Hello againπŸ‘‹, welcome to the second part of my series on understanding Twitter user behavior.

Before we dive deeper into the code, I want you to understand how I envision the characteristics of an user’s behavior.

First of all, there are three main aspects: 1. temporal πŸ•°οΈ - how often does the user open Twitter and how long does a session last? 2. publicly visible πŸ“» - how often does a user post a tweet or follow someone? 3. publicly invisible πŸ•΅οΈβ€β™€οΈ - how many tweets does a user scroll, how often do they click on images, video, etc.

For each of those three aspects I assigned some quantifiable characteristics. The result looks like this:

Temporal Publicly Visible Publicly Invisible
Sessions Per Day Posting Prob. Scroll Timeline Prob.
Session Length Like Prob. Click On Media Prob.
Session Distribution Over Day Retweet Prob. Open Details Prob.
Follow After Searching Prob. Click On Author Profile Prob.
Follow From Tweet Prob. Click On Hashtag Prob.

So the next step is to collect data from real users that will allow me to assign values to each of these cells. To do this, the Android app and the browser extension will have to send events that correspond to what can be described by this Kotlin data class:

data class Event(
    val eventType: EventType,
    val userId: String,
    val action: String,
    val timestamp: String,
    val target: String? = null,
    val selector: String? = null,
    val scrollPosition: Int? = null,
    val estimatedTweetsScrolled: Int? = null
) {
    enum class EventType {
        BROWSER,
        ANDROID
    }
}

As you can see, some of the properties are optional, depending on the value for action , which is either scroll , click , session_start or session_end . Note that this is the data class used in the backend, which is responsible for writing into the ElasticSearch instance only - hence, I didn’t need to refine it anymore. The Android app, however, uses a more complex class hierarchy to facilitate event publishing. More on this in a later post.

A like event in the browser would look like this:

{
    "event_type": "BROWSER",
    "user_id": "b92427ae41e4649b934ca495991b7852b855...",
    "action": "click",
    "timestamp": "2021-01-24T16:11:36.822Z",
    "target": "like",
    "selector": "# tweet-action-buttons > div:nth-child(3) > div > div > div > div"
  }

… whereas a scroll event on Android would look similar to this:

{
    "event_type": "ANDROID",
    "user_id": "3a4cab1a6f566207d520622e0bbce78d7...",
    "action": "scroll",
    "timestamp": "2021-01-24T13:29:33.853",
    "scroll_position": 5938,
    "estimated_tweets_scrolled": 5
  }

With this information it is later possible to identify sessions, probabilities per session and probabilities per tweet. Thus, in the end, automated software that tries to mimic real users can be fed with these values.

Those are the basics to get an idea of what the whole infrastructure is about. In the next posts, we will look at the individual components, starting with the browser extension.

For those who haven’t read the first post: here is the code involved.

This post is also available on DEV.