Entities in Twitter Objects

Updated on Mon, 2013-12-16 16:46

See also Entities from the Field Guide.

Entities for Tweets

Entities provide structured data from Tweets including resolved URLs, media, hashtags and mentions without having to parse the text to extract that information. Entities are included in all Tweet Objects from both the REST API and Streaming APIs endpoints, located under the entities attribute.

As usual, it is important to be tolerant of new fields and empty/null values in all returns. Note that since REST API v1.1, entities are always included by default but you can omit them by setting the include_entities parameter to false in your requests.

Below are the details of entities you can find in Tweet Objects: media, urls, user_mentions, hashtags, and symbols.

The media entity

An array of media attached to the Tweet with the Twitter Photo Upload feature. Each media entity comes with the following attributes:

id the media ID (int format)
id_str the media ID (string format)
media_url The URL of the media file (see the sizes attribute for available sizes)
media_url_https The SSL URL of the media file (see the sizes attribute for available sizes)
url The media URL that was extracted
display_url Not a URL but a string to display instead of the media URL
expanded_url The fully resolved media URL
sizes We support different sizes: thumb, small, medium and large. The media_url defaults to medium but you can retrieve the media in different sizes by appending a colon + the size key (for example: http://pbs.twimg.com/media/A7EiDWcCYAAZT1D.jpg:thumb). Each available size comes with three attributes that describe it:
w: the width (in pixels) of the media in this particular size
h: the height (in pixels) of the media in this particular size
resize: how we resized the media to this particular size (can be crop or fit)
type Only photo for now
indices The character positions the media was extracted from

JSON Example

{
  ...
  "text": "Four more years. http:\/\/t.co\/bAJE6Vom",
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [],
    "user_mentions": [],
    "media": [{
      "id": 266031293949698048,
      "id_str": "266031293949698048",
      "indices": [17, 37],
      "media_url": "http:\/\/pbs.twimg.com\/media\/A7EiDWcCYAAZT1D.jpg",
      "media_url_https": "https:\/\/pbs.twimg.com\/media\/A7EiDWcCYAAZT1D.jpg",
      "url": "http:\/\/t.co\/bAJE6Vom",
      "display_url": "pic.twitter.com\/bAJE6Vom",
      "expanded_url": "http:\/\/twitter.com\/BarackObama\/status\/266031293945503744\/photo\/1",
      "type": "photo",
      "sizes": {
        "medium": {
          "w": 600,
          "h": 399,
          "resize": "fit"
        },
        "thumb": {
          "w": 150,
          "h": 150,
          "resize": "crop"
        },
        "small": {
          "w": 340,
          "h": 226,
          "resize": "fit"
        },
        "large": {
          "w": 800,
          "h": 532,
          "resize": "fit"
        }
      }
    }]
  }
}

The urls entity

An array of URLs extracted from the Tweet text. Each URL entity comes with the following attributes:

url The t.co URL that was extracted from the Tweet text
display_url Not a valid URL but a string to display instead of the URL
expanded_url The resolved URL
indices The character positions the URL was extracted from

JSON Example

{
  ...
  "text": "Today, Twitter is updating embedded Tweets to enable a richer photo experience: https:\/\/t.co\/XdXRudPXH5",
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [{
      "url": "https:\/\/t.co\/XdXRudPXH5",
      "expanded_url": "https:\/\/blog.twitter.com\/2013\/rich-photo-experience-now-in-embedded-tweets-3",
      "display_url": "blog.twitter.com\/2013\/rich-phot\u2026",
      "indices": [80, 103]
    }],
    "user_mentions": []
  }
}

The user_mentions entity

An array of Twitter screen names extracted from the Tweet text. Each user mention entity comes with the following attributes:

id The user ID (int format)
id_str The user ID (string format)
screen_name The user screen name
name The user full name
indices The character positions the user mention was extracted from

JSON Example

{
  ...
  "text": "We\u2019re excited to work closely with the external technical community and continue @twittereng\u2019s work with open source. cc @TwitterOSS",
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [],
    "user_mentions": [{
      "screen_name": "TwitterEng",
      "name": "Twitter Engineering",
      "id": 6844292,
      "id_str": "6844292",
      "indices": [81, 92]
    }, {
      "screen_name": "TwitterOSS",
      "name": "Twitter Open Source",
      "id": 376825877,
      "id_str": "376825877",
      "indices": [121, 132]
    }]
  }
}

The hashtags entity

An array of hashtags extracted from the Tweet text. Each hashtag entity comes with the following attributes:

text The hashtag text
indices The character positions the hashtag was extracted from

JSON Example

{
  ...
  "text": "Loved #devnestSF"
  "entities": {
    "hashtags": [
      "text": "devnestSF"
      "indices": [
        6,
        16
      ]
    ],
    "symbols": [],
    "urls": [],
    "user_mentions": []
  }
}

The symbols entity

An array of financial symbols starting with the dollar sign extracted from the Tweet text. Similar to hashtags, an entity comes with the following attributes:

text The symbol text
indices The character positions the symbol was extracted from

JSON Example

{
  ...
  "text": "$PEP or $COKE?",
  "entities": {
    "hashtags": [],
    "symbols": [
      {
        "text": "PEP",
        "indices": [
          0,
          4
        ]
      },
      {
        "text": "COKE",
        "indices": [
          8,
          13
        ]
      }
    ],
    "urls": [],
    "user_mentions": []
  }
}

Entities for Retweets

From the Twitter API perspective, a Retweet is a special kind of Tweet that contains the original Tweet as an embedded retweeted_status object. For consistency, the top-level Retweet object also has a text property and associated entities.

The Retweet text attribute is composed of the original Tweet text with “RT @username: ” prepended. If the display character count then exceeds 140 characters, this text is truncated and an ellipsis “…” is added. Consequently, some top-level entities might be incorrect or missing, for instance in the case of a truncated hashtag entry.

As a result, the best practice is to retrieve the text, entities, original author and date from the original Tweet in retweeted_status whenever this exists.

JSON Example

{
  ...
  "text": "RT @rsarver: Great, in-depth technical post from @raffi on how they re-architected Twitter to be able to handle 500M tweets\/day https:\/\/t.c\u2026",
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [],
    "user_mentions": [{
      "screen_name": "rsarver",
      "name": "Ryan Sarver",
      "id": 795649,
      "id_str": "795649",
      "indices": [3, 11]
    }, {
      "screen_name": "raffi",
      "name": "Raffi Krikorian",
      "id": 8285392,
      "id_str": "8285392",
      "indices": [49, 55]
    }]
  },
  ...
  "retweeted_status": {
    ...
    "text": "Great, in-depth technical post from @raffi on how they re-architected Twitter to be able to handle 500M tweets\/day https:\/\/t.co\/te5ubjHNsZ",
    "entities": {
      "hashtags": [],
      "symbols": [],
      "urls": [{
        "url": "https:\/\/t.co\/te5ubjHNsZ",
        "expanded_url": "https:\/\/blog.twitter.com\/2013\/new-tweets-per-second-record-and-how",
        "display_url": "blog.twitter.com\/2013\/new-tweet\u2026",
        "indices": [115, 138]
      }],
      "user_mentions": [{
        "screen_name": "raffi",
        "name": "Raffi Krikorian",
        "id": 8285392,
        "id_str": "8285392",
        "indices": [36, 42]
      }]
    }
  }
}

In the above example, the URL was truncated in the top-level text and missing from the top-level entities. You will also notice the additional user_mentions top-level entity coming from the “RT @rsarver: ” prefix on the text field.

However, the Tweet text and entities in retweeted_status perfectly reflect the original Tweet with no truncation or incorrect entities, hence our recommendation to rely on this nested object for Retweets.

Upcoming Change

Please note that we have announced a change so that starting January 6, 2014, top-level entities for Retweets will now be consistent with the original Tweet ones. We still strongly recommend using entities from the original retweeted_status, but if you are making use of those top-level entities, below is a detailed example for the changes you will see early next year when we start rolling this out.

Check out this Retweet example to illustrate the change.

JSON Extract Before the Change

{
  ...
  "text": "RT @university: Learn more about the powerful #Linux container engine @docker in this video intro with @solomonstre - http:\/\/t.co\/QJLdA1762\u2026",
  "entities": {
    "hashtags": [{
      "text": "Linux",
      "indices": [46, 52]
    }],
    "symbols": [],
    "urls": [],
    "user_mentions": [{
      "screen_name": "university",
      "name": "Twitter University",
      "id": 1665823832,
      "id_str": "1665823832",
      "indices": [3, 14]
    }, {
      "screen_name": "docker",
      "name": "Docker",
      "id": 1138959692,
      "id_str": "1138959692",
      "indices": [70, 77]
    }, {
      "screen_name": "solomonstre",
      "name": "Solomon Hykes",
      "id": 9551792,
      "id_str": "9551792",
      "indices": [103, 115]
    }]
  },
  ...
  "retweeted_status": {
    ...
    "text": "Learn more about the powerful #Linux container engine @docker in this video intro with @solomonstre - http:\/\/t.co\/QJLdA1762Y @TwitterOSS",
    "entities": {
      "hashtags": [{
        "text": "Linux",
        "indices": [30, 36]
      }],
      "symbols": [],
      "urls": [{
        "url": "http:\/\/t.co\/QJLdA1762Y",
        "expanded_url": "http:\/\/youtu.be\/Q5POuMHxW-0",
        "display_url": "youtu.be\/Q5POuMHxW-0",
        "indices": [102, 124]
      }],
      "user_mentions": [{
        "screen_name": "docker",
        "name": "Docker",
        "id": 1138959692,
        "id_str": "1138959692",
        "indices": [54, 61]
      }, {
        "screen_name": "solomonstre",
        "name": "Solomon Hykes",
        "id": 9551792,
        "id_str": "9551792",
        "indices": [87, 99]
      }, {
        "screen_name": "TwitterOSS",
        "name": "Twitter Open Source",
        "id": 376825877,
        "id_str": "376825877",
        "indices": [125, 136]
      }]
    }
  }
}

Details of the Change

Retweet Entities

Since the URL was truncated in the top-level text, previously the URL entity would be missing. With this change, the URL entity will be included and its indices will start at the beginning of the URL text, but actually end at the ellipsis character. The URL entity will contain the entire shortened URL, even though that URL is not fully contained in the Retweet text.

The @TwitterOSS user mention entity was also missing since it was entirely truncated. It will now be included but will only reference the ellipsis character at indices [139, 140]. Please also note that the order of entities is not guaranteed to be ordered by from indices.

JSON Extract After the Change

{
  ...
  "text": "RT @university: Learn more about the powerful #Linux container engine @docker in this video intro with @solomonstre - http:\/\/t.co\/QJLdA1762\u2026",
  "entities": {
    "hashtags": [{
      "text": "Linux",
      "indices": [46, 52]
    }],
    "symbols": [],
    "urls": [{
      "url": "http://t.co/QJLdA1762Y",
      "expanded_url": "http://youtu.be/Q5POuMHxW-0",
      "display_url": "youtu.be/Q5POuMHxW-0",
      "indices": [118, 140]
    }],
    "user_mentions": [{
      "screen_name": "university",
      "name": "Twitter University",
      "id": 1665823832,
      "id_str": "1665823832",
      "indices": [3, 14]
    }, {
      "screen_name": "docker",
      "name": "Docker",
      "id": 1138959692,
      "id_str": "1138959692",
      "indices": [70, 77]
    }, {
      "screen_name": "solomonstre",
      "name": "Solomon Hykes",
      "id": 9551792,
      "id_str": "9551792",
      "indices": [103, 115]
    }, {
      "screen_name": "TwitterOSS",
      "name": "Twitter Open Source",
      "id": 376825877,
      "id_str": "376825877",
      "indices": [139, 140]
    }]
  },
  ...
  "retweeted_status": {
    ...
    "text": "Learn more about the powerful #Linux container engine @docker in this video intro with @solomonstre - http:\/\/t.co\/QJLdA1762Y @TwitterOSS",
    "entities": {
      "hashtags": [{
        "text": "Linux",
        "indices": [30, 36]
      }],
      "symbols": [],
      "urls": [{
        "url": "http:\/\/t.co\/QJLdA1762Y",
        "expanded_url": "http:\/\/youtu.be\/Q5POuMHxW-0",
        "display_url": "youtu.be\/Q5POuMHxW-0",
        "indices": [102, 124]
      }],
      "user_mentions": [{
        "screen_name": "docker",
        "name": "Docker",
        "id": 1138959692,
        "id_str": "1138959692",
        "indices": [54, 61]
      }, {
        "screen_name": "solomonstre",
        "name": "Solomon Hykes",
        "id": 9551792,
        "id_str": "9551792",
        "indices": [87, 99]
      }, {
        "screen_name": "TwitterOSS",
        "name": "Twitter Open Source",
        "id": 376825877,
        "id_str": "376825877",
        "indices": [125, 136]
      }]
    }
  }
}

Entities for Users

Entities for User Objects describe URLs that appear in the user defined profile URL and description fields. They do not describe hashtags or user_mentions. Unlike Tweet entities, user entities can apply to multiple fields within its parent object — to disambiguate, you will find a parent nodes called url and description that indicate which field contains the entitized URL.

In this example, the user url field contains a t.co link that is fully expanded within the entities/url/urls[0] node of the response. The user does not have a wrapped URL in their description.

JSON Example

{
  "id": 6253282,
  "id_str": "6253282",
  "name": "Twitter API",
  "screen_name": "twitterapi",
  "location": "San Francisco, CA",
  "description": "The Real Twitter API. I tweet about API changes, service issues and happily answer questions about Twitter and our API. Don't get an answer? It's on my website.",
  "url": "http:\/\/t.co\/78pYTvWfJd",
  "entities": {
    "url": {
      "urls": [{
        "url": "http:\/\/t.co\/78pYTvWfJd",
        "expanded_url": "http:\/\/dev.twitter.com",
        "display_url": "dev.twitter.com",
        "indices": [0, 22]
      }]
    },
    "description": {
      "urls": []
    }
  }
  ...
}

Entities for Direct Messages

Entities for Direct Messages are very similar to entities for Tweets. However, there are a few differences concerning the media entities.

Unlike media shared in Tweets, media shared in direct messages requires authorization to view. This authorization can be presented via an authenticated twitter.com session or by signing a request with the user's access token using OAuth 1.0A.

Also, in tweets, media URLs are only in the media entities, but in direct messages, media URLs are in both media and urls entities.

JSON Example

{
    "id": 411031503817039874,
    "id_str": "411031503817039874",
    "text": "test $TWTR @twitterapi #hashtag http:\/\/t.co\/p5dOtmnZyu https:\/\/t.co\/ZSvIEMOPb8",
    "created_at": "Thu Dec 12 07:15:21 +0000 2013",
    "entities": {
        "hashtags": [{
            "text": "hashtag",
            "indices": [23, 31]
        }],
        "symbols": [{
            "text": "TWTR",
            "indices": [5, 10]
        }],
        "urls": [{
            "url": "http:\/\/t.co\/p5dOtmnZyu",
            "expanded_url": "http:\/\/dev.twitter.com",
            "display_url": "dev.twitter.com",
            "indices": [32, 54]
        }, {
            "url": "https:\/\/t.co\/ZSvIEMOPb8",
            "expanded_url": "https:\/\/ton.twitter.com\/1.1\/ton\/data\/dm\/411031503817039874\/411031503833792512\/cOkcq9FS.jpg",
            "display_url": "pic.twitter.com\/ZSvIEMOPb8",
            "indices": [55, 78]
        }],
        "user_mentions": [{
            "screen_name": "twitterapi",
            "name": "Twitter API",
            "id": 6253282,
            "id_str": "6253282",
            "indices": [11, 22]
        }],
        "media": [{
            "id": 411031503833792512,
            "id_str": "411031503833792512",
            "indices": [55, 78],
            "media_url": "https:\/\/ton.twitter.com\/1.1\/ton\/data\/dm\/411031503817039874\/411031503833792512\/cOkcq9FS.jpg",
            "media_url_https": "https:\/\/ton.twitter.com\/1.1\/ton\/data\/dm\/411031503817039874\/411031503833792512\/cOkcq9FS.jpg",
            "url": "https:\/\/t.co\/ZSvIEMOPb8",
            "display_url": "pic.twitter.com\/ZSvIEMOPb8",
            "expanded_url": "https:\/\/ton.twitter.com\/1.1\/ton\/data\/dm\/411031503817039874\/411031503833792512\/cOkcq9FS.jpg",
            "type": "photo",
            "sizes": {
                "medium": {
                    "w": 600,
                    "h": 450,
                    "resize": "fit"
                },
                "large": {
                    "w": 1024,
                    "h": 768,
                    "resize": "fit"
                },
                "thumb": {
                    "w": 150,
                    "h": 150,
                    "resize": "crop"
                },
                "small": {
                    "w": 340,
                    "h": 255,
                    "resize": "fit"
                }
            }
        }]
    }
    ...
}

Note that it's currently not possible to attach media to direct messages with POST direct_messages/new.