Tweet Entities

Updated on Thu, 2013-10-17 07:48

See also Entities from the Field Guide.

Why Tweet Entities?

Tweet text can potentially mention other users or lists, but also contain URLs, media, hashtags... Instead of parsing the text yourself to try to extract those entities, you can use the entities attribute that contains this parsed and structured data.

How can I use these Tweet Entities?

As usual, it is important to be tolerant of new fields and empty/null values in all returns. Note also that:

  • With the REST API v1, you'll need to set the include_entities parameter to 1 (or true) if you want entities to be included. In API v1.1, entities will always be included unless you set include_entities to False or 0.
  • With the Streaming API, entities are automatically included.

The media entity

An array of media attached to the Tweet with the new Twitter Photo Upload feature. Each media entity comes with the following attributes:

id the media ID (int format)
id_str the media ID (string format)
media_url The URL of the media file (see the `sizes` attribute for available sizes)
media_url_https The SSL URL of the media file (see the sizes attribute for available sizes)
url The media URL that was extracted
display_url Not a URL but a string to display instead of the media URL
expanded_url The fully resolved media URL
sizes We support different sizes: thumb, small, medium and large. The media_url defaults to medium but you can retrieve the media in different sizes by appending a colon + the size key (for example: http://p.twimg.com/ARACoSZs_QA8BDB.jpg:thumb). Each available size comes with three attributes that describe it:
w: the width (in pixels) of the media in this particular size
h: the height (in pixels) of the media in this particular size
resize: how we resized the media to this particular size (can be crop or fit)
type only photo for now
indices The character positions the media was extracted from
JSON example:
  1.     "text": "#Photos on Twitter: taking flight http://t.co/qbJx26r",
  2.     "entities": {
  3.       "media": [
  4.         {
  5.           "id": 76360760611180544,
  6.           "id_str": "76360760611180544",
  7.           "media_url": "http://p.twimg.com/AQ9JtQsCEAA7dEN.jpg",
  8.           "media_url_https": "https://p.twimg.com/AQ9JtQsCEAA7dEN.jpg",
  9.           "url": "http://t.co/qbJx26r",
  10.           "display_url": "pic.twitter.com/qbJx26r",
  11.           "expanded_url": "http://twitter.com/twitter/status/76360760606986241/photo/1",
  12.           "sizes": {
  13.             "large": {
  14.               "w": 700,
  15.               "resize": "fit",
  16.               "h": 466
  17.             },
  18.             "medium": {
  19.               "w": 600,
  20.               "resize": "fit",
  21.               "h": 399
  22.             },
  23.             "small": {
  24.               "w": 340,
  25.               "resize": "fit",
  26.               "h": 226
  27.             },
  28.             "thumb": {
  29.               "w": 150,
  30.               "resize": "crop",
  31.               "h": 150
  32.             }
  33.           },
  34.           "type": "photo",
  35.           "indices": [
  36.             34,
  37.             53
  38.           ]
  39.         }
  40.       ],
  41.       "urls": [
  42.       ],
  43.       "user_mentions": [
  44.       ],
  45.       "hashtags": [
  46.       ]
  47.     }
  48.  

The urls entity

An array of URLs extracted from the Tweet text. Each URL entity comes with the following attributes:

url The URL that was extracted
display_url (only for t.co links) Not a URL but a string to display instead of the URL
expanded_url (only for t.co links) The fully resolved URL
indices The character positions the URL was extracted from
JSON example:
  1.     "text": "Twitter for Mac is now easier and faster, and you can open multiple windows at once http://t.co/0JG5Mcq",
  2.     "entities": {
  3.       "media": [
  4.       ],
  5.       "urls": [
  6.         {
  7.           "url": "http://t.co/0JG5Mcq",
  8.           "display_url": "blog.twitter.com/2011/05/twitte…",
  9.           "expanded_url": "http://blog.twitter.com/2011/05/twitter-for-mac-update.html",
  10.           "indices": [
  11.             84,
  12.             103
  13.           ]
  14.         }
  15.       ],
  16.       "user_mentions": [
  17.       ],
  18.       "hashtags": [
  19.       ]
  20.     }
  21.  

The user_mentions entity

An array of Twitter screen names extracted from the Tweet text. Each User entity comes with the following attributes:

id The User ID (int format)
id_str The User ID (string format)
screen_name The User screen name
name The User's full name
indices The character positions the User mention was extracted from
JSON example:
  1.     "text": "@rno Et demi!"
  2.     "entities": {
  3.       "media": [
  4.       ],
  5.       "urls": [
  6.       ],
  7.       "user_mentions": [
  8.         {
  9.           "id": 22548447,
  10.           "id_str": "22548447",
  11.           "screen_name": "rno",
  12.           "name": "Arnaud Meunier",
  13.           "indices": [
  14.             0,
  15.             4
  16.           ]
  17.         }
  18.       ],
  19.       "hashtags": [
  20.       ]
  21.     }
  22.  

The hashtags entity

An array of hashtags extracted from the Tweet text. Each Hashtag entity comes with the following attributes:

text The Hashtag text
indices The character positions the Hashtag was extracted from
JSON example:
  1.     "text": "Loved #devnestSF"
  2.     "entities": {
  3.       "media": [
  4.       ],
  5.       "urls": [
  6.       ],
  7.       "user_mentions": [
  8.       ],
  9.       "hashtags": [
  10.         "text": "devnestSF"
  11.         "indices": [
  12.           6,
  13.           16
  14.         ]
  15.       ]
  16.     }
  17.  

The symbols entity

An array of financial symbols starting with the dollar sign extracted from the Tweet text. Similar to hashtags, an entity comes with the following attributes:

text The symbol text
indices The character positions the symbol was extracted from
JSON example:
  1.     "text": "$PEP or $COKE?",
  2.     "entities": {
  3.       "hashtags": [],
  4.       "symbols": [
  5.         {
  6.           "text": "PEP",
  7.           "indices": [
  8.             0,
  9.             4
  10.           ]
  11.         },
  12.         {
  13.           "text": "COKE",
  14.           "indices": [
  15.             8,
  16.             13
  17.           ]
  18.         }
  19.       ],
  20.       "urls": [],
  21.       "user_mentions": []
  22.     }
  23.