Introduction to Tweet JSON

All Twitter APIs that return Tweets provide that data encoded using JavaScript Object Notation (JSON). JSON is based on key-value pairs, with named attributes and associated values. These attributes, and their state are used to describe objects.

At Twitter we serve many objects as JSON, including Tweets and Users. These objects all encapsulate core attributes that describe the object. Each Tweet has an author, a message, a unique ID, a timestamp of when it was posted, and sometimes geo metadata shared by the user. Each User has a Twitter name, an ID, a number of followers, and most often an account bio.

With each Tweet we also generate "entity" objects, which are arrays of common Tweet contents such as hashtags, mentions, media, and links. If there are links, the JSON payload can also provide metadata such as the fully unwound URL and the webpage’s title and description.

So, in addition to the text content itself, a Tweet can have over 150 attributes associated with it. Let’s start with an example Tweet:


The following JSON illustrates the structure for these objects and some of their attributes:


{
  "created_at": "Thu Apr 06 15:24:15 +0000 2017",
  "id_str": "850006245121695744",
  "text": "1\/ Today we\u2019re sharing our vision for the future of the Twitter API platform!\nhttps:\/\/t.co\/XweGngmxlP",
  "user": {
    "id": 2244994945,
    "name": "Twitter Dev",
    "screen_name": "TwitterDev",
    "location": "Internet",
    "url": "https:\/\/dev.twitter.com\/",
    "description": "Your official source for Twitter Platform news, updates & events. Need technical help? Visit https:\/\/twittercommunity.com\/ \u2328\ufe0f #TapIntoTwitter"
  },
  "place": {   
  },
  "entities": {
    "hashtags": [      
    ],
    "urls": [
      {
        "url": "https:\/\/t.co\/XweGngmxlP",
        "unwound": {
          "url": "https:\/\/cards.twitter.com\/cards\/18ce53wgo4h\/3xo1c",
          "title": "Building the Future of the Twitter API Platform"
        }
      }
    ],
    "user_mentions": [     
    ]
  }
}






 

Fundamental Twitter objects

Tweet objects

When ingesting Tweet data the main object is the Tweet Object, which is a parent object to several child objects. For example, all Tweets include a User object that describes who authored the Tweet. If the Tweet is geo-tagged, there will a "place" object included. Every Tweet includes an "entities" object that encapsulates arrays of hashtags, user mentions, URLs, cashtags, and native media. If the Tweet has any ‘attached’ or ‘native’ media (photos, video, animated GIF), there will be an "extended_entities" object.

The JSON below illustrates some structural fundamentals of Tweet objects. Here we are showing some core Tweet attributes on the 'root-level' (or top level), including when the Tweet was posted, its unique ID, and the Tweet message. Also on this root-level are other fundamental (child) objects such as the "user" and "entities" objects.


{  
"created_at": "Thu May 10 15:24:15 +0000 2018",
 "id_str": "850006245121695744",
 "text": "Here is the Tweet message.",
 "user": {
 },
 "place": {
 },
 "entities": {
 },
 "extended_entities": {
 }
}





Extended Tweets

JSON that describes Extended Tweets was introduced when 280-character Tweets were launched in November 2017. Tweet JSON was extended to encapsulate these longer messages, while not breaking the thousands of apps parsing these fundamental Twitter objects. To provide full backward compatibility, the original 140-character 'text' field, and the entity objects parsed from that, were retained. In the case of Tweets longer than 140 characters, this root-level 'text' field would become truncated and thus incomplete. Since the root-level 'entities' objects contain arrays of key metadata parsed from the 'text' message, such as included hashtags and links, these collections would be incomplete. For example, if a Tweet message was 200 characters long, with a hashtag included at the end, the legacy root-level 'entities.hashtags' array would not include it. 

A new 'extended_tweet' field was introduced to hold the longer Tweet messages and complete entity metadata. The "extended_tweet" object provides the "full_text" field that contains the complete, untruncated Tweet message when longer than 140 characters. The "extended_tweet" object also contains an "entities" object with complete arrays of hashtags, links, mentions, etc.

Extended Tweets are identified with a root-level "truncated" boolean. When true ("truncated": true), the "extended_tweet" fields should be parsed instead of the root-level fields.

Note in the JSON example below that the root-level "text" field is truncated and the root-level "entities.hashtags" array is empty even though the Tweet message includes three hashtags. Since this is an Extended Tweet, the "truncated" field is set to true, and the "extended_tweet" object provides complete "full_text" and "entities" Tweet metadata.


{
	"created_at": "Thu May 10 17:41:57 +0000 2018",
	"id_str": "994633657141813248",
	"text": "Just another Extended Tweet with more than 140 characters, generated as a documentation example, showing that [\"tru… https://t.co/U7Se4NM7Eu",
	"display_text_range": [0, 140],
	"truncated": true,
	"user": {
		"id_str": "944480690",
		"screen_name": "FloodSocial"
	},
	"extended_tweet": {
		"full_text": "Just another Extended Tweet with more than 140 characters, generated as a documentation example, showing that [\"truncated\": true] and the presence of an \"extended_tweet\" object with complete text and \"entities\" #documentation #parsingJSON #GeoTagged https://t.co/e9yhQTJSIA",
		"display_text_range": [0, 249],
		"entities": {
			"hashtags": [{
				"text": "documentation",
				"indices": [211, 225]
			}, {
				"text": "parsingJSON",
				"indices": [226, 238]
			}, {
				"text": "GeoTagged",
				"indices": [239, 249]
			}]
		}

	},
	"entities": {
		"hashtags": []
	}
}





 

Retweets and Quote Tweets

If you are working with Retweet or Quote Tweet objects, then that JSON payload will contain multiple Tweet objects, and each Tweet object will contain its own User object. The root-level object will contain information on the type of action taken, i.e. whether it is a Retweet or a Quote Tweet, and will contain an object that describes the 'original' Tweet being shared. 

Retweets

Retweets always contain two Tweet objects. The 'original' Tweet being Retweeted is provided in a "retweeted_status" object. The root-level object encapsulates the Retweet itself, including a User object for the account taking the Retweet action and the time of the Retweet. Retweeting is an action to share a Tweet with your followers, and no other new content can be added. Also, a (new) location cannot be provided with a Retweet. While the 'original' Tweet may have geo-tagged, the Retweet "geo" and "place" objects will always be null.

Even before the introduction of Extended Tweets, the root-level "entities" object was in some cases truncated and incomplete due to the "RT @username " string being appended to Tweet message being Retweeted.  Note that if a Retweet gets Retweeted, the "retweet_status" will still point to the original Tweet, meaning the intermediate Retweet is not included. Similar behavior is seen when using twitter.com to 'display' a Retweet. If you copy the unique Tweet ID assigned to the Retweet 'action', the original Tweet is displayed. 

Below is an example structure for a Retweet. Again, when parsing Retweets, it is key to parse the "retweeted_status" object for complete (original) Tweet message and entity metadata.


{
  "tweet": {
    "text": "RT @author original message"
    "user": {
          "screen_name": "Retweeter"
    },
    "retweeted_status": {
      "text": "original message".
        "user": {         
            "screen_name": "OriginalTweeter"
        },
        "place": {          
        },
        "entities": {          
        },
        "extended_entities": {          
        }
      },     
    },
    "entities": {      
    },
    "extended_entities": {      
    }
  }
}




Quote Tweets

Quote Tweets are much like Retweets except that they include a new Tweet message. These new messages can contain their own set of hashtags, links, and other "entities" metadata. Quote Tweets can also include location information shared by the user posting the Quote Tweet.

Quote Tweets will contain at least two Tweet objects and in some cases three. The Tweet being Quoted, which itself can be a Quoted Tweet, is provided in a "quoted_status" object. The root-level object encapsulates the Quote Tweet itself, including a User object for the account taking the sharing action and the time of the Quote Tweet. 

Note that Quote Tweets can not have photos or videos added to them using the 'post Tweet' user-interface. When links to externally hosted media are included in the Quote Tweet message, the root-level "entities.urls" will describe those. Since no native media can be added, any root-level "extended_entities" metadata can be ignored. 

When Quote Tweets were first launched, a shortened link (t.co URL) was appended to the 'original' Tweet message and provided in the root-level "text" field. In addition, metadata for that t.co URL was included in the root-level 'entities.urls' array. 

Below is an example structure for a Quote Tweet using this original formatting. Note that the root-level "text" attribute is based on the quoted Tweet's message, plus a Twitter shortened URL to the quoted Tweet.


{
  "text": "My added comments to this Tweet ---> https:\/\/t.co\/LinkToTweet",
  "user": {
    "screen_name": "TweetQuoter"
  },
  "quoted_status": {
    "text": "original message",
    "user": {
      "screen_name": "OriginalTweeter"
    },
    "place": {      
    },
    "entities": {      
    },
    "extended_entities": {      
    }
  },
  "place": {    
  },
  "entities": {
    "urls": [
      {
        "url": "https:\/\/t.co\/LinkToTweet",
        "expanded_url": "https:\/\/twitter.com\/OriginalTweeter\/status\/994281226797137920"
      }
    ]
  }
}





Recent Quote Tweet updates

As announced on our developer forum, changes will be made in how Quote Tweets are represented in JSON. First, the shorted t.co URL to the quoted Tweet will not be included in the root-level "text" field. Second, the metadata for the quoted Tweet will not be included in the "entities.urls" metadata. Instead, URL metadata for the quoted Tweet will be in a new "quoted_status_permalink" object on the root-level (or top-level), so at the same level of the "quoted_status" object.

With these updates, the above Quote Tweet JSON will look like:


{
  "text": "My added comments to this Tweet",
  "user": {
    "screen_name": "TweetQuoter"
  },
  "quoted_status": {
    "text": "original message",
    "user": {
      "screen_name": "OriginalTweeter"
    },
    "place": {      
    },
    "entities": {      
    },
    "extended_entities": {      
    }
  },  
  "quoted_status_permalink": {
     "url": "https:\/\/t.co\/LinkToTweet",
     "expanded": "https:\/\/twitter.com\/OriginalTweeter\/status\/994281226797137920",
     "display": "twitter.com\/OriginalTweeter\/status\/994281226797137920"
  },    
  "place": {    
  },
  "entities": {
    "urls": [
    ]
  }
}





Note that there will be a 30-day period when Quote Tweets JSON will reflect both formats: the new "quoted_status_permalink" attributes will be provided, along with the older style of appending a t.co URL to the quoted Tweet message and putting quote Tweet URL metadata in the root-level "entities" object. After the update in mid-June, only the updated Quote Tweet format will be provided. Be sure to update your parsers by then!

 

Data dictionaries

Whatever your Twitter use case, understanding what these JSON-encoded Tweet objects and attributes represent is critical to successfully finding your data signals of interest. To help in that effort, there are a set of Data Dictionaries for these fundamental Twitter objects.

Reflecting the JSON hierarchy above, here are links and further descriptions of these Objects:

  • Tweet - Also referred to as a ‘Status’ object, has many ‘root-level’ attributes, parent of other objects.
    • User - Twitter Account level metadata. Will include any available account-level enrichments, such as Profile geo.
    • Entities - Contains object arrays of #hashtags, @mentions, $symbols, URLs, and media.
    • Extended Entities - Contains up to four native photos, or one video or animated GIF.
    • Places - Parent to ‘coordinates’ object.

 

Parsing best-practices

  • Twitter JSON is encoded using UTF-8 characters.
  • Parsers should tolerate variance in the ordering of fields with ease. It should be assumed that Tweet JSON is served as an unordered hash of data.
  • Parsers should tolerate the addition of 'new' fields. The Twitter platform has continually evolved since 2006, so there is a long history of new metadata being added to Tweets.  
  • JSON parsers must be tolerant of ‘missing’ fields, since not all fields appear in all contexts.
  • It is generally safe to consider a nulled field, an empty set, and the absence of a field as the same thing.

 

Important notes

Product details

These JSON attribute dictionaries are specifically for the Tweets delivered by the following Twitter products:

Please note that Tweets sourced elsewhere may vary somewhat in structure from this document.

Tweet JSON formats

The enriched native JSON format (previously referred to as the original format) is Twitter's primary data format. This format includes several data enrichments including the Profile geomatching rulesTwitter Polls metadata, and Exhanced URLs.

Legacy Gnip developers may still be using the Activity Stream format. As new metadata becomes available, note that it will only be available in the enriched native format and not the Activity Stream format.  

 

Next steps