Data Reference Overview

The Data Reference provides an overview of the data available on Quantopian as well as documentation for each dataset. The documentation includes descriptions, code examples, historical coverage, update frequencies, and more.

The sections below outline some of the concepts that are shared between all or most datasets on Quantopian.

Coverage

Note

This section describes the overall coverage of data available on Quantopian. Individual datasets may have more or less coverage.

The Quantopian platform provides data for global equities. Quantopian provides historically accurate equity data (including stocks, ETFs, ADRs, and more) going back as far as 2002 in the US and as far as 2004 in 43 other countries. This includes equities that are no longer trading today.

Note

Providing data for equities that are no longer listed is important because it helps avoid survivorship bias in quantitative research. Databases that omit delisted assets ignore bankruptcies and other important events, and lead to false optimism about a factor or strategy. For example, LEH (Lehman Brothers) was a tradable asset in 2008, even though the company no longer exists today; Lehman's bankruptcy was a major event that affected many algorithms at the time.

Supported Countries

Below is a table of countries and exchanges for which Quantopian provides equity data.

Country Country Code Pipeline Domain Supported Exchanges
Argentina AR AR_EQUITIES Buenos Aires Stock Exchange
Austria AT AT_EQUITIES Vienna Stock Exchange
Australia AU AU_EQUITIES Australian Securities Exchange, National Stock Exchange of Australia
Belgium BE BE_EQUITIES Euronext Brussels
Brazil BR BR_EQUITIES Sao Paulo Stock Exchange
Canada CA CA_EQUITIES Toronto Stock Exchange, TSX Venture Exchange, Canadian Securities Exchange
Chile CL CL_EQUITIES Santiago Stock Exchange
China CN CN_EQUITIES Shenzhen Stock Exchange, Shanghai Stock Exchange
Colombia CO CO_EQUITIES Colombia Stock Exchange
Czech Republic CZ CZ_EQUITIES Prague Stock Exchange
Denmark DK DK_EQUITIES NASDAQ OMX Copenhagen
Finland FI FI_EQUITIES NASDAQ OMX Helsinki
France FR FR_EQUITIES Euronext Paris
Germany DE DE_EQUITIES Berlin Stock Exchange, Dusseldorf Stock Exchange, XETRA, Frankfurt Stock Exchange, Hamburg Stock Exchange, Hannover Stock Exchange, Munich Stock Exchange, Stuttgart Stock Exchange, Xetra Indices
Great Britain GB GB_EQUITIES London Stock Exchange, ICAP Securities & Derivatives Exchange, Cboe Europe Equities CXE
Greece GR GR_EQUITIES Athens Exchange
Hong Kong HK HK_EQUITIES Hong Kong Stock Exchange
Hungary HU HU_EQUITIES Budapest Stock Exchange
India IN IN_EQUITIES Bombay Stock Exchange, National Stock Exchange of India
Indonesia ID ID_EQUITIES Indonesia Exchange
Ireland IE IE_EQUITIES Irish Stock Exchange, Irish Stock Exchange Bonds & Funds
Italy IT IT_EQUITIES Milan Stock Exchange
Japan JP JP_EQUITIES Tokyo Stock Exchange, JASDAQ, Osaka Exchange, Nagoya Stock Exchange, Fukuoka Stock Exchange, Sapporo Securities Exchange
Malaysia MY MY_EQUITIES Malaysia Stock Exchange
Mexico MX MX_EQUITIES Mexican Stock Exchange
Netherlands NL NL_EQUITIES Euronext Amsterdam
New Zealand NZ NZ_EQUITIES New Zealand Stock Exchange
Norway NO NO_EQUITIES Oslo Exchange
Pakistan PK PK_EQUITIES Pakistan Stock Exchange
Peru PE PE_EQUITIES Lima Stock Exchange
Philippines PH PH_EQUITIES Philippine Stock Exchange
Poland PL PL_EQUITIES Warsaw Stock Exchange
Portugal PT PT_EQUITIES Euronext Lisbon
Russia RU RU_EQUITIES Moscow Exchange
Singapore SG SG_EQUITIES Singapore Exchange
South Africa ZA ZA_EQUITIES Johannesburg Securities Exchange
South Korea KR KR_EQUITIES Korea Exchange, Korea KONEX
Spain ES ES_EQUITIES Madrid Stock Exchange/Spanish Markets
Sweden SE SE_EQUITIES NASDAQ OMX Stockholm, AktieTorget, Nordic Growth Market
Switzerland CH CH_EQUITIES SIX Swiss Exchange, BX Swiss AG, Swiss Fund Data
Taiwan TW TW_EQUITIES Taiwan Stock Exchange
Thailand TH TH_EQUITIES Stock Exchange of Thailand
Turkey TR TR_EQUITIES Istanbul Stock Exchange
United States US US_EQUITIES NYSE, NASDAQ, AMEX

Asset Identifiers

When researching and developing a quantitative investment strategy, it is critical to have a reliable way of identifying equities. However, across the finance industry, there is no one standard that everyone uses to identify equities. Depending on the source of data, equities can be identified using ticker symbols, CUSIPs, SEDOLs, and more. Consolidating data identified using different systems can be a difficult task. To solve this problem, Quantopian collects and surfaces data through uniform APIs by first mapping all datasets to a common set of identifiers called SIDs (security identifiers). SIDs are integer labels that maintain a consistent reference to a particular equity even over symbol changes and other events.

The way that SIDs are determined is different for US and non-US equities. The two methods are described below.

US SIDs

For US equities, SIDs are provided to Quantopian from a third-party vendor. The same vendor also provides mappings from CUSIP and ticker symbols to SIDs for US equities. When integrating a dataset, Quantopian uses these mappings along with proprietary algorithms to associate records from the new dataset with SIDs. For example, FactSet provides Quantopian with CUSIP labels for each of their datasets. These CUSIPs are used to label records from FactSet datasets with SIDs so that the datasets can be used alongside datasets from other vendors.

Global SIDs (Excluding US)

For non-US equities, SIDs are generated from FactSet's proprietary FSYM Regional Identifiers. Currently, all global (excl. US) data is sourced from FactSet, so aligning identifiers between vendors is not required.

Ticker Symbols

In addition to SIDs, equities on Quantopian are labeled with a ticker symbol. Whenever an equity is displayed in the application (including pipeline output, backtest transactions, etc.), it is typically displayed with its SID and current ticker symbol. Functions like symbols() in research and the IDE support historically accurate ticker symbol lookups, but any time a symbol is displayed, the accompanying ticker symbol represents the current symbol, even in historical simulations.

Point-In-Time Data

To prevent lookahead bias, Quantopian data is stored in a point-in-time fashion. Each data point is stored with two special fields: an asof_date and a timestamp. The asof_date is typically provided by the data vendor and is used to inform Quantopian's simulation engines about where a data point should be slotted in a timeseries. The timestamp is created by Quantopian upon collecting the data from the vendor and is used to inform the pipeline simulation engine about when in the simulation that data point can be used.

The timestamp of each data point is used to control when pipeline uses that data point in a simulation. Each market has a 'cutoff time' set to 45 minutes before market open. The cutoff time is used to decide if a data point was known early enough to be used by pipeline that day. All data points with a timestamp prior to the cutoff time of day N can be used by a pipeline simulation on day N. Data points with a timestamp after the cutoff time on day N will not be used by a pipeline with a simulation date of N.

Example

Let's say Quantopian had the following data points for field X for company AAAA (trading in the US):

asof_date value timestamp
03-04-2019 2.56 03-04-2019 11:55pm (ET)
03-05-2019 1.73 03-05-2019 11:55pm (ET)
03-06-2019 -5.21 03-07-2019 10:00am (ET)
03-07-2019 0.53 03-07-2019 11:55pm (ET)

And let's say a pipeline was defined with a 3-day simple moving average factor over field X. If the pipeline was executed on 03-07-2019 (e.g. run_pipeline(pipe, start_date='2019-03-07', end_date='2019-03-07')), the 3-day SMA computation would be performed on a timeseries for AAAA that looks like this: [2.56, 1.73, 1.73].

Why? The simulation was conducted on 03-07-2019 and company AAAA is trading in the US (markets open at 9:30am ET). Therefore, 03-07-2019 8:45am (ET) was used as the cutoff time. The third data point (value=-5.21) has a timestamp of 03-07-2019 10:00am (ET), which is after the cutoff time, so it was not yet accessible to pipeline. As a result, the value for 03-06-2019 had to be forward-filled from the previous known data point.

If the same pipeline is run on 03-08-2019, (e.g. run_pipeline(pipe, start_date='2019-03-08', end_date='2019-03-08')), the 3-day SMA computation would be performed on a timeseries for AAAA that looks like this: [1.73, -5.21, 0.53].

Why? This time, the the second, third, and fourth data points were all known by the cutoff time (03-08-2019 8:45am (ET)). Even though the data point whose value is -5.21 came later than the 03-07 cutoff time, it was known by the 03-08 cutoff time. After determining which data points it can use, a pipeline uses the asof_date to slot each data point into the timeseries which is why the -5.21 data point is properly slotted as the second newest data point in the timeseries.

By using both the asof_date and timestamp, the pipeline simulation engine is able to remove lookahead bias from its computations.

Historical Data

For historical data which existed prior to Quantopian's integration, timestamps are approximated by adding a delay offset to historical dates provided by the vendor. Each dataset in the Data Reference documents its live collection start date and historical timestamp approximation method. Currently, the most complex timestamp approximation performed by Quantopian occurs in the FactSet Fundamentals dataset.

See also

Learn more about how Quantopian stores and manages point-in-time data, check out the Three-Dimensional Time webinar.

Corporate Action Adjustments

Note

This section describes how pricing data and other per-share data is adjusted for coporate actions like splits, mergers, and dividends. This concept is distinct from how corporate actions are applied to your portfolio holdings in a backtest.

When your pipeline or algorithm calls for historical data denominated in units per share (such as price per share), it is adjusted for splits, mergers, and dividends as of the current simulation date.

Adjustments depend on three pieces of information:

  1. The date of the data point.
  2. The date from which the data point is being considered (the current simulation date).
  3. Any events (splits, dividends, and mergers) that happened between those two dates.

For example, on June 9, 2014, AAPL had a 7:1 stock split event. If we held one share of AAPL before the split, we would have held 7 AAPL shares after the split. Let's walk through this case.

Let's say our simulation date is May 16, 2014. We want yesterday's close price for AAPL. The date of the price is May 15, and the date from which the price is being considered is May 16. Since no events occurred between May 15 and May 16, we can use the as-is close price ($588.82).

Now let's move forward in time. Our simulation date is July 2, 2014. We want yesterday's close price for AAPL. The date of the price is July 1, and the date from which the price is being considered is July 2. Since no events occurred between July 1 and July 2, we can use the as-is close price ($93.52).

But what if we wanted the close price from May 15, 2014 on this same July simulation date? (For example, if we wanted to run a trailing-window calculation with a two-month lookback period.) Then, we'd see a sudden 84% price drop in the middle of our data window solely due to the 7:1 stock split -- even though there would be no change in the value of your portfolio if you held AAPL.

This will clearly result in misleading values for many graphs and trailing-window calculations (for example, the simple moving average over any window that includes June 9). Fortunately, Quantopian data is adjusted so you don't have to account for these sudden jumps.

To continue with the AAPL example: let's say our simulation date is July 2, 2014; we want the close price from May 15, 2014. Instead of showing a sudden 84% decrease in close price, Quantopian adjusts the pre-June 9 prices so that this sudden jump disappears. Since it's a 7:1 stock split, the May 15, 2014 price will be divided by 7 (adjusted price: $84.12).

In this example, any prices from before June 9 will be adjusted (divided by 7) for simulation dates after June 9, 2014. However, prices will not be adjusted for simulation dates before June 9; and prices from after June 9 will not be adjusted at all.

To summarize: On simulation dates after the split occurs, pricing data from before the split will be adjusted.

Though our AAPL example dealt with a split event specifically, dividends and mergers are dealt with analogously.

If it seems like your data isn't being properly adjusted for a split, merger, or dividend, it's possible that we missed the event. Please reach out to support@quantopian.com if you think this is the case.

Note

Why aren't all prices over all time adjusted for splits/mergers/dividends, regardless of the simulation date? Adjusting for an event before it occurs introduces lookahead bias. While it's difficult to determine exactly how this bias would affect a strategy, it's best practice to avoid lookahead bias whenever possible.

Holdout Periods

Quantopian provides dozens of datasets including pricing, fundamental, and alternative data. Much of the data is available up to the present day, but some datasets have the last 1-2 years of data held out. Full datasets updated through the current day are available in Quantopian's enterprise offering.

Note

Datasets that were available via subscription in the Quantopian Store are currently being phased out of Quantopian. All datasets that are officially supported are listed in this Data Reference.