Index weblogs, mainstream news, and social media with spinn3r

A firehose (or decahost) and full-text search API for social media.

Advanced Feature Set

Full metadata

Index weblogs, mainstream news, and social media. RSS, Atom, HTML, microformats, and microdata web formats. All our APIs are powered by JSON for ease of use and rapid implementation.

Firehose API

Distributed with a full firehose API which handles 95% of the data indexing requirements. No coding required. Just start it up and it spools JSON files to disk.

Admin Console

Full visibility into our crawl. We provide a comprehensive admin console for use by our customers.

+140M Sources Indexed

Indexing over 140M sources available through the API. Vast coverage of social media, weblogs, mainstream news, and more.

Full-text Search

Integrated full-text search powered by Elasticsearch and Kibana. Run powerful queries and aggregations on raw data. Full text search allows for precise queries over vast amounts of data.

Boilerplate Removal

Integrated boilerplate removal and content extraction based on state of the art information retrieval techniques. Exclude ads, navigation and other miscellaneous text on a page.

Language and Spam Detection

Full language detection. Hate spam? Don't worry! Spinn3r ships with integrated spam prevention.

Fault Tolerant

Spinn3r is built on a fault tolerant infrastructure powered by Softlayer and is monitored 24/7 to ensure high availability.

Firehose API

Dedicated firehose and content streaming with advanced filtering.

Receive content in real time

Our firehose allows you to index content in real time, as soon we discover new content. Our client installs as a daemon, runs in the background and spools content to disk.

Advanced filtering with boolean logic

Our firehose API supports advanced filtering using boolean logic, on any field (or within fields). Search for documents in English, by publisher type, with contain terms or tags, etc.

High throughput

Our firehose API is designed to scale. We serve more than 100TB to our customers per month. Our infrastructure is built on a highly parallel cluster design which we've had in production for nearly a decade.

Easy to use API


Uses the industry standard JSON vocabulary for representing documents. No dealing with APIs, RSS or microformats. All data in Spinn3r comes through a standardized API.

Easy integration

Simple integration with your app. Check on the status of a source, register new sources, get the recent posts on a source , etc.

Evolving schema

We're constantly iterating, and adding new fields and metadata, as web standards change over time. This includes modern metadata such as geo, tags, author information. Our schema can easily accommodate rapidly changing web standards.

Trusted data provider

More than 1000 PhDs have access to Spinn3r data with more than 350 academic papers.

Sign up for a trial

So, what are you waiting for? It only takes a few minutes and a few lines of code to start indexing the data that really matter to you.
