Spinn3r: RSS Content, News Feeds, News Content, News Crawler and Web Crawler APIs

Advanced Feature Set

Full metadata

Index weblogs, mainstream news, and social media. RSS, Atom, HTML, microformats, and microdata web formats. All our APIs are powered by JSON for ease of use and rapid implementation.

Firehose API

Distributed with a full firehose API which handles 95% of the data indexing requirements. No coding required. Just start it up and it spools JSON files to disk.

Admin Console

Full visibility into our crawl. We provide a comprehensive admin console for use by our customers.

+140M Sources Indexed

Indexing over 140M sources available through the API. Vast coverage of social media, weblogs, mainstream news, and more.

Full-text Search

Integrated full-text search powered by Elasticsearch and Kibana. Run powerful queries and aggregations on raw data. Full text search allows for precise queries over vast amounts of data.

Boilerplate Removal

Integrated boilerplate removal and content extraction based on state of the art information retrieval techniques. Exclude ads, navigation and other miscellaneous text on a page.

Language and Spam Detection

Full language detection. Hate spam? Don't worry! Spinn3r ships with integrated spam prevention.

Fault Tolerant

Spinn3r is built on a fault tolerant infrastructure powered by Softlayer and is monitored 24/7 to ensure high availability.

Learn about our full feature breakdown

Firehose API

Dedicated firehose and content streaming with advanced filtering.

Receive content in real time

Our firehose allows you to index content in real time, as soon we discover new content. Our client installs as a daemon, runs in the background and spools content to disk.

Advanced filtering with boolean logic

Our firehose API supports advanced filtering using boolean logic, on any field (or within fields). Search for documents in English, by publisher type, with contain terms or tags, etc.

High throughput

Our firehose API is designed to scale. We serve more than 100TB to our customers per month. Our infrastructure is built on a highly parallel cluster design which we've had in production for nearly a decade.

Full-Text Search and Analytics

Direct Elasticsearch API Access

Use the raw Elasticsearch query API including all features like aggregations, Lucene’s structured query DSL, filters, etc.

Kibana Analytics

We provide a Kibana search GUI on top of our corpus which allows for easy data visualization.

Indexed Fields

All metadata fields indexed correct elasticsearch field mapping. Search for inbound links, search by domain, etc.

Easy to use API

JSON over HTTP

Uses the industry standard JSON vocabulary for representing documents. No dealing with APIs, RSS or microformats. All data in Spinn3r comes through a standardized API.

Easy integration

Simple integration with your app. Check on the status of a source, register new sources, get the recent posts on a source , etc.

Evolving schema

We're constantly iterating, and adding new fields and metadata, as web standards change over time. This includes modern metadata such as geo, tags, author information. Our schema can easily accommodate rapidly changing web standards.

Index weblogs, mainstream news, and social media with spinn3r

A firehose (or decahost) and full-text search API for social media.

One easy to use API

Monitor the crawl

Advanced Feature Set

Full metadata

Firehose API

Admin Console

+140M Sources Indexed

Full-text Search

Boilerplate Removal

Language and Spam Detection

Fault Tolerant

Learn about our full feature breakdown

Firehose API

Receive content in real time

Advanced filtering with boolean logic

High throughput

Full-Text Search and Analytics

Direct Elasticsearch API Access

Kibana Analytics

Indexed Fields

Easy to use API

JSON over HTTP

Easy integration

Evolving schema

Trusted data provider

More than 1000 PhDs have access to Spinn3r data with more than 350 academic papers.

Sign up for a trial

So, what are you waiting for? It only takes a few minutes and a few lines of code to start indexing the data that really matter to you.