How Search Works

From algorithms to answers.

Part 1 of 3

Crawling And Indexing

Search starts with the web. It's made up of over 60  trillion individual pages and it's constantly growing.


Google navigates the web by crawling.

That means we follow links from page to page.

Site owners choose whether their sites are crawled.

We sort the pages
by their content
and other factors.

The index includes text from millions of books from several libraries and other partners.

"The perfect search engine would understand exactly what you mean and give you back exactly what
you want." Larry Page
Co-founder & CEO

Prefer to watch a video about search? Here you go!

Thanks to Street View, we can also include information from the physical world.

The Knowledge Graph provides better answers by organizing information about real world people, places and things.

It's over 100
million gigabytes.

And we keep track of it all in the index.

Part 2 of 3

Algorithms

We write programs  & formulasto deliver the best results possible.


As you search

algorithmsget to work looking for clues
to better understand what
you mean.

Based on these clues, we pull relevant documents from the index.

We thenrankthe results

We don't accept payment to increase a site's ranking.
The most relevant results appear first.

using over200 factors.

The search lab

Our algorithms are constantly changing. These changes begin as ideas in the minds of our engineers. They take these ideas and run experiments, analyze the results, tweak them, and run them again and again.

Explore theSearch
Lab
All this happens in
1/8th of a second.

resultscan take a variety of forms:

  1. 1

    Knowledge Graph

    Provides results based on a database of real world people, places, things, and the connections between them.

  2. 2

    Snippets

    Shows small previews of information, such as a page's title and short descriptive text, for each search result.

  3. 3

    News

    Includes results from online newspapers and blogs from around the world.

  1. 1

    Knowledge Graph

    Provides results based on a database of real world people, places, things, and the connections between them.

  2. 8

    Voice Search

    With the Google Search App, simply say what you want and get answers spoken right back to you.

  3. 9

    Mobile

    Includes improvements designed specifically for mobile devices, such as tablets and smartphones.

  1. 4

    Answers

    Displays immediate answers and information for things such as the weather, sports scores and quick facts.

  2. 5

    Videos

    Shows video-based results with thumbnails so you can quickly decide which video to watch.

  3. 6

    Images

    Shows you image-based results with thumbnails so you can decide which page to visit from just a glance.

  4. 7

    Refinements

    Provides features like "Advanced Search," related searches, and other search tools, all of which help you fine-tune your search.

As You Search

  1. Spelling

    Identifies and corrects possible spelling errors and provides alternatives.

  2. Autocomplete

    Predicts what you might be searching for.
    This includes understanding terms with more
    than one meaning.

  3. Synonyms

    Recognizes words with similar meanings.

  4. Query Understanding

    Gets to the deeper meaning of the words you type.

  5. Search Methods

    Creates new ways to search, including "search by image" and "voice search."

  6. Google Instant

    Displays immediate results as you type.

Ranking

  1. Site & Page Quality

    Uses a set of signals to determine how trustworthy, reputable, or authoritative a source is. (One of these signals is PageRank, one of Google's first algorithms, which looks at links between pages to determine their relevance.)

  2. Freshness

    Shows the latest news and information. This includes gathering timely results when you're searching specific dates.

  3. SafeSearch

    Reduces the amount of adult web pages,
    images, and videos in your results.

  4. User Context

    Provides more relevant results based on
    geographic region, Web History, and other factors.

  5. Translation and Internationalization

    Tailors results based on your language and country.

  6. Universal Search

    Blends relevant content, such as images, news, maps, videos, and your personal content, into a single unified search results page.

The Search Lab

  1. Precision Evaluation

    In a precision evaluation, Search evaluators (people who are trained to evaluate search quality) rate the usefulness of individual results for a given search.

    In a typical year, we run over 40,000 precision evaluations.
    Curious about what search evaluators are looking for?
    Download the Guidelines

  2. Side-by-Side Experiment

    In a side-by-side experiment, evaluators review two different sets of search results: one from the old algorithm and one from the experimental one.
    They analyze the results and give us feedback on
    the differences.

    In a typical year, we run over 9,000 side-by-side experiments.

  3. Live Traffic Experiment

    In live traffic experiments, we change Search for a small percentage of real Google users and see how it affects their experience. We carefully analyze the results to understand whether the change is
    an improvement.

    In a typical year, we run over 7,000 live traffic experiments.

  4. Launch

    Our lead engineers review the data from the experiments and decide if the change should go to launch for all Google users.

    Based on all these experiments, we'll launch over 500 search improvements in a typical year.

  5. Inside the Search Lab

    Here's a short video we put together that gives a sense of the work that goes into improving Search.

Part 3 of 3

Fighting spam

We fight spa24/7to keep your results relevant.


The majority of spam removal is automatic. We examine other questionable documents by hand. If we find spam, we take manualaction.

See what we've removed lately

See action taken over time

See types of spam

When we take action, we attempt to notifythe website owners.See the growth in spam notifications

Site owners can fix their sites and let us know.See how often this happens

Messages by Month

History of webmaster communication

May 2007

We used to send notifications only via email, and in 2007 webmasters reported receiving fake notifications of Webmaster Guidelines violations. We temporarily paused our notifications in response to this incident while we worked on a new notification system.

July 2007

With the launch of the Message Center feature in Webmaster Tools, we resumed sending notifications in July 2007 after pausing the notifications in May due to email spoofing.

March 2010

We began using a new notification system which enabled us to more easily send messages to the Message Center of Webmaster Tools when we found spam. The first category of spam to use this new system was hacked sites.

July 2010

A bug in our hacked sites notification system reduced the number of messages that we sent to hacked sites.

November 2010

We upgraded our notification system. With this update, we fixed the hacked sites notification bug and began experimenting with sending messages for additional categories of spam such as unnatural links from a site.

February 2011

We expanded notifications to cover additional types of unnatural links

a site.

March 2011

We expanded notifications to cover even more types of unnatural links to a site.

June 2011

We expanded the number of languages we send many of our messages in.

September 2011

We made a change to our classification system for spam. Messages for some categories of spam were not sent, while we created and translated new messages to fit the new categories.

November 2011

A bug in our hacked sites notification system reduced the number of messages that we sent to hacked sites.

December 2011

We expanded the categories of spam that we send notifications for to include pure spam and thin content.

February 2012

The bug affecting our hacked sites notifications was fixed.

Reconsideration Requests by Week

Notable moments for reconsideration requests

December 2006

A bug prevented us from properly storing reconsideration requests for about a week. On December 25th (Christmas), we submitted requests on behalf of sites affected by the bug, creating a small spike at the end of the year.

May/June 2007

Many webmasters received fake notificationsof Webmaster Guidelines violations, leading an unusual number to file reconsideration requests.

December 2007

Every year webmasters submit fewer reconsideration requests during the late December holidays.

April 2009

We released a video with tips for reconsideration requests.

June 2009

We started sending responses to reconsideration requests to let webmasters know that their requests have been processed.

October 2010

upgraded our notification system and starting sending out more messages.

April 2011

We rolled out the Panda algorithm internationally. In the past, sites have often filed reconsideration requests when they see traffic changes that aren’t actually due to manual action.

April - Sept 2011

We started sending reconsideration responses with more information about the outcomes of reconsideration requests.

June 2012

We began sending messages for a wider variety of webspam issues. We now send notifications for all manual actions by the webspam team which may directly affect a site's ranking in web search results.

Manual Action by Month

Milestones for manual spam fighting

February 2005

We expanded our manual spam-fighting team to Hyderabad, India.

March 2005

We expanded our manual spam-fighting team to Dublin, Ireland.

April 2006

We expanded our manual spam-fighting team to Tokyo, Japan.

June 2006

We expanded our manual spam-fighting team to Beijing, China.

October 2007 - Legacy

We changed our classification system to keep data in a more structured format based on the type of webspam (which allowed us to create this chart). Actions that couldn't be categorized appropriately in the new system are in the “legacy” category. The breakdown by spam type isn't readily available for the older data.

October 2009 - Unnatural links from your site

Improvements in our systems allowed us to reduce the number of actions taken on sites with unnatural outbound links.

November 2009 - Hacked sites

We noticed an increase in hacked sites and increased our efforts to prevent them from affecting search results.

February 2011 - Spammy free hosts and dynamic DNS providers

We increased enforcement of a policy to take action on free hosting services and dynamic DNS providers when a large fraction of their sites or pages violated our Webmaster Guidelines. This allowed us to protect our users from seeing spam when taking action on the individual spammy accounts would be impractical.

October 2011 - Cloaking and/or sneaky redirects

We made a change to our classification system so that the majority of cloaking and sneaky redirect actions were labeled as “Pure spam.” Actions related to less egregious violations continue to be labeled separately.

October 2011 - Parked domains

We reduced our efforts to manually identify parked domains due to improvements in our algorithmic detection of these sites.

April 2012

We launched an algorithmic update codenamed “Penguin” which decreases the rankings of sites that are using webspam tactics.

And that's how search works.

Behind your simple page of results is a complex system, carefully crafted and
tested, to support more than one-hundred billion searches each month.

Learn something interesting?
By the way, in the 0 seconds you've been on this page, approximately
0 searches were performed.

Types of Spam

  1. Pure Spam

    Site appears to use aggressive spam techniques such as automatically generated gibberish, cloaking, scraping content from other websites, and/or repeated or egregious violations of Google's Webmaster Guidelines.

  2. Hidden text and/or
    keyword stuffing

    Some of the pages may contain hidden
    text and/or keyword stuffing.

  3. User-generated spam

    Site appears to contain spammy user-generated content. The problematic content may appear on forum pages, guestbook pages, or user profiles.

  4. Parked domains

    Parked domains are placeholder sites with little unique content, so Google doesn't typically include them in search results.

  5. Thin content with little or
    no added value

    Site appears to consist of low-quality or shallow pages which do not provide users with much added value
    (such as thin affiliate pages, doorway pages, cookie-cutter sites, automatically generated content, or
    copied content).

  6. Unnatural links to a site

    Google has detected a pattern of unnatural artificial, deceptive or manipulative links pointing to the site. These may be the result of buying links that pass PageRank or participating in link schemes.

  7. Spammy free hosts and dynamic DNS providers

    Site is hosted by a free hosting service or dynamic
    DNS provider that has a significant fraction of
    spammy content.

  8. Cloaking and/or
    sneaky redirects

    Site appears to be cloaking (displaying different content to human users than is shown to search engines) or redirecting users to a different page than Google saw.

  9. Hacked site

    Some pages on this site may have been hacked by a third party to display spammy content or links. Website owners should take immediate action to clean their sites and fix any security vulnerabilities.

  10. Unnatural links from a site

    Google detected a pattern of unnatural, artificial, deceptive or manipulative outbound links on this site. This may be the result of selling links that pass PageRank or participating in link schemes.

Continue to next section