Eight co-champions carry a trophy after winning the 2019 Scripps National Spelling Bee on May 31, in Oxon Hill, Md. Each will get the full winner's prize of $50,000 in cash. (Patrick Semansky/AP)

Pendeloque. Cernuous. Odylic. Erysipelas.

Those are among the eight winning responses to this year’s Scripps National Spelling Bee, and if you’ve never seen letters arranged to form words in those particular orders before this morning, you’re not alone.

Each year it seems as if the Bee organizers have to dig deeper and deeper into their hoard of obscure English words to crown a champion. Laodicean (2009). Cymotrichous (2011). Scherenschnitte (2015).

By contrast, take a look at the winning words from the Bee’s early decades. In 1932, a proper spelling of “knack” was enough to bring home the crown. In 1940 all it took was a little “therapy.” The winning word in 1941? “Initials.”

Has the Bee really gotten that much harder over time?

To find out, I’ve run each year’s winning word through Google Ngrams, which tallies the frequency a given word or phrase appears in Google’s massive corpus of digital books in any given year. The general idea is that words that appear in the database more often are more common, and hence probably easier to spell.

To give one example, “knack” accounted for 0.0001071291 percent of the words in the database in 1932, the year it appeared as the Scripps spelling bee’s winning word. By contrast “serrefine,” 2007′s winner, accounted for 0.0000000185 percent of the words in the database that year.

Those may both seem like really, really tiny numbers, but the difference between them is actually enormous: Knack was about 6,000 times more common in 1932 than serrefine was in 2007.

When we run that exercise for every single year and plot the results over time, it looks like this:


For starters, note that the y-axis is in a logarithmic scale: Every tick upward represents an increase in frequency by a factor of 10. This is necessary because there’s nearly a 200,000-fold difference between the most and least common words in the list.

I’ve also displayed the data as a five-year moving average. The year-to-year data is noisy, so smoothing it out this way helps make the overall trend easier to discern.

While the line does a fair amount of meandering, the general direction is clearly downward. Back in the 1930s and 1940s, for instance, you might expect a winning word to make up about 0.0001 percent of Google’s database. In recent years, however, it’s tended to be somewhere between 0.00001 and 0.000001 percent, or one-tenth to one-hundredth as common as in the early decades of the competition.

Google’s word database is not a perfect tool for measuring how common any given word is. It leans heavily on scientific texts, for instance, and a word that appears once in an obscure academic journal is given just as much weight as one that appears in a mass market book read by millions. But it nevertheless provides a useful approximation of a word’s frequency in the overall language.

The database is also particularly useful in this case because it calculates frequency within a given year. Language evolves, and words that are common in one decade can go out of fashion in the next. The Google database allows us to measure a word’s frequency within the context of the year it appeared in the Scripps Spelling Bee.

The word difficulty has ramped up in part because the Bee has become bigger and more professional over the years. Back in 1925, there were just nine finalists selected from an initial field of 2 million schoolchildren. This year 11 million children were winnowed down to 565 finalists. Many of them are aided by personal spelling coaches and special preparation software.

As a result, Bee organizers have taken to rooting about in some of the more obscure realms of English in an effort to stump the players. In recent years they’ve turned to fields like medicine (“erysipelas,” “stromuhr,” “serrefine”), botany (“cernuous,” “bougainvillea”) and textiles (“aiguillette,” “marocain”).

Last night’s results, with an unprecedented eight-way tie, suggest that this year’s crop of contestants are so well-prepared that they’ve essentially solved the language: Organizers hit them with the toughest words English has to offer, and the kids kept right at it until the word list was exhausted.