Why There Will Never Be a List of Every Word

Robinson Meyer

Fri, Jun 6, 2014, 4:37 PM

This afternoon, @everyword will send its final missive. The Twitter account has published what it claims to be every word in the English language, one tweet per word, since 2007. It’s now counting the final Zs, and the rest will be silence.

Twitter is now crowded with bots. There’s one that mashes headlines together and another that posts pictures from the Metropolitan Museum of Art and a third that f inds tweets that are anagrams of each other. Last summer, a famous bot was revealed not to be a bot at all, and the New Yorker took note.

But last year, the New York Review of Bots, implied @everyword was the very best bot. And now it’s ending.

Ruth Spencer of the Guardian recently talked to its creator, the programmer and poet Adam Parrish. “Where,” she asked, “does the library of words you use come from?”

Parrish’s reply, I think, gets to what makes @everyword so interesting:

I honestly don't remember. It's a list of words that I downloaded from a website somewhere. It's not the OED. One of the purposes of @everyword is to raise the question of whether it's possible to have a canonical list of the English language. To me, the obvious answer is no. We come up with new words all the time. We have rules about what can and cannot be words and linguists don't know where to draw the line any more.

Parrish has also written his own @everyword postmortem. He says he hopes to run a “Season 2” for the bot with a more complete word list.

Of course, giving the bot a larger vocabulary will only itself intensify the bot’s existential question. The German mathematician Gregor Cantor proved that there could be larger and smaller infinities that were both, still, infinities; a more prolix Every Word will only make its lacunae more noticeable.

Because—make that list longer and longer—and you will run into an old problem: No one’s sure what a word is, exactly.

For over a century, some scholars have claimed that Shakespeare used more words than any other writer, that his vocabulary dwarfed his era’s fellow English-speakers. The number of words he deployed, some insisted, is even double that of modern-day speakers. In 1986, a famed and Emmy-winning PBS documentary, The Story of English, alleged: “Shakespeare had one of the largest vocabularies of any English writer, some 30,000 words. Estimates of an educated person’s vocabulary today vary, but it is probably about half this, 15,000.”

Could that be true? It depends what you mean by vocabulary. As Ward Elliott and Robert Valenza write in their paper, “Shakespeare’s vocabulary: did it dwarf all the others?”, there are three different ways to cut up a text into its words. (They cite Marvin Spevack’s important studies into this issue, which were among the first to use a computer.)

Of the 884,647 tokens in the Riverside Shakespeare corpus, a computer counts 29,066 “types”—that is, different kinds of collections of letters. This machine-counting doesn’t account for the common alternate spellings of Shakespeare’s day, like wreck and wrack, or murder and murther, nor does it separate plurals and conjugated forms from their more common roots. Therefore, horse and horses are two different words, as are run and running.

That’s because computers—at least in the late 1960s, when Spevack was conducting his study—could only distinguish “types” like those. That horse and horses shared a root meant nothing to them. To count root words—which are sometimes called lemmas—the two scholars had to rely on hand-counts, or rely on the common estimate that a vocabulary not yet lemmatized is two-thirds larger than one that uses only root words.

What’s Shakespeare’s lemmatized vocabulary, then? Both long-respected hand-counted efforts and a mathematical estimation return the same answer: He used between 17,000 and 18,000 root words.

This count may still be incorrect. Spevack’s machine reading can’t account for homographs, words like spring or bear that can function as nouns or verbs and have many more definitions after that. It also doesn’t track two-token words, like grown up, where types combine to create a new definition. Finally—and this is the largest misestimation of all—it doesn’t account for words that Shakespeare knew but never wrote in a play. Such a challenge engrosses Elliott and Valenza for much of their paper. They conclude, finally, that Shakepeare’s total vocabulary… is just about the same size as or smaller than that of a “run-of-the-mill college-educated modern.”

Look at @everyword, and you can see that its 109,000 tweets aren’t lemmatized. In its Elysian Fields, a single…

horse
— everyword (@everyword) September 22, 2010

gallops among…

horses
— everyword (@everyword) September 22, 2010

and it’s not just doing it to…

run
— everyword (@everyword) December 3, 2012

but because it likes…

running
— everyword (@everyword) December 4, 2012

And the point Parrish wants to make about language is a little different, too. At @everyword’s current rate of one tweet every 30 minutes, no starter list can stay up-to-date for the years and years it would take to complete the English language. Language is much too protean for that.

More From The Atlantic

‘Americans just work harder’ than Europeans, says CEO of Norway’s $1.6 trillion oil fund, because they have a higher ‘general level of ambition’
"We are not very ambitious. I should be careful about talking about work-life balance, but the Americans just work harder.”
Fortune•15h ago
Warren Buffett Says 'When It Rains Gold, Put Out The Bucket' And This High Yield Investment Is Making It Rain
In his 2016 letter to Berkshire Hathaway shareholders, legendary investor Warren Buffett wrote, “Every decade or so, dark clouds will fill the economic skies, and they will briefly rain gold. When downpours of that sort occur, it’s imperative that we ...
Benzinga•8h ago
Microsoft beats Q3 top and bottom lines on cloud strength
Microsoft reported better than anticipated Q3 earnings on Thursday, powered by growth in its cloud products.
Yahoo Finance•3h ago
Mark Zuckerberg got $24.4 million in ‘other compensation’ in 2023—but Meta also treated staff well, with the median employee making $379,000
On paper, Mark Zuckerberg is Meta’s lowest-paid employee, with a $1 dollar salary and no bonus.
Fortune•2d ago
Hertz loses another $200 million from its EVs
Car-rental operator Hertz reported it lost another $200 million due to its EV gamble.
Yahoo Finance•8h ago
Javier Milei Fuels Wild Rally That Makes Peso No. 1 in World
(Bloomberg) -- Four months into office, Argentine President Javier Milei has pulled off a critical feat in a country long ravaged by runaway inflation: He stabilized the currency.Most Read from BloombergUS Economy Slows and Inflation Jumps, Damping Soft-Landing HopesMalaysia in Talks With Tycoons on Casino to Revive $100 Billion Forest CityJavier Milei Fuels Wild Rally That Makes Peso No. 1 in WorldBig Tech Surges in Late Hours on Blowout Earnings: Markets WrapBiden’s Gains Against Trump Vanish
Bloomberg•8h ago
Intel reports better than expected Q1 earnings but falls short on revenue outlook. Stock slides more than 5%.
Intel reported its Q1 earnings on Thursday, beating analysts' estimates. But a disappointing outlook sent shares sliding.
Yahoo Finance•5h ago
Google Earnings Handily Beat Wall Street Targets. Google Stock Dividend Approved.
Google stock soared after parent Alphabet reported first-quarter earnings and revenue that handily beat consensus estimates.
Investor's Business Daily•3h ago
Alphabet stock surges on earnings beat, dividend announcement
Alphabet reported Q1 earnings on Thursday that beat estimates. The company also announced its first-ever dividend.
Yahoo Finance•11h ago
Google parent announces first-ever dividend; beats on sales, profit; shares soar
Alphabet announced its first-ever dividend on Thursday and a $70 billion stock buyback, cheering investors who sent the stock surging nearly 16% after the bell. The Google parent is returning capital while spending billions of dollars on data centers to catch up with rivals on generative artificial intelligence. The dividend will be 20 cents per share.
Reuters•5h ago

News

Life

Entertainment

Finance

Sports

New on Yahoo

Yahoo Finance

Why There Will Never Be a List of Every Word

Recommended Stories

‘Americans just work harder’ than Europeans, says CEO of Norway’s $1.6 trillion oil fund, because they have a higher ‘general level of ambition’

Warren Buffett Says 'When It Rains Gold, Put Out The Bucket' And This High Yield Investment Is Making It Rain

Microsoft beats Q3 top and bottom lines on cloud strength

Mark Zuckerberg got $24.4 million in ‘other compensation’ in 2023—but Meta also treated staff well, with the median employee making $379,000

Hertz loses another $200 million from its EVs

Javier Milei Fuels Wild Rally That Makes Peso No. 1 in World

Intel reports better than expected Q1 earnings but falls short on revenue outlook. Stock slides more than 5%.

Google Earnings Handily Beat Wall Street Targets. Google Stock Dividend Approved.

Alphabet stock surges on earnings beat, dividend announcement

Google parent announces first-ever dividend; beats on sales, profit; shares soar