September | 2021 | Possibly Wrong

Several months ago I updated the list of words and frequencies of occurrence that I’ve used in various natural language processing experiments (keyword search “ngrams”) over the years, to reflect last year’s update to the Google Books Ngrams dataset.

This past weekend I updated it again, to include the words in the WordNet parts of speech database developed by Princeton University, restricting to just those 63,745 terms consisting of single words of all lowercase letters, i.e., no capitalization, spaces, hyphens, or other punctuation.

Most– over 98%– of these words were already in the list. But including them lets us also use their categorization into parts of speech (nouns, verbs, adjectives, and adverbs), to automate the generation of random code names for your next super secret project.

Just select a random adjective, followed by a random noun, and you get Operation ALIEN WORK, or DISSENTING CITY, or one of my favorites, OCCASIONAL ENEMY. Weighting the random selections by word frequency keeps the code names reasonably simple, so that you don’t end up with Operation PARABOLOIDAL IMMIGRATION (although an unweighted sample did yield Operation YONDER KING, which sounds pretty cool).

The Python code name generator and word lists of parts of speech and corresponding frequencies are available on GitHub.

References:

Google Books Ngram Viewer Exports, English versions 20120701 and 20200217
Princeton University “About WordNet.” WordNet. Princeton University. 2011.

Possibly Wrong

On science, mathematics, and computing

Monthly Archives: September 2021

Code name generator