On November 12, 2012, Randall Munroe’s famous xkcd comic published Up Goer Five, a blueprint and explanation of the Apollo V rocket written using only the 1000 most common words of the English language (as he estimated them). Later on, on November 24, 2015, came out Thing Explainer, an entire illustrated book of similar explanations for other objects and concepts. The “only the most common 1000 words” style of writing sounds sometimes stilted, sometimes a bit funny, but these texts certainly prove that it’s enough to talk virtually about anything.

In the age of LLMs, would it be possible to have a training set built only on the most common 1000 words of the English language?

Let’s try.