Playing Wordle with the command-line
Tuesday 04 January, 2022 at 9.09am
I hated when the family made me play Scrabble. I was only ever good at the numbers part of Countdown. And now I am also frustrated by the new word game Wordle that has been showing up in my twitter steam over the past couple of weeks.
The problem is I forget words all the time, even between starting a sentence and the point in the sentence I was going to use the word. I know the words, and it's not like I don't understand them when they are used at me. But I can't access them sometimes, even having a subset of letters doesn't make it any easier. And seeing jumbled letters seems to make this worse, not better. Consequently, any game that's about being able to come up with words leaves me cold.
I failed my first three attempts at Wordle, and got stuck on the fourth attempt because I couldn't think of any words that matched the constraints before me. So I cheated, and found there was one exact match, and it was just so obvious once I realised I could use a letter twice. But all questions are easy when you know the answer.
Anyway, if you're anything like me, and have access to fairly standard Unix command-line tools, here's how you can play along too, even if you just can't words.
Prepare the first guess
To start with, we can find words in the file /usr/share/dict/words
, which
contains a pretty comprehensive list of … well, words. Let's find those that
are only five letters.
grep '^.....$' /usr/share/dict/words
Grep is the command-line tool for finding patterns using a limited amount of
regular expressions. The caret (^
) marker restricts the search to the
start of the line, the dollar ($
) end of the line, and dot (.
) means "any
character". So this says "print all lines that have five of any character".
But let's restrict this a little. I don't know but I suspect proper nouns
aren't likely to be used in Wordle.
grep '^[a-z]....$' /usr/share/dict/words
This replaces the first dot with a bracket expression [a-z]
which means a
range, "any character between a and z". This will exclude any word starting
with a capital letter.
Now let's remove words with duplicated letters. To start, we need to break the word into separate letters.
echo -n boost | sed 's/\(.\)/\1\n/g'
This gives us the individual letters on separate lines. The sed
command is a
"stream editor", and the argument 's/…/…/'
does search-and-replace. The
search pattern can also use regular expressions. In this case the
any-character match is captured for use in the replacement by surrounding it
with parens ((.)
); to be treated as special characters and not literal
parens they need to be escaped, hence the backslashes before them. The
replacement (\1\n
) means "the first capture, and a newline". The final part
(g
) is a flag meaning to repeat the match until no more are found, rather
than stopping after the first which is the default behaviour.
We can now remove duplicates using sort
's uniqueness flag:
echo -n boost | sed 's/\(.\)/\1\n/g' | sort -u
This shows the letter o
once, instead of twice. Now, by counting the number
of lines and testing that it is still five, we know if the word has five
distinct letters. Wrap that up into a loop, and we can check all of the
five-letter words.
for word in $(grep '^[a-z]....$' /usr/share/dict/words); do
letters=$(
echo -n $word \
| sed 's/\(.\)/\1\n/g' \
| sort -u \
| wc -l
)
[ $letters = 5 ] && echo $word
done > initial-wordles.txt
Using for word in ...; do ... done
we loop over the words we found
previously, putting each word in a variable helpfully called word
. Then we
set the value of a variable letters
to the count of unique letters. Only if
it is 5 do we output the word. And by redirecting the output to a file, we
can capture this processing for future use.
On my 2015 iMac, processing this list takes slightly less than a minute. Yes, you could process the words much faster in an actual programming environment, but it would take just as long to write the program as it would to compose and run the shell commands.
Now we can make a first guess.
% shuf -n1 initial-wordles.txt
forth
shuf
shuffles the lines of the initial guesses file, and -n
says to then
show only the first n lines, in this case one. We get "forth", and use it.
The R and the T are in the word, but are both in the wrong place. The other
letters are not in the word.
% grep '^[a-z]....$' /usr/share/dict/words \
| grep -v '[foh]' | wc -l
Using all of the five-letter words again, we filter out (grep -v
means
"invert match", so only the lines that do not match are output) any words
that contain the letters F, O, or H. I use wc -l
on the end to get an idea
of how many matches there are. 4,666 words out of an initial 8,497, so we've
eliminated half the possibilities already. Now check for those that have an R
and a T.
% grep '^[a-z]....$' /usr/share/dict/words \
| grep -v '[foh]' | grep r | grep t
We grep for both individually because each line has to contain both. Now we're at 399 matches. But we also know where the R and the T aren't in the word, and can incorporate that too.
% grep '^[a-z]....$' /usr/share/dict/words \
| grep -v '[foh]' | grep r | grep t \
| grep '..[^r][^t].'
Using caret (^) at the start of the bracket expression negates it, so this new line matches any-character any-character not-an-r not-a-t any-character. Now we're at 275 matches. Let's pick one.
% grep '^[a-z]....$' /usr/share/dict/words \
| grep -v '[foh]' | grep r | grep t \
| grep '..[^r][^t].' \
| shuf -n1
react
Once again, the R and the T are in the wrong place, and the other letters are not found. Adding this information to our restrictions narrows things down considerably to just 16 matches.
% grep '^[a-z]....$' /usr/share/dict/words \
| grep -v '[aecfoh]' | grep r | grep t \
| grep '[^r].[^r][^t][^t]'
butyr
stirk
stirp
sturk
tikur
trill
trink
tripy
trubu
trull
truly
trump
trunk
truss
twirk
twirl
Note: It's worth noting at this point that some words in the Unix "dictionary" aren't words recognised by Wordle, such as the first match "butyr". You could always capture the list of five-letters words to a file, and edit out those that aren't recognised as you come across them.
I pick one myself: "truly". The first three letters match! And although I can now match the pattern in my head, for completeness let's run it once last time adding L and Y to the not-in-the-word filter, and putting T, R, and U as exact matches.
% grep '^[a-z]....$' /usr/share/dict/words \
| grep -v '[aecflohy]' | grep r | grep t \
| grep 'tru..'
trubu
trump
trunk
truss
trust
Of these, I first pick "trust", dagnabbit. Then finally the correct word "truss".