Day 15: Words on CPAN (App::wordlist)
About the series: perlancar's 2014 Advent Calendar: Introduction to a selection of 24 modules which I published in 2014. Table of contents.
Some of you might already know that there are some wordlists on CPAN, under the Games::Word::Wordlist:: namespace. These lists can be used for, as the package name suggests, word games like Hangman or Word Search or Scrabble, but can also be used for other purposes, among others: cracking/bruteforce dictionary, password generation, feeding into search engine, or some linguistics tasks like word length/frequency analysis.
The wordlist CLI utility (from App-wordlist distribution) is a small script which I recently wrote to query the contents of the abovementioned modules.
To use it, first install some wordlists. Enable is a good first choice, and there are several others: SGB, SGB, Country, CountrySingleWord, even one in another language. (wordlist is supposed to be able to list all available wordlist modules on CPAN (-L) and install (-I)/uninstall (-U) them, but I haven't bothered to add that functionality yet, simply because there is currently just a small handful of such modules on CPAN.)
Next, if you run:
% wordlist
then all words from all wordlist modules will be printed. You can select only one (or several modules) using the -w option:
% wordlist -w SGB % wordlist -w Country -w CountrySingleWord
To search/grep for some words, simply specify the criteria as arguments. The words must satisfy all arguments (unless if --or is specified, in which case the words must satisfy only one argument). If you enclose an argument with /.../ it means a regex:
% wordlist x q; # list words containing x as well as q % wordlist /fo+/; # list words matching a regex
To filter by word length, there are the --len, --minlen, and --maxlen options.
One more thing, wordlist has tab completion feature, so if you use bash just do a complete -C wordlist wordlist somewhere and you can conveniently tab after -w etc to complete wordlist names. I'm even thinking of somehow completing the words themselves so you might not need to enter the command at all and just see the list of words from the completion entries :-)
As a side note, Unix users also traditionally have something like /usr/share/dict/words on their system which they can grep for stuffs. The wordlist utility brings this capability to Windows or places where there is no /usr/share/dict/words available (but instead a working Perl installation).
Leave a comment