grep.metacpan.org at the Perl Toolchain Summit
Introducing the new way to grep CPAN: https://grep.metacpan.org/
My co-worker Nicholas and I were honored to be invited to attend the Perl Toolchain Summit in Lyon, France last week. It was a great experience for both of us.
Earlier this year, I needed to answer some questions being put to me by the toolchain gang about the scope of the problem of build/test/install without . in @INC. However, grep.cpan.me was down that day. Additionally I needed to be able to execute Makefile.PL on each distro and see what happened. I tried to do some simple walking using CPAN::Mini when I was done, I played around with putting the data in git so I could use tools like git grep. On a whim I uploaded it to github but found 2 problems: 1. You're not allowed to have files over 50MB. 2. github won't search repos with over a certain number of files. I worked around the problem and uploaded the repo.
After getting my invite to the Toolchain Summit, I thought this might be a good tool to shop around for a home. The metacpan folks were very gracious to offer it a home on day one of the conference so we were able to go from nothing to something in a very short window of time.
You can find the code here:
- The Front End code that hosts http://grep.metacpan.org
- The code we use to generate the git repo that the front end uses
- The git repo holding all of the latest distro releases on CPAN
The code site is still very young. Many of the regexes aren't quite working correctly at the moment. You can report issues here.
Doing it local
There are about a million files in the checked out git repo that stores all of this data. This can be cumbersome to check out on your own drive. However a bare git repo is much smaller and takes up a mere 2GB (vs 12GB) of space:
git clone --bare https://github.com/metacpan/metacpan-cpan-extracted.git
Once you've done this, you can grep the repo locally on your own system by doing:
git grep -i "the meaning of life" master