Virtual Spring Cleaning (part 1 of X)
At the 2016 German Perl Workshop in Nuremberg I first released and demonstrated a simple local search engine created from Elasticsearch with Perl for the surrounding infrastructure. The search application is distributed via CPAN as Dancer::SearchApp.
The name is, as I find usual for my distributions, a bad name, as that ties it to Dancer. Maybe in the future, it won't be based on Dancer anymore. SawyerX hopes to use it as a testbed for showing that Dancer2 is ready for production, while I hope that he will do most of the porting work.
The application itself was surprisingly simple to write because I use Elasticsearch as the search index, and Search::Elasticsearch is the interface to that. The remaining functionality like crawlers I had already written for other purposes. These helper modules are more generic in scope than just as feeders for a simple search application.
What good is a search application if you can't get any content into it?
With the search part solved by Elasticsearch and Search::Elasticsearch, the main issue for a search engine is to have enough content that actually becomes available for the search index.
The first module I hastily bundled for the first release was the module for controlling Apache Tika.
The module, originally named Apache::Tika, was first written in 2011 at YAPC::Europe in Riga when I first discovered Tika. Since then, it mostly lingered on my laptop until I got around to the rest of the search engine infrastructure.
Also among the modules I repurposed is the wrapper for reading IMAP mails and extracting the text from either the plaintext body or the HTML, originally written for a blog engine based on IMAP folders.
I wonder how much more generally useful code lingers on other computers that could use some polishing or simply a release onto CPAN for others to improve upon. In the spirit of Spring cleaning, I'm looking at releasing these modules in their own distributions so they can be reused instead of being baked into Dancer::SearchApp, or even worse, rotting on my computers.
Hence I came up with the idea of virtual Spring cleaning. I'll try to release as much as possible of my code onto CPAN. Preferrably, I will even write some documentation for the modules so that others can also use them.
Leave a comment