A feed reader (2)

After asking around, I gave Giannis "feeder" a second look. Indeed adding the missing features was almost trivial and Gianni was highly responsive about merging my patches.

Now the mailed blog posts can look like this:

feeder-screenshot.png

Looking back at the change history, I added

Sending mail via MIME::Lite

Contrary to its documentation, MIME::Lite has always worked well for me on various platforms and necer given me reason to switch to an alternative. So when I wanted to send mail, I naturally reached for this tried and tested tool.

Optional inlining of images in the mail for offline reading

I will be offline or at least bandwidth-constrained when reading the blog posts in the subway. Having all the images already included in the mail as attachments and the HTML rewritten to refer to the internal assets makes the experience much more enjoyable. Neither K9-mail nor Thunderbird ask me about downloading external references and I still get to see the pretty pictures. Maybe the code should live sepatately on CPAN, as HTML-archives and MIME-mails have similar needs to fetch the outside parts and rewrite the HTML for local references.

The code is still hacky and instead of using a proper parser it uses a regular expression to find linked images. Once it annoys me enough, I'll rewrite it to use a proper HTML parser, but I haven't investigated which parser(s) feeder already loads. On the other hand, compared to a newsletter attempting to sell me items, the feed entries already look vastly superior:

soliver-mail.png

Retrieving the linked page as the RSS body

Some RSS feeds only show a teaser instead of the whole post, maybe to better track visitors, or for some other reason. Using LWP::UA, the existing RSS body can optionally be reolaced by the page in its original glory.

Automatic content extraction using HTML::ExtractContent

This is a module that might well deserve its own blog post. As I have written enough web scrapers myself, I know how tedious it is to keep a scraper up to date with a site. HTML::ExtractContent automates finding the "main content" on a page and returns the HTML for it, which is exactly what I want for some pages. Its heuristics fail, for example on jwz's image-only posts, but so far it worked incredibly well for a solution that took effort of 5 minutes.

Look more like a CPAN distribution

Of course, when touching somebody else's code, I can't stop and I'll also introduce/spill some of my infrastructure choices into the project. The app now has a Makefile.PL which is convenient for keeping track of and pulling in the various prerequisites. Gianni took well to that change and only ripped out unneeded parts of my boilerplate code, something that I can live with very well.

Alternatives

Léon Brocard was to give a presentation about other alternatives in Kyiv at the YAPC::Europe 2013, but unfortunately, he was not able to make it there. I'm told that the previous versions of his talk were relevant to my interests and I would have liked to subscribe to his newsletter. Most likely he'll present his talk again somewhere else so all is not lost.

Leave a comment

About Max Maischein

user-pic I'm the author of various CPAN modules. I'm also one of the admins of perlmonks.org.