HTML Content Extraction / Instapaper

By Max Maischein on January 30, 2016 12:47 PM

I recently found the old Instapaper extraction rules to rewrite HTML content in a way that is easier on the eyes for consumption. This find has resulted in me writing HTML::ExtractContent::FTR and HTML::ExtractContent::Pluggable to get a nice/concise way to scrape HTML from sites for consumption via RSS or mail.

The module is not yet on CPAN because I haven't written any documentation and haven't used the module extensively. As soon as I've written enough programs using it for example for Gianni's Feeder RSS-to-mail program, I will likely release it onto CPAN as well.

This also means writing enough documentation as to how to set it up and how to manage the local rules and how to merge your local modifications with the publically maintained rules.

0 comments

Tagged as:

HTML::ExtractContent

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Max Maischein

I'm the Treasurer for the Frankfurt Perlmongers e.V. . I have organized Perl events including 9 German Perl Workshops and one YAPC::Europe.

More info »

Max Maischein