HTML Content Extraction / Instapaper
I recently found the old Instapaper extraction rules to rewrite HTML content in a way that is easier on the eyes for consumption. This find has resulted in me writing HTML::ExtractContent::FTR and HTML::ExtractContent::Pluggable to get a nice/concise way to scrape HTML from sites for consumption via RSS or mail.
The module is not yet on CPAN because I haven't written any documentation and haven't used the module extensively. As soon as I've written enough programs using it for example for Gianni's Feeder RSS-to-mail program, I will likely release it onto CPAN as well.
This also means writing enough documentation as to how to set it up and how to manage the local rules and how to merge your local modifications with the publically maintained rules.
Leave a comment