Yep, indeed, contributing to the Perl community can be a very ludic activity (not to be confused with luddite!). I tried to list every Perl-related web resource where participants are encouraged to build up some kind of score. Most have charts where participants compete for the highest rank while some has an absolute goal (like 100% test coverage). The list has no specific order. Feel free to post the resources I forgot/am unaware of in comments!
$ \time perl mojo-crawler.pl 23.08user 0.07system 2:11.99elapsed 17%CPU (0avgtext+0avgdata 84064maxresident)k 16inputs+0outputs (0major+6356minor)pagefaults 0swaps $ \time perl yada-crawler.pl 8.83user 0.30system 0:12.60elapsed 72%CPU (0avgtext+0avgdata 131008maxresident)k 0inputs+0outputs (0major+8607minor)pagefaults 0swaps
How can it be 10x faster while consuming less than a half of CPU resources?!
So, once upon a time I had a crazy idea: to put an almost complete resource meter into the tmux status bar. You know, the clock is so boring. Let's add a battery indicator there. And the load numbers. And the memory usage...
Needless to say, this resulted in an unbearable user experience:
a:2.96G c:4.37G f:5.41G i:2.98G l:0.65/1.73/1.41 23:47
Actually, the data is OK, the "gauges" work fine on every Unix I tested them. If only it was a bit fancier...
It was tested on Mac OS X 10.8.2, Ubuntu 12.04, Ubuntu 11.10, Debian 6.0.6 and works fine with the default system Perl; there are no external dependencies at all.
Liked it? Go ahead, grab your copy and follow the installation instructions: https://github.com/creaktive/rainbarf
Grab the gist with the complete, working source code.
I often hear the question: "so, you're Perl guy, could you show me how to make a web crawler/spider/scraper, then?" I hope this post series will become my ultimate answer :)
First of all, I compiled a small list of features that people expect of crawlers nowadays:
- capability of concurrent, persistent connections;
- usage of CSS selectors to process HTML;
- easily modifiable source instead of a flexible OOP inheritance structure;
- LESS DEPENDENCIES!
Some time ago, I've acknowledged the LWP::Protocol::Net::Curl existence here. Lots of things changed since then due to the feedback received, so thank you all, guys! Today, I am proud to announce the reach of the stable milestone with the version 0.011.
Search this blog
- Ludic Perl
- Web Scraping with Modern Perl (Part 2 - Speed Edition)
- Put a fancy CPU/RAM usage chart in tmux status bar
- Web Scraping with Modern Perl (Part 1)
- Merry XS-mas!
- CPAN module recommendation system
- TMTOWTDI, plus benchmarking
- Google Refine + Perl
- libcurl as LWP backend (or "all your protocol are belong to us")