Stop me if you have heard this one before. You have a list of files you need to process in a text file with one item per line. Handling this is fairly simple you read a line in and process it over and over again until you processed the whole list. This works great, but if that list is 40,000 items long and each item takes up to 30 seconds to run it suddenly takes a very long time to finish. In this case processing each item is just a system call to another cli application with no shared resources, thus allowing processing of items in parallel with no fuss. For this task I am using Parallel::ForkManager and here are the important bits:
Oracle is closing down the opensolaris.org site on March 24th, which is inconvenient for the rest of us. I wanted to grab the mailman archives for the mailing lists so I fired up a search engine and looked for any existing open source projects to do this. After trying two different scripts that did not quite work right I realized it would just be faster for me to write what I need.
I started by fetching the listinfo page which has links to all the lists archived and took a look at the data. Based on the page structure the easiest method was to iterate over all the links in the page an…
A little while back I wrote a pair of applications that used Path::Class::Rule to do the file finding. I selected this module because I like the interface for building up rules. I started to run into speed issues as the source directory grew larger and larger. Along comes rjbs's the speed of Perl file finders article and his speed chart backs up my findings that more files equals a marked increase in time.
This is where I found out about ="https://metacpan.org/mod…
In my previous post Text Processing: Divide and Conquer I took a text processing problem profiled it, then developed a few possible solutions. I benchmarked these options and now use the fastest solution… that I tested for. Two comments were posted for that article that gave insight into different and faster ways to solve this problem.
Recently I have been doing some in depth research with regards to development tools of all kinds. Currently I am working my through the various IDEs available in both the open and close source worlds. This is what spurred me into giving Padre another shot. The last time I tried to install it there was a dependency problem and it was not worth solving. So that is my first step, install Padre.
Search this blog
- Parallel Forking and Process Management
- Download a mailman archive
- Finding files faster
- Text Processing Part 2: More Speed
- Using Padre for the first time
- Testing for HTTP compression
- Text Processing: Divide and Conquer
- 3 features I would like to see in Perl
- Q: When not to use Regexp? A: HTML parsing
- A NYTprof encoding hiccup