Learning Perl 6 - Find in a Large File

Currently I feel it is good to prepare for Christmas.

To read in a larger file, /usr/share/dict/words in my case, I do

my @dictionary = split /\n/ , slurp '/usr/share/dict/words';

That is a cool piece of short code for the task. But - this is slow because I run a regex across a large file, I am told. Also, it is unnecessarily lots of code for the task. Look at that:
my @dictionary := '/usr/share/dict/words'.IO.lines;

But also, that was slow a few hours ago. It has become much faster since. (Don't tell anybody, the trick was going to a Perl Workshop and talking to some people. Nice people, and very, very helpful. I am tempted to some day try and run a hackathon for them..)

Next task was searching for lines containing a $search_string:

for @dictionary -> $line {
if $line ~~ m/$search_string/ {
say "match: $line";

Looks pretty straightforward but is programming Perl 5 in Perl 6. Brrr. ;-)
say @dictionary.grep({m/$search_string/});



I like it! Perl 6 is on my Christmas 2015 list, so keep blogging these little beauties for me and I'll catch up when I've got time. I'd like to work my way up to converting my Perl 5 modules over to 6. From what little I've seen of the external library support, it sounds to me a lot easier than dealing with XS code.

If you like hacking on Perl 6 (or learning Perl 6), come to the Austrian Perl Workshop in Salzburg: http://act.useperl.at/apw2014/

After two days of "regular" Perl Workshop, we'll have a two-day hackathon on 12th and 13th Oct. Larry Wall will be there, plus Liz, Jonathan, and several more smart people.

I was having a similar issue, but on a larger file (32M 64-bit integers, about 120x larger than /usr/share/dict/words on my machine). This is more than I'd like to pull into an array, as it uses 13.5GB of memory.

Using the 'for $in -> $x { ... }' style was going quite slow, but the helpful people on #perl6 got be to try .get in a loop, e.g. 'while (my $x = $in.get) { ... }' which turns out to be much faster. Not only does it use almost no memory, it's 50% faster than the latest @d = "file".IO.lines.

BTW, this was meant to share my experience with using .get for reading a large file. Big thanks to Liz and others for speeding up Str.lines!

Leave a comment

About :m)

user-pic I blog about Perl.