September 2010 Archives

One-liner XML / Perl / JSON

castaway blew my mind this morning in irc.perl.org #axkit-dahut.

Convert an XML file to Perl data structure:

perl -MXML::Simple -MData::Dumper -le'print Dumper XMLin("foo.xml")'

Convert an XML file to JSON:

perl -MJSON::Any -MXML::Simple \
   -le'print JSON::Any->new()->objToJson(XMLin("foo.xml"))'

"How do I do X?" Like this! Poof!

Some days Perl feels like a Las Vegas magic show. :)

Using blogs.perl.org

I stumbled into blogs.perl.org last night. Here's a couple "quick start" tips for using this install of Movable Type Pro:

(1) Code blocks. If you choose Format: Markdown, leave a blank line, indent text with 4 spaces, then another blank line

you will get code blocks like this
# with some
$rudimentary = "syntax highlighting";

(2) Blog subtitle. Erez Schatz was kind enough to point out how to set your blog subtitle (e.g.: Mutation Grid, Inc. "Controlled software evolution." above): From the blogs.perl.org page, click on Post, then, on the top menu bar: Preferences - General, the subtitle is "description".

Overlapping regex matches

irc.perl.org #perl-help posed a good question tonight. Why does this only find some of the matches?

my $sequence = "ggg atg aaa tgt tcc cgg taa atg aat gcc cgg gaa ata tag cct gac ctg a"; 
$sequence =~ tr/ //d; 
print "Input sequence is: $sequence \n";  
while ($sequence =~ /(atg(...)*?(taa|tag|tga))/g) {print "$1 \n";}

Because, by default, regex /g begins each subsequent search after the end of the last match, so overlapping hits are not found. As this blog post explains, a negative lookahead assertion is the key to finding all of them. This works great:

while ($sequence =~ /(?=(atg.*?(taa|tag|tga)))/g) {
   print "$1\n";
}

I'm partial to bioinformatics homework after 4 years of hacking on the stuff. :)

About Jay @ Mutation Grid

user-pic Perl / web / database development since 1995. Contact us for your next project.