September 2010 Archives

Reading META.yml when it's not UTF-8

Part of the 3% of the distributions I couldn't index with MyCPAN had encoding issues. YAML is supposed to be UTF-8, but when I don't always get UTF-8 when I generate a META.yml for files that don't have one. I guess I could do the work to poke around in Makemaker, etc, to convert all the values before I generate the META.yml, but um, no. Not only that, not all of the META.yml files already in the dists are UTF-8. Remember, however, this is a very small part of BackPAN: about 700 distributions out of 140,000 (or about 1/7th of my problem cases).

A couple hundred distros have Makefile.PL files encoded as Latin-1 in a way that it matters. If it's not collapsable to ASCII, the META.yml ends up with Latin-1 in it. Some YAML parsers refuse to deal with that.

I'm not particularly satisfied with this solution, but I assume that it's UTF-8, which is mostly true, but if the YAML loader barfs on it, I try to load it as Latin-1 and convert it.

sub _load_meta_yml { $_[0]->_try_utf8( $_[1] ) || $_[0]->_try_latin1( $_[1] ) }

sub _try_utf8 { $_[0]->_load_yaml( $_[0]->_load_file( 'utf8', $_[1] ) ) }

sub _try_latin1 {
    require Encode;
    Encode::from_to( my $utf8 = $_[0]->_load_file( 'bytes', $_[1] ), 'latin1', 'utf8' );
    $_[0]->_load_yaml( $utf8 );
    }

sub _load_file {
    $logger->debug( "Trying to load $_[2] as $_[1]" );
    local $/; open my $f, "<:$_[1]", $_[2]; 
    my $content = scalar <$f>;
    }

sub _load_yaml {
    require YAML::Syck;
    my( $caller ) = ( caller(1) )[3]; 
    my $yaml = eval { YAML::Syck::Load( $_[1] ) } or 
        $logger->error( "$caller: $@" );
    $yaml;
    }

I liked YAML::XS for a bit, but it has a problem with the utf8 pramga that messed up some other stuff I was handling. I don't quite understand it, but LibYAML seems to be fine if everything was always UTF-8, and not so fine otherwise.

MyCPAN indexes 97% of BackPAN

A history of Perl variables

I was curious when various Perl variables showed up, so I started diving through perlvar and perl*delta. Ignoring those that were already there in Perl 4, I have so a draft list. It's a bit dodgy because some of the variables existed before they were documented, but I'm really interested in the point where they became supported variables (so, I also don't care about blead versions):

Does anyone have any corrections or predictions for 5.14? :)

perl 5.14
-----------

???

perl 5.12.0
-----------

???

perl 5.10.0
-----------
${^PREMATCH}
${^MATCH}
${^POSTMATCH}
%+
%-
${^WIN32_SLOPPY_STAT}
${^WARNING_BITS}
${^RE_TRIE_MAXBUF}
${^RE_DEBUG_FLAGS}

Perl 5.8.9
-----------
${^CHILD_ERROR_NATIVE}
${^UTF8CACHE}

perl 5.8.8
-----------
${^UTF8LOCALE}

perl 5.8.2 ???
-----------
${^ENCODING}
${^OPEN}
${^UNICODE}

perl 5.8.0
-----------
$^N
${^TAINT}

perl 5.6
-----------
$^C
$^V
@-
@+
%^H

perl 5.005
-----------
%!
$^R

perl 5.004
-----------
$^M
$^S
$^A ???

perl 5.003
-----------
$^E
$^H
$^O

What non-Perl books do you recommend to Perlers?

I'm overhauling the perlbook documentation and moving the book list from perlfaq2 into it. Besides updating the references, I'd like to include a short section on non-Perl (technical) books that are useful to the Perl programmer. So far I have Jon Bentley's Programming Pearls, but that's an easy one.

What else is there? What other books do you think Perlers should read to help them be better Perl programmers?

Just to head off all the posts I know are coming, Lord of the Rings might help you understand the perl source code, but it's not going in perlbook.

Okay, maybe it is.

Get a free Effective Perl Programming eBook.

Our publisher, Addison-Wesley, would like to give out one free e-book per month to motivated Effective Perler readers who suggest a topic for one of our weekly posts at The Effective Perler. Suggest a topic that we use and you might get a free eBook. See our website for more details.

Make a plan to rescue use.perl content

Chris Nandor is changing employers, so he won't have access to the machine that hosts use.perl after Friday. He's going to take a database dump with him and turn the site into a static site so the content will still be there for a bit, but eventually it's going to disappear. This might mean that a reboot, power outage, disk failure, or something else might means the current use.perl doesn't come back to life.

The Perl community was very fortunate that Geeknet (and all the other names they went by) allowed Chris to host the site. Use.perl was a test bed for slashcode, so whatever Slashdot did or wanted to do, that's what use.perl did. It was a good thing going for a long time, and many of us owe Chris a lot of beer for use.perl, even if we didn't always like slashcode.

Now it's time to figure out what to do without use.perl and all the links to various posts in there. The main goals are:

  • Links to URLs within use.perl continue to serve the same content
  • People get to keep their use.perl content if they'd like to do something else with it.

Since its slash, a lot of the stuff is dynamic. Since use.perl basically ran on autopilot, Chris doesn't have the bandwidth. Since use.perl won't be connected to his new employer's tasks, he'll have even less time for it. Maybe there's someone who has the bandwidth for it.

Some possible solutions we talked about:

  • Set up slashcode and serves the site dynamically. If hosting is a problem, talk to me. :)
  • Set up slashcode and serves the site statically. This might mean a caching layer that stores the computed page for every URL. I'm not sure if that will handle the Ajax stuff properly though.
  • Take the database dump and recreate the pages somehow.
  • Crawl the entire site and store the results as static pages.

One idea was to inject everything into blogs.perl.org. I don't know how that would work or if the blogs.perl.org team want to do that. It would require a very large URL rewriting component.

How can I troubleshoot my Perl CGI script?

Awhile ago I moved my How can I troubleshoot my Perl CGI script? to StackOverflow. I'm just getting around to telling everyone about it because it was pretty far down on my to do list.

I think this has almost pushed the old location on SourceForge out of the googlebrain, but it wouldn't hurt for people to link to it in a blog post, tweet, whatever to encourage Google to find this one. Someday SourceForge will disappear and we won't have to worry about it anymore. How is it even still alive? StackOverflow has pretty good googlejuice though, maybe because Google likes StackOverflow.

Since it's on StackOverflow, this also means that I'm basically letting go of it. StackOverflow encourages people to revise the questions and answers of other to improve them, and I've given it wiki status to encourage that even more. Take a look, see what I've left out (or left in), what's new and exciting (or old and boring).

Even if you don't (or can't) edit it just yet, I'd appreciate any comments on how to bring it up to date. Maybe another StackOverflow user can make the changes if I'm too busy.

Also, sadly, the only thing keeping the bad Perl info out of StackOverflow is a small band of knowledgeable Perlers patrolling the answers (Sinan used my summer absences to pass me as the highest rated Perl user there). If you're looking for a way promote Perl in a useful way (and you actually know Perl), consider helping out. Providing good answers, voting on good answers (and against bad answers), and refining other answers helps the entire world.

An index for The Perl Journal articles

A couple of years ago I put together a list of The Perl Journal articles I could find on the Dr. Dobbs website. They changed some of their URLs, so I updated those to avoid all of the redirects and in the process found several more articles. My TPJ index is on Perlmonks. You can see some of the beginnings of popular projects, such as Moose, in some of the articles.

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).