Reading META.yml when it's not UTF-8

Part of the 3% of the distributions I couldn't index with MyCPAN had encoding issues. YAML is supposed to be UTF-8, but when I don't always get UTF-8 when I generate a META.yml for files that don't have one. I guess I could do the work to poke around in Makemaker, etc, to convert all the values before I generate the META.yml, but um, no. Not only that, not all of the META.yml files already in the dists are UTF-8. Remember, however, this is a very small part of BackPAN: about 700 distributions out of 140,000 (or about 1/7th of my problem cases).

A couple hundred distros have Makefile.PL files encoded as Latin-1 in a way that it matters. If it's not collapsable to ASCII, the META.yml ends up with Latin-1 in it. Some YAML parsers refuse to deal with that.

I'm not particularly satisfied with this solution, but I assume that it's UTF-8, which is mostly true, but if the YAML loader barfs on it, I try to load it as Latin-1 and convert it.

sub _load_meta_yml { $_[0]->_try_utf8( $_[1] ) || $_[0]->_try_latin1( $_[1] ) }

sub _try_utf8 { $_[0]->_load_yaml( $_[0]->_load_file( 'utf8', $_[1] ) ) }

sub _try_latin1 {
    require Encode;
    Encode::from_to( my $utf8 = $_[0]->_load_file( 'bytes', $_[1] ), 'latin1', 'utf8' );
    $_[0]->_load_yaml( $utf8 );
    }

sub _load_file {
    $logger->debug( "Trying to load $_[2] as $_[1]" );
    local $/; open my $f, "<:$_[1]", $_[2]; 
    my $content = scalar <$f>;
    }

sub _load_yaml {
    require YAML::Syck;
    my( $caller ) = ( caller(1) )[3]; 
    my $yaml = eval { YAML::Syck::Load( $_[1] ) } or 
        $logger->error( "$caller: $@" );
    $yaml;
    }

I liked YAML::XS for a bit, but it has a problem with the utf8 pramga that messed up some other stuff I was handling. I don't quite understand it, but LibYAML seems to be fine if everything was always UTF-8, and not so fine otherwise.

Leave a comment

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).