March 2019 Archives

What to do with doubly-broken UTF-8?

I recently got a few test reports like this:

Although I've put all kinds of stuff in my test file:

the cpan testers doesn't like that. How to deal with this garbage characters?

The solution is this:

use warnings;
use strict;
no utf8;
use FindBin '$Bin';
my $got = 'ック';
my $expected = 'ソー';

dec ($got);
dec ($expected);


sub dec
my ($in) = @_;
utf8::decode ($in);
utf8::decode ($in);
print "$in\n";

This turns the doubly-decoded garbage back into readable characters:

[ben@mikan] {14:28 25} moji 513 $ perl ~/oneoff/ 

About Ben Bullock

user-pic Perl user since about 2006, I have also released some CPAN modules.