March 2019 Archives

What to do with doubly-broken UTF-8?

By Ben Bullock on March 27, 2019 2:29 PM

I recently got a few test reports like this:

www.cpantesters.org/cpan/report/49de90f8-4ec9-11e9-98fa-fc611f24ea8f

Although I've put all kinds of stuff in my test file:

https://metacpan.org/source/BKB/Lingua-JA-Moji-0.56/t/katakana2syllable.t#L9-13

the cpan testers doesn't like that. How to deal with this garbage characters?

The solution is this:

#!/home/ben/software/install/bin/perl
use warnings;
use strict;
no utf8;
use FindBin '$Bin';
my $got = 'Ã£ÂÂÃ£ÂÂ¯';
my $expected = 'Ã£ÂÂ½Ã£ÂÂ¼';

dec ($got);
dec ($expected);

exit;

sub dec
{
my ($in) = @_;
utf8::decode ($in);
utf8::decode ($in);
print "$in\n";
}

This turns the doubly-decoded garbage back into readable characters:

[ben@mikan] {14:28 25} moji 513 $ perl ~/oneoff/superdecode.pl 
ック
ソー

2 comments

« November 2018 | Main Index | Archives | February 2020 »

About Ben Bullock

Perl user since about 2006, I have also released some CPAN modules.

More info »

The Incredible Journey

March 2019 Archives

What to do with doubly-broken UTF-8?

About Ben Bullock

Search this blog