What to do with doubly-broken UTF-8?

By Ben Bullock on March 27, 2019 2:29 PM

I recently got a few test reports like this:

www.cpantesters.org/cpan/report/49de90f8-4ec9-11e9-98fa-fc611f24ea8f

Although I've put all kinds of stuff in my test file:

https://metacpan.org/source/BKB/Lingua-JA-Moji-0.56/t/katakana2syllable.t#L9-13

the cpan testers doesn't like that. How to deal with this garbage characters?

The solution is this:

#!/home/ben/software/install/bin/perl
use warnings;
use strict;
no utf8;
use FindBin '$Bin';
my $got = 'Ã£ÂÂÃ£ÂÂ¯';
my $expected = 'Ã£ÂÂ½Ã£ÂÂ¼';

dec ($got);
dec ($expected);

exit;

sub dec
{
my ($in) = @_;
utf8::decode ($in);
utf8::decode ($in);
print "$in\n";
}

This turns the doubly-decoded garbage back into readable characters:

[ben@mikan] {14:28 25} moji 513 $ perl ~/oneoff/superdecode.pl 
ック
ソー

2 comments

2 Comments

Grinnz | March 28, 2019 2:21 AM | Reply

Converting to Test2::Suite may be a good idea as it has built in support for encoding output to UTF-8.

Ben Bullock replied to comment from Grinnz | March 28, 2019 11:24 AM | Reply

Thank you for the tip. I'll carefully examine this choice.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Ben Bullock

Perl user since about 2006, I have also released some CPAN modules.

More info »

The Incredible Journey