What to do with doubly-broken UTF-8?
I recently got a few test reports like this:
www.cpantesters.org/cpan/report/49de90f8-4ec9-11e9-98fa-fc611f24ea8f
Although I've put all kinds of stuff in my test file:
https://metacpan.org/source/BKB/Lingua-JA-Moji-0.56/t/katakana2syllable.t#L9-13
the cpan testers doesn't like that. How to deal with this garbage characters?
The solution is this:
#!/home/ben/software/install/bin/perl
use warnings;
use strict;
no utf8;
use FindBin '$Bin';
my $got = 'ãÂÂã¯';
my $expected = 'ã½ã¼';
dec ($got);
dec ($expected);
exit;
sub dec
{
my ($in) = @_;
utf8::decode ($in);
utf8::decode ($in);
print "$in\n";
}
This turns the doubly-decoded garbage back into readable characters:
[ben@mikan] {14:28 25} moji 513 $ perl ~/oneoff/superdecode.pl
ック
ソー
Converting to Test2::Suite may be a good idea as it has built in support for encoding output to UTF-8.
Thank you for the tip. I'll carefully examine this choice.