What to do with doubly-broken UTF-8?
I recently got a few test reports like this:
www.cpantesters.org/cpan/report/49de90f8-4ec9-11e9-98fa-fc611f24ea8f
Although I've put all kinds of stuff in my test file:
https://metacpan.org/source/BKB/Lingua-JA-Moji-0.56/t/katakana2syllable.t#L9-13
the cpan testers doesn't like that. How to deal with this garbage characters?
The solution is this:
#!/home/ben/software/install/bin/perl use warnings; use strict; no utf8; use FindBin '$Bin'; my $got = 'ãÂÂã¯'; my $expected = 'ã½ã¼'; dec ($got); dec ($expected); exit; sub dec { my ($in) = @_; utf8::decode ($in); utf8::decode ($in); print "$in\n"; }
This turns the doubly-decoded garbage back into readable characters:
[ben@mikan] {14:28 25} moji 513 $ perl ~/oneoff/superdecode.pl ック ソー
Converting to Test2::Suite may be a good idea as it has built in support for encoding output to UTF-8.
Thank you for the tip. I'll carefully examine this choice.