January 2016 Archives

UTF-16 and Windows CRLF, oh my

I recently had to do some quick search/replace on a bundle of Windows XML files. They are all encoded as UTF-16LE, with the Windows \n\r line endings encoded as 0D 00 0A 00.

Perl can handle UTF-16LE just fine, and it handles CRLF endings on windows out-of-the-box, but the problem is that the default CRLF translation happens too close to the filehandle- on the wrong side of the Unicode translation. The fix is to use the PerlIO layers :raw:encoding(UTF-16LE):crlf - the ":raw" prevents the default CRLF translation from happening at the byte level, the UTF secti…

About Yary

user-pic Programming for decades, and learning something new every day.