Pod::Perldoc at 3.19_01

One of the things that brian d foy worked pretty hard on for Perldoc inside of 5.16 was better UTF-8 support. We found that there are a huge number of variables for getting good Unicode support out of the "man" formatting pipeline. perldoc internally uses the "podulators" distribution to turn POD markup into man pages, HTML, XML, etc. But with the "man" formatting, the pipeline of operations looks something like this:

perldoc (a tiny little wrapper around the Pod::Perldoc module) finds the appropriate pod markup (either embedded in a .pm or a .pod), passes it to Pod::Man, takes the output from Pod::Man and then invokes the "nroff" implementation (which is usually groff) and sends it to your pager (less, more, etc) where it's displayed on your screen.

That's a lot of places where UTF-8 can go sideways. And it usually does.

So it seems like one solution is to simply avoid this pipeline, using Pod::Text::Term, so I've uploaded a developer distribution of perldoc (3.19_01) to CPAN where the default formatter for POD is "Term" instead of "Man". In my admittedly limited testing this seems to provide much better cross-platform UTF8 support when displaying documentation.

This is where you can help: please install the 3.19_01 distribution in a perlbrew environment and display some of your UTF8 documentation. If there are problems, you can report them on the RT queue for the distribution.



This is exciting news! I use non-ASCII UTF-8 characters extensively in some of my docs, especially for modules dealing with Unicode, including grapheme clusters containing multiple code points. When I use Pod::Perldoc 3.19_01 on my laptop running Perl 5.14.2 and Ubuntu 12.04, everything just works, as opposed to Pod::Perldoc 3.19 where everything non-ASCII is broken and displayed as an X per code point. On an old server running Perl 5.8.8 and Debian 4.0, there are still problems but they're different than before. Each code point is displayed in hex instead of replaced by an X. For example, code point U+0308 is displayed as the two UTF-8 bytes <CC><88>. I have a feeling that this is a problem completely external to Pod::Perldoc, but I'll have to investigate the issue later tonight and open a bug if needed. I'll also take a look at the open issues and see if I can help in any way. Anyway, thanks for working on this important improvement!

My name is "Olivier Mengué", not "Olivier Mengue". Will I at least see my name correctly spelled in the doc of my modules on Win32?
The Win32 console is Unicode-aware, but so far perldoc seems to only feed it with ASCII.
So I have to test this release...

Awesome, 'perldoc Unicode::Collate::Locale' looks so much nicer now!

For any who care to test, here's a simple example pod file that should give you something to try, based on this discussion:

=encoding utf-8


These are less-than-or-equals characters: '≤', 'E<le>', 'E<0x2264>'.


Leave a comment

About Mark Allen

user-pic Singer, dad, nerd, not necessarily in that order. @bytemeorg