The sad state of syntax highlighting libraries on CPAN

I just posted reviews of several code syntax highlighting libraries on CPAN. In short, most of them are crap and there is nothing remotely similar to Python's Pygments or Ruby's coderay (oh how the mighty CPAN has fallen). I found that Syntax::SourceHighlight, a Perl interface to GNU Source-highlight, is the only usable one. One downside is just that I need to pull over 100MB-worth of Debian packages to have it installed, a huge dependency especially since my original requirement is merely to colorize the terminal output of some JSON and YAML. And it's too bad that it doesn't support YAML out-of-the-box yet. So I don't plan on using it anytime soon.

Currently I'm investigating (via the lazy web) utilizing emacs'/vim's syntax-highlighting capability. It's a good bet that one of those two editors are available on a standard Linux box. A pure-Perl library would be ideal though.


10 Comments

I rather like Syntax-Highlight-Engine-Kate. Some languages are better highlighted than others, but it does the job for me.

Hi Steven,

Talking about Emacs, here is a "weirdo" distribution of mine on github:

https://github.com/benkasminbullock/Emacs-HTMLize

This is what I use to make the syntax highlighting on my website, like the following:

http://www.lemoda.net/perl/hash-ref-or-copy/index.html
http://www.lemoda.net/images/sizes/index.html


It is really slow and not conceptually great, which is why it is not released to CPAN. There are all sorts of reasons for that code being like that, most of which I don't remember. One problem was making it run without a terminal process, since Emacs doesn't usually bother doing the syntax highlighting unless there is a controlling terminal. I got the idea to use "expect" from Stackoverflow. But it is very good at adapting to any language, e.g. I can write a program in Octave or JavaScript and it "just works":

http://www.lemoda.net/games/othello/index.html
http://www.lemoda.net/octave/normal-probability/index.html

I also have a module which is specifically for C programs:

https://metacpan.org/release/C-Tokenize

There is a script included in the distribution:

https://metacpan.org/module/c2html

This is quite fast, faster than running an Emacs process and then killing it again.

And it's not exactly what I need because I want ANSI escapes output and not HTML.
Oh come on, that is about one line of Perl:
my %ht = (
'htmltag' => "\x0g",
);
my $regex =  (join '|', keys %ht) ;
my $text =~ s/($regex)/$ht{$1}/;

I keep meaning to write an article/example/description/something on how you can use Parser::MGC to parse up the input text and yield a syntax tree annotated to give the positions it found the various constructs in the input. This would make it easy to drive a syntax highlight engine from it.

PPI::HTML is very, very good, but only highlights Perl code of course.

Leave a comment

About Steven Haryanto

user-pic A programmer (mostly Perl 5 nowadays). My CPAN ID: SHARYANTO. I'm sedusedan on perlmonks. My twitter is stevenharyanto (but I don't tweet much). Follow me on github: sharyanto.