The sad state of syntax highlighting libraries on CPAN

By Steven Haryanto on November 16, 2012 7:58 AM

I just posted reviews of several code syntax highlighting libraries on CPAN. In short, most of them are crap and there is nothing remotely similar to Python's Pygments or Ruby's coderay (oh how the mighty CPAN has fallen). I found that Syntax::SourceHighlight, a Perl interface to GNU Source-highlight, is the only usable one. One downside is just that I need to pull over 100MB-worth of Debian packages to have it installed, a huge dependency especially since my original requirement is merely to colorize the terminal output of some JSON and YAML. And it's too bad that it doesn't support YAML out-of-the-box yet. So I don't plan on using it anytime soon.

Currently I'm investigating (via the lazy web) utilizing emacs'/vim's syntax-highlighting capability. It's a good bet that one of those two editors are available on a standard Linux box. A pure-Perl library would be ideal though.

11 comments

11 Comments

Steven Haryanto | November 16, 2012 10:46 AM | Reply

BTW, I've just hacked and released JSON::Color and YAML::Tiny::Color to fulfill my specific needs.

Darn, The MT won't show images in comments. json-color.png.

Toby Inkster | November 16, 2012 4:57 PM | Reply

I rather like Syntax-Highlight-Engine-Kate. Some languages are better highlighted than others, but it does the job for me.

Steven Haryanto replied to comment from Toby Inkster | November 16, 2012 5:09 PM | Reply

Yeah, but did you also notice how slow it is?

Ben Bullock | November 16, 2012 10:29 PM | Reply

Hi Steven,

Talking about Emacs, here is a "weirdo" distribution of mine on github:

https://github.com/benkasminbullock/Emacs-HTMLize

This is what I use to make the syntax highlighting on my website, like the following:

http://www.lemoda.net/perl/hash-ref-or-copy/index.html
http://www.lemoda.net/images/sizes/index.html

It is really slow and not conceptually great, which is why it is not released to CPAN. There are all sorts of reasons for that code being like that, most of which I don't remember. One problem was making it run without a terminal process, since Emacs doesn't usually bother doing the syntax highlighting unless there is a controlling terminal. I got the idea to use "expect" from Stackoverflow. But it is very good at adapting to any language, e.g. I can write a program in Octave or JavaScript and it "just works":

http://www.lemoda.net/games/othello/index.html
http://www.lemoda.net/octave/normal-probability/index.html

I also have a module which is specifically for C programs:

https://metacpan.org/release/C-Tokenize

There is a script included in the distribution:

https://metacpan.org/module/c2html

This is quite fast, faster than running an Emacs process and then killing it again.

Steven Haryanto replied to comment from Ben Bullock | November 17, 2012 12:06 AM | Reply

Hi Ben,

Thanks. Someone at SO also pointed out about a similar command htmlfontify-buffer. I played around with it a bit. It's a bit painful, like you said. And it's not exactly what I need because I want ANSI escapes output and not HTML.

My current favorite is Syntax::SourceHighlight, since it's fast and covers a lot of languages (YAML is not yet in the list though) and output formats (ANSI escape and HTML are supported, among others).

My current needs is so far met with the two modules I wrote today. They're not exactly syntax highlighters, more like dumper (with fixed formatting) but it's just what I need.

Ben Bullock | November 17, 2012 1:22 PM | Reply

And it's not exactly what I need because I want ANSI escapes output and not HTML.

Oh come on, that is about one line of Perl:

my %ht = (
'htmltag' => "\x0g",
);
my $regex =  (join '|', keys %ht) ;
my $text =~ s/($regex)/$ht{$1}/;

Paul "LeoNerd" Evans | November 22, 2012 9:59 PM | Reply

I keep meaning to write an article/example/description/something on how you can use Parser::MGC to parse up the input text and yield a syntax tree annotated to give the positions it found the various constructs in the input. This would make it easy to drive a syntax highlight engine from it.

Steven Haryanto replied to comment from Ben Bullock | November 27, 2012 7:46 AM | Reply

You're right! Maybe using vim is viable after all (emacs is currently out of the question though, its startup overhead is too much).

Steven Haryanto replied to comment from Paul "LeoNerd" Evans | December 6, 2012 11:19 AM | Reply

Hoping Parser::MGC (or Marpa, or whatever) will help form a basis for the next Perl syntax highlighting library project :)

Toby Inkster | March 13, 2013 5:47 AM | Reply

PPI::HTML is very, very good, but only highlights Perl code of course.

Roland Minner | November 29, 2014 10:16 PM | Reply

Not a Perl solution, but i typically use vim to get syntax highlighting for code which i post on websites:



vim -f +"syntax on" +"TOhtml" +"wq" +"q" myscript.pl

This will create a html file with syntax highlighting and quit right afterwards.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Steven Haryanto

A programmer (mostly Perl 5 nowadays). My CPAN ID: SHARYANTO. I'm sedusedan on perlmonks. My twitter is stevenharyanto (but I don't tweet much). Follow me on github: sharyanto.

More info »

Of course I still use Perl