August 2017 Archives

Picking a better Markdown library for bad input

I was handling some bad Markdown input using Text::Markdown, when I saw it generate broken HTML.

I started with (bad) Markdown input " 1. z\n >" and got back HTML <p><ol>\n<li>z</p>\n\n<blockquote>\n <p></li>\n </ol></p>\n</blockquote>.

(See the incorrectly nested HTML tags, <p><ol><li></p>?)

So I tried feeding this bad Markdown to four different Perl Markdown libraries: Text::Markdown, Text::MultiMarkdown, Text::Markdown::Discount, and Markdent, to see which one would give me valid HTML.

The results?

  • Text::Markdown — invalid HTML <p><ol>\n<li>z</p>\n\n<blockquote>\n <p></li>\n </ol></p>\n</blockquote>\n

  • Text::MultiMarkdown — invalid HTML <p><ol>\n<li>z</p>\n\n<blockquote>\n <p></li>\n </ol></p>\n</blockquote>\n

  • Text::Markdown::Discount — valid HTML! <ol>\n<li> z\n\n<blockquote></blockquote></li>\n</ol>\n\n

  • Markdent — valid HTML, but doesn't generate a simple HTML fragment <!DOCTYPE html>\n<html><head><title></title></head><body><ol><li>z\n &gt;\n</li></ol></body></html>

The solution? Switch from Text::Markdown to Text::Markdown::Discount.

About Anirvan

user-pic I blog about Perl.