Picking a better Markdown library for bad input
I was handling some bad Markdown input using Text::Markdown, when I saw it generate broken HTML.
I started with (bad) Markdown input " 1. z\n >"
and got back HTML <p><ol>\n<li>z</p>\n\n<blockquote>\n <p></li>\n </ol></p>\n</blockquote>
.
(See the incorrectly nested HTML tags, <p><ol><li></p>
?)
So I tried feeding this bad Markdown to four different Perl Markdown libraries: Text::Markdown, Text::MultiMarkdown, Text::Markdown::Discount, and Markdent, to see which one would give me valid HTML.
The results?
Text::Markdown — invalid HTML
<p><ol>\n<li>z</p>\n\n<blockquote>\n <p></li>\n </ol></p>\n</blockquote>\n
Text::MultiMarkdown — invalid HTML
<p><ol>\n<li>z</p>\n\n<blockquote>\n <p></li>\n </ol></p>\n</blockquote>\n
Text::Markdown::Discount — valid HTML!
<ol>\n<li> z\n\n<blockquote></blockquote></li>\n</ol>\n\n
Markdent — valid HTML, but doesn't generate a simple HTML fragment
<!DOCTYPE html>\n<html><head><title></title></head><body><ol><li>z\n >\n</li></ol></body></html>
The solution? Switch from Text::Markdown to Text::Markdown::Discount.
I'm not so sure that Markdown is invalid. Are you? Markdown is notoriously non-specific about all sorts of things.
I think Markdent does the right thing here. As far as it producing a document, it has two classes to produce HTML, Markdent::Handler::HTMLStream::Document and Markdent::Handler::HTMLStream::Fragment. I'm guessing you used the former when you wanted the latter.
(I am not including less-than and greater-than symbols in this comment.)
p is a block level marker and ol is a block level marker. There is no such thing as a block within a block, so p ol /p is invalid HTML. If you are interested please see the following discussion:
https://metacpan.org/pod/HTML::Valid::Tagset#Issues-with-HTML::Tagset
I switched to using Pandoc for all my Markdonw processing. Haven't seen a problem yet.