It's time to admit I've failed
Two days ago I was so excited! I had an idea how to make the Perl world a bit better, faster and simpler. Of course, I didn’t spread such exciting news until I checked and double-checked and benchmarked, until I’m absolutely sure I’ve found The Holy Grail.
Well, see the title. It hurts. All my benchmarks contained a terrible mistake. And those +20%, or, maybe even +100% speed boost PugiXML interface could provide doesn’t worth all the buzz I created.
I apologize.
Yet, Perl interface to PugiXML I’ve described in my previous post could be (optimistically) twice as fast as LibXML. In some cases. But I’m so disappointed by my failure that I just don’t think it worth it.
Another lesson learned.
When you feel lack of speed with HTML parsing please use something LibXML based, like HTML::TreeBuilder::LibXML or just XML::LibXML. Just make sure you are using load_html() family instead of load_xml() and enable recover() mode as it’s done in HTML::TreeBuilder::LibXML
For those who still are interested, the code of the prototype is published at https://github.com/yko/pugixml-perl
At some point I may decide to continue development. Unfortunately it would not be that lightning fast as initially expected.
It takes courage to admit when you're wrong. Kudos to you for stepping up and explaining the mistake. It's a great lesson for others (myself included), and even more valuable than a wicked-fast XML module :)
If you don't mind, what was the error? Off-by-one somewhere?
Generally speaking, error was using load_xml() method of LibXML() instead of load_html(), which is the right method for this case (I mentioned that in my post).
So the error was rather in my head than in the code.
"If you realize that you aren't as wise today as you thought you were yesterday, you're wiser today." - Olin Miller