$tree->delete
when you’re done.
Perl added weak references (a.k.a. “weakrefs”) to resolve this problem, but HTML-Tree has never taken advantage of them. Until now.
HTML-Tree 5.00 (just released to CPAN) uses weak references by default. This means that when a tree goes out of scope, it gets deleted whether you called $tree->delete
or not. This should eliminate memory leaks caused by HTML-Tree.
Unfortunately, it can also break code that was working. Even though that code probably leaked memory, that’s not a big problem with a short-running script. The one real-world example I’ve found so far is pQuery’s dom.t. In pQuery 0.08, it does:
my @elems = pQuery::DOM->fromHTML('<div>xxx<!-- yyy -->zzz</div>')
->childNodes;
my $comment = $elems[1];
is $comment->parentNode->tagName, 'DIV', 'Comment has parentNode';
Notice that it’s not saving the result of the fromHTML
call; only the
child nodes. Since children now have only a weak reference to their
parent, the root node is deleted immediately, and $comment->parentNode
is undef
.
This can be fixed by saving a reference to the root node:
my @elems = (my $r = pQuery::DOM
->fromHTML('<div>xxx<!-- yyy -->zzz</div>'))
->childNodes;
As a quick fix for broken code (and to determine whether it’s the weak references that are causing the breakage), you can say:
use HTML::Element -noweak;
This (globally) disables HTML-Tree’s use of weak references. But this is just a temporary measure. You need to fix your code, because this feature will be going away eventually.
If you want to ensure that weak references are enabled, you can say:
use HTML::Element 5 -weak;
(It is necessary to include the version number, because previous versions of HTML-Tree simply ignored the import list.)
The next major change I’m planning for HTML-Tree is to make parse_file
use IO::HTML by default. Right now, it opens files in binary mode, which means that it doesn’t do the right thing when the file isn’t ISO-8859-1. IO::HTML uses the HTML5 encoding sniffing algorithm to open files using the right encoding. But you don’t have to wait for HTML-Tree 6; you can start using IO::HTML today. Just use IO::HTML
and then use $tree->parse_file(html_file($filename))
. (It also works with new_from_file
.)