Holy bloat, Batman!

Let's compare the latest constant.pm to a minimal equivalent:
    $ ./perl -Ilib -le 'print $^V'; /usr/bin/time -l ./perl -Ilib -le 'use constant X => 1..5; print X' 2>&1 | grep 'maximum resident'
       3829760  maximum resident set size
    $ /usr/bin/time -l ./perl -I/tmp -le 'use constant X => 1..5; print X' 2>&1 | grep 'maximum resident'
       1200128  maximum resident set size
That's 2.6MB bloat to define a constant. (The culprit turns out to be utf8, natch, to handle Unicode constants. (Why, God?!)). For reference, /tmp/constant.pm, which does most useful constant-type stuff, is here:
package constant;

sub import
    my $caller = caller;
    if (ref $_[0] eq 'HASH') {
        while (my ($k, $v) = each %{$_[0]}) {
            *{"$caller\::$k"} = sub () { $v };
    } else {
        my $k = shift;
        my @vals = @_;
        *{"$caller\::$k"} = sub () { @vals };


The offending line:

    *{chr 256} = \3;

which is only used for:

    if (exists ${__PACKAGE__."::"}{"\xc4\x80"}) {
	delete ${__PACKAGE__."::"}{"\xc4\x80"};
	*_DOWNGRADE = sub () {1};
    else {
	delete ${__PACKAGE__."::"}{chr 256};
	*_DOWNGRADE = sub () {0};

Thank you for bringing this to my attention.

(I have sent a message detailing this to the Perl 5 Porters mailing list)

So… this cost… it is not a once-per-interpreter that will be be incurred anyway if any part of the program processes Unicode?

Are we, in other words, talking about bloat in constant.pm itself, or about missing lazy loading (which one should hope is possible)?

(Unicode, of course, requires an unfortunate amount of what a lot of people think is bloat. It would be nice to cut down on that; on p5p there have been proposals before, regarding how to do that.)

Did I write that they do? This question and yours both have the same answer.

My own previous questions were not rhetorical. My agenda in asking them pointedly is not to absolve the pragma and dismiss your concern, but to prevent the pragma’s support for Unicode per se from becoming contested. I do not presume the answers to them to be “yes” and “the former”, though, even if I expect it. If they are, then there are two avenues open here: a) ensure that constant.pm will not incur this cost for code which does not require it, and/or b) find ways to reduce that cost significantly. Either of these may address your issue sufficiently. The first is quick and localised; the latter will benefit Perl long-term, long-range.

(What the interpreter currently does to load the Unicode data is both slow and memory hungry compared to how it is done elsewhere. It runs Perl code to parse the tables at runtime into Perl-level hashes, whereas elsewhere it is precompiled to a binary shared library and simply mapped into memory on demand. There were posts to p5p – I do not remember whether by Karl, Father Chrysostomos, Reini, Yves, or someone else yet – proposing to follow that example.)

(FWIW, when it comes to scripts, often as not I forgo constant.pm entirely in favour of just (e.g.) sub DEBUG () { 0 }. It’s such thin syntactic sugar I’m always torn over paying any cost for it. This is especially in my CPAN modules – I am loathe to make users pay for… barely even a convenience to me. I am not torn over it when I’m defining so many constants at once that its pass-a-hash feature spares me loads of copy-pasta:

use constant { qw(
    FOO 1
    BAR 2
    BAZ 3
    QUX 4
    ... .
    ... .
    ... .
    ... .
    ... .
) };


I have sent a patch to Perl5 Porters to remedy this situation.

Based on the comment just before "*{chr 256} = \3;" this should have been reworked after the next dev release (5.15.4).

-    # Before this makes its way into a dev perl release, we have to do
-    # browser-sniffing, as it were....
-    *{chr 256} = \3;
-    if (exists ${__PACKAGE__."::"}{"\xc4\x80"}) {
-	delete ${__PACKAGE__."::"}{"\xc4\x80"};
-	*_DOWNGRADE = sub () {1};
-    }
-    else {
-	delete ${__PACKAGE__."::"}{chr 256};
-	*_DOWNGRADE = sub () {0};
-    }
+    my $downgrade = $] = 5.008
+    *_DOWNGRADE = sub () { $downgrade };

This is sufficient to prevent utf8 from being loaded prematurely.

Leave a comment

About educated_foo

user-pic I blog about Perl.