August 2010 Archives

Avoid my keys() accident

I broke $work yesterday when a change I'd made that I thought was mundane was not in fact. I'd changed some code from: if ( keys %$hash_ref ) { to if ( %$hash_ref ) { under the theory that we weren't supporting any perl less than our current production at perl-5.10.0.

What I'd completely missed was that $hash_ref might be undef and that keys %$... would auto-vivify the hash if necessary. Previously, the hash would be created as a side effect of dereferencing it. Afterward, I got an exception because the hash wasn't being automatically created just by looking at it.

The reason for this is keys() is actually an lvalue, something you can assign to. The meaning of keys( %... ) = 8 is actually fairly obscure and not what you would guess if you haven't read the documentation for keys(). Because keys() is an lvalue, the dereference %$... will auto-vivify anything necessary because I might want to modify it.

When I dropped the usage of something strictly defined as an lvalue function, the dereference stopped auto-vivifying and now I had exceptions in production. Wheee!

Ponies are the truth

Did you know you can modify perl's readonly constants for undef, true and false? Yep.

&Internals::SvREADONLY( \ !!1, 0 );
${ \ !!1 } = 'ponies!';
&Internals::SvREADONLY( \ !!1, 1 );

print !!1; # ponies!

Same thing for \ undef and \ !!0.

Benchmarking string trimming

Clever Regexps vs Multiple Simple Regexps:

In reading some code I ran across the expression s/^\s*|\s*$//g which is a trim function. It is not the optimal way to write this. The optimal way is two simpler expressions: s/^\s+//; s/\s+$//. Justification follows.

Conclusion:

  • Use of + instead of * means regexps that will would do no effective work will also fail to match. Failing to match when the work would be useless yielded some 3x to 4x improvement.

  • Use of multiple simpler patterns like s/^...//;s/...$// instead of compound patterns like s/^...|...$//g enabled boundary checking optimizations.

Testing:

String length:

long:  +80 chars
short: -80 chars

Pre/postfixes:

pre/post: "  string  "
pre:      "  string"
post:       "string  "
base:       "string"

Coding styles:

g*: s/^\s*|\s*$//g
g+: s/^\s+|\s+$//g
2*: s/^\s*//
    s/\s*$//
2+: s/^\s+//
    s/\s+$//

Calculated results:

>>  short pre 2+      1638810/s
>>  short base 2+     1622457/s
>>  short post 2+     1351812/s
>>  short pre/post 2+ 1152253/s
>>  long base 2+       564477/s
>>  long pre 2+        534890/s
    short base +g      532709/s
    short post +g      502626/s
>>  long post 2+       501015/s
    short pre +g       479683/s
    short pre/post +g  465137/s
>>  long pre/post 2+   463741/s
    short base 2*      462448/s
    short pre 2*       456719/s
    short pre/post 2*  450081/s
    short post 2*      449661/s
    short base *g      394226/s
    short pre *g       384360/s
    short post *g      367736/s
    short pre/post *g  367624/s
    long post 2*       114832/s
    long base 2*       113787/s
    long pre 2*        110305/s
    long pre/post 2*   110169/s
    long post +g       100847/s
    long base +g        99830/s
    long pre +g         98871/s
    long pre/post +g    98331/s
    long base *g        87066/s
    long post *g        86520/s
    long pre *g         84080/s
    long pre/post *g    81429/s

My gladiators are getting entangled

Hi, I've had a long-standing interesting in having a nice way to browse through perl's memory space. Here's today's attempt. It almost works great except that the two introspection modules Devel::FindRef and Devel::Gladiator don't know enough to stay hands-off from each other.

require Data::Dumper;
# require Data::Dump::Streamer;
require Devel::Gladiator;
require Devel::FindRef;
require Scalar::Util;
require B;

my %SKIP_REF;
my $all;

$SKIP_REF{ Scalar::Util::refaddr( \ %SKIP_REF  ) } = undef;
$SKIP_REF{ Scalar::Util::refaddr( \ %seen      ) } = undef;

$all = Devel::Gladiator::walk_arena();
for ( @$all ) {
    Devel::Peek::SvREFCNT_inc( $_ );
    print STDERR Data::Dumper::Dumper($_);

    # Skip the local variables
    next if
        # Skip any variables local to this probe
        exists( $SKIP_REF{ Scalar::Util::refaddr( $_ ) } )

        # Skip the global shared hash
        || ( 'HASH' eq Scalar::Util::reftype( $_ )
             && ! ( B::svref_2object( $_ )->FLAGS() & 0x2000_0000 ) )
    ;

    print STDERR Devel::FindRef::track( $_ );
}
print STDERR "Decrementing refcnts\n";
Devel::Peek::SvREFCNT_dec($_) for @$all;
print STDERR "Done\n";

Here's a sample of the output before any attempts to make this less voluminous:

$VAR_0x17b308f81 = \*B::RXf_PMf_EXTENDED;
GLOB(0x17b308f8) [refcount 5] is
+- referenced by REF(0x17b81148) [refcount 1], which is
|  the array element 1 of ARRAY(0x17b81118) [refcount 1], which is
|     referenced by REF(0x17b81178) [refcount 1], which is
|        the member '\{f8}\{08}\{b3}\{17}' of HASH(0x17b80e30) [refcount 1], which is
|           referenced by REF(0x17b80f20) [refcount 1], which is
|              the member 'seen' of Data::Dumper=HASH(0x17b80e60) [refcount 3], which is
|                 +- referenced by REF(0x17a3b6b8) [refcount 1], which is
|                 |  referenced by REF(0x17afe3f8) [refcount 1], which is
|                 |     the array element 5296 of ARRAY(0x17b16018) [refcount 2], which is
|                 |        referenced by REF(0x179f5730) [refcount 2], which is
|                 |              not referenced within the search depth.
|                 +- referenced by REF(0x17a306d8) [refcount 1], which is
|                 |  referenced by REF(0x17b59868) [refcount 1], which is
|                 |     the array element 5461 of ARRAY(0x17b16018) [refcount 2], which was seen before.
|                 +- referenced by REF(0x17a21f78) [refcount 1], which is
|                    referenced by REF(0x17b57360) [refcount 1], which is
|                       the array element 5856 of ARRAY(0x17b16018) [refcount 2], which was seen before.
+- referenced by REF(0x17b80db8) [refcount 1], which is
|  the array element 0 of ARRAY(0x17b80da0) [refcount 2], which is
|     +- referenced by REF(0x17b80f38) [refcount 1], which is
|     |  the member 'todump' of Data::Dumper=HASH(0x17b80e60) [refcount 3], which was seen before.
|     +- referenced by REF(0x17a21e70) [refcount 1], which is
|        referenced by REF(0x17b57468) [refcount 1], which is
|           the array element 5845 of ARRAY(0x17b16018) [refcount 2], which was seen before.
+- referenced by REF(0x17b80d70) [refcount 3], which is
|  +- the array element 0 of ARRAY(0x17b16018) [refcount 2], which was seen before.
|  +- the global $main::_.
+- referenced by REF(0x17a3b6e8) [refcount 1], which is
|  referenced by REF(0x17afe3c8) [refcount 1], which is
|     the array element 5298 of ARRAY(0x17b16018) [refcount 2], which was seen before.
+- the member 'RXf_PMf_EXTENDED' of HASH(0x179f5670) [refcount 3], which is
+- referenced by REF(0x17b54420) [refcount 1], which is
|  the array element 6360 of ARRAY(0x17b16018) [refcount 2], which was seen before.
+- the global %main::B::.
$VAR_0x17b309101 = sub () { 32768 };
CODE(0x17b30910) [refcount 2] is
+- referenced by REF(0x17b80d58) [refcount 3], which is
|  +- the array element 1 of ARRAY(0x17b16018) [refcount 2], which is
|  |  referenced by REF(0x179f5730) [refcount 2], which is
|  |     +- referenced by REF(0x17b54360) [refcount 1], which is
|  |     |  the array element 6368 of ARRAY(0x17b16018) [refcount 2], which was seen before.
|  |     +- the lexical '$all' in CODE(0x179d7530) [refcount 1], which is
|  |        the main body of the program.
|  +- the global $main::_.
+- the global &B::RXf_PMf_EXTENDED.

About Josh ben Jore

user-pic