Adventures with clang and ASan

By Reini Urban on November 22, 2011 6:08 PM

clang

LLVM's clang (at least 3.1) can be easily used via -Dcc=clang.

The benefit is that your generated code will be faster on DEBUGGING (optimized not so far), compile + link times are much faster and use much less memory, the diagnostics are better and because its AST does not simplify the code beyond repair (as with gcc) it is easy to add various code check passes and diagnostics such as ASan.

I found several warnings which I previously ignored in my code.

Storable.xs:5400:2: warning: expression result unused [-Wunused-value]
SvREFCNT_inc(sv); /* XXX seems to be necessary */
^~~~~~~~~~~~~~~~
../../sv.h:233:2: note: expanded from macro 'SvREFCNT_inc'
_sv; \
^~~
Storable.xs:5440:2: warning: null passed to a callee which requires a non-null argument [-Wnonnull]


Socket.xs:837:47: warning: conversion specifies type 'int' but the argument has type 'STRLEN' (aka 'unsigned long') [-Wformat]
croak("Bad arg length for %s, length is 
%d, should be %d",
~^
%lu

Timings

/usr/src/perl
$ grep scripts=21 build-5.15.*/log.test

debugging -O0 -g3

build-5.15.4d-nt@24ad6161/log.test:u=9.37  s=1.54  cu=723.64  
          cs=28.08  scripts=2152  tests=486474
build-5.15.4d-nt@7bb3c074/log.test:u=9.26  s=1.39  cu=699.95  
          cs=26.65  scripts=2152  tests=489756
build-5.15.4d-nt@8e711f0d/log.test:u=9.61  s=1.05  cu=720.40  
          cs=25.88  scripts=2152  tests=486457
build-5.15.4d-nt@dbc6546a/log.test:u=9.32  s=1.50  cu=730.39  
          cs=32.75  scripts=2151  tests=484891
build-5.15.5d-nt/log.test:u=9.10  s=1.38  cu=709.83  
          cs=27.04  scripts=2154  tests=486849
build-5.15.5d-nt-git-clang/log.test:u=6.59  s=1.27  cu=485.97  
          cs=26.38  scripts=2165  tests=489595
build-5.15.5d-nt-git-llvm/log.test:u=11.92  s=1.36  cu=808.02  
          cs=28.55  scripts=2166  tests=487765

non-debugging -O2

build-5.15.5-nt@5e141575/log.test:u=3.89  s=1.28  cu=331.86  
          cs=24.34  scripts=2166  tests=487755
build-5.15.5-nt-git-clang/log.test:u=4.37  s=1.32  cu=349.79  
          cs=24.24  scripts=2166  tests=487823

llvm is llvm-gcc-4.5, which is the slowest, clang is clang 3.1, which is fast , cc is gcc-4.6.1 which is a bit faster -O2

address-sanitizer

And then there is Google's address-sanitizer (ASan), which detects invalid pointer accesses (read+write) to stack, heap and globals. Also via shadow memory maps as Dr.Memory and DynamoRIO, just much faster than any other memory checker. It's only 2x slower than unchecked, compared to 20x slower with valgrind and 10x with drmemory. And it needs much less memory. http://code.google.com/p/address-sanitizer/wiki/AddressSanitizer Google checks chromium with it.

-Dcc='~/address-sanitizer/asan_clang_Linux/bin/clang' 
-Accflags=-faddress-sanitizer 
-Accflags='-mllvm\ -asan-blacklist=asan_blacklist.ignore'
-Aldflags=-faddress-sanitizer 
-Doptimize='-g3\ -O1'

So I added such a perl-5.15.5d-nt-asan to my debugging test suite. But I had to create a custom asan_blacklist.ignore list to exclude lots of early asan bugs/limitations (Most of them are now fixed).

Notes:

An existing old clang in your path will harm the build process.
I first couldn't build on Linux, even clean, only on Darwin. -m32 support was missing in my libc. So make lib64 install did the trick.
Use -Doptimize=-O2 (or use Alex' -O0 patch)

The problem was that current ASan is not yet properly initialized with -O0, so our Configure probes did fail. I patched it but the developers didn't like it, though it worked for me to create a miniperl and with -O0 a perl and most CPAN modules. I just bypassed ASan. You really need -O1 or -O2 to use ASan. Can we persuade Merjin to use -O1 just for ASan? For sure not.

Update: Alex created a better patch to support -O0 and this looks fine now. Great! See issue 11

On one system I got some linker problem with -fstack-protector, so I removed that from makefile and config.sh. We do not want to check that twice anway. On my debian box and my fixed post-configure clang setup it worked ok with -fstack-protector though.

There's still a Darwin init problem somewhere. Even with DYLD_NO_PIE=1 I had to force init IO, with something like

DYLD_PRINT_OPTS=1 ./miniperl -Dv -Ilib configpm

to get past initial ctor crashes.

export DYLD_PRINT_OPTS=1 && make

and sometimes even

make MINIPERL="./miniperl -Dv -Ilib"

was needed. Problem was not debuggable as it worked okay from the debugger. My darwin seems to load the wrong malloc hook.

And it eventually it led to the first worthwhile problem to inspect, an invalid write in a threaded miniperl. valgrind did not detect this.

$ ./miniperl -Ilib configpm
Expected a Configure variable header or another paragraph of description at configpm line 1010, <GLOS> chunk 1035.
written lib/Config.pod
=================================================================
==2079== ERROR: AddressSanitizer unknown-crash on address 0x7f1ec37f92f0 at pc 0x42e546 bp 0x7ffff98ab790 sp 0x7ffff98ab770

WRITE of size 8 at 0x7f1ec37f92f0 thread T0
    #0 0x42e546 (build-5.15.5d-asan@a7d2e0/miniperl+0x42e546)
    #1 0x47b262 (build-5.15.5d-asan@a7d2e0/miniperl+0x47b262)
    #2 0x7f1ec4958ead (/lib/x86_64-linux-gnu/libc-2.13.so+0x1eead)
    #3 0x41da69 (build-5.15.5d-asan@a7d2e0/miniperl+0x41da69)
0x7f1ec37f92f0 is located 624 bytes inside of 2912-byte region [0x7f1ec37f9080,0x7f1ec37f9be0)
allocated by thread T0 here:
    #0 0x7b93d7 (build-5.15.5d-asan@a7d2e0/miniperl+0x7b93d7)
    #1 0x41dc50 (build-5.15.5d-asan@a7d2e0/miniperl+0x41dc50)
    #2 0x47b1b0 (build-5.15.5d-asan@a7d2e0/miniperl+0x47b1b0)
    #3 0x7f1ec4958ead (/lib/x86_64-linux-gnu/libc-2.13.so+0x1eead)
==2079== ABORTING
Shadow byte and word:
  0x1fe3d86ff25e: 0
  0x1fe3d86ff258: 00 00 00 00 00 00 00 00
More shadow bytes:
  0x1fe3d86ff238: 00 00 00 00 00 00 00 00
  0x1fe3d86ff240: 00 00 00 00 00 00 00 00
  0x1fe3d86ff248: 00 00 00 00 00 00 00 00
  0x1fe3d86ff250: 00 00 00 00 00 00 00 00
=>0x1fe3d86ff258: 00 00 00 00 00 00 00 00
  0x1fe3d86ff260: 00 00 00 00 00 00 00 00
  0x1fe3d86ff268: 00 00 00 00 00 00 00 00
  0x1fe3d86ff270: 00 00 00 00 00 00 00 00
  0x1fe3d86ff278: 00 00 00 00 00 00 00 00

miniperl+0x42e546 is what? It cannot resolve the syms in the backtrace yet. There is an external tool scripts/asan_symbolize.py, but these should better be rewritten in perl to be more stable. I've written now such a symbolizer tool at https://gist.github.com/1392123 and put the full results to perl514.cpanel.net.

So far I prefer objdump with manual macro expansion. Adding symbolizing to asan would be my wishlist, as it sees the expanded macro also.

$ objdump -S -d --start-address=0x42e546 miniperl| less
int
perl_run(pTHXx)
{
    dVAR;
    I32 oldscope;
    int ret = 0;
    dJMPENV;

    PERL_ARGS_ASSERT_PERL_RUN;
#ifndef MULTIPLICITY
    PERL_UNUSED_ARG(my_perl);
#endif

000000000042e546 <perl_run+0x1026>:
    oldscope = PL_scopestack_ix;
#ifdef VMS
    VMSISH_HUSHED = 0;
#endif

    JMPENV_PUSH(ret);
  42e546:       40 88 fa                mov    %dil,%dl
  42e549:       80 e2 07                and    $0x7,%dl
  42e54c:       38 ca                   cmp    %cl,%dl
  42e54e:       0f 8c 18 f1 ff ff       jl     42d66c <perl_run+0x14c>
  42e554:       e8 c7 ef 37 00          callq  7ad520 <__asan_report_store1>
  42e559:       44 89 e9                mov    %r13d,%ecx
  42e55c:       83 e1 07                and    $0x7,%ecx
  42e55f:       83 c1 03                add    $0x3,%ecx
  42e562:       38 c1                   cmp    %al,%cl
  42e564:       0f 8c 3c f1 ff ff       jl     42d6a6 <perl_run+0x186>
    case 0:                             /* normal completion */

JMPENV_PUSH(ret) =>

$ make perl.i
$ edit perl.i
(void)( { cur_env.je_prev = (my_perl->Itop_env); 
(void)0; cur_env.je_ret = __sigsetjmp (((cur_env.je_buf)), ((0))); 
(void)0; (my_perl->Itop_env) = &cur_env; 
cur_env.je_mustcatch = (0); 
(ret) = cur_env.je_ret; } );

To check which line in this macro failed, I usually rename perl.i to .c, do a linebreak as above, fix the linenumber before and recompile.

$ mv perl.i perl.c # I'm in a symlinked buildtree!
$ make
....
ASAN:SIGSEGV
==22000== ERROR: AddressSanitizer crashed on unknown address 
0x3ae4f05642e0 (pc 0x00000042dd4d sp 0x7fff9807a7e0 bp 0x7fff9807a990 
ax 0x000000000003 T0)
#0 0x42dd4d (build-5.15.5d-asan@a7d2e0/miniperl+0x42dd4d)

$ objdump -S -d --start-address=0x42dd4d miniperl| less

        cur_env.je_ret = __sigsetjmp (((cur_env.je_buf)), ((0))); 
  42dd4d:       80 3a 00                cmpb   $0x0,(%rdx)

So either cur_env.je_buf or cur_env.je_ret is wrong. Now we really have to use the debugger. If we are lucky the error is reproducable within gdb. In my case not. Or if not add a printf to this line.

Recompilation with instructing the linker to use the same flags as cc helped here. I added -g -O2 -faddress-sanitizer to all LDFLAGS in makefile. There are three. perl Configure sucks big time with it's cc driver centrism, ignoring ld.

This time it compiled fine and I found what looks like a real core bug:

$ ./perl -f -Ilib pod/buildtoc
=================================================================
==30266== ERROR: AddressSanitizer global-buffer-overflow on address 0x7ff6ca5d8d8b 
at pc 0x7ff6c9e9d2dc bp 0x7fff2362e2c0 sp 0x7fff2362e2a0
READ of size 1 at 0x7ff6ca5d8d8b thread T0
    #0 0x7ff6c9e9d2dc (build-5.15.5d-nt-asan@a7d2e0/libperl.so+0x1002dc)
    #1 0x7ff6c9e9b440 (build-5.15.5d-nt-asan@a7d2e0/libperl.so+0xfe440)
    #2 0x7ff6ca5cea42 (build-5.15.5d-nt-asan@a7d2e0/lib/auto/List/Util/Util.so+0x3a42)
    #3 0x7ff6ca025b00 (build-5.15.5d-nt-asan@a7d2e0/libperl.so+0x288b00)
    #4 0x7ff6c9fa5eee (build-5.15.5d-nt-asan@a7d2e0/libperl.so+0x208eee)
    #5 0x7ff6c9e90209 (build-5.15.5d-nt-asan@a7d2e0/libperl.so+0xf3209)
    #6 0x7ff6c9e86e10 (build-5.15.5d-nt-asan@a7d2e0/libperl.so+0xe9e10)
    #7 0x7ff6c9e67c2d (build-5.15.5d-nt-asan@a7d2e0/libperl.so+0xcac2d)
    #8 0x7ff6c9e5a01e (build-5.15.5d-nt-asan@a7d2e0/libperl.so+0xbd01e)
    #9 0x7ff6c9e561f8 (build-5.15.5d-nt-asan@a7d2e0/libperl.so+0xb91f8)
    #10 0x7ff6c9f1fa83 (build-5.15.5d-nt-asan@a7d2e0/libperl.so+0x182a83)
    #11 0x7ff6c9e8d4b8 (build-5.15.5d-nt-asan@a7d2e0/libperl.so+0xf04b8)
    #12 0x7ff6c9e88120 (build-5.15.5d-nt-asan@a7d2e0/libperl.so+0xeb120)
    #13 0x404d9e (build-5.15.5d-nt-asan@a7d2e0/perl+0x404d9e)
    #14 0x7ff6c8f5fead (/lib/x86_64-linux-gnu/libc-2.13.so+0x1eead)
    #15 0x404b59 (build-5.15.5d-nt-asan@a7d2e0/perl+0x404b59)
0x7ff6ca5d8d8b is located 0 bytes to the right of 
global variable '.str27' (0x7ff6ca5d8d80) of size 11
  '.str27' is ascii string 'List::Util'
==30266== ABORTING
Shadow byte and word:
  0x1ffed94bb1b1: 3
  0x1ffed94bb1b0: 00 03 f9 f9 f9 f9 f9 f9
More shadow bytes:
  0x1ffed94bb190: f9 f9 f9 f9 00 00 00 00
  0x1ffed94bb198: f9 f9 f9 f9 00 00 00 00
  0x1ffed94bb1a0: 00 00 00 04 f9 f9 f9 f9
  0x1ffed94bb1a8: 03 f9 f9 f9 f9 f9 f9 f9
=>0x1ffed94bb1b0: 00 03 f9 f9 f9 f9 f9 f9
  0x1ffed94bb1b8: 00 07 f9 f9 f9 f9 f9 f9
  0x1ffed94bb1c0: 00 00 00 00 00 00 00 00
  0x1ffed94bb1c8: 00 00 00 00 05 f9 f9 f9
  0x1ffed94bb1d0: f9 f9 f9 f9 00 04 f9 f9

Now this really looks like an invalid read past the trailing 0-byte on the gv name. (size 11 of 'List::Util' sounds like the 0 was allocated. 8+2+1 = 11)

$ objdump -Sd --start-address=0x1002dc libperl.so|less
00000000001002dc <Perl_gv_name_set+0x19c>:

    if (!(flags & GV_ADD) && GvNAME_HEK(gv)) {
        unshare_hek(GvNAME_HEK(gv));
    }

    PERL_HASH(hash, name, len);
  1002dc:       89 fa                   mov    %edi,%edx
  1002de:       83 e2 07                and    $0x7,%edx
  1002e1:       83 c2 03                add    $0x3,%edx
  1002e4:       38 ca                   cmp    %cl,%dl
  1002e6:       7c 70                   jl     100358 <Perl_gv_name_set+0x218>

These macro expansions are a bit longer, so I spare you the details. Same procedure as above. 0xfe440 is in Perl_gv_init_pvn, which comes from List/Util.so which probably defined the global name of the module.

The bug really was there in ListUtil.xs

if (SvTYPE(rmcgv) != SVt_PVGV)
gv_init(rmcgv, lu_stash, "List::Util", 12, TRUE);

12 is clearly off-by-two. A classical copy&paste error from 3 lines above. What worries me is that no other compiler or tool found this. Filed as rt.cpan.org #72700

And why does valgrind does not complain? Because valgrind cannot find Global OOB (Out of bound) not Stack OOB, only Heap OOB. Here we have the global variable '.str27'.

valgrind only found these known leaks: (full details with --leap-check=full)

Warning: bad signal number 0 in sigaction()

HEAP SUMMARY:
    in use at exit: 4,840,027 bytes in 44,571 blocks
  total heap usage: 722,897 allocs, 678,326 frees, 158,660,281 bytes allocated

Searching for pointers to 44,571 not-freed blocks
Checked 9,191,152 bytes

LEAK SUMMARY:
   definitely lost: 1,557 bytes in 82 blocks
   indirectly lost: 0 bytes in 0 blocks
     possibly lost: 0 bytes in 0 blocks
   still reachable: 4,838,470 bytes in 44,489 blocks
        suppressed: 0 bytes in 0 blocks
Rerun with --leak-check=full to see details of leaked memory

ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
used_suppression:      4 dl-hack3-cond-1

An invalid read is certainly more important than a minor leak. And valgrind is so slow that it is only used randomly. ASan is so fast and so much better that I compile it in and use it all the time now in my debugging perl.

BTW, the leak is:

 1,557 bytes in 82 blocks are definitely lost in loss record 1,284 of 1,582
    at 0x4C2779D: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    by 0x4EE9CE6: Perl_safesysmalloc (util.c:100)
    by 0x4EEB7CC: Perl_savepv (util.c:1103)
    by 0x4E78989: Perl_newXS_len_flags (op.c:7045)
    by 0x4E77E92: Perl_newCONSTSUB_flags (op.c:6947)
    by 0x4E8D893: Perl_gv_init_pvn (gv.c:373)
    by 0x4E9237C: Perl_gv_fetchpvn_flags (gv.c:1691)
    by 0x4E93D55: Perl_gv_fetchsv (gv.c:1395)
    by 0x4E7B187: Perl_ck_rvconst (op.c:7667)
    by 0x4E6DE69: Perl_newUNOP (op.c:3687)
    by 0x4EA6C4B: Perl_yylex (toke.c:6690)
    by 0x4EB934B: Perl_yyparse (perly.c:434)

BTW: I really like the concept of shadow memory maps. See AddressSanitizerAlgorithm or the developers thesis paper about DynamoRIO at http://www.burningcutlery.com/derek/phd.html Much easier than with huge guard pages, like electric fence.

Summary of found perl core bugs

With the old asan rev 144800 I found 32+13 new unique perl core problems, unthreaded. Most of them look security relevant. Just filed one bug report for now. Will have to automate this somehow. scripts/asan_symbolize.py works only on Linux for me, and I wrote a better symbolizer asan_addr2dis. Some of the problems seem to be asan problems not perl.

address-sanitizer: I love you!

There are some minor bugs still, it is currently being merged into llvm proper, but it's usable.

$ perl -lne'BEGIN{$/=q/ERROR: AddressSan/}; 
   print join " ",$1,$2,$3 if /tizer (.+?) on address.*
   ((?:READ|WRITE) of size \d+).*? is located (at offset \d+ in .*?) 
   of T0/s' log.test-5.15.5d-nt-asan\@a7d2e0| sort -u

stack-buffer-overflow READ of size 1  <Perl_pp_entereval>
stack-buffer-overflow READ of size 8  <Perl_sv_vcatpvfn>
stack-buffer-overflow WRITE of size 1  <Perl_gv_stashpvn>
stack-buffer-underflow WRITE of size 1  <Perl_gv_stashpvn>
stack-buffer-overflow WRITE of size 1  <Perl_gv_fetchfile_flags> bogus
stack-buffer-overflow WRITE of size 1  <S_study_chunk>
stack-buffer-overflow WRITE of size 1  <Perl_call_sv>
stack-buffer-overflow WRITE of size 8  <Perl_call_sv>
stack-buffer-underflow WRITE of size 1  <Perl_call_sv>
stack-buffer-overflow WRITE of size 1  <Perl_amagic_call>
stack-buffer-overflow WRITE of size 4  <S_find_byclass>
stack-buffer-overflow WRITE of size 4  <Perl_pregcomp>
stack-buffer-overflow WRITE of size 4  <Perl_re_compile>
stack-buffer-overflow WRITE of size 8  <Perl_re_compile>
stack-buffer-underflow WRITE of size 8  <Perl_re_compile>
stack-buffer-overflow WRITE of size 8  <Perl_sighandler>
stack-buffer-overflow WRITE of size 8  <Perl_call_list>
stack-buffer-overflow WRITE of size 8  <Perl_regexec_flags>
stack-buffer-underflow WRITE of size 8  <Perl_regexec_flags>
stack-buffer-overflow WRITE of size 8  <Perl_die_unwind>
stack-buffer-underflow WRITE of size 4  <Perl_die_unwind>
stack-buffer-underflow READ of size 8  <Perl_sv_vcatpvfn>
stack-buffer-underflow WRITE of size 1  <Perl_die_unwind>
stack-buffer-underflow WRITE of size 1  <Perl_newATTRSUB>
stack-buffer-underflow WRITE of size 1  <Perl_Gv_AMupdate>
stack-buffer-underflow WRITE of size 4  <Perl_pp_die>
stack-buffer-underflow WRITE of size 4  <Perl_croak>
stack-buffer-underflow WRITE of size 8  <Perl_pp_entersub>

$ perl -lne'BEGIN{$/=q/ERROR: AddressSan/}; print join " ",$1,$2,$3 if
   /tizer (.+?) on address.*((?:READ|WRITE) of size \d+).*? is( located
   \d bytes .*? \()/s' log.test-5.15.5d-nt-asan\@a7d2e0| sort -u

global-buffer-overflow READ of size 1  to the right of global variable '.str'
global-buffer-overflow READ of size 1  to the right of global variable '.str69'
heap-buffer-overflow READ of size 1 at 16-byte region
heap-buffer-overflow READ of size 1 at 16-byte region
heap-buffer-overflow READ of size 8 at 16-byte region
heap-buffer-overflow READ of size 8  at 16-byte region
heap-buffer-overflow READ of size 8  at 19-byte region
heap-buffer-overflow READ of size 8  7 bytes to the right of 9-byte region
heap-buffer-overflow READ of size 8  8 bytes to the right of 8-byte region
heap-buffer-overflow READ of size 8  8 bytes to the right of 8-byte region
heap-buffer-overflow READ of size 8  8 bytes to the right of 8-byte region
heap-buffer-overflow READ of size 8  8 bytes to the right of 8-byte region
heap-buffer-overflow READ of size 8  8 bytes to the right of 8-byte region

The developer mentions for those OOB reads:

Please be aware that some of the out-of-bound reads may be caused by over-optimizations in string processing functions. For example, a function may read 8 bytes at a time if it knows that the strings are 8-aligned and NULL-terminated. Theoretically this is still an error, but in practice it should not cause any problems.

I have to check all of them manually and keep them in an asan perl blacklist, which is a suppression file.

After some days analyzing mosty of these reports I came to the conclusion that only the very first report caught a perl bug, the rest were false positives. Caused by either not detecting local pointer updates or by mangling the control-flow with longjmp.

Since then asan is now included in llvm trunk, and miniperl can be compiled out of the box. See the asan HowToBuild instructions, and for configure I used

-D'cc=/usr/src/llvm/projects/compiler-rt/lib/asan_clang_linux/bin/clang'
-A'ccflags=-faddress-sanitizer' 
-A'ldflags=-faddress-sanitizer'

llvm rev 146046

Now only those tests failed:

op/taint.t
op/tie.t
re/pat_re_eval.t
re/pat_rt_report.t
re/reg_mesg.t
re/regexp.t
re/regexp_noamp.t
re/regexp_notrie.t
re/regexp_qr.t
re/regexp_qr_embed.t
re/regexp_trielist.t
run/fresh_perl.t
uni/method.t
uni/parser.t
uni/readline.t

With problems in those functions:

$ perl -lne'BEGIN{$/=q/ERROR: AddressSan/}; 
  print join " ",$1,$2,$3 if /tizer (.+?) on address.*((?:READ|WRITE) of size \d+).*? is located (at offset \d+ in .*?) of T0/s' log.test | sort -u
stack-buffer-overflow READ of size 1 <Perl_sv_compile_2op_is_broken>
stack-buffer-overflow READ of size 4 <Perl_vmess>
stack-buffer-overflow READ of size 8 <Perl_vcroak>
stack-buffer-overflow WRITE of size 1 <Perl_amagic_call>
stack-buffer-overflow WRITE of size 1 <Perl_call_sv>
stack-buffer-overflow WRITE of size 1 <Perl_gv_stashpvn>
stack-buffer-overflow WRITE of size 1 <Perl_re_compile>
stack-buffer-overflow WRITE of size 1 <Perl_vcroak>
stack-buffer-overflow WRITE of size 1 <S_incline>
stack-buffer-overflow WRITE of size 1 <S_re_croak2>
stack-buffer-overflow WRITE of size 4 <Perl_re_compile>
stack-buffer-overflow WRITE of size 4 <S_pack_rec>
stack-buffer-overflow WRITE of size 8 <Perl_call_sv>
stack-buffer-overflow WRITE of size 8 <perl_destruct>
stack-buffer-overflow WRITE of size 8 <Perl_die_unwind>
stack-buffer-overflow WRITE of size 8 <Perl_Gv_AMupdate>
stack-buffer-overflow WRITE of size 8 <Perl_hv_common>
stack-buffer-overflow WRITE of size 8 <Perl_re_compile>
stack-buffer-overflow WRITE of size 8 <Perl_regexec_flags>
stack-buffer-underflow WRITE of size 1 <Perl_die_unwind>
stack-buffer-underflow WRITE of size 8 <Perl_regexec_flags>

Still investigating. They look like false alarms to me.

But I already detected some more CPAN errors, like #73118 in DBI, and #73111 in JSON::XS

Static analysis with clang-analyzer

clang comes with scan-build which uses ccc-analyzer to statically analyze C/C++ code. http://clang-analyzer.llvm.org/

scan-build ./Configure ...
scan-build -V -k make
scan-build -V -k make test

generates a lot of html reports in /tmp/scan-build-* Have a look and you will be surprised. It's no big deal mostly, but Perl definitely could benefit from more compiler attributes esp. noreturn, and more defensive code.

http://perl514.cpanel.net/scan-build-2011-11-22-1/

0 comments

On simple benchmarks

By Reini Urban on November 18, 2011 8:14 PM

use Benchmark is not good enough. At all. - you can specify -2 as count which means 2 seconds. Good. - if you specify the test code as string not coderef means that you bench also the parsing time for all counts, and not plain run-time. coderefs should be used. The result is entirely unrealistic as you compile once and run often. - the iteration results are not used at all to check the statistical test quality. - without using :hireswallclock you get time(2) precision which is integer seconds.

benchmark-perlformance is too good and too slow. It"s good to have a single special and reliable machine for this, but I see no useful results. And I miss simple tests with good op coverage. I even do not see op coverage at all.

How fast is my perl, how good is my test and how good is my test result?

Dumpbench reports at least some statistical quality, but needs too many args. initialruns and targetprecision should not be mandatory.

I need a Benchmark package which automatically selects the number of iterations to get a reliable and reproducible result, rejects automatically high load prior to start, filters outliers within the testing (warmup oscillations, random spikes) and statistically bad results (low precision or low accuracy). Do not print bad results, reject them.

This is no high art as reading other documentation suggests. Reproducing measurements is basic work done daily. In my previous work with AVL our customers demanded to get good quality measurements results, repeatability and statistical verification. The user should not be bothered with mandatory arguments which will highly influence the results. E.g. Graphics benchmarks give us a single number FPS, which tells us all on any CPU, GPU and architecture. Results should be comparable on different machines. 31.161/s on one machine means what on a slower machine? 2.164.227/s means what? Probably that the test is just a null-op. This should be rejected right away.

Furthermore it should be possible to see the graph 1. for the single benchmark run (to verify quality), and 2. compare results with a graph in time for different perl versions and time. Similar to http://speed.perlformance.net/timeline/ but with more than 3 results in the graph. At least for every major perl version, but optionally for every single commit.

I"ll give it a try. At least for the questions how fast is it and how good is my test result.

The profiling stats for how good is my test is another problem, which eg. cannot be tackled with a simple line level profiler like this (simplified version from brian d foy's Mastering Perl)

package Devel::prof;
sub DB::DB {
  my ($file,$line) = (caller)[1,2];
  return unless $file eq $0;
  $c[$line]++
}
END{
  open F, $0 or die;
  print "\nEND - Linecount for $0:\n";
  while (<F>) {
    printf "%5d: %5d %s", $c[$i], $i, $_ if $c[++$i];
  }
}
1;

which is just a simplified version of NYTProf. Rather only with an op-level profiler like this B::Stats which tells you which ops were called how often, so we can see which ops were missed and how the distribution of ops is and compare this to typical perl programs.

Also needed would be the knowledge of typical op costs, which can be extracted from system profiling tools. How slow is divide compared to idivide, helem vs hslice, keys vs values, method vs methodnamed, leavesub vs leavesublv, enteriter vs grepwhile vs mapwhile.

My initial take on perl-core testing was:

sub f{my($n)=@_;$n==8 and bless{1..4}and$a=~s/$/../;$n<2 and return$n;
f($n-1)+f($n-2)}f(33)

but how what ops does this use and is this a fair example? See the thread starting at http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2010-09/msg00403.html

Did you see spark BTW? https://github.com/holman/spark

2 comments

A lot of good tiny ideas

By Reini Urban on November 18, 2011 5:22 PM

brian d foy was here in Houston for two days and I got a lot of good tiny ideas:

1. implement last out of grep/map (disabled because broken with 5.6)

2 days. step out 2 scopes in dopoptoloop:
grep, grep_item, block

$ p -MO=Concise,-exec -e'grep{last if $_ == 2} 1..3'
1 0> enter
2 ;> nextstate(main 2 -e:1) v:{
3 0> pushmark s
4 $> const(AV ) s
5 1> rv2av lKPM/1
6 @> grepstart K*
7 |> grepwhile(other->8)[t3] vK
8 0> enter s
9 ;> nextstate…

1 comment

My take on Modern Perl

By Reini Urban on November 15, 2011 4:21 PM

Szagab recently provoked me. On his question "I wonder how to teach "Modern" Perl? http://szabgab.com/how-to-teach-modern-perl.html" I anwered him by pointing to the qore manual.

Modern Perl might be an established term already. As radical modernist (opposed to a radical post-modernist) I still like to use my own vision of a modern perl. Not directly opposing chromatic's modern perl. Just a real modern perl, as an outsider would consider it. Which just happens to be the qore feature set. Everything I would have done to make perl modern would have been what David Nichols already did in qore. Plus channels for IPC.

Qore is basically a modern perl, a rewrite with a modern vision. One can take a perl script, and optionally enhance it with types and background and get a qore speed-up and compile-time safety, but obviously a compatibility problem.

Optional strong typing (what I plan to do in the near future)
Fast native threading, enabled automatically, TID 0 for signals.
Thread-safe library and types
Deadlock detection
Multi-core support as in Go, with SMP thread scheduler, but no channels yet.
Modern rewritten parser (he did it in C++)
Modern exception handling, without string parsing of die messages
Unified network event handling

What is different:

sub return types are C-style. perl has no style so far.

[return_type] sub function_name([[type] variable1, ...]) {
    statements;
}

const are true constants, used by the compiler. Currently only Const::Fast comes close, Readonly and constant not at all.

The class and namespace keywords deviate from perl packages, that they are modern equivalent of "what to expect from them". No magic @ISA variable for inheritance e.g.

class options are [inherits [private|public]

method options are [static] [synchronized] [private]

So as I enview it, a "Modern Perl" would take the same step as "qore" did, plus channels. The existing feature set of chromatic's Modern::Perl is in my opinion the false way. Do not add half-features (e.g. Moose, Devel::Declare, types, ...) without native core support, and struggle then later. First the basics should be supported than the sugar can be done upon it. We don't have the basics but are already using the sugar, in an inefficient and hackish way. perl5 is IMHO not enough modern yet. We are using classes without efficient class and method support, we allow types but do not use it, we call a feature ithreads, which has nothing to do with threads. Marc Lehmann called it in his strong "Why so-called Perl threads should die" rant an inefficient fork. Ops and data are copied on thread init, not shared, no copy-on-grow, no thread-safe datastructures and library functions. No efficient SMP and modern IPC support.

"Modern" would oppose the current policy to allow everything outside core but hardly support anything inside. That's entirely "post-modern". "Modern" would be native green threads in core, native Class::Mop, native types, native argument handling in core. Handling it outside is nice, but post-modern.

13 comments

Single-file distro

By Reini Urban on November 9, 2011 8:09 PM

No, not Par.

I have a simple script perlall which is deployed as App::perlall, which comes with Makefile.PL, tests and such, but really I only want to rsync this single script to all of my test machines.

I even wrote a initvm command to deploy it automatically to other machines.

Then I came up with this simple autoinstaller to add missing non-core libs.

BEGIN { # autoinstall the non-core modules
  my @m;
  for (qw(App::Rad IO::Tee IO::Scalar Devel::Platform::Info Devel::PatchPerl)) {
    push @m, $_ unless eval "require $_;" }
  if (@m) { # Checked the API back to 1.76_01 (v5.8.4)
    require CPAN; CPAN->import;
    warn "CPAN::Shell->install(qw(@m))\n"; CPAN::Shell->install(@m); }
  $_->import for @m;
}

I'm still working on it, but I have created so far with perlall about 50 perls on about 6 machines with lots of compilers and features and platforms, and I am constantly testing with these.

https://github.com/rurban/App-perlall/

The App::Cmd magic is done with App::Rad. This does the command, option and parameter detection from @ARGV, reads a config file for defaults, and runs the command or does the simple error handling. Thanks to Naveed Massjouni who said on Google+: "If you like App::Cmd, also take a look at https://metacpan.org/module/App::Rad" In fact I hated App::Cmd and App::Rad is the best tool for such a single-file script.

I even had to extend the limited option handling, because I need global pre-cmd options and special cmd-specific options.

perlall -v build 5.14.2d-nt --link

build = cmd
-v = global pre-cmd option, --link = cmd-specific options.
5.14.2d-nt argument to build

Plain App::Rad only supported pre-cmd options, but changing the cmd and argument detector was trivial.

All normal functions are available as commands, all underscored functions are internal helpers. The usage error screen is created automatically from :Help(text) attributes.

Take care not use valid perl statements in that text, because Attribute::Handler is used which evals the text. So a usage text of

sub version :Help(print version) { $0." ".$VERSION}

will actually print the version at CHECK time. It does CHECK{eval "print version"}. The fix was

sub version :Help(Print version) { $0." ".$VERSION}

Attribute::Handler sucks big-time because it is insanely hard to detect attributes generally. Better do your attribute handling with manual FETCH_SCALAR_ATTRIBUTES methods.

0 comments

Reini Urban

November 2011 Archives

Adventures with clang and ASan

clang

Timings

address-sanitizer

Summary of found perl core bugs

Static analysis with clang-analyzer

On simple benchmarks

A lot of good tiny ideas

My take on Modern Perl

Single-file distro

About Reini Urban

Search this blog