coke [blogs.perl.org]

Wondering what you can do for Perl?

By StylusEater on December 30, 2014 5:56 PM under Community

An interesting idea seems to be floating around the Internet ... creating a basic interactive website on how to contribute to a particular FLOSS project. After some noise on the Perl Monger Groups and Perl Propaganda lists I decided to help a few folks put together: http://whatcanidoforperl.org/ . You might have read something about it on Perl Weekly or not. If you have suggestions please create a ticket or better yet create a pull request here. Cheers!

0 comments

perlcc next steps

By Reini Urban on August 7, 2014 11:46 PM

cPanel uses now the new perl compiler B::C with -O3 and --staticxs with 5.14.4 in production, and I'm outlining the next steps.

Our old compiler (with 5.6.2) needed 2:30 hours for each build, produced ~100 binaries of about 30-50MB size. This sounds a lot but it is not. This is about the size a single perl scripts needs in memory, but a single perl script has to find all the dependent files first, load and parse it, and construct the stashes, the nested hashes of namespaces. All this is not needed at all during run-time, hence perlcc compiled binaries do not need it. The new compiler uses 30 minutes for the same binaries on our build servers, but is not yet using a lot of internal optimizations. It does use a lot of statically initialized strings and data, so the run-time is better than with 5.6, even if 5.6 is still much faster than the 5.14 run-time. But we still don't collect arrays into _independent_comalloc ranges as we did with 5.6, we still don't use link-time nor profile guided optimizations and we still do not optimize single modules seperately.

There are not many shared libraries used in such a compiled binary, just libperl and the used XS shared libraries.

Starting such a single binary is very fast, it's just mapped into memory at once, then the dynamic symbols need to be generated, as B::C cannot yet generate them statically, and perl5 is not really helpful in supporting static symbols, but all other data, strings and ops (perl functions) are generated statically. Then the XS modules are initialized (booted), then some pointers pointing into some shared libraries need to updated, as they pointed to some other pointer at compile-time and then the perl program is started.

perlcc -m

The plan for the next year is to support generating shared modules per perl package. perlcc -m compiles a package to a shared library, buildcc generates the Makefile.depend dependencies for each binary, and then you can use selectively more advanced optimizations per package. I.e. B::CC or rperl optimized compilations, which should run 2-20x faster than the current B::C compiled packages.

B::CC is not stable enough yet, so you can only use it with some modules, and it currently benefits greatly from type information, for which upstream maintainers have no interest. With rperl it is even more strict, as those modules are not even more strict, as in real programming languages compared to our simple dynamic scripting language, it explicitly disallows certain kinds of dynamic magic or array autovivification, which slows down run-time so much as benchmarked in my YAPC::EU 2013 talk.

So the plan is to compile to several optimized libraries seperately and link them together later. For binaries which fork or which need to be updated pretty often, shared libraries are preferred, for other simple binaries static libraries are preferred as they are loaded much faster. Shared libraries can be updated much easier, and when you fork a binary its shared libraries are shared, there's no seperate copy in memory. This is also one of the main reasons to change our current perl5 unicode implementation from simple and slow perl data structures, to compiled shared libraries, which are just mapped in when needed and can be shared.

Data::Compile

But first some other steps are needed. Before we will create compiled libraries per perl packages we want to compile only some read-only datastructures, hashes mainly. The name of this module is Data::Compile and will replace our current cdb databases for localization data. cdb maps readonly strings to strings, but cannot store other values than strings and is implemented as normal hashmap. Our localization data would be much easier to use with stored coderefs, similar to pre-compiled html templates, which which is a mix of strings and coderefs. By using certain parts of B::C it is trivial to store data as shared library, which can be dynaloaded on demand. Loading such a compiled datastructure of perl data is of course much faster then loading a btree or colliding hashmap and accessing it at run-time through some magic tie interface. B::C is very efficient in storing perl data statically, and only some GV symbols need to be initialized at boot-time.

Data::Compile will also replace the overly slow serializers, like Sereal, Cpanel::JSON::XS, Data::MessagePack or Storable, which have to convert binary data from and to perl data. Data::Compile freezes perl data natively and just thaws it at instance. So loading the CPAN Metadata file will go from 16 secs to 0.001ms, similar to CPAN::SQLite and cpanm which does those queries at run-time to some server will not be needed anymore. Updating such a compiled CPAN Metadata file is estimated to need 2 secs, which needs about the same time as updating CPAN with CPAN::SQLite. And CPAN::SQLite still has a locking bug, which prevents multiple cpan sessions, so you are easier of with cpanm now.

In order to optimize loading of static data, readonly hashes, to replace cdb or sqlite databases or serialized caches we need another module first:

Perfect::Hash

Currently there's is only gperf to create perfect hashes in C or C++, and then there is also the lesser known bob jenkins perfect executable to create perfect hashes (with some weird header files) and of course the cmph library to create perfect hashes of bigger dictionaries, google size. No single database or datra-structure can be faster to lookup then perfect hashes for readonly data, the fastest single-host readonly databases are cdb and mysql. And perfect hashes beat them by far. Ever wondered why Google lookups are so fast? Well, they distribute hashes across several servers with so-called consistent hashes, which map strings into buckets of n servers, and when they insert and delete entries the remapping (copying to other buckets) is minimized. But the real trick are minimal perfect hashes, which minimize lookup times and storage sizes far beyond normal or sparse hashes or b-trees.

So I created phash to replace gperf in the long term, by using better algorithms to handle any data (gperf fails to work with anagrams or weird keys), to create optimally fast C libraries or optimally small C libraries for fast hash lookups, and even provide backends to output perfect hashes for several popular programming languages, like perl (XS), java, ruby, php or python. As C library you can only store strings as keys, and integers and strings as values. With those other backends you can store all supported values.

Perfect hashes look differently on small 8-bit machines than on fast x86_64 machines, for static libraries or for shared libraries with -fPIC, for <1000 keys or for >1.000.000 keys, for keys with NUL characters or not, for 7-bit keys only, for unicode keys, for case-insensitive key lookup, and much more. Using a high-level language to analyze the keys to generate a good perfect hash (in this case perl) is much easier than fixing and enhancing gperf.

The other main problem is icu which is a collection of glorified but moderately efficient hashmaps of unicode tables and the even worse perl5 implementation of those unicode tables. Encode does it much better by pre-compiling encoding tables into good shared libraries, but unicode has much more tables than just encodings. parrot currently uses icu with a new gperf generated workaround for missing tables in icu (the not-existing whitespace class and missing namealiases which were broken in icu 5.0), and moarvm came up with it's own implementation of those tables to workaround icu deficiencies.

So far I have re-implemented the best algorithms to create and use perfect hashes in pure perl and optimized it with XS functions and I am able to create good C code. I am still testing optimal hash functions and strategies. I need only ~2 seconds to create a perfect hash for 100.000 entries in pure perl, with C libraries this goes down to 0.001 seconds, scaling linearily. The main problem is solved. Compilation with a C compiler and -O3 of such hashes need another second.

E.g. these hashes can be used to replace the constant lookup in ExtUtils::Constants. So I'll look soon into seperate my current ~10 different implementations of perfect hashes into one simple but good enough pure-perl version, which will generate pure-c code, without the need for any external library, and then seperate the others into several packages with external dependencies, like zlib for a fast hardware-assisted crc32 hash function (1 op per word), or libcmph and the various output formatters.

Then Data::Compiled will use Perfect::Hash::XS to store the read-only hashes. And then source code will change to use those hashes instead, and then perlcc -m will be able to compile and link arbitrary perl packages with various type annotations and various compilers, if B::C -O0, -O3, B::CC -O0, -O2 or even rperl.

The next plan is then to create a type inferencer in C, but I'll wait for this until p5p comes to a syntax decision how to annotate return types of functions. If ever.

2 comments

Thank you for all public comments

By Makoto Nozaki on March 22, 2014 4:00 PM

In this round, The Grants Committee has received four grant proposals (list).

Since we posted them in public, we have received a number of public comments. At this point, here is the number of comments at the TPF site:

6: Automated generation of DWIM Perl
9: Perl::Lint - Yet Another Static Analyzer for Perl5
12: RPerl Test Suite Upgrade & Module::Compile Integration
10: JERL (Alien::Jerl) Perl5 running in the JVM

This includes the comments from the applicants.

The number of comments is one way to measure feedback but quality of the comments is more important. And most comments, if not all, were constructive and in great quality.

Thank you for those who spent time to give feedback.

We will conclude the voting process in one week.

0 comments

B::C[C] and rperl

By Reini Urban on January 7, 2014 6:55 PM

With rperl we have now a second serious perl compiler project, which has distinctive advantages over B::C and B::CC, but also several disadvantages.

Basically we need to define compilation blocks for rperl (restricted perl) - typed and compile-time optimizable syntax - and combine it with unoptimizable dynamic perl5 syntax, to be able to run ordinary perl5 code, partially optimized.

B::C can compile dynamic perl5 syntax to C code and simply run the unoptimized optree through libperl. 10x better startup, destruction and 10% memory savings.
B::CC does control-flow and simple type optimizations, esp. on stack variables but has no proper type integration yet, and will never be able to run 100% of perl5 code. 2-5x better run-time. perl5 is a too dynamic language to be properly compilable as you would expect. But perl5 has support to parse and store types, just nobody is using it yet. And p5p does not support optimizations based on the type information, it rather blocks those attempts. So we need a seperate compilation to C (reini), C++ (will) or LLVM (goccy).
perlcc, the frontend, cannot compile yet .pm packages to single shared libraries and link it together, to be able to use different compile-time optimizations and reduce compile and link times.
rperl can compile restricted typed syntax, using C++ libraries to operate on typed data and source-level type information. 5-20x better run-time on 5-10% of your code. This is a much better approach then with B::CC, which optimizes only on int and num lexical variables. rperl can also treat efficiently typed and no-magic arrayrefs, hashrefs and function arguments and return types. Because it has control over the parser and compiler, which B::CC has not.

The problem is that rperl cannot use the perl AST, the B optree, because it is too tightly bound to the mostly undocumented internal ops and data, which is hard to work with properly from outside perl5. So rperl uses the PPI AST, interestingly called DOCTREE, does the type optimization and translation to C++ on this doctree via Inline::C++ which is btw. called Inline::CPP, not CXX and has nothing to do with the C preprocessor.

If you look at rperl/docs/rperl_grammar.txt you'll see the parsed boundaries with which another compiler can interact with rperl. Basically we could use seperate compilation units (module files or programs) or single blocks with rperl syntax, and maybe later as advanced problem subroutines, but then you need to pass the type information of the arguments and return values back and forth. This is done via standard XS typemaps.

perlcc and a new script called buildcc should be usable as frontend to link seperate compilation units as compiled libraries (shared or static) together and the different compilers should be able to detect each other and pass the work back and forth.

It needs to be seperate because they use different parsers and different compilers, but agree on the same types and the typed calling convention. This calling convention should be specified on the C/C++ level, and for the user it would be nice if the perl-level types also agree. The base for these types need to be perl6 types because these are the only ones in existance today.
B::C can do int and num and accepts str.
rperl requires also object and void, and can already do aggregate types, like arrayrefs and hashrefs of those types.
perl6 adds bool, some more numbers and intermediate and meta object types, which we dont need yet. What needs to be added are sized array declarations to be able to omit run-time bounds checks, but this should be trivial.

rperl already offers compile-time or run-time type-checks and compile-time type optimizations.

p5p and the mop project have no interest in helping with compile-time optimizations (on type and const information), but there is some minor progress in getting support for compile-time readonly-ness into core. My estimation on this is three years, for a simple project which needs about 2 months work. And actually already exists. It might get in by adding proper class, role and method support but I would not bet on it, as the current p5-mop project ignores those possibilities and even within perl6 I see no focus on OO performance. I see no single use oo :closed :final or the accompanying use class pragmas yet. But they implemented the basic type optimizations at least.

How to interface?

Inline is not easy to interface with, notoriously hard to debug, and has several unmaintained bugs and omissions. But it's the best and most transparent way, and the only way to mix perl and rperl code on the source level. The biggest bug is the notorious namespace hack, i.e. the stashes for the generated functions and data are missing. Also argument passing has some serious limitations. Will is using a really crazy hack right now in rperl to push arrayrefs and hashref arguments properly onto the stack. The inline stack macros are too simple, they only work for trivial examples. Inline::CPP does not properly support passing plain arrays and hashes to and from perl5 code, so now rperl uses just scalar types, esp. arrayrefs and hashrefs. We will need to look into lifting those limitations.

Methods

Methods and subroutines need to be seperated at compile-time for rperl/Inline::CPP. In the old days there was a :method attribute idea, which is not compile-time accessible, only at run-time. And there were Devel::Declare based hacks to support class and method keywords, but they never made it into core. Nowadays Devel::Declare is not needed anymore, but there is still no flag to denote methods-only or OO semantics as in perl6 to enable OO compile-time optimizations, such as method dispatch and method inlining. use oo :closed :final. We cannot even declare yet read-only hashes or @ISA arrays properly, as they still clash with restricted hashes and COW. So they need to be implemented seperately, as done in rperl. In rperl method types are done via special type names, to denote the return type and if it's a method or normal subroutine.

Technically the B::C -m switch ("compile module") can be easily used for .pm modules alone already, it compiles to a library not to a standalone. But the problem is to automize the seperation in the code walker. Any module should only compile itself not its dependencies, but the parser already included those before the compiler sees it. The bytecode compiler B::Bytecode already does this seperation by calling out on the require op to the other module, and require observes the .pmc extension, so it loads the other bytecode compiled module. With B::C the walker and loader is different. The split need to be done on the FILE field in nextstate CP's, CV's and GV's, it cannot be done on the current separation, the GV namespaces.

The B::C project timeline is to implement -m and new its seperation immediately after the 1.43 release with the target date of fall 2014. buildcc then collects the deps and drives perlcc, and perlcc is used compiles to modules. Work started in the modul branch in the perl-compiler repo.

-- We, Reini Urban and Will Braswell wrote most parts of this blog post on Saturday January 4, 2014 in Austin in an informal perl11 party celebrating rperl 1.0 beta, with Matt Trout across the street (it was cold and late), but he didn't show up. Maybe he will add some comments later. He supported the idea of an integration somehow.

1 comment

Perl 5 Optimizing Compiler, Part 11: RPerl v1.0 Beta Released!

By willthechill on January 2, 2014 10:49 AM

Howdy Perl World,

After many months of effort, I'm proud to release the beta version of RPerl v1.0, now available for public download at Github:

https://github.com/wbraswell/rperl/

As we lead up to full RPerl v1.0 official release, I'll continue posting regular status updates on Facebook:

https://www.facebook.com/wnbjr

I say this is "beta" because it can only compile "Hello, World" so far! But we have working data-types support, and it is relatively easy fo…