perlcc next steps
cPanel uses now the new perl compiler B::C with -O3 and --staticxs with 5.14.4 in production, and I'm outlining the next steps.
Our old compiler (with 5.6.2) needed 2:30 hours for each build, produced ~100 binaries of about 30-50MB size. This sounds a lot but it is not. This is about the size a single perl scripts needs in memory, but a single perl script has to find all the dependent files first, load and parse it, and construct the stashes, the nested hashes of namespaces. All this is not needed at all during run-time, hence perlcc compiled binaries do not need it. The new compiler uses 30 minutes for the same binaries on our build servers, but is not yet using a lot of internal optimizations. It does use a lot of statically initialized strings and data, so the run-time is better than with 5.6, even if 5.6 is still much faster than the 5.14 run-time. But we still don't collect arrays into _independent_comalloc ranges as we did with 5.6, we still don't use link-time nor profile guided optimizations and we still do not optimize single modules seperately.
There are not many shared libraries used in such a compiled binary, just libperl and the used XS shared libraries.
Starting such a single binary is very fast, it's just mapped into memory at once, then the dynamic symbols need to be generated, as B::C cannot yet generate them statically, and perl5 is not really helpful in supporting static symbols, but all other data, strings and ops (perl functions) are generated statically. Then the XS modules are initialized (booted), then some pointers pointing into some shared libraries need to updated, as they pointed to some other pointer at compile-time and then the perl program is started.
perlcc -m
The plan for the next year is to support generating shared modules per
perl package. perlcc -m
compiles a package to a shared library,
buildcc
generates the Makefile.depend
dependencies for each binary,
and then you can use selectively more advanced optimizations per
package. I.e. B::CC or rperl optimized compilations, which should run
2-20x faster than the current B::C compiled packages.
B::CC is not stable enough yet, so you can only use it with some modules, and it currently benefits greatly from type information, for which upstream maintainers have no interest. With rperl it is even more strict, as those modules are not even more strict, as in real programming languages compared to our simple dynamic scripting language, it explicitly disallows certain kinds of dynamic magic or array autovivification, which slows down run-time so much as benchmarked in my YAPC::EU 2013 talk.
So the plan is to compile to several optimized libraries seperately and link them together later. For binaries which fork or which need to be updated pretty often, shared libraries are preferred, for other simple binaries static libraries are preferred as they are loaded much faster. Shared libraries can be updated much easier, and when you fork a binary its shared libraries are shared, there's no seperate copy in memory. This is also one of the main reasons to change our current perl5 unicode implementation from simple and slow perl data structures, to compiled shared libraries, which are just mapped in when needed and can be shared.
Data::Compile
But first some other steps are needed. Before we will create compiled
libraries per perl packages we want to compile only some read-only
datastructures, hashes mainly. The name of this module is
Data::Compile
and will replace our current cdb
databases for
localization data. cdb maps readonly strings to strings, but cannot
store other values than strings and is implemented as normal hashmap.
Our localization data would be much easier to use with stored
coderefs, similar to pre-compiled html templates, which which is a mix
of strings and coderefs. By using certain parts of B::C it is trivial
to store data as shared library, which can be dynaloaded on
demand. Loading such a compiled datastructure of perl data is of
course much faster then loading a btree or colliding hashmap and accessing it at run-time
through some magic tie interface. B::C is very efficient in storing
perl data statically, and only some GV symbols need to be initialized
at boot-time.
Data::Compile
will also replace the overly slow serializers, like
Sereal
, Cpanel::JSON::XS
, Data::MessagePack
or Storable
, which
have to convert binary data from and to perl data. Data::Compile
freezes perl data natively and just thaws it at instance. So loading
the CPAN Metadata file will go from 16 secs to 0.001ms, similar to
CPAN::SQLite
and cpanm
which does those queries at run-time to
some server will not be needed anymore. Updating such a compiled CPAN
Metadata file is estimated to need 2 secs, which needs about the same
time as updating CPAN with CPAN::SQLite. And CPAN::SQLite still has a
locking bug, which prevents multiple cpan sessions, so you are easier
of with cpanm now.
In order to optimize loading of static data, readonly hashes, to
replace cdb
or sqlite
databases or serialized caches we need
another module first:
Perfect::Hash
Currently there's is only
gperf
to
create perfect hashes in C or C++, and then there is also the lesser
known bob jenkins
perfect
executable
to create perfect hashes (with some weird header files) and of course
the cmph library to create perfect
hashes of bigger dictionaries, google size. No single database or
datra-structure can be faster to lookup then perfect hashes for
readonly data, the fastest single-host readonly databases are cdb and
mysql. And perfect hashes beat them by far. Ever wondered why Google
lookups are so fast? Well, they distribute hashes across several
servers with so-called
consistent hashes,
which map strings into buckets of n
servers, and when they insert
and delete entries the remapping (copying to other buckets) is
minimized. But the real trick are minimal perfect hashes, which
minimize lookup times and storage sizes far beyond normal or sparse
hashes or b-trees.
So I created phash
to replace gperf
in the long term, by using
better algorithms to handle any data (gperf
fails to work with
anagrams or weird keys), to create optimally fast C libraries or
optimally small C libraries for fast hash lookups, and even provide
backends to output perfect hashes for several popular programming
languages, like perl (XS), java, ruby, php or python. As C library
you can only store strings as keys, and integers and strings as
values. With those other backends you can store all supported values.
Perfect hashes look differently on small 8-bit machines than on fast
x86_64 machines, for static libraries or for shared libraries with
-fPIC
, for <1000 keys or for >1.000.000 keys, for keys with NUL
characters or not, for 7-bit keys only, for unicode keys, for
case-insensitive key lookup, and much more. Using a high-level
language to analyze the keys to generate a good perfect hash (in this
case perl) is much easier than fixing and enhancing gperf
.
The other main problem is icu
which is a collection of glorified but
moderately efficient hashmaps of unicode tables and the even worse
perl5 implementation of those unicode tables. Encode does it much
better by pre-compiling encoding tables into good shared libraries,
but unicode has much more tables than just encodings. parrot
currently uses icu with a new gperf
generated workaround for missing
tables in icu (the not-existing whitespace class and missing
namealiases which were broken in icu 5.0), and moarvm
came up with
it's own implementation of those tables to workaround icu
deficiencies.
So far I have re-implemented the best algorithms to create and use perfect hashes in pure perl and optimized it with XS functions and I am able to create good C code. I am still testing optimal hash functions and strategies. I need only ~2 seconds to create a perfect hash for 100.000 entries in pure perl, with C libraries this goes down to 0.001 seconds, scaling linearily. The main problem is solved. Compilation with a C compiler and -O3 of such hashes need another second.
E.g. these hashes can be used to replace the constant lookup in ExtUtils::Constants. So I'll look soon into seperate my current ~10 different implementations of perfect hashes into one simple but good enough pure-perl version, which will generate pure-c code, without the need for any external library, and then seperate the others into several packages with external dependencies, like zlib for a fast hardware-assisted crc32 hash function (1 op per word), or libcmph and the various output formatters.
Then Data::Compiled will use Perfect::Hash::XS to store the read-only hashes. And then source code will change to use those hashes instead, and then perlcc -m will be able to compile and link arbitrary perl packages with various type annotations and various compilers, if B::C -O0, -O3, B::CC -O0, -O2 or even rperl.
The next plan is then to create a type inferencer in C, but I'll wait for this until p5p comes to a syntax decision how to annotate return types of functions. If ever.
This work is absolutely amazing. Keep it up.
Thanks for all your hard work on B::C, Reini.