Hailo: A Perl rewrite of MegaHAL

Hinrik and I have been creating a replacement for the well-known MegaHAL conversation simulator in Perl. The result is already on the CPAN as Hailo and the source code is available on Github.

MegaHAL has numerous problems that we sought to solve:

  1. It keeps all of its brain in memory which for our use case of a chatbot on a small IRC channel whose logs had reached around 200,000 lines meant that it was starting to take up around 600 MB of resident memory.
  2. Its tokenizer is implemented with C's ctype.h functions which read things on a byte-by-byte basis. It handles non-ASCII input really badly, especially since it internally normalizes tokens by capitalizing them with toupper() before storing them.
  3. It has a limit of 2^16 word brains, or whatever short happens to be defined as on your system.
  4. It would regularly corrupt its entire brain, especially as it got larger, necessitating a complete reload. I never found out why this was but I think it has something to do with it rarely checking the return values of functions like malloc() and free().

I managed to hack it so that #2 and #3 weren't an issue. But that left the major issue of its memory use & stability unresolved. Hinrik and I started writing a replacement now called Hailo (HAL + failo, see this) which:

  • Is a pluggable Moose-based Markov engine in Pure-Perl.
  • Has pluggable tokenizer, engine and storage backends. The default is to split the input up by words and storing it in SQLite but it also has an in-memory engine and an alternative tokenizer which makes it easy to do things like generate Web 2.0 company names (I've already done so).
  • Hovers at around 45MB resident memory usage where MegaHAL would use around 600MB. Almost all of that memory is being used by Moose and other dependencies which we liberally used.
  • Is much faster than MegaHAL was, we're able to generate around 200 replies per second on a database made up of around 200,000 IRC lines

If you're interested then you can:

Lastly I'd like to highly recommend Moose. This is the first significant thing we've written in Moose and it made the whole progress at least 5x easier than it otherwise would have been. It's really nice when your program has a command-line interface automatically generated from your class definition and you avoid the tedium of manual OO-management.

The downside is that most of the 50MB memory usage can be attributed to Moose & related modules and a cold start of the module can take up to 1 second, but the ease of maintenance is well worth it for this sort of program which is mean to be long running.


Any chance of s/Moose/Mouse/g working for this?

Leave a comment

About Ævar Arnfjörð Bjarmason

user-pic Blogging about anything Perl-related I get up to.