Adventures in Debugging C/XS

... or Why A Good Perl Developer Is Not Automatically A Good C Developer, the Story of C Programming via Google.

My tests failed, but only sometimes. I was building an XS module to interface with a C wrapper around a C++ library (wrapper unnecessary? probably). make test was failing with exit code 11. Some quick searching revealed that I had an intermittent segfault. Calling a function as_xml would fail with a SEGV in strlen(). This only happened in perl after as_xml when perl was making a SV out of the return value. This also only mainly happened during make test. Doing prove myself would succeed 19 times out of 20, where make test would fail 19 times out of 20. Worse, my C test program would never fail at all.

I changed everything I could think of: Using a debugging perl and keeping debug symbols in my C library and XS module made the failures happen less frequently (making debugging ever more frustrating). perlbrew was a big help here, letting me switch between different versions of perl, debugging perl, threaded perl, and combinations thereof.

After playing with GDB (once again succeeding 19 times out of 20), I gave up and searched the web. I found the same mailing list thread multiple times, and read it multiple times, not coming up with anything that was relevant to my situation.

Until I read the thread again after another frustrating half-hour with GDB: I had forgotten to put a prototype in the .h file, causing the char* pointer being returned to be treated as an int. On my 64-bit system, this causes segfaults. The compiler was warning me of this, "warning: initialization makes pointer from integer without a cast", but I didn't understand the warning (and the web was not helpful on that one).

Adding the function prototype to the C wrapper header and recompiling fixed the problem.

And that is why I need to learn a lot more about C. Function prototypes? Header files? Why are those necessary (I'm asking rhetorically, of course)? Heap and stack? Might as well be herp and derp.

Lesson reinforced: Depth of knowledge does not equal breadth of knowledge.

Also, having IRC at work might have saved me a few hours of hassle.

WebGUI 8 Status Report

A major milestone in WebGUI 8 development was reached this week: A dry-run of the WebGUI 8 upgrade was successfully run against the plainblack.com database. This means the only thing remaining from releasing an alpha 8.0.0 is updating all the custom code on http://plainblack.com and http://webgui.org. As always, plainblack.com and webgui.org will be the first sites running the latest bleeding-edge version of WebGUI (unless one of you wants to beat me to the punch).

This month, I also gave a presentation to Madison.PM about building applications in WebGUI 8, a quick introduction to Assets and an overview of the most important changes to how they work. The slides are available at http://preaction.github.com/ and the code samples are linked at the end.

On an unrelated topic, I really enjoyed using S5 to build my slides, SHJS to highlight the code inside, and Github Pages to host the whole thing. I plan on doing the same for all my presentations: They look good, readable without a special program, editable without a special program, anyone can fork and update my presentations, and they're served by a nice, fast, free host.

CHI Saves The Day

The Server Is Down

  1. No it isn't, I didn't get paged.
  2. Wait a minute, why didn't I get paged?
  3. FUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU--
  4. CHALLENGE ACCEPTED

Diagnosis

The client reported that the site sometimes took more than a minute to load. Doesn't respond very slowly to me, and the pager is only primed to ping me if there is a sustained downtime (hiccups are not something I want to wake up for every night at 3:00am).

Strangely, load hovered around 7 most of the time, only spiking to 13 every few minutes. With a 16-core processor, this was well within operating parameters, if just a little worrisome. Nothing in the log files.

Oops, now I get a slow page load. Takes 30 seconds to load a page. Refresh again, and the page loads just fine. Clear browser cache, and the page still loads just fine.

top kept MySQL at the top of the CPU list. Not surprising, as this server is the master database server for a two node cluster. So I keep an eye on top as I poll mysql for its process list.

A pattern emerges: The load spikes and server goes unresponsive when this happens:

Screen shot 2011-03-08 at 10.43.37 PM.png

This table shows 12 different processes are trying to update the same cache location (process ID 2-3, 5-8, 10, 12-13, 18, 23, and 26). Because of MyISAM's table-level lock, any request to get from the cache has to wait for 12 REPLACE INTO requests to complete. They've already taken 1 second, if each replace takes 2 seconds, that's 24 seconds of non-responsive website.

These 12 processes all saw that the cache item had expired and are trying to update it. This is called a "cache stampede". Only one of them needs to update the cache, the rest are just wasting resources. Worse, they're doing all the work to update the cache, which is much more expensive than getting the value from the cache. If it's expensive enough, the site goes down hard.

Management

How can we stop the cache stampede? One way is to mildly randomize the actual expiration date when checking if the cache is expired:

sub is_expired {
    my ( $self, $key ) = @_;
    my $expires = $self->get_expires( $key );
    # Randomize the expiration by up to 5% +/-
    # by first removing 5% and then adding 0-10%
    $expires = $expires - ( $expires * 0.05 ) + ( $expires * 0.10 * rand );
    # Compare against now
    return $expires > time;
}

In this very simple case, if you are within 5% of the expiration time, you have a chance to have an expired cache item. The chance grows as time passes, reaching 50% at the actual expiration time, and 100% at 5% past the expiration time.

Rather than add this expiration variance to our custom database cache, I instead opted to move this site over to CHI, which has this protection built-in.

my $cache   = CHI->new( 
                driver              => 'DBI',
                namespace           => 'localhost',
                dbh                 => $dbh,
                expires_variance    => '0.10',
            );

This stops the cache stampede, but we're still hitting the database a lot. Remember we have two web nodes hitting one database node. The fewer database hits we make, the better performance we can get without having to ask for more hardware from the client (which takes time, and forms, and more forms, and meetings, and forms, and more meetings, and probably some forms).

Because this is a distributed system, we need a distributed, synchronized cache. We cannot use memcached, as WebGUI 7.x does not support it (but WebGUI 8 does). So for now we must use the database as our synchronized cache, but what if we put a faster, local cache in front of the slower, synchronized cache?

CHI has an awesome way to do this: Add an l1_cache

my $cache   = CHI->new(
                driver              => 'DBI',
                namespace           => 'localhost',
                dbh                 => $dbh,
                expires_variance    => '0.10',
                l1_cache            => {
                                        driver      => 'FastMmap',
                                        root_dir    => '/tmp/cache',
                                    },
            );

Now we're using FastMmap to share an in-memory cache between our web processes, and if the L1 cache is expired or missing, we look for content from the DBI cache. If that cache is missing or expired, we have a cache miss and have to recompute the value.

Hurdles

I had to install the DB tables myself, which was not difficult, just undocumented (bug report filed). MySQL only allows a 1000-byte key, and the CHI::Driver::DBI tries to create a 600-character key. This is fine in the Latin-1 charset, but MySQL complains if you're using UTF-8 by default.

The driver also tries to create a TEXT field to hold the cache value, but MySQL expects a text field to hold characters in a known character set. After noticing that my cache values were empty, I changed to a LONGBLOB.

The full create table statements are below:

-- primary cache table: chi_<namespace> --
CREATE TABLE IF NOT EXISTS `chi_localhost` (
    `key` VARCHAR(255),
    `value` LONGBLOB,
    PRIMARY KEY ( `key` )
);

-- CHI metacache table --
CREATE TABLE IF NOT EXISTS `chi__CHI_METACACHE` (
    `key` VARCHAR(255),
    `value` LONGBLOB,
    PRIMARY KEY ( `key` )
);

Results

Screen shot 2011-03-08 at 10.20.55 PM.png

The server is stable again! Spikes do not turn into out-of-control loads and unresponsive server. We'll see how things go tomorrow during normal business hours (the peak time for this site), but right now it looks like CHI has saved the day!

What's New in WebGUI 8.0 #5 - Asset Helpers

By far the biggest change we've made in WebGUI 8 is the new Admin Console. Though parts of it may look familiar, it has been completely rewritten from the ground up to be a flexible, extensible, responsive JavaScript application making calls to JSON services in Perl.

I could talk about how to use the admin interface, but I don't think that's why you would read this blog, so instead I'm going to talk about how you can add functionality to it.

Asset Services

Since Assets are the basic unit of both application and content in WebGUI, much of the Admin Console is spent interacting with Assets. It does so by calling out to Asset Helpers.

By default, every asset has a helper to Cut, Copy, Duplicate, Delete, and more. When a helper gets called, it returns a JSON data structure explaining to the Admin Console what to do next.

We can simply show the user a message:

message     => 'The work is done, here's what happened.'
error       => 'Something went wrong.'

Or we can open up new dialogs or tabs to allow the user to give us more data:

openDialog  => '/helper/get_input'
openTab     => '/helper/get_input'

We can let the user know their command is running in a forked process:

forkId      => '...' # GUID for WebGUI::Fork object

We can even load and run any external JS file:

scriptFile  => '/extras/newscript.js',  # Load a new script file
scriptFunc  => 'myFunction',            # Call a function in that script
scriptArgs  => [ "arg1", "arg2", ],     # Pass some arguments to that func

To write an Asset Helper, we inherit from WebGUI::AssetHelper and override the process() method to send back one of the message types from above.

package MyHelper;
use base 'WebGUI::AssetHelper';

sub process {
    my ( $self ) = @_;

    return { error => 'Cry Havoc!' } if !$self->asset->canEdit;

    # Do some work

    return { message => 'Work is done!' };
}

If our Asset Helper needs to get some input from the user, we can open a dialog. Like most everything in WebGUI, Asset Helpers can also have www_ methods.

package MyFormHelper;
use base 'WebGUI::AssetHelper';

sub process {
    my ( $self ) = @_;
    my $url = $self->getUrl( "showForm" );
    return { openDialog => $url };
}

sub www_showForm {
    my ( $self ) = @_;
    my $form    = $self->getForm( 'processForm' ); # WebGUI::FormBuilder
    $form->addField( "text", name => 'why' );
    return $form->toHtml;
}

sub www_processForm {
    my ( $self ) = @_;
    my $input = $self->session->form->get( 'why' ); # input from the form
    return { message => $input }; # Why not?
}

But our asset helpers are not only useful inside of the Admin Console. Because they're all built on a simple JSON API, you can call them from anywhere. For example, the Asset Helper to resize and rotate images could be used by anyone with edit privileges to the Image.

Because we already have these Asset Helpers, the new Asset Manager (now called the Tree view) uses them to perform all of its tasks. This means, again, more code reuse and less code in WebGUI.

Side note: I love deleting code much more than writing it.

Adding Helpers

What would a plugin point be without a way to override what already exists? In our case, if you want another helper to handle the "cut" operation, you can make it happen.

If you have your own asset, you can override the getHelpers method, which returns a hashref of helper descriptions:

package MyAsset;

around getHelpers => sub {
    my ( $orig, $self ) = @_;
    my $helpers = $self->$orig;
    $helpers->{ "cut" } = {
        className   => 'MyCutHelper',
        label       => 'SuperCuts',
    };
    return $helpers;
};

Or if you don't want to edit the asset's code, you could add your helpers to the configuration file:

{
    "assets" : {
        "WebGUI::Asset::Snippet" : {
            "helpers" : {
                "cut" : {
                    "className" : "MyCutHelper",
                    "label"     : "SuperCuts"
                }
            }
        }
    }
}

Side Note: Deep data-structure is deep.

A Helper doesn't have to be its own class, it could be any URL at all:

$helpers->{ "edit" } = {
    url     => './edit',
    label   => 'Edit',
};

So Asset Helpers are the new way to add related tasks to your assets. Come back next time when I introduce WebGUI::FormBuilder.

What's New in WebGUI 8.0 #4 -- CHI Cache

Caching is a tricky business. Having just one kind of cache won't work, because the production environment will greatly determine the most efficient caching system. A distributed production environment would be best-served with a distributed cache. A smaller, single-server environment could use a simple shared memory cache.

Enter Jonathan Swartz's CHI module, the greatest Perl module to provide a unified caching interface. CHI is the DBI of caching: It presents an API, and delegates to CHI::Driver modules to perform the heavy lifting. It provides a layered caching system, allowing you to have a faster, more volatile cache in front of a slower, more persistent cache. It also provides a variable expiration time, preventing a "miss stampede" where all processes try to recompute an expired cache item at the same time.

By integrating CHI cache into WebGUI, we have the ability to provide any caching strategy that CHI can provide. We get Memcached, FastMmap, and DBI drivers (and more drivers can be written).

I wrote a CHI cache driver for WebGUI 7.9 that we've been using on many of our shared hosting servers. The performance increase using FastMmap through CHI over the old Storable+DBI cache module is dramatic: 2-5 times faster with CHI and FastMmap.

Using CHI in WebGUI

The fewer wrappers that WebGUI has around CPAN modules we use, the less code I have to write, and the more features will be available to our users without having to change WebGUI to use them.

To that end, you can write a section of the configuration file that gets passed directly to CHI->new. Some massaging occurs to make sure a DBI cache driver gets the right $dbh, but otherwise you can fully configure CHI directly from the WebGUI config file:

# The new default cache for WebGUI, FastMmap
{
     cache : {
         driver : 'FastMmap',
         root_dir : '/tmp/WebGUICache',
         expires_variance : 0.5
     }
 }

 # Set up a memcached cache with local memory in front
 {
     cache : {
         driver : 'Memcached::libmemcached',
         servers : [ '10.0.0.100:11211', '10.0.0.110:11211' ],
         l1_cache : {
            driver : 'Memory'
         }
     }
 }

When you want to use the cache in your code, you can get a CHI object with $session->cache. CHI's interface is sufficiently simple, with some fun tricks:

my $cache = $session->cache; # as read
my $value = $cache->get('cache_key');
if ( !$value ) {
    $value = compute_value();
    $cache->set( 'cache_key', $value );
}

# Combine get and set with intelligence
my $value = $cache->compute( 'cache_key', \&compute_value );

Future Plans

With a single unified cache that performs well and layers like CHI, we can take our current stow and scratch APIs and move them to the cache. In the case of stow, we remove a redundant API. In the case of scratch, we remove database hits.

We've also been exploring cache-only sessions, instead of updating the session every time a page is requested, updating the cache only, flushing to the database (or not). The fewer DB calls we make per page, the better performance will be.

Special thanks go out to Jonathan Swartz for such a wonderful solution.

Stay tuned for next time when I explore our new Admin Interface. Lots of pretty and screenshots!