<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Steffen Mueller</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/steffen_mueller/" />
    <link rel="self" type="application/atom+xml" href="http://blogs.perl.org/users/steffen_mueller/atom.xml" />
    <id>tag:blogs.perl.org,2009-11-03:/users/steffen_mueller//31</id>
    <updated>2012-11-13T09:18:11Z</updated>
    <subtitle>blogging about Perl just like everyone else.</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.38</generator>

<entry>
    <title>Announcement for Sereal, a binary data serialization format, finally live!</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/steffen_mueller/2012/11/announcement-for-sereal-a-binary-data-serialization-format-finally-live.html" />
    <id>tag:blogs.perl.org,2012:/users/steffen_mueller//31.4040</id>

    <published>2012-11-13T09:20:00Z</published>
    <updated>2012-11-13T09:18:11Z</updated>

    <summary>It&apos;s been long in the making, but finally, I&apos;ve gotten the Sereal announcement article in a shape that I felt somewhat comfortable with publishing. Designing and implementing Sereal was a true team effort and we really hope to see non-Perl implementations of it in the future. We&apos;re virtually committed to finish the Java decoder at least for our data-warehousing infrastructure. Any help and cooperation is welcome, as are patches to improve the actual text of the specification (which is kind...</summary>
    <author>
        <name>Steffen Mueller</name>
        <uri>http://steffen-mueller.net</uri>
    </author>
    
        <category term="Booking.com" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="CPAN" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Sereal" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="documentation" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl modules" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="programming" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="article" label="article" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="blog" label="blog" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="bookingcom" label="Booking.com" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perlmodules" label="perl-modules" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sereal" label="Sereal" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/steffen_mueller/">
        <![CDATA[<p>It's been long in the making, but finally, I've gotten the <a href="http://blog.booking.com/sereal-a-binary-data-serialization-format.html" title="The Sereal Article">Sereal announcement article</a> in a shape that I felt somewhat comfortable with publishing. Designing and implementing Sereal was a true team effort and we really hope to see non-Perl implementations of it in the future. We're virtually committed to finish the Java decoder at least for our data-warehousing infrastructure. Any help and cooperation is welcome, as are patches to improve the actual text of the specification (which is kind of a weak point still).</p>

<p>By the way, for those who worried about the lack of a comment-system on the <a href="http://blog.booking.com/" title="Booking.com dev blog">Booking.com dev blog</a> before, we've added Disqus-support.</p>

<p>But now, I'm just glad it's out there!</p>
]]>
        

    </content>
</entry>

<entry>
    <title>Booking.com dev blog goes live!</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/steffen_mueller/2012/11/bookingcom-dev-blog-goes-live.html" />
    <id>tag:blogs.perl.org,2012:/users/steffen_mueller//31.4022</id>

    <published>2012-11-05T11:00:00Z</published>
    <updated>2012-11-05T10:14:46Z</updated>

    <summary>I&apos;m proud to echo the announcement that the Booking.com dev blog has just gone live. Quoting the announcement: Booking.com is an online hotel reservations company founded during the hey-days of the dot com era in the 90s. The product offering was initially limited to just the Dutch market. We grew rapidly to expand our offerings to include 240,000+ accommodations in 171 countries used by millions of unique visitors every month - numbers which continue to grow every single day. With...</summary>
    <author>
        <name>Steffen Mueller</name>
        <uri>http://steffen-mueller.net</uri>
    </author>
    
        <category term="Booking.com" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="announcement" label="announcement" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="blog" label="blog" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="bookingcom" label="Booking.com" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/steffen_mueller/">
        <![CDATA[<p>I'm proud to echo the announcement that <a href="http://blog.booking.com" title="The Booking.com dev blog">the Booking.com dev blog</a> has just gone live. Quoting the announcement:</p>

<blockquote><p>Booking.com is an online hotel reservations company founded during the hey-days of the dot com era in the 90s. The product offering was initially limited to just the Dutch market. We grew rapidly to expand our offerings to include 240,000+ accommodations in 171 countries used by millions of unique visitors every month - numbers which continue to grow every single day. With such growth come interesting problems of scalability, design and localisation which we love solving every day.</p></blockquote>

<p>The blog is kicked off with just a <a href="http://blog.booking.com/devel-tracksig-the-signal-handling-blues.html" title="The Signal Handling Blues">quick, humble article</a> of mine on a debugging module that I published after needing the functionality at work. In a given code location, it allows you to find where in the code base the current set of signal handlers were set up. We plan to publish new content regularly and have a few interesting stories already lined up. So stay tuned!</p>
]]>
        

    </content>
</entry>

<entry>
    <title>New Data::Dumper release: 50% faster</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/steffen_mueller/2012/10/new-datadumper-release-50-faster.html" />
    <id>tag:blogs.perl.org,2012:/users/steffen_mueller//31.3918</id>

    <published>2012-10-04T07:35:29Z</published>
    <updated>2012-10-04T11:46:32Z</updated>

    <summary>Data::Dumper version 2.136 was just uploaded to CPAN. It&apos;s been over a year since the latest stable release of the module. Generally, I just synchronize changes to the module from the Perl core to CPAN releases and do so very carefully with lots of development releases. Recently, however, there was a reason to look at Data::Dumper performance critically. A very simple change meant a speed-up of the order of 50% on my test data set. In a nutshell, Data::Dumper used...</summary>
    <author>
        <name>Steffen Mueller</name>
        <uri>http://steffen-mueller.net</uri>
    </author>
    
        <category term="CPAN" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="optimization" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl modules" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="programming" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="cpan" label="CPAN" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="datadumper" label="Data::Dumper" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="optimization" label="optimization" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perlmodules" label="perl-modules" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="programming" label="programming" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/steffen_mueller/">
        <![CDATA[<p><code>Data::Dumper</code> version 2.136 was just uploaded to CPAN. It's been over a year since the latest stable release of the module. Generally, I just synchronize changes to the module from the Perl core to CPAN releases and do so very carefully with lots of development releases.</p>

<p>Recently, however, there was a reason to look at <code>Data::Dumper</code> performance critically. A very simple change meant a speed-up of the order of 50% on my test data set. In a nutshell, <code>Data::Dumper</code> used to track each and every value in the data structure just in case you were going to want to use the <code>Seen</code> functionality. That pertains to a tiny fraction of all <code>Data::Dumper</code> uses and everybody was having to pay for it. For example, if you're using the functional interface (like most), then you wouldn't even ever get access to that information, yet everything was being tracked instead of just things with high reference counts.</p>

<p>With <code>Data::Dumper</code> 2.136, the functional interface has become faster unconditionally. If you use the OO interface, you may be one of the few people that care about the old <code>Seen</code> feature. That means you have to opt in to the new optimization by setting the <code>Sparseseen</code> option of the object. If you do, the <code>Seen</code> hash will be useless. Alternatively, you can globally enable the optimization by setting <code>$Data::Dumper::Sparseseen = 1</code>.</p>

<p>At the same time, the new release ports several bug fixes from the perl core. Unfortunately, some of those changes turned out to be incompatible with older versions of Perl. More specifically, it appears that there is one vstring related change that breaks some vstring tests on 5.8. I don't currently have the time to investigate. If you are affected by this, why don't you step up and help out to restore full compatibility?</p>

<p>A big thanks to my employer, Booking.com, for letting me spend work time on this optimization.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>The physicist&apos;s way out</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/steffen_mueller/2010/09/the-physicists-way-out.html" />
    <id>tag:blogs.perl.org,2010:/users/steffen_mueller//31.1043</id>

    <published>2010-09-23T13:04:10Z</published>
    <updated>2010-09-23T15:14:45Z</updated>

    <summary>Previously, I wrote about modeling the result of repeated benchmarks. It turns out that this isn&apos;t easy. Different effects are important when you benchmark run times of different magnitudes. The previous example ran for about 0.05 seconds. That&apos;s an eternity for computers. Can a simple model cover the result of such a benchmark as well as that of a run time on the order of microseconds? Is it possible to come up with a simple model for any case at...</summary>
    <author>
        <name>Steffen Mueller</name>
        <uri>http://steffen-mueller.net</uri>
    </author>
    
        <category term="benchmarking" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="dumbbench" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="programming" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="benchmarking" label="benchmarking" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="dumbbench" label="dumbbench" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="programming" label="programming" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="statistics" label="statistics" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/steffen_mueller/">
        <![CDATA[<p><a href="http://blogs.perl.org/users/steffen_mueller/2010/09/your-benchmarks-suck.html">Previously</a>, I wrote about modeling the result of repeated benchmarks. It turns out that this isn't easy. Different effects are important when you benchmark run times of different magnitudes. The <a href="http://blogs.perl.org/users/steffen_mueller/2010/09/hard-data-for-timing-distributions.html">previous example</a> ran for about 0.05 seconds. That's an eternity for computers. Can a simple model cover the result of such a benchmark as well as that of a run time on the order of microseconds? Is it possible to come up with a simple model for any case at all? The typical physicist's way of testing a model for data is to write a simulation. It's quite likely a model has some truth if we can generate fake data sets from the model that look like the original, real data. For reference, here is the real data that I want to reproduce (more or less):  </p>

<p><img src="http://steffen-mueller.net/dumbbench/code2_timings.png" alt="slow benchmark" title="" /></p>

<p>So I wrote a toy Monte Carlo simulation. The code is available from the <a href="http://github.com/tsee/dumbbench">dumbbench github repository</a> in the folder <em>simulator</em>. Recall that the main part of the model was that I assume a normally distributed measurement around the true run time with an added set of outliers which are biased to much longer run times. That is what this MC does: For every timing, draw a random number from a normal distribution around the true value and in a fraction of all cases, add an offset (again with an uncertainty) to make it an outlier. With some <a href="http://github.com/tsee/dumbbench/blob/master/simulator/cfg/slow.yml">fine tuning of the parameters</a>, I get as close as this:  </p>

<p><img src="http://steffen-mueller.net/dumbbench/slow_mc.png" alt="slow toy MC" title="" /></p>

<p>Yes, I know it's not the same thing. Humans are excellent at telling things apart that aren't exactly equal. But don't give up on me just yet. What you see in the picture is three lines: The black is mostly covered by the others. It's the raw distribution of times in the Monte Carlo. The red curve is the set of timings that were accepted for calculating the expectation value by the <a href="http://blogs.perl.org/users/steffen_mueller/2010/09/your-benchmarks-suck.html">Dumbbench algorithm</a>. The blue timings were discarded as outliers.  </p>

<p>The simulation reproduces quite a few properties fairly well by construction: The main distribution is in the right spot and has the right width if a bit narrow. The far outliers have about the same distribution. The one striking difference is that in the real data, the main distribution isn't really following a Gaussian. It's skewed. I could try to sample from a different distribution in the simulation, but let's keep the Gaussian for a while since that's an underlying assumption of the analysis. Here's the output of the simulation:</p>

<pre><code>Detected a required 1 iterations per run.
timings:           346
good timings:      319
outlier timings:   27
true time:         5.e-2
before correction: 5.0005e-02 +/- 3.1e-05 (mean: 0.0506163970895954)
after correction:  4.9973e-02 +/- 2.9e-05 (mean: 0.0499712054670846)
</code></pre>

<p><em>correction</em> refers to the outlier rejection done by <em>dumbbench</em>. Clearly, it's not a huge deal in this case. Even the uncorrected mean would have been acceptable since the fraction of outliers is so low. But this was an optimal case. Long benchmark duration, but not so long that I couldn't conveniently accumulate some data. What if I want to benchmark <code>++$i</code> and see if it's any faster than the post-increment <code>$i++</code>? Let's ignore the comparison for now and just look at the data I get from benchmarking the post-increment. I run <em>dumbbench</em> with 100000 timings, skip the dry-run subtraction, and care neither about optimizing the absolute nor relative precision:  </p>

<pre><code>perl -Ilib bin/dumbbench -i 100000 --no-dry-run -a 0 -p 0.99999 --code='$i++' --plot_timings
</code></pre>

<p><img src="http://steffen-mueller.net/dumbbench/fast_bench.png" alt="short benchmark distribution" title="" /></p>

<pre><code>Ran 121550 iterations (21167 outliers).
Rounded run time per iteration: 4.23978e-06 +/- 1.4e-10 (0.0%)
[disregard the errors on this one]
</code></pre>

<p>Woah! Rats, what's that? This graph shows <em>a lot</em> of extra complications. Most prominently, the measurement of the time is done in discrete units. That's not terribly surprising since the computer has a finite frequency. The hi-res walltime clock seems to have a clock tick of about 30ns on this machine. Another thing to note is that my computer can certainly increment Perl variables more than a million times per second, so the timing is significantly off. This is because <em>dumbbench</em> will go through some extra effort to run the benchmark in a clean environment. There is also the overhead of taking the time before and after running the code. This is why normally, <em>dumbbench</em> will subtract a (user-configurable) dry run from the result and propagate the uncertainties for you. On the upside, the main distribution looks (overall) much more Gaussian than in the long-running benchmark. Let's add the discretization effect to our <a href="http://github.com/tsee/dumbbench/blob/master/simulator/cfg/fast.yml">model</a> and try to simulate this data:  </p>

<p><img src="http://steffen-mueller.net/dumbbench/fast_mc.png" alt="fast toy MC" title="" /></p>

<p>Considering the simplicity of what I'm putting in, this isn't all that bad! Let's see how well <em>dumbbench</em> can recover the true time:  </p>

<pre><code>Detected a required 32 iterations per run.
timings:           120000
good timings:      92373
outlier timings:   27627
true time:         4.25e-6
before correction: 4.247000e-06 +/- 8.9e-11 (mean: 4.31880168333532e-06)
after correction:  4.24700e-06 +/- 1.0e-10 (mean: 4.25444989336903e-06)
</code></pre>

<p>Again, the correction isn't important. But in this case, that is mostly due to the discretization of the measurement. If there's a lot of measurements at <em>x</em> and at <em>x+1</em> but none in between, then the median can't get any closer. If you take a look at the mean before and after correction, you can see that the outlier rejection was indeed effective. It significantly reduced the bias of the mean.  </p>

<p>From this little experiment, I deduce that while the simple model is clearly not perfect (remember the skew of the main distribution), it isn't entirely off and works more or less across radically different conditions. Furthermore, using the model to simulate benchmarks with known "true" time, I saw that the analysis produces a good estimate of the input. It's just a toy, but it's served its purpose. I'm more confident in the setup now than before -- even without diving very far into statistics.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>Hard data for timing distributions</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/steffen_mueller/2010/09/hard-data-for-timing-distributions.html" />
    <id>tag:blogs.perl.org,2010:/users/steffen_mueller//31.1036</id>

    <published>2010-09-21T13:47:59Z</published>
    <updated>2010-09-23T15:14:05Z</updated>

    <summary>In the previous article, I wrote about the pitfalls of benchmarking and dumbbench, a tool that is meant to make simple benchmarks more robust. The most significant improvement over time ./cmd is that it actually comes with a moderately well motivated model of the time distribution of invoking ./cmd many times. In data analysis, it is very important to know the underlying statistical distribution of your measurement quantity. I assume that most people remember from high school that you can...</summary>
    <author>
        <name>Steffen Mueller</name>
        <uri>http://steffen-mueller.net</uri>
    </author>
    
        <category term="benchmarking" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="dumbbench" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="programming" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="benchmarking" label="benchmarking" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="dumbbench" label="dumbbench" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="programming" label="programming" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="statistics" label="statistics" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/steffen_mueller/">
        <![CDATA[<p>In the <a href="http://blogs.perl.org/users/steffen_mueller/2010/09/your-benchmarks-suck.html">previous article</a>, I wrote about the pitfalls of benchmarking and <a href="http://search.cpan.org/dist/Dumbbench">dumbbench</a>, a tool that is meant to make simple benchmarks more robust.  </p>

<p>The most significant improvement over <code>time ./cmd</code> is that it actually comes with a moderately well motivated model of the time distribution of invoking <code>./cmd</code> many times. In data analysis, it is very important to know the underlying statistical distribution of your measurement quantity. I assume that most people remember from high school that you can calculate the mean and the standard deviation (<em>"error"</em>) of data and use those two numbers as an estimate of the true value and a statement of the uncertainty. That is a reasonable thing to do if the measurement quantity has a normal (or Gaussian) distribution. Fortunately for everybody, normal distributions are very common because if you add up enough statistics, chances are that the result will be almost Gaussian (<a href="http://en.wikipedia.org/wiki/Central_limit_theorem">Central Limit Theorem</a>).  </p>

<p>Unfortunately for you, when you run a benchmark, you would like it to produce a good answer <em>and</em> finish before the heat death of the universe. That means the friendly Central Limit Theorem doesn't apply cleanly and we have to put a little more thought into the matter to extract more information. In the second half of the previous article, I suggested a simple recipe for analyzing benchmark data that mostly amounted to: The main distribution of timings is Gaussian, but there is a fraction of the data, the outliers, that have significantly increased run time. If we lose those, we can calculate mean and uncertainty. But I didn't show you actual data of a reasonable benchmark run. Let's fix that:  </p>

<p>I <em>dumbbench</em> as follows:</p>

<pre><code>dumbbench -p 0.001 --code='local $i; $i++ for 1..1e6' --code='local $i; $i++ for 1..1.1e6' --code='local $i; $i++ for 1..1.2e6' --plot_timings
</code></pre>

<p>With <code>-p 0.001</code>, I'm saying that I want at most an uncertainty of 0.1%. It runs three benchmarks: code1, 2, and 3. They're all the same except that code 2 runs 10% more iterations than code 1 and code 3 runs 20% more iterations. I would expect the resulting run times to be related in a similar fashion. Here is the output of the run:</p>

<pre><code>  Ran 544 iterations of the command.
  Rejected 53 samples as outliers.
  Rounded run time per iteration: 4.5851e-02 +/- 4.6e-05 (0.1%)
  Ran 346 iterations of the command.
  Rejected 25 samples as outliers.
  Rounded run time per iteration: 5.0195e-02 +/- 5.0e-05 (0.1%)
  Ran 316 iterations of the command.
  Rejected 18 samples as outliers.
  Rounded run time per iteration: 5.4701e-02 +/- 5.4e-05 (0.1%)
</code></pre>

<p>A little calculation shows that code2 takes 9.5% longer than code1 and code3 19.3%. Fair enough. Since I installed the <a href="http://search.cpan.org/dist/SOOT">SOOT</a> module, the <code>--plot_timings</code> option will pop up a bunch of windows with plots for my amusement. Here's the timing distributions for code1 and code2:</p>

<p><img src="http://steffen-mueller.net/dumbbench/code1_timings.png" alt="base benchmark" title="" /></p>

<p><img src="http://steffen-mueller.net/dumbbench/code2_timings.png" alt="base benchmark + 10%" title="" /></p>

<p>Clearly, the two look qualitatively similar, but note the slightly different scale on the x axis. There are good and bad news. The good news are that indeed, there is a main distribution and a bunch of outliers. Clearly, getting rid of the outliers would be a win. The implemented procedure does that fairly well, but it's a bit too strict. The bad news is that the main distribution isn't entirely Gaussian. A better fit may have been a convolution of a Gaussian and an exponential, but I digress.  </p>

<p>Let me use the digression as an excuse for another, <a href="http://blog.plover.com/">MJD-style</a>. brian d foy's comment on the previous entry reminded me of a convenient <em>non-parametric</em> way of comparing samples. The <a href="http://en.wikipedia.org/wiki/Box_plot">box and whisker plot</a>:  </p>

<p><img src="http://steffen-mueller.net/dumbbench/box_plot.png" alt="box plot" title="" /></p>

<p>I don't think I could explain it better than the Wikipedia article linked above, but here's a summary: For each of the three benchmarks, the respective gray box includes exactly half of the data. That is, if you cut the distribution in three chunks: The lowest 25%, the mid 50%, and the upper 25%, then the box includes the mid part. The big black marker in the box is the median of the distribution. The <em>"error bars"</em> (whiskers) stretch from the end of the box (i.e. 25% of data from either side of the median) to the largest (or smallest) datum that is not an outlier. Here, outliers are defined as data that is further away from the box than 1.5 times the height of the box.  </p>

<p>At one glance, we can see that the whiskers are asymmetric and there are a lot of outliers on <em>one</em> side. An effective way for quickly comparing several distributions.  </p>

<p>Back on topic: The above example benchmarked fairly long running code. A lot of times, programmers idly wonder whether some tiny bit of code will be faster than another. This is much harder to benchmark since the shorter the benchmark run, the larger the effect of small disturbances. The best solution is to change your benchmark to take longer, of course. I'll try to write about the pain of benchmarking extremely short-duration pieces of code next time.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>Your benchmarks suck!</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/steffen_mueller/2010/09/your-benchmarks-suck.html" />
    <id>tag:blogs.perl.org,2010:/users/steffen_mueller//31.1021</id>

    <published>2010-09-16T15:14:06Z</published>
    <updated>2010-09-23T15:14:02Z</updated>

    <summary>Virtually every programmer is obsessed with writing FAST code. Curiously, this extends even to those who prefer dynamic languages such as Perl over naturally faster, more low-level languages such as C. Few among us can resist the urge to micro-optimize and a cynical version of myself would claim that the best we can expect is that programmers prove effectiveness of their optimizations with benchmarks. Wait! Proof? What I should have written is that they attempt to demonstrate the effect of...</summary>
    <author>
        <name>Steffen Mueller</name>
        <uri>http://steffen-mueller.net</uri>
    </author>
    
        <category term="benchmarking" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="dumbbench" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="programming" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="benchmarking" label="benchmarking" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="dumbbench" label="dumbbench" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="statistics" label="statistics" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/steffen_mueller/">
        <![CDATA[<p>Virtually every programmer is obsessed with writing <em>FAST</em> code. Curiously, this extends even to those who prefer dynamic languages such as Perl over naturally faster, more low-level languages such as C. Few among us can resist the urge to micro-optimize and a cynical version of myself would claim that the best we can expect is that programmers prove effectiveness of their optimizations with benchmarks.  </p>

<p>Wait!  </p>

<p>Proof? What I should have written is that they attempt to demonstrate the effect of their optimization by timing it versus another variant. Arriving at a resilient conclusion from a benchmark is hard. It doesn't only take plenty of CPU time, it also takes plenty of brain cycles and experience. People will often publish the result of a simple</p>

<pre><code>$ time ./somecommand --options
  real    0m2.005s
  user    0m2.000s
  sys     0m0.005s

$ time ./somecommand --otheroptions
  real    0m3.005s
  user    0m3.000s
  sys     0m0.005s
</code></pre>

<p>and even if they don't draw a conclusion themselves, they are potentially misleading others. Unfortunately, this situation isn't easily fixable. People (usually) have neither the persistence nor the expertise to do much better. Since this is a pet-peeve of mine, I tried to create an almost-drop-in replacement for "time" in the above incantation that should, on average, produce more conclusive results. It's called <em>dumbbench</em> and is available on <a href="http://github.com/tsee/dumbbench">github</a> only. I claim neither completeness nor correctness.</p>

<p>With <em>dumbbench</em>, you trade extra CPU cycles for a statement of the uncertainty on the result and some robustness of the result itself. It doesn't fundamentally solve the problem that in all likeliness <em>your benchmark doesn't matter</em>. You now do:</p>

<pre><code>$ dumbbench -- ./some-command --options
  Ran 23 iterations of the command.
  Rejected 3 samples as outliers.
  Rounded run time per iteration: 9.519e-01 +/- 3.7e-03 (0.4%)
</code></pre>

<p>Okay, I admit this is harder to read than the original, but not much. It ran the benchmark 23 times, did some statistics with the results, decided that three of the runs were bad, and then arrived at the conclusion that your code took 0.95 seconds to run. The uncertainty on that measurement is only 0.4%.  </p>

<p>Even if you don't care about the details, rest assured that this measurement is likely more reliable <em>and</em> it will give others more clues how to interpret your results.</p>

<p>The following essay is taken from the <em>dumbbench</em> documentation and goes into more detail why benchmarks suck and how it tries to work around the inevitable. <em>How it works and why it doesn't...</em></p>

<h1>Why it doesn't work and why we try regardless</h1>

<p>Recall that the goal is to obtain a reliable estimate of the run-time of
a certain operation or command. Now, please realize that this is impossible
since the run-time of an operation may depend on many things that can change rapidly:
Modern CPUs change their frequency dynamically depending on load. CPU caches may be
invalidated at odd moments and page faults provide less fine-grained distration.
Naturally, OS kernels will do weird things just to spite you. It's almost hopeless.  </p>

<p>Since people (you, I, everybody!) insist on benchmarking anyway, this is a best-effort
at estimating the run-time. Naturally, it includes estimating the uncertainty of the
run time. This is extremely important for comparing multiple benchmarks and that
is usually the ultimate goal. In order to get an estimate of the expectation value
and its uncertainty, we need a model of the underlying distribution:  </p>

<h1>A model for timing results</h1>

<p>Let's take a step back and think about how the run-time of multiple
invocations of the same code will be distributed. Having a qualitative
idea what the distribution of many (<strong>MANY</strong>) measurements looks like is
extremely important for estimating the expectation value and uncertainty
from a sample of few measurements.  </p>

<p>In a perfect, deterministic, single-tasking computer, we will get N times the
exact same timing. In the real world, there are at least a million ways that
this assumption is broken on a small scale. For each run, the load of the
computer will be slightly different. The content of main memory and CPU
caches may differ. All of these small effects will make a given run a tiny
bit slower or faster than any other. Thankfully, this is a case where statistics (more precisely
the Central Limit Theorem) provides us with the <em>qualitative</em> result: The
measurements will be normally distributed (i.e. following a Gaussian
distribution) around some expectation value (which happens to be the mean in this case).
Good. Unfortunately, benchmarks are more evil than that. In addition to the small-scale
effects that smear the result, there are things that (at the given run time of the benchmark)
may be large enough to cause a large jump in run time. Assuming these are
comparatively rare and typically cause extraordinarily long run-times (as opposed to
extraordinarily low run-times), we arrive at an overall model of
having a central, smoothish normal distribution with a few outliers towards
long run-times.  </p>

<p>So in this model, if we perform <em>N</em> measurements, almost all <em>N</em> times
will be close to the expectation value and a fraction will be significantly higher.
This is troublesome because the outliers create a bias in the uncertainty
estimation and the asymmetry of the overall distribution will bias a simple
calculation of the mean.  </p>

<p>What we would like to report to the user is the mean and uncertainty
of the main distribution while ignoring the outliers.  </p>

<p>Before I go into the details of how we can account for the various complications, let me show you an example of a benchmark result that defies all attempts at automatically arriving at a quantitative result. You know. Just so you don't imagine you're safe if you follow my advice!</p>

<p><img src="http://steffen-mueller.net/tmp/aaarg.png" alt="Benchmark, horribly gone wrong" title="" /></p>

<p>In this example, you can see several disjoint distributions, each with its own bit of jitter around it. Possibly, the differences are caused by page faults or CPU frequency changes. I can't tell and that's exactly the point of the example because I'd wager that neither can you!  </p>

<h2>A robust estimation of the expectation value</h2>

<p>Given the previously discussed model, we estimate the expectation value
with the following algorithm:  </p>

<ol>
<li><p>Calculate the median of the whole distribution.
The median is a fairly robust estimator of the expectation value
with respect to outliers (assuming they're comparatively rare).</p></li>
<li><p>Calculate the median-absolute-deviation from the whole distribution
(MAD, see wikipedia). The MAD needs rescaling to become a measure
of variability. The MAD will be our initial guess for an uncertainty.
Like the median, it is quite robust against outliers.</p></li>
<li><p>We use the median and MAD to remove the tails of our distribution.
All timings that deviate from the median by more than <em>X</em> times the MAD
are rejected. This measure should cut outliers without introducing
much bias both in symmetric and asymmetric source distributions.  </p>

<p>An alternative would be to use an ordinary truncated mean (that is
the mean of all timings while disregarding the <em>N</em> largest and <em>N</em>
smallest results). But the truncated mean can produce a biased result
in asymmetric source distributions. The resulting expectation value
would be artificially increased.</p>

<p>In summary: Using the median as the initial guess for the expectation value and the
MAD as the guess for the variability keeps the bias down in the general case.</p></li>
<li><p>Finally, the use the mean of the truncated distribution as the expectation
value and the MAD of the truncated distribution as a measure of variability.
To get the uncertainty on the expectation value, we take <em>MAD / sqrt(N)</em> where
<em>N</em> is the number of remaining measurements.</p></li>
</ol>

<h2>Conclusion</h2>

<p>I hope I could convince you that interpreting less sophisticated benchmarks
is a dangerous if not futile exercise. The reason this module exists is
that not everybody is willing to go through such contortions to arrive
at a reliable conclusion, but everybody loves benchmarking. So let's at least
get the basics right.  Do not compare raw timings of meaningless benchmarks but
robust estimates of the run time of meaningless benchmarks instead.</p>

<h2>Disclaimer</h2>

<p>This whole rant (and writing the program) was inspired by a recent thread in a certain mailing list. Neither the title nor the content of this post are intended as a slight to anybody involved in the discussion. I'm simply venting long-standing frustration.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>Tiny vim convenience hack</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/steffen_mueller/2010/08/tiny-vim-convenience-hack.html" />
    <id>tag:blogs.perl.org,2010:/users/steffen_mueller//31.937</id>

    <published>2010-08-24T12:31:25Z</published>
    <updated>2010-08-24T12:47:14Z</updated>

    <summary> Since I maintain a lot of code that wasn&apos;t originally written by me, I am in the unfortunate situation that I get to edit code with many different coding and indentation styles on a regular basis. Reformatting usually isn&apos;t a good idea since not only may the original author object, but the revision control history is more important than code and indentation style. I&apos;m a vi(m) user. Modifying the settings to suit whatever insane tab-compression-indentation scheme some crazy emacs...</summary>
    <author>
        <name>Steffen Mueller</name>
        <uri>http://steffen-mueller.net</uri>
    </author>
    
        <category term="editor" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl modules" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="programming" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="editor" label="editor" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="indentation" label="indentation" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="vi" label="vi" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/steffen_mueller/">
        <![CDATA[<p>
Since I maintain a lot of code that wasn't originally written by me, I am in the unfortunate situation that I get to edit code with many different coding and indentation styles on a regular basis. Reformatting usually isn't a good idea since not only may the original author object, but the revision control history is more important than code and indentation style.
</p>
<p>
I'm a vi(m) user. Modifying the settings to suit whatever insane tab-compression-indentation scheme some crazy emacs user decided on for individual documents is severely annoying. My editor should do this for me. If I was using Padre for my daily work, it <b>would</b> do this for me automatically. Today, finally, I sat down and wrangled vi until it would auto-detect the indentation style of the current document. More precisely, <a href="http://search.cpan.org/perldoc?Text::FindIndent">Text::FindIndent</a> does (version 0.09 was just released and is required). Install the module, include this in your <tt>.vimrc</tt> on one line:
</p>
<pre><code class="prettyprint">
map &lt;F5&gt; &lt;Esc&gt; :perl use Text::FindIndent;VIM::DoCommand($_) for Text::FindIndent-&gt;to_vim_commands(join "\n", $curbuf-&gt;Get(1..$curbuf-&gt;Count()));&lt;CR&gt;</code></pre>
<p>Now, open up somebody else's source code that uses some crazy indentation scheme. Hit F5. Your vi settings have now been marvelously modified to produce the same broken indentation as whatever you opened for editing.</p>]]>
        
    </content>
</entry>

<entry>
    <title>CPAN Testers and giant library</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/steffen_mueller/2010/04/cpan-testers-and-giant-library.html" />
    <id>tag:blogs.perl.org,2010:/users/steffen_mueller//31.479</id>

    <published>2010-04-14T14:32:41Z</published>
    <updated>2010-08-24T12:47:41Z</updated>

    <summary>Dear lazyweb, I am about to upload a new Alien:: distribution that downloads, builds, and installs a very, very large library. The installation of this Alien:: distribution occupies about 240MB on my laptop and compile times are huge even on my fast computer. Is there a way to flag a distribution as unsuitable for CPAN testers? I&apos;d rather not abuse the volunteer infrastructure by having them compile a library over night....</summary>
    <author>
        <name>Steffen Mueller</name>
        <uri>http://steffen-mueller.net</uri>
    </author>
    
        <category term="cpan testers" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl modules" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="alien" label="alien" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="cpantesters" label="cpan-testers" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perlmodules" label="perl-modules" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/steffen_mueller/">
        <![CDATA[<p>Dear lazyweb,</p>

<p>I am about to upload a new Alien:: distribution that downloads, builds, and installs a very, very large library. The installation of this Alien:: distribution occupies about 240MB on my laptop and compile times are huge even on my fast computer.</p>

<p>Is there a way to flag a distribution as unsuitable for CPAN testers? I'd rather not abuse the volunteer infrastructure by having them compile a library over night.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>XS bits: Overloaded interfaces</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/steffen_mueller/2010/04/xs-bits-overloaded-interfaces.html" />
    <id>tag:blogs.perl.org,2010:/users/steffen_mueller//31.442</id>

    <published>2010-04-05T08:37:18Z</published>
    <updated>2010-08-24T12:49:01Z</updated>

    <summary> When writing Perl, people often create hybrid interfaces that accept either a reference to an array or hash, a string, or a reference to a string. The Perl code to do appropriate conversion behind the scenes is usually trivial. Some even use this to overload their interface to do something entirely unrelated depending on the type passed in. However much one might loathe such interfaces, when replacing Perl code with XS, one usually has to reproduce the properties of...</summary>
    <author>
        <name>Steffen Mueller</name>
        <uri>http://steffen-mueller.net</uri>
    </author>
    
        <category term="documentation" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="xs" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="xs-bits" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="documentation" label="documentation" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="xs" label="xs" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="xsbits" label="xs-bits" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/steffen_mueller/">
        <![CDATA[<p>
When writing Perl, people often create hybrid interfaces that accept either a reference to an array or hash, a string, or a reference to a string. The Perl code to do appropriate conversion behind the scenes is usually trivial. Some even use this to overload their interface to do something entirely unrelated depending on the type passed in. However much one might loathe such interfaces, when replacing Perl code with XS, one usually has to reproduce the properties of the original. That is what this entry is about.
</p>
<p>
A rather reasonable example of such a hybrid interface is <a href="http://search.cpan.org/perldoc?PPI::Document"><code class="prettyprint">PPI::Document</code></a> whose constructor accepts either a string (interpreted as a file name) or a reference to a scalar (interpreted as a reference to a scalar containing the code as a string).
While (different) named arguments would have been clearer for a casual reader of the resulting code, this case of an overloaded interface is a generally reasonable optimization.
</p>
<p>
The straightforward (and least error prone) way to provide an overloaded interface is to keep the interface in Perl and just call XSUBs from there using a simpler, more XS-friendly interface. If I was replacing said constructor of <code class="prettyprint">PPI::Document</code>, I could do something like the following pseudo-code:
</p>
<pre><code class="prettyprint">package PPI::Document;

sub new {
  my $class = shift;
  my $source = shift;
  if (not ref($source)) {
    return _xs_new_from_file($source);
  }
  elsif (ref($source) and ref($source) eq 'SCALAR') {
    return _xs_new_from_string($source);
  }
  else {
    croak("Huh?");
  }
}
</code></pre>

<p>
But if the code in question is in a tight loop already or you are a tad crazy, you may want to have this logic in XS as well. This is one way to do it:
</p>
<pre><code class="prettyprint">void
new(class, source)
    SV* class;
    SV* source;
  INIT:
    SV* inner;
  PPCODE:
    if (!<a href="http://perldoc.perl.org/perlapi.html#SvROK">SvROK</a>(source))
      <a href="http://perldoc.perl.org/perlapi.html#mXPUSHs">mXPUSHs</a>( _new_document_from_file(class, source) );
    else {
      inner = <a href="http://perldoc.perl.org/perlapi.html#SvRV">SvRV</a>(source);
      if (<a href="http://perldoc.perl.org/perlapi.html#SvTYPE">SvTYPE</a>(inner) &lt;= SVt_PVMG)
        <a href="http://perldoc.perl.org/perlapi.html#mXPUSHs">mXPUSHs</a>( _new_document_from_string(class, <a href="http://perldoc.perl.org/perlapi.html#SvRV">SvRV</a>(source)) );
      else
        croak("Huh?");
    }
</code></pre>
<p>
I'll pull that apart in detail for an XS beginner in a moment. The key bits for the interface are
<code class="prettyprint">!SvROK(source)</code>, which tests whether the source SV is a reference
at all, and <code class="prettyprint">SvTYPE(inner) &lt;= SVt_PVMG</code>, which ensures that the dereferenced SV is a scalar
(and not an array, etc.).
</p>
<p>
While the test for an SV being a reference is fairly common and simple, the test for being a scalar
reference is slightly more obscure. It grabs the type (enum) of the SV using <code class="prettyprint">SvTYPE()</code> 
and checks whether the type is smaller or equal to <code class="prettyprint">SVt_PVMG</code>. <code class="prettyprint">SVt_PVMG</code> indicates
a scalar with <i>magic</i> attached. The reason we're using that for less-than-or-equal comparison
lies in <a href="http://search.cpan.org/dist/illguts/index.html#svtypes">the order of the SV types</a>: 
All SV types below SVt_PVMG happen to be scalars. Above, you'll find more complicated things
such as arrays, hashes, code references, etc. Using this construct, you could easily add cases
that test for array or hash references by comparing (equality!) with 
<code class="prettyprint">SVt_PVAV</code> or <code class="prettyprint">SVt_PVHV</code> respectively.
</p>
<p>
Now, the example punts on one bit of the <code class="prettyprint">PPI::Document-&gt;new()</code> interface:
You can call <code class="prettyprint">new()</code> without arguments to receive an empty document.
Optional parameters to an XSUB aren't particularly complicated once you understood
how parameters are actually passed inside perl. But this is for another post.
</p>
<p>
Here are a few random notes that may or may not help XS beginners understand
the code:
</p>
<ul>
  <li>
    The XSUB is declared void and the actual C code is inside an XS section named
    <a href="http://perldoc.perl.org/perlxs.html#The-PPCODE:-Keyword"><code class="prettyprint">PPCODE</code></a>.
    This tells the XSUB compiler that we will (if necessary) manage
    returning values via the argument stack ourselves.
  </li>
  <li>
    This is done with the <a href="http://perldoc.perl.org/perlapi.html#mXPUSHs"><code class="prettyprint">mXPUSHs()</code></a>
    macro which takes an SV (which is assumed to be
    returned by the <code class="prettyprint">_new_document_from_*</code> functions) and pushes it on top of the argument stack.
    The <code class="prettyprint">X</code> indicates that it will extend the size of the stack if necessary. The
    <code class="prettyprint">s</code> suffix indicates that we're returning a pre-manufactured SV. The <code class="prettyprint">n</code>
    prefix means that the macro will mortalize the SV. This is roughly equivalent to marking
    it as a temporary and necessary for all elements of the argument stack.
    I'm going through all bits of this macro because there are a ton of variants in the API
    which become moderately obvious once you understood the naming conventions.
  </li>
</ul>
]]>
        
    </content>
</entry>

<entry>
    <title>XS bits: Which class is this object blessed into?</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/steffen_mueller/2010/04/xs-bits-which-class-is-this-object-blessed-into.html" />
    <id>tag:blogs.perl.org,2010:/users/steffen_mueller//31.434</id>

    <published>2010-04-03T11:58:33Z</published>
    <updated>2010-08-24T12:49:31Z</updated>

    <summary> So it turns out I&apos;m not really good at this blogging thing. It&apos;s not that I feel I have nothing worth writing about, but that my time is usually better spent on other things. Instead of pretending to myself that I will eventually write long rants (and then effectively writing nothing), I&apos;ll try to post occasional bits of code that I found useful and not particularly obvious. I&apos;ve been writing a lot of XS lately and the somewhat incomplete...</summary>
    <author>
        <name>Steffen Mueller</name>
        <uri>http://steffen-mueller.net</uri>
    </author>
    
        <category term="documentation" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="xs" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="xs-bits" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="documentation" label="documentation" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="xs" label="xs" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="xsbits" label="xs-bits" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/steffen_mueller/">
        <![CDATA[<p>
So it turns out I'm not really good at this blogging thing. It's not that I feel I have nothing worth writing about, but that my time is usually better spent on other things. Instead of pretending to myself that I will eventually write long rants (and then effectively writing nothing), I'll try to post occasional bits of code that I found useful and not particularly obvious. I've been writing a lot of XS lately and the somewhat incomplete or at least inaccessible XS / perl API documentation could do with a little more cookbook style. Furthermore, when writing C/XS code that uses the perl API, I find myself trying to do the same things that are trivial in Perl and taking much, much longer to actually succeed. Thus, I'll start out with a bit of perl API.
</p>
<p>
Today's Not-So-Frequently-Encountered-XS-Problem:<br/><b>Which class is this object blessed into?</b>
</p>
<p>
If you wonder why this isn't a moderately frequent issue, then consider that 90% of the time, you really want to know whether the given object is blessed into a class that is derived from some class. This related issue is simple, using
<pre><code class="prettyprint">bool    <a href="http://perldoc.perl.org/perlapi.html#sv_derived_from">sv_derived_from</a>(SV* sv, const char* name)</code></pre>. There is no builtin function/macro to give you the class name itself. You can get it with:
</p>
<pre><code class="prettyprint">const char* className = <a href="http://perldoc.perl.org/perlapi.html#HvNAME">HvNAME</a>(<a href="http://perldoc.perl.org/perlapi.html#SvSTASH">SvSTASH</a>(<a href="http://perldoc.perl.org/perlapi.html#SvRV">SvRV</a>(thePerlObject)));</code></pre>
<p>
This works by derefencing the Perl object using <a href="http://perldoc.perl.org/perlapi.html#SvRV"><code>SvRV</code></a>, then fetching the symbol table hash that the STASH pointer in the inner SV points to (<a href="http://perldoc.perl.org/perlapi.html#SvSTASH"><code>SvSTASH</code></a>), and then using <a href="http://perldoc.perl.org/perlapi.html#HvNAME"><code>HvNAME</code></a> to get the stash's name.
</p>
<p>
I guess it's simple enough to figure out when you read the code, but it wasn't obvious to me when I wrote it, err, asked better programmers for help with it.
</p>
<p>PS: <strike>Is there a web site that has the perlapi document in HTML form and HTML anchors for each function/macro to link to?</strike></p>
<p><i>Update</i>: Thanks to Christians comment, I added the appropriate links to documentation</p>]]>
        
    </content>
</entry>

<entry>
    <title>LHC @ 7 TeV</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/steffen_mueller/2010/03/lhc-7-tev.html" />
    <id>tag:blogs.perl.org,2010:/users/steffen_mueller//31.418</id>

    <published>2010-03-30T07:19:36Z</published>
    <updated>2010-08-24T12:49:55Z</updated>

    <summary>There is a live webcast about the ramp up to 7 TeV collisions right now. Head over there!...</summary>
    <author>
        <name>Steffen Mueller</name>
        <uri>http://steffen-mueller.net</uri>
    </author>
    
    <category term="lhc" label="LHC" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="physics" label="physics" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/steffen_mueller/">
        <![CDATA[There is a live webcast about the ramp up to 7 TeV collisions <strong>right now</strong>. Head <a href="http://webcast.cern.ch/lhcfirstphysics">over there</a>!]]>
        
    </content>
</entry>

<entry>
    <title>Astronomy with libnova</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/steffen_mueller/2009/12/astronomy-with-libnova.html" />
    <id>tag:blogs.perl.org,2009:/users/steffen_mueller//31.99</id>

    <published>2009-12-13T15:57:41Z</published>
    <updated>2010-08-24T12:50:15Z</updated>

    <summary>libnova is a Celestial Mechanics, Astrometry and Astrodynamics Library written in C. Just yesterday, I uploaded an inital, thin XS wrapper to CPAN as Astro::Nova, so we can use it from Perl. Here&apos;s a simple example that calculates the current moon rise, transit and set times at my home in Karlsruhe using Astro::Nova. It&apos;s quite similar to the equivalent example of the C version. use strict; use warnings; use Astro::Nova qw/:all/; my $observer = Astro::Nova::LnLatPosn-&gt;new(); # observer location: Karlsruhe, for...</summary>
    <author>
        <name>Steffen Mueller</name>
        <uri>http://steffen-mueller.net</uri>
    </author>
    
        <category term="astronomy" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="libnova" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="perl modules" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="astronomy" label="astronomy" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="libnova" label="libnova" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="modules" label="modules" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/steffen_mueller/">
        <![CDATA[<p><a href="http://libnova.sourceforge.net/index.html">libnova</a> is a <em>Celestial Mechanics, Astrometry and Astrodynamics Library</em> written in C. Just yesterday, I uploaded an inital, thin XS wrapper to CPAN as <a href="http://search.cpan.org/dist/Astro-Nova">Astro::Nova</a>, so we can use it from Perl.
</p>
<p>
Here's a simple example that calculates the current moon rise, transit and set times at my home in Karlsruhe using <em>Astro::Nova</em>. It's quite similar to the equivalent example of the C version.
</p>
<p>
<pre><code class="prettyprint">use strict;
use warnings;
use Astro::Nova qw/:all/;

my $observer = Astro::Nova::LnLatPosn->new();
# observer location: Karlsruhe, for rst 
$observer->set_lat( Astro::Nova::DMS->from_string("49°00' N")->to_degrees );
$observer->set_lng( Astro::Nova::DMS->from_string("8°23' E")->to_degrees );

my $now = get_julian_from_sys(); # current julian date

print "Current Julian Day: $now\n";

my $moon_from_earth = get_lunar_geo_posn($now, 0);

my ($moonx, $moony, $moonz) = ($moon_from_earth->get_X, $moon_from_earth->get_Y, $moon_from_earth->get_Z);
printf("Moon is at (%.02fkm, %.02fkm, %.02fkm)\n", $moonx, $moony, $moonz);
printf("Moon distance: %.02fkm\n", get_lunar_earth_dist($now));


my $moon_lnglat = get_lunar_ecl_coords($now, 0);
print "Moon at:\n", $moon_lnglat->as_ascii();

my $moon_equatorial = get_lunar_equ_coords($now);

my $moon_fraction = get_lunar_disk($now);
print "Current moon fraction: $moon_fraction\n";
print "Moon phase " . get_lunar_phase($now) . "\n";

my ($status, $moon_rst) = get_lunar_rst($now, $observer);
if ($status == 1) {
  print "Moon is circumpolar\n";
}
else {
  print "Rise time:\n",    get_local_date( $moon_rst->get_rise() )->as_ascii(), "\n";
  print "Transit time:\n", get_local_date( $moon_rst->get_transit() )->as_ascii(), "\n";
  print "Set time:\n",     get_local_date( $moon_rst->get_set() )->as_ascii(), "\n";
}
</code></pre>
</p>
<p>
The interface is mostly still very C-ish and that's unlikely to change entirely, but the various containers will probably gain a few more convenience and conversion methods.
</p>
<p>
Next piece of the puzzle: <a href="http://search.cpan.org/dist/Astro-Hipparcos">a module for reading a star catalog</a>. Astronomy is fun.</p>]]>
        
    </content>
</entry>

</feed>
