<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Schwern</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/michael_g_schwern/" />
    <link rel="self" type="application/atom+xml" href="http://blogs.perl.org/users/michael_g_schwern/atom.xml" />
    <id>tag:blogs.perl.org,2009-11-03:/users/michael_g_schwern//786</id>
    <updated>2013-05-16T04:04:37Z</updated>
    <subtitle>Please don&apos;t hammer nails with a gun.</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.38</generator>

<entry>
    <title>Blog moved to blog.schwern.net</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/michael_g_schwern/2013/05/blog-moved-to-blogschwernnet.html" />
    <id>tag:blogs.perl.org,2013:/users/michael_g_schwern//786.4681</id>

    <published>2013-05-16T04:03:41Z</published>
    <updated>2013-05-16T04:04:37Z</updated>

    <summary>My blog has moved to blog.schwern.net....</summary>
    <author>
        <name>Michael G Schwern</name>
        <uri>http://schwern.net</uri>
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/michael_g_schwern/">
        <![CDATA[<p>My blog has moved to <a href="http://blog.schwern.net">blog.schwern.net</a>.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>How Not To Highlight Women In Perl</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/michael_g_schwern/2012/07/how-not-to-highlight-women-in-perl.html" />
    <id>tag:blogs.perl.org,2012:/users/michael_g_schwern//786.3473</id>

    <published>2012-07-03T17:24:24Z</published>
    <updated>2012-07-03T18:03:18Z</updated>

    <summary>Here&apos;s the short version: gender anonymity is protection, and in a male-dominated community many women prioritize safety. A machine parsible list of female CPAN authors threatens their anonymity even if they&apos;re not on it. Here&apos;s the even shorter version: highlighting...</summary>
    <author>
        <name>Michael G Schwern</name>
        <uri>http://schwern.net</uri>
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/michael_g_schwern/">
        <![CDATA[<p>Here's the short version: gender anonymity is <em>protection</em>, and in a male-dominated community many women prioritize <em>safety</em>.  A machine parsible list of female CPAN authors threatens their anonymity <em>even if they're not on it</em>.</p>

<p>Here's the even shorter version: highlighting gender is <em>advanced</em> and should not be done lightly.</p>

<p>A module was recently uploaded to CPAN whose aim was to provide a big list of female CPAN authors.  I believe the author had good intentions, or at least nothing more than "I was curious", and has been quite puzzled at the reaction that it's creepy and the requests for it to be deleted.  Fortunately, he voluntarily removed the module when asked.  Unfortunately, because the community is not well versed in gender politics, this sort of thing is likely to happen again.  Here's an opportunity to talk about it so it doesn't.</p>
]]>
        <![CDATA[<p>Yes, I am deliberately not mentioning the module so as not to highlight the data nor embarrass the author further.  No, don't link to it in the comments.  The module itself is not the point and it would be best if it just quietly disappeared.  There is a far larger issue to be discussed here.  And please, commenters, if your comment is going to be about "censorship" or "freedom of speech", just move on.  Honor the The Other Perl Motto ("try it") and let other people try something without being drowned out by the usual noise this sort of post attracts.  If you must, talk about it on your own blog and link to it in the comments.  Thank you for your restraint.</p>

<p>While CPAN gender information may be publicly available, its only available in a form which requires each individual human who wants to use that data to laboriously go through each author name and guess their gender.  This introduces a barrier to entry to doing things with that data.  That barrier cuts both ways.  It prevents people from doing good things on a whim, but it also prevents people from doing <em>awful</em> things on a whim.</p>

<p>Putting gender data together in an easily digestible form eliminates that barrier.  Unfortunately, in a community which is extremely male dominated, folks are going to assume the primary application is to <em>stalk female CPAN authors</em>, particularly in the wake of <a href="http://bits.blogs.nytimes.com/2012/03/30/girls-around-me-ios-app-takes-creepy-to-a-new-level/">the Facebook/Foursquare stalker app</a>.  <em>Whether or not it actually happens</em> that is the perception and emotion and real danger for women.  There's no amount of good intention that changes that. A bit of demographic data is not worth the damage.</p>

<p>If the intention was to help raise awareness of female CPAN authors and encourage more women to join, it will have the opposite effect.  Some will look at it and shrug, but more will see "creeper app".  A lot will see it as an involuntary highlighting of their gender, whether or not the actual process is voluntary.  Done incorrectly, this causes women more problems in the community as people see them less as Perl people and more as women to either be barraged with "women in open source" issues, or taken advantage of.  See also <a href="http://geekfeminism.wikia.com/wiki/Unicorn_Law">The Unicorn Law</a> where every woman in an Open Source community is expected to represent women and can't just be left to code.</p>

<p>This is one of those situations where the more out of whack gender balance is, the more sensitive we have to be to gender.  The less power and representation women have, the more they have to watch out for being taken advantage of.  The first thought is protection, and anonymity is protection.  Conversely, in a more balanced situation, this <em>might</em> have been acceptable, or at least not immediately called for deletion, or just laughed off.  Fix the gender power imbalance, and we can all relax.</p>

<p>There are ways to highlight women correctly, <a href="http://www.flickr.com/groups/whatacomputerscientistlookslike/">This Is What A Computer Scientist Looks Like</a> is one example: voluntary, empowering, and not useful to creepers (don't take this as a challenge).  This is about crafting a message which is welcoming, empowering and safe.  Doing this requires a viewpoint that most of us do not have and cannot imagine but we can learn how to be sensitive to it.  It is <em>advanced</em> and should not be taken lightly, but it should be taken on... but not by uploading a module to CPAN.</p>

<p>If you're a guy, and you're considering something like this, don't just do the usual thing and hack it together alone on a weekend.  Get some people together and talk about it beforehand.  Get some public commentary.  Get some women to look at it, not just one woman ("my girlfriend said it was ok" does not work), particularly those outside the mainstream community.  Most importantly, get women involved <em>and empowered</em> in the project.  Maybe give them the reigns to make the big decisions and you handle the grunt work.  If you want to do a project to empower women, it helps if you're empowering women within the project.</p>
]]>
    </content>
</entry>

<entry>
    <title>Further Reading For The YAPC::NA 2012 Keynote</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/michael_g_schwern/2012/06/yapcna2012-keynote-refs.html" />
    <id>tag:blogs.perl.org,2012:/users/michael_g_schwern//786.3380</id>

    <published>2012-06-13T14:14:23Z</published>
    <updated>2012-06-13T04:20:37Z</updated>

    <summary>This is a placeholder post to house links to further reading from my YAPC::NA 2012 keynote....</summary>
    <author>
        <name>Michael G Schwern</name>
        <uri>http://schwern.net</uri>
    </author>
    
    <category term="yapc" label="yapc" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/michael_g_schwern/">
        <![CDATA[<p>This is a placeholder post to house links to further reading from my <a href="http://yapcna.org/">YAPC::NA 2012</a> keynote.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>Test::Builder2 vs CPAN and How You Can Help</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/michael_g_schwern/2012/02/testbuilder2-vs-cpan-and-how-you-can-help.html" />
    <id>tag:blogs.perl.org,2012:/users/michael_g_schwern//786.2878</id>

    <published>2012-02-28T01:34:18Z</published>
    <updated>2012-02-28T02:18:34Z</updated>

    <summary>Test::Builder2 is now down to two issues. The problem of using Mouse The problem of backwards compatibility The former is complicated, but suffice it to say TB2 cannot rely on Mouse or Moose or Moo. It&apos;s being solved by writing...</summary>
    <author>
        <name>Michael G Schwern</name>
        <uri>http://schwern.net</uri>
    </author>
    
    <category term="tb2" label="TB2" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/michael_g_schwern/">
        <![CDATA[<p>Test::Builder2 is now down to two issues.</p>

<ol>
<li>The problem of using Mouse</li>
<li>The problem of backwards compatibility</li>
</ol>

<p>The former is complicated, but suffice it to say TB2 cannot rely on Mouse or Moose or Moo.  It's being solved by writing an OO compiler, something which will generate accessor methods and roles at build time rather than relying on a runtime compiler.  This should also solve TB2's less than ideal startup time.  It might be <a href="https://github.com/schwern/Mite">Mite</a> or it might be Moo, but the problem is being taken care of.</p>

<p>The second is harder and is what I call "Test::Builder2 vs CPAN".  Because Test::Builder has been around or so long, and so much depends on it indirectly, there's a lot of not entirely documented behavior being relied on.  We've been using CPAN modules as a broad test suite right along for this reason.  TB2 has the potential to seriously break a lot of module's test suites, so it's best to get it as right as possible before stable release.</p>
]]>
        <![CDATA[<p>Andreas, using the incredible <a href="http://analysis.cpantesters.org">analysis.cpantesters.org</a> has provided a list of <a href="https://github.com/schwern/test-more/issues/249">the top 100 modules which are failing because of TB2</a>.  A lot more than I thought at this point honestly.  Worse, I was hoping they were mostly failing for a handful of reasons.  Unfortunately it seems they're unique.</p>

<p>I'm asking for two ways people can help.</p>

<p>First, if you're the author of a module which is failing because of TB2 I ask you to <a href="https://github.com/schwern/test-more/issues/new">please report it to TB2</a> <strong>even if the problem turns out to be your own</strong> (like if you violated encapsulation) and <strong>even if you can fix it on your own</strong>.  We want to know how Test::Builder is being used.</p>

<p>More importantly, if your CPAN module is failing then probably other software we don't know about will fail.  We want to know about these problems and be able to decide what to do about them.  So please, report any trouble you have with TB2.  Also report any successes you have with TB2.  Really just talk to us about how TB2 is effecting you.</p>

<p>Second, I'd like to crowd source some bug analysis.  Each failure in <a href="https://github.com/schwern/test-more/issues/249">the list</a> needs to be examined to see <em>what</em> is failing and then <em>why</em> it is failing before anything can be done about it.  It would be a great help if this could be tackled by people taking a handful of them.</p>

<p>For each failing release...</p>

<ul>
<li>Make sure nobody's already done it.</li>
<li>Create a <a href="https://github.com/schwern/test-more/issues/new">new issue</a> in the TB2 queue</li>
<li>Put the release in the title, ex: "XML-LibXML-SAX-ChunkParser-0.00004"</li>
<li>Add the "CPAN Regression" label</li>
<li>Add a link to one illustrative failing test report</li>
<li>Mention <a href="https://github.com/schwern/test-more/issues/249">issue #249</a> in the description</li>
</ul>

<p>If all you do is that, that would be a great help!  I can't stress this enough.</p>

<p>If you'd like to go further...</p>

<ul>
<li>Run the release's tests with Test::More 0.98 and 1.5 to verify the failure.</li>
<li>Post the relevant bits of the failing test in the issue's comments.</li>
</ul>

<p>This provides a list of pre-digested issues for people to examine and determine how to fix.  If you'd like to go even further you can do some of that analysis yourself.</p>

<ul>
<li>Provide an analysis of what caused the failure.</li>
</ul>

<p>If you really want to be a champ...</p>

<ul>
<li>Write a test for TB2 to test the regression.</li>
</ul>

<p>And if you're bucking for my job...</p>

<ul>
<li>Write a patch for TB2 to fix it.</li>
</ul>

<p>There's 100 releases to go through, a lot for me, but not much for a few dozen people who read this blog and do a few.  <a href="https://github.com/schwern/test-more/issues/257">Here's a good example</a> of what the resulting issue looks like.</p>

<p>Thanks crowd!</p>
]]>
    </content>
</entry>

<entry>
    <title>A Real Developer Challenge</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/michael_g_schwern/2012/02/a-real-developer-challenge.html" />
    <id>tag:blogs.perl.org,2012:/users/michael_g_schwern//786.2759</id>

    <published>2012-02-03T23:21:00Z</published>
    <updated>2012-02-04T02:23:28Z</updated>

    <summary>Spotify is having a coding challenge to find &quot;top-notch talent to join our NYC team&quot;. The challenge is to solve the most algorithmic puzzles in four hours... alone. &quot;You may not cooperate with anyone, and you may not publish any...</summary>
    <author>
        <name>Michael G Schwern</name>
        <uri>http://schwern.net</uri>
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/michael_g_schwern/">
        <![CDATA[<p><a href="https://codequest.spotify.com/">Spotify is having a coding challenge </a> to find "top-notch talent to join our NYC team".  The challenge is to solve the most <a href="http://www.spotify.com/us/jobs/tech/">algorithmic puzzles</a> in four hours... alone.  "You may not cooperate with anyone, and you may not publish any discussion of solutions."  What sort of developer will win this competition?  Someone who is quick, dirty, has a mathematical mindset and lucky enough to write something that happens to work for the test data set.  The "rockstar".  Is this somebody you want on your team?  Would you want to maintain their code?</p>

<p>Last year while on contract, the company in question was passing around their coding problem they used to test new hires.  It was pretty typical stuff: give the data going in, the data they want out, and write a little program to do the transform.  They even supplied most of the program, including a test; the prospective hire just needed to write one sort subroutine which could deal with "Low", "Medium" and "High" as well as numbers.</p>

<p>Predictably, this halted all coding in the office for a solid half day while everyone figured out the most clever way to sort the data.  My opus was to observe that the input data was already sorted, so I redefined the <code>shuffle()</code> routine.  The best one was from a co-worker who observed that "Low", "Medium" and "High" sort correctly by their last letter in reverse order.  It was fun for us, but it wasn't very useful.</p>

<p>This is a pretty typical coding problem used to judge potential hires, and it sucks.  All it tells you is the candidate is not completely incompetent.  Why do we keep using them?  They're easy.  They're easy to think up, easy to judge and easy to administer.  They're also the sort of clean, algorithmic problems a stereotypical programmer loves to solve.  Do they have anything to do with detecting a good developer?  No.  Can we fix it?  Yes!</p>
]]>
        <![CDATA[<p>Before this can be fixed, first we have to work out what a project wants in a developer.  What do developers do all day?  We can work this out by reversing every contrived element of our typical programming contest: clean, well defined inputs; clean, well defined expected output; a clear description of the algorithm; pre-existing template code; pre-existing acceptance tests.  When's the last time you were handed a problem like that in the real world?</p>

<p>Instead, we get poorly defined inputs, sample input that has anything to do with reality being a luxury, inputs riddled with mistakes.  Expected behavior and output are vaguely defined.  As a developer we're presented with a clean sheet, no template, no tests, just a blinking cursor and a blank page.</p>

<p>How does one solve the contrived example?  First the specification, inputs and outputs are carefully examined.  If there's any ambiguity it's discussed up front with the person who probably wrote the problem and understands it perfectly.  Then the candidate goes off and writes some code until it passes the tests.  There is little or no interaction with people, everything is handed on a silver platter in unambiguous terms.</p>

<p>How does one solve the real example?  First you have to find somebody who understands the problem, usually not a programmer, and discuss the problem.  Then you drag some samples out of them, converting them into a format you can actually use.  The user probably doesn't know what they really want, so the behavior/output will be ill defined.  Pressed, the user will make up something that you know will be wrong as soon as you show it to them.  Armed with this "information" you hammer it into some sort of algorithm and now need to write some code.</p>

<p>But you don't just hammer out a script.  You need to do it in a way that matches the team's coding style.  You'll probably want to make the meat of it reusable, so you need to write it as a library not just a one-off script.  It'll have to deal with the inevitable bad input, which means good error handling and recovery.  Other people will have to use it, which means good documentation.  It needs tests written in a way which works with whatever integration server the team is using.  And, of course, it should all be checked into version control with well defined and logged commits.</p>

<p>And then, when you've written all that and got it working, you take it back to the user and they tell you it's not what they wanted.  Or they show you some new input that doesn't match what they originally said.  If you're good... and lucky, your code is robust enough to handle it.  If you're not... back to coding with you!  Repeat until dead.</p>

<p>In that light, a good developer is one who works well with others, but also can make a multitude of small, detailed decisions about a problem they know very little about.  They need to take vague requirements and work with a user to turn them into something a computer can do repeatedly.  A good developer looks beyond the immediate requirements and thinks ahead ensuring the code is flexible enough to withstand future change.  A good developer writes code not for themselves, but for everyone else on the team.</p>

<p>So... how do you test all that?  And in less than an hour?  Turns out it's pretty easy with some simple modifications to the classic example.  You keep the same basic algorithmic problem, but you give it to the candidate the way a <em>user</em> would.  It can be as simple as this:</p>

<blockquote>
  <p>Could you sort this data please?  [Excel document attached]</p>
</blockquote>

<p>You can be a bit more clever, feeding the candidate what seems like enough information to do the work but is full of subtle ambiguities, tempting them to take the easy route and just get to coding, weeding out those with the undesirable tendency to program without questioning what they're doing and why, but that's the crux of it.  </p>

<p>Now sit back and see what the candidate does with that. Respond to their questions, but remain in the persona of a user.  What you're looking for here is how they push back.  What sort of questions to do they ask to gather and clarify the requirements?  What sort of assumptions do they make?  How well do they bridge the communications gap between customer and programmer?</p>

<p>Given a blank page to code on, what do they do with that?  Do they write tests and documentation?  Do they write a quick script or a library?  Do they use version control?  Do they ask about your code standards?  Do they use pre-existing libraries?  Do they write to the letter of the requirements or leave some flexibility?  And again, what are their communications like through this process?</p>

<p>Once they've submitted their first solution, change the requirements on them subtly by offering a second set of inputs.  These will be the same but subtly different.  Maybe throw in some Unicode or deliberately malformatted lines.  Maybe change the sort criteria.  Does their code fail gracefully?  Messily?  Silently?  Do they notice the changes?  What shape does the discussion over the change take?  Are they indignant about the changing requirements, do they take them in stride?  How much work does it take them to adapt their code to the new requirements?</p>

<p>This test will take more time than a traditional puzzle test, but it can be administered easily enough over email and doesn't involve more than a few minutes attention at any given time.  A dud developer can sink your team and cause far more damage than good, so it's worth the little bit of extra effort to find out if they can actually do what developers do, and not just solve clever puzzles.</p>
]]>
    </content>
</entry>

<entry>
    <title>True + True == 2</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/michael_g_schwern/2012/02/true-true-2.html" />
    <id>tag:blogs.perl.org,2012:/users/michael_g_schwern//786.2765</id>

    <published>2012-02-03T22:47:28Z</published>
    <updated>2012-02-04T18:39:11Z</updated>

    <summary>I&apos;ve seen lots of new Perl programmers confuse 1 and true before, usually something like this: do_something if boolean_function() == 1; This is not only redundant, but it&apos;s also brittle and redundant. In Perl, it&apos;s very easy for a boolean...</summary>
    <author>
        <name>Michael G Schwern</name>
        <uri>http://schwern.net</uri>
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/michael_g_schwern/">
        <![CDATA[<p>I've seen lots of new Perl programmers confuse 1 and true before, usually something like this:</p>

<pre><code>do_something if boolean_function() == 1;
</code></pre>

<p>This is not only redundant, but it's also brittle and redundant.  In Perl, it's very easy for a boolean function to return something other than one.</p>

<pre><code>sub has_stuff {
    ...
    return scalar @stuff;
}
</code></pre>

<p>This <a href="https://github.com/Perl-Toolchain-Gang/Module-Build/blob/bdb3e896cbecabfbc5b3e8fa6d705246a462107f/lib/Module/Build/Base.pm#L41">code I found today in Module::Build::Base</a> takes the cake.</p>

<pre><code>if ( $self-&gt;check_prereq + $self-&gt;check_autofeatures != 2) {
</code></pre>

<p>(<a href="https://github.com/Perl-Toolchain-Gang/Module-Build/commit/fce7c9ccdf619f70cbe9fb957b626cd420527102">Yes, I fixed it</a>)</p>

<p><strong>UPDATE</strong>:  Whoops, I broke it!  As rsimões pointed out in the comments, the original does not short circuit and my change does.  That was not a bug, both check routines should always run (a consequence of the checks having side effects).  I had to think about it a bit, and the simplest thing I came up with is:</p>

<pre><code>if( grep { !$_ } $self-&gt;check_prereq, $self-&gt;check_autofeatures ) {
</code></pre>

<p>Which goes to show, and I should have known this myself, don't assume the author is an idiot.  There's usually a reason why they did what they did.  Find it before touching the code.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>Help Test::Builder 1.5</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/michael_g_schwern/2011/11/help-testbuilder-15.html" />
    <id>tag:blogs.perl.org,2011:/users/michael_g_schwern//786.2443</id>

    <published>2011-11-14T22:16:17Z</published>
    <updated>2011-11-14T22:37:46Z</updated>

    <summary>To keep myself focused on getting a feature complete Test::Builder 1.5 out this month, I&apos;ve been writing down non-critical tasks rather than doing them myself. Refactorings, documentation and interface fixups. They&apos;re helpfully categorized: Easy are things which are easy. Gardening...</summary>
    <author>
        <name>Michael G Schwern</name>
        <uri>http://schwern.net</uri>
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/michael_g_schwern/">
        <![CDATA[<p>To keep myself focused on getting a feature complete Test::Builder 1.5 out this month, I've been writing down non-critical tasks rather than doing them myself.  Refactorings, documentation and interface fixups.  They're helpfully categorized:</p>

<ul>
<li><a href="https://github.com/schwern/test-more/issues?labels=Easy%2CTest-Builder2&amp;sort=created&amp;direction=desc&amp;state=open&amp;page=1">Easy</a> are things which are easy.</li>
<li><a href="https://github.com/schwern/test-more/issues?labels=Gardening%2CTest-Builder2&amp;sort=created&amp;direction=desc&amp;state=open&amp;page=1">Gardening</a> are refactorings and other cleanups.</li>
<li><a href="https://github.com/schwern/test-more/issues?labels=Docs%2CTest-Builder2&amp;sort=created&amp;direction=desc&amp;state=open&amp;page=1">Docs</a> are for documentation.</li>
</ul>

<p>If you'd like to help, I'd love the help.  There's plenty to do for everybody.  Our <a href="https://github.com/schwern/test-more/wiki/Preferred-workflow">preferred workflow</a> is laid out and pretty easy to follow.  There's a <a href="https://github.com/schwern/test-more/blob/Test-Builder1.5/lib/Test/Builder2/Design.pod">design document</a> to give you an overview of what's going on, though it is out of date in places.</p>
]]>
        <![CDATA[<p>When the Google Code In starts, these tasks will be made available for GCI students.</p>

<p><strong>About Test::Builder2</strong></p>

<p>As you may or may not know, a major rewrite of Test::Builder (the thing which coordinates and does the heavy lifting for the Test modules) has been in the works for some time.  Called Test::Builder2, it has a grant from The Perl Foundation.  The grant's final deadline is at the end of the month and I'm frantically coding to get a feature complete alpha out in time.</p>

<p>This release is being referred to as Test::Builder 1.5.  It contains all the internal reworkings of Test::Builder2, but in order to save time the new class to replace Test::Builder itself (which is also called Test::Builder2) is not done.</p>

<p>But users still get to use and benefit from the new internals!</p>
]]>
    </content>
</entry>

<entry>
    <title>The End Of 5.6 Is Nigh!</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/michael_g_schwern/2011/11/the-end-of-56-is-nigh.html" />
    <id>tag:blogs.perl.org,2011:/users/michael_g_schwern//786.2433</id>

    <published>2011-11-12T23:50:24Z</published>
    <updated>2011-11-13T00:14:40Z</updated>

    <summary>It&apos;s that time again! Time when I hammer the last few nails in the coffin of a version of Perl. A few years back, I killed 5.004 and 5.005 in a stroke by uping the minimum version of Test::More, upon...</summary>
    <author>
        <name>Michael G Schwern</name>
        <uri>http://schwern.net</uri>
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/michael_g_schwern/">
        <![CDATA[<p>It's that time again!  Time when I hammer the last few nails in the coffin of a version of Perl.  A few years back, I killed 5.004 and 5.005 in a stroke by uping the minimum version of Test::More, upon which 80% of CPAN relies, from 5.004 to 5.6.0.  In a few months I'll be doing it again.</p>
]]>
        <![CDATA[<p>The next major release of Test::More (aka Test::Builder1.5) will support 5.8.1 and up.  ExtUtils::MakeMaker will probably go that way, too.  This effectively cuts off most of CPAN from 5.8.0 and down.  It will happen in the next few months.</p>

<p>Test::More might nudge its requirement a little higher depending on just how difficult it is to work around threading bugs in the earlier 5.8 releases.</p>

<p>At this point I don't imagine this will cause too much disruption.  There aren't many serious 5.6 users left and whomever is left has to have already come up with some sort of 5.6PAN solution.</p>

<p>Dropping 5.6 will ease maintenance and testing of these modules and make a greater baseline of core modules available.  However, if a company or organization would like to see 5.6 compatibility retained, they can contact me about sponsoring the extra work for continued maintenance.</p>

<p>Thanks to Sarathy and rgs for some great and long lived releases!  It's been a good eight years.</p>

<p>Schwern</p>

<blockquote>
  <p>Slayer of Pseudo-Hashes</p>

<p>Defender of Lexical Encapsulation</p>

<p>Destroyer of Perl Versions</p>
</blockquote>
]]>
    </content>
</entry>

<entry>
    <title>How (not) To Load a Module or Bad Interfaces Make Good People Do Bad Things</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/michael_g_schwern/2011/10/how-not-to-load-a-module-or-bad-interfaces-make-good-people-do-bad-things.html" />
    <id>tag:blogs.perl.org,2011:/users/michael_g_schwern//786.2244</id>

    <published>2011-10-02T17:44:37Z</published>
    <updated>2011-10-23T07:21:42Z</updated>

    <summary>tl;dr version: The design of require makes it all but impossible to use in a secure and correct fashion. To fix this, Perl needs two new ops: one which will only load files, and one which will only load modules....</summary>
    <author>
        <name>Michael G Schwern</name>
        <uri>http://schwern.net</uri>
    </author>
    
    <category term="require" label="require" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="security" label="security" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/michael_g_schwern/">
        <![CDATA[<p><strong>tl;dr version</strong>:  The design of <code>require</code> makes it all but impossible to use in a secure and correct fashion.  To fix this, Perl needs two new ops: one which will <em>only</em> load <em>files</em>, and one which will <em>only</em> load <em>modules</em>.  Both would <em>only</em> load from <code>@INC</code>.</p>

<p>The more I look into the problem, the more I'm convinced that there is no good way to load a module from a variable in Perl.  None of the existing techniques or modules fully solve the problem.  They all have security holes or limitations.  This is kind of embarrassing, it's an easy thing and it should be easy.  My investigation into how many ways the simple act of loading a module can go wrong has lead me to believe that the solution is a new op which just loads modules.</p>

<p>For those of you wondering, near as I can tell this is how you correctly and securely load a module from a variable...</p>

<pre><code>sub require_module {
    my $module = shift;

    # Is it defined?
    die unless defined $module;

    # Is the caller using utf8?
    require utf8;
    my $with_utf8 = (caller(0))[8] &amp; $utf8::hint_bits;

    # Are Unicode package names ok?
    my $check = $with_utf8 ? qr{\A [[:alpha:]_] [[:word:]]*    (?: :: [[:word:]]+ )* \z}x
                           : qr{\A [A-Z_a-z]    [0-9A-Z_a-z]*  (?: :: [0-9A-Z_a-z]+  )* \z}x;

    # Is it a syntactically valid module name?
    die unless $module =~ $check;

    # Transform to a pm file path
    my $file = $module;
    $file .= ".pm";
    $file =~ s{::}{/}g;

    # What were we doing again?
    return require $file;
}
</code></pre>

<p>Isn't that EASY?!  Nothing I've looked at gets this all correct.  And please don't cut and paste that into your code, fix one of the module loading modules instead.</p>

<p>This all started when I <a href="http://stackoverflow.com/questions/7598425/what-are-the-common-pitfalls-when-using-perls-eval/7603939#7603939">mentioned on StackOverflow</a> that having your compiler and your exception handler share the same function is a security hole by way of a bad interface.  It encourages people to use them interchangeably without really thinking about the consequences.  The common example I came up with was <code>eval "require $module"</code> which <a href="http://www.google.com/codesearch#search/&amp;q=eval%5C%20%5C%22require%5C%20%5C$%20lang:%5Eperl$&amp;type=cs">happens a lot</a>.</p>

<p>The real fun came when I tried to fix it.</p>

<p>See, I found a security hole in a major module (details withheld because I'd rather not point it out too conspicuously before it gets updated) which allows arbitrary code execution.  Whoops.  Yes, it did filter $module... or it tried to anyway.  And yes, it was written by a very competent programmer.  Loading a module from a variable is hard.  It should be easy.  Easy things should be easy.</p>

<p>My first solution was to replace it with <code>eval { require $module }</code>.  Yeah, it's wrong.  I was in a rush.  My superhuman strength was needed to lift boxes, some of which were solidly packed with metal bars.  It may have been a bank heist, the details are hazy.  Funny enough, the module's tests did not catch the mistake (they do now).</p>

<p>If you know why it's wrong, skip ahead to the next <code>tl;dr</code>.  The rest of you sit down and learn where this whole problem comes from.</p>

<p><code>require</code> is really two functions and that's where the whole problem comes from.  Are you seeing a theme?  If you pass require a variable, like <code>require $module</code>, it will consider that to be a file path and try to load it, appending it to each entry in @INC assuming the path is not absolute (that becomes a concern later).  If you pass <code>require</code> a bareword, like <code>require Module::Name</code>, it will consider that to be a module, validate it (probably the Perl compiler validates it), transform it into a .pm file path and then do what <code>require $scalar</code> does.</p>

<p>Got that?  Confused?  It's hard to keep straight, and that's half the problem.  Let's make it clear(er):</p>

<p><code>require Bareword::Name</code></p>

<ul>
<li><code>Bareword::Name</code> treated as a module name.</li>
<li>Checked it is valid module name syntax.</li>
<li>Transformed into a relative .pm file (ex. <code>Bareword/Name.pm</code>)</li>
<li>Searched through <code>@INC</code>.</li>
</ul>

<p><code>require $file</code></p>

<ul>
<li>$file treated as a file path.</li>
<li>No checks are done.</li>
<li>Absolute paths and paths starting with "./" are simply loaded.</li>
<li>Other paths search through <code>@INC</code>.</li>
</ul>

<p><code>require Bareword::Name</code> is safe in that it will load a .pm in your <code>@INC</code> <em>and nothing else</em>.  If the content of @INC can be trusted, you're good.  (If it can't, you shouldn't be wasting time reading overly verbose security blog posts.  Fix that shit now.)</p>

<p><code>require $file</code> is not safe.  It does no validation.  It can be used to load <em>any valid Perl on the filesystem</em>.  Got that?  If an attacker injects a file into <code>/tmp</code> an unprotected <code>require $file</code> might let them execute it with elevated privileges.  This is bad.  It gets worse.</p>

<p>Trouble is, <code>require Bareword</code> lets you load from a bareword and not a variable.  This is how Perl knows you're asking for a module and not a file.  Rather than have two clearly defined functions we have clever syntax.  Yaaay... guh.</p>

<p>The user is forced into two ways around this inflexibility, both unpalatable.  Either you write <code>eval "require $module"</code> to trick Perl into thinking $module is a bareword, or you take it upon yourself to transform $module into a pm file path and use <code>require $file</code>.  Both are security holes.</p>

<p>Welcome back <code>tl;dr</code> readers!  Wipe the drool off the side of your face, we're not done.</p>

<p>Some of you are thinking, "I'm not stupid.  I can write <code>eval "require $module"</code>!  All I have to do is validate $module or make sure it comes from a trusted source!"  Maybe.</p>

<p>Once upon a time my friend was showing off this new Glock.  He pointed out the lack of a conventional safety, instead there's a number of very clever things to ensure the gun will not fire unless the trigger is deliberately pulled.  On the other hand, a loaded Glock will <em>always</em> fire if the trigger is deliberately pulled.  This is a great feature if you are, say, a trained police officer.  If you're a civilian, this is a great way to shoot you or your house guest in the foot.  My friend was so confident in the safety of his gun he bragged he could hammer nails with it.</p>

<p>Just because you can hammer nails with a gun doesn't mean you should build a house with it.  Just because you think you can make <code>eval "require $module"</code> safe doesn't mean it is.  Or that everyone else will.  The best security is the security you don't have to be careful about.  You always have to be careful with <code>eval STRING</code>, every single time.</p>

<p>You will get lazy and not validate.  Or a clever attacker will figure a way around your validation.  Or your validation will reject valid module names.  Or code which previously only took safe $module is later changed to take unsafe user input.  I've seen all of this.  This is not the path to security.</p>

<p>So what about <code>require $file</code>?  Because it's so easy to jump the <code>@INC</code> rails that starts out insecure by design.  Here's how...</p>

<p>Let's say an attacker has found a way to get files onto your filesystem, doesn't matter what owner.  This is already a security hole, but they don't have any way to execute the file.  A cracker will now find a way to escalate their privilege stacking one minor security hole onto another to make a great big one.  An unprotected <code>require $file</code> is all they need to execute it, because it will load any absolute path.  Already <code>require</code> is a security risk because you have to jump through more hoops to make it stick to <code>@INC</code>.</p>

<p>Filter for absolute paths and you're good?  No, also paths starting with <code>./</code>.  And <code>../</code> because those will jump out of <code>@INC</code> too.  Don't forget to check for <code>.\</code> and <code>.\</code> on Windows!  Writing that down?  Good.  Save space, there's more.</p>

<p>This is about loading a module, not any old file.  Usually people will first transform the module into a file path.  That doesn't seem so hard, right?</p>

<pre><code>my $file = $module;
$file .= ".pm";
$file =~ s{::}{/}g;
require $file;
</code></pre>

<p>Done?  No.  It's not even totally valid, it doesn't handle the <code>'</code> namespace separator, but only <a href="http://www.youtube.com/watch?v=O8n1XbRaZJ4#t=0m12s">Klingons and hippies</a> care about that.  No, it's worse.  It's a security hole and one that is all over our Perl code.</p>

<p>Remember how <code>require $file</code> can be made to jump the rails and load any file?  You'd think the module transformation would prevent that, but no.  Consider the "module" <code>/tmp/LOL/PWND</code> which the code above happily loads a <code>/tmp/LOL/PWND.pm</code>.  Ok, maybe your code has some validation and only lets things that look like module names in.  If the validation isn't well written it might allow <code>::tmp::LOL::PWND</code>.</p>

<p>The combination of a salad of insecurities makes loading a module almost impossible to properly secure unless you are very knowledgeable and very careful.</p>

<p>That's fine, in situations like this the smart and safe thing is to use a module!  Then lots of people who think about this stuff can get it right.  Right?  Surely a module dedicated to the one act of loading a module has thought this through?  Right?  Please?</p>

<p>Nope.</p>

<p><a href="https://metacpan.org/module/perl5i#require">perl5i</a> gets it wrong.  <code>$module-&gt;require</code> does not validate <code>$module</code> and so is vulnerable to the <code>::tmp::LOL::PWND</code> trick.</p>

<p><a href="https://metapcan.org/module/Module::Load">Module::Load</a> will jump out of <code>@INC</code> just like require does.</p>

<p><a href="https://metacpan.org/module/Class::Load">Class::Load</a> comes close.  I haven't been able to make it do anything insecure, but it tells me things which are not valid module names, like "123", are and things which are valid module names, like "Mégå::Mödulé" are not valid.  Also it has a <a href="http://deps.cpantesters.org/?module=Class::Load">rather long dependency chain</a> just to safely load a module.</p>

<p><a href="https://metacpan.org/module/UNIVERSAL::require">UNIVERSAL::require</a> <strike>works great</strike> is vulnerable to shenanigans.  By making require a class method call, it uses Perl's own idea of what a valid package name is.  Unfortunately, what the Perl parser considers a package name and what the Perl runtime considers a package name is different.  The runtime will accept all sorts of junk for the purposes of a class method call, like <code>subdir/../../../../../tmp/LOL/PWND</code>.  For fuck's sake.</p>

<p>Even if a module is made convenient, correct and secure, people will still avoid adding a CPAN dependency for something as "trivial" as loading a module.  The CPAN modules should be fixed to be sure, but they are not the long term solution.</p>

<p>The design of <code>require</code> makes it almost impossible to secure.  Those modules went past a lot of eyeballs, including mine, and they still don't work right.  While it is good to patch those modules, what is needed is a better <code>require</code> built right into Perl.  More precisely, two.</p>

<p>The first, let's call it <code>require_module</code>, loads modules.  It <em>only</em> loads modules in <code>@INC</code> and does nothing else.  By doing this inside Perl it can perfectly validate the module name avoiding both validation mistakes and security holes.</p>

<p>The second, let's call it <code>require_file</code>, loads files.  It <em>only</em> loads files from inside <code>@INC</code>.  Absolute paths and those which try to updir are rejected.  This allows a programmer to be sure what Perl code can be loaded is at least coming from secured locations.</p>

<p><code>require_module</code> is easy, it can be carved out of the existing require code without much trouble (uhh... as far as writing new keywords in Perl goes).  <code>require_file</code> is made difficult because Perl lacks the built in file path operations.  Detecting an absolute path is pretty straightforward, but detecting one that's trying to updir is not.  That requires a complete file path parsing library... in C.</p>

<p>Fortunately, only <code>require_module</code> is needed to solve our problem.  With an in core way to securely and correctly load a module from a variable, the impulse to use any other hack fades away with older versions of Perl.  A CPAN module can encapsulate the decision to use the built-in or a pure Perl work around depending on the version of Perl.</p>

<p>I'm writing up the tests and docs now.  I'll be doing the work in <a href="https://github.com/schwern/perl/tree/feature%2Fsplit_require">a branch on Github called feature/split_require</a> if you'd like to contribute.  It especially needs someone who can write the actual C code using something better than my rusty chainsaw.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>And the fastest OO accessor is...</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/michael_g_schwern/2011/03/and-the-fastest-oo-accessor-is.html" />
    <id>tag:blogs.perl.org,2011:/users/michael_g_schwern//786.1614</id>

    <published>2011-03-31T10:21:51Z</published>
    <updated>2011-04-02T02:53:15Z</updated>

    <summary>There&apos;s a lot of FUD out there about the performance of various OO modules, particularly Mouse. So let&apos;s set it straight with some benchmarking. I&apos;ve chosen to simulate that buggaboo of OO performance in Perl, the simple accessor. The one...</summary>
    <author>
        <name>Michael G Schwern</name>
        <uri>http://schwern.net</uri>
    </author>
    
    <category term="benchmarks" label="benchmarks" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="oo" label="oo" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/michael_g_schwern/">
        <![CDATA[<p>There's a lot of FUD out there about the performance of various OO modules, particularly Mouse.  So let's set it straight with some benchmarking.</p>

<p>I've chosen to simulate that buggaboo of OO performance in Perl, the simple accessor.  The one that you're going to call millions of times and that you'll be sorely tempted to reimplement with a hash or tear all the argument checks out of for "performance".  To make it a little more realistic, I'm checking both getting and setting as well as a simple argument check.</p>
]]>
        <![CDATA[<p>Each benchmark gets or sets an integer once via an accessor (except in the case of the plain hash).  The object is created outside the benchmark to avoid distortion.</p>

<p>Each accessor validates that the argument is an integer.</p>

<p>I picked the following modules because they are either popular or make performance claims:</p>

<ul>
<li>Mouse, both XS and pure Perl</li>
<li>Moose</li>
<li>Moo, with and without Sub::Quote</li>
<li>A plain hash</li>
<li>A class written by hand (aka "manual")</li>
</ul>

<p>I also threw in Object::Tiny and Object::Tiny::XS, even though it's not really fair.  They're read only and do no argument checks.  They sacrifice everything for performance.  Let's see if it's worth it.</p>

<p>Finally, I did a hash with no argument checks, just to get an upper bound.</p>

<p>I won't clog this post with <a href="https://gist.github.com/896004">the full benchmark code</a>, but here's what the Moose class looks like for example.</p>

<pre><code>package Foo::Moose;
use Moose;
has bar =&gt; (is =&gt; 'rw', isa =&gt; "Int");
__PACKAGE__-&gt;meta-&gt;make_immutable;
</code></pre>

<p><strong>Results</strong></p>

<p>And the results, from slowest to fastest getter with the cheaters at the bottom.</p>

<pre><code>Name                  Get            Set
-----------------------------------------
Moo                   -11%            -4%   
manual                  0%             0%    
Mouse, no XS            0%           -33%  
Moo w/quote_sub         5%           -46%  
Moose                   8%           -40%  
Mouse                 145%           468%  

manual, no check        0%           153%  
Object::Tiny           20%           n/a   
Object::Tiny::XS      226%           n/a    
hash, no check        279%         1,116%  
hash                  289%           147%
</code></pre>

<p>That's with Perl 5.12.2 on OS X using the latest versions of all those modules as of this writing (Moose 1.24, Mouse 0.91, Moo 0.009007, Object::Tiny 1.08, Object::Tiny::XS 1.01).</p>

<p>The percentages are the % improvement from a hand written class (manual).</p>

<p>You can get all the data <a href="http://blogs.perl.org/users/michael_g_schwern/data/accessor_benchmarks.csv/accessor_benchmarks.csv">as a CSV</a>.</p>

<p><strong>Conclusions</strong></p>

<p>The go-to module is clearly <strong>Mouse</strong> with XS.  The power of Moose, but faster than everything equivalent, faster than writing it by hand, faster (on setting with equivalent validation) than a raw hash, and no required dependencies.  Unless you need the meta capabilities of Moose, there's little reason to use anything else.</p>

<p>If you want a read-only object with no argument checks, use <strong>Object::Tiny::XS</strong>, hands down.</p>

<p><strong>Object::Tiny</strong>, on the other hand, is pretty poky; significantly slower than Mouse with XS.  Given its extreme limitations, if you can use XS then there's no point to Object::Tiny.</p>

<p><strong>Moose</strong> and <strong>pure Perl Mouse</strong> stack up about the same and fare well against a hand written class, though lose out in setting.  You should be getting far more than setting, so you won't be seeing much performance gain by hand writing methods.</p>

<p>The big surprise is the newcomer, <strong>Moo</strong>... but not in a good way.  It's "an extremely light-weight, high-performance Moose replacement" but the numbers don't stack up.  Absolutely creamed by Mouse with XS, it's the slowest getter of them all with a slight edge in setting.</p>

<p>Another surprise is Moo's <strong><code>quote_sub</code></strong>.  This is supposed "to create coderefs that are inlineable, giving us a handy, XS-free speed boost."  The Moo docs suggest using them for "isa" type checks, so I did.  The numbers don't pan out, it's worse than a regular sub.  I'm going to assume it's a bug and I've filed a report.  Curiously, using quote_sub on isa made the getter faster... which makes no sense.</p>

<p>The final conclusion is this: performance is no longer an excuse for not using OO.  Accessors are what will be called most often and they're what tempts programmers to micro optimize and throw out their abstractions.  Mouse and Object::Tiny::XS can blow the doors off what you can write by hand.  The performance improvements in Mouse show that abstraction not only makes the code cleaner, but it allows radical optimizations which you can have just by upgrading to a new version.</p>

<p><strong>Bias</strong></p>

<p>Every benchmark has its biases.  Here's the ones in this one that I can identify.</p>

<p>Of course the version of Perl, operating system and versions of modules all matter.  One particularly large gap will be in Mouse.  Its XS optimizations are a fairly recent thing, so older versions of Mouse will not perform well.</p>

<p>The benchmark uses a very simple type check, an integer.  I chose that because it's what an awful lot of methods are going to use.  Because it's so simple and common Moose, and especially Mouse, have a clear opportunity to optimize for it.  A less common or more complicated type check might have impacted performance more.  A string might provide different numbers.</p>

<p>Finally, I really like Mouse. :-)</p>

<p><strong>Update</strong></p>

<p>Corrected my statement about Mouse being faster than a raw hash.  That only happens when setting with validation.  When I originally did the benchmarks they combined setting and getting.</p>

<p>Added a wider conclusion about the role of performance in choosing an OO system.</p>
]]>
    </content>
</entry>

</feed>
