The Price of Cleverness (YAML is not Safe)
Today I wasted a few hours tracking down this delightful bug:
Undefined subroutine &main::main:: called at
...lib/site_perl/5.12.4/YAML/Mo.pm line 5.
So what does YAML::Mo
line 5 look like?
That's right. That's line 5. I've wrapped it to make it easier to read. When you load the latest YAML
, you load YAML::Mo
and that contains the above monstrosity. And it has a serious bug. Do you see it?
This bug was first reported a couple of months ago but no one could reproduce it. There was speculation that it was related to *{""} stringification, but while it's related, it's sort of a red herring.
I first encountered this bug when I was was using Dancer and was using the excellent Data::DPath (thank's Steffen!) to extract data from a JSON data structure. At first, Dancer
was telling me that YAML
was not installed, but a pull request later, it was giving me the error message I opened this post with. And what a heck of a problem it was to debug.
After a lot of work stripping down Data::DPath
, I managed to get this test case:
use Safe;
BEGIN { Safe->new } # used in Data::DPath
use YAML;
As it turns out, there's a tiny bug in Safe
's calculation of shared code and I've filed a bug report for it. Essentially, it does this:
\&main::;
That creates a reference to a sub whose name is the empty string! Actually, no it doesn't. It appears to autovivify the typeglob slot in the same way a forward declaration will, but it provides no sub body.
# a forward declaration of a sub, but no sub body
sub foo;
Now what does strict
do? Here's some fun:
[Ovid:~] $ perl -le 'my $x = foo; print $x'
foo
[Ovid:~] $ perl -le 'sub foo { "bar" } my $x = foo; print $x'
bar
If you don't enable strict
, Perl will see if a bareword is a subroutine and if it is, it will call it. Otherwise, it will assume that the bareword is a string.
Actually, it's use strict 'subs'
that does this:
[Ovid:~] $ perl -le 'use strict 'subs'; my $x = foo; print $x'
Bareword "foo" not allowed while "strict subs" in use at -e line 1.
Execution of -e aborted due to compilation errors.
At the beginning of the YAML::Mo
code, we have this:
package YAML::Mo;
my $M = __PACKAGE__ . ::;
Because strict
is not enabled, $M
will be assigned the value YAML::Mo::
. Clever, huh? No, not really. When Safe
is loaded, the reference \&main::
in Safe
will create a subroutine effectively named "" (the empty string). When you reference a package variable or subroutine, you're allowed to leave off the main::
if it's in package main
, so if you have this:
package main;
sub foo { 'bar' }
The rest of your code can call foo()
with either main::foo()
or ::foo()
. In this case, here's what happens:
- Perl sees a bare
::
in__PACKAGE__ . ::;
- It first tries to see if
::
is a sub, but sincemain
is optional, it sees::
as a sub inmain::
. - Because of the
Safe
bug, there is now an empty string sub in*main::{CODE}
, but it has no sub body. - So Perl tries to call
*main::
, resulting in the strangeUndefined subroutine &main::main::
error.
Here's the smallest test case to replicate the bug:
[Ovid~] $ perl -e 'BEGIN { \&:: } use Mo'
Undefined subroutine &main::main:: called at
... lib/site_perl/5.12.4/Mo.pm line 3.
So now, any code that used Mo
to generate "inline Mo" code needs to be rereleased to work around this bug because updating Mo
is not enough. You must update the code that inlines Mo
.
The fundamental problem here isn't actually the lack of using strict
(though this problem would not have happened if it was used). It's trying to be too clever. I have often maintained two things:
- Clever programmers often aren't.
- Expert programmers try to make their code as clean and simple as possible.
I have no problem with "clever" code, but only as a proof of concept, for an Acme::
module, or when you have no choice. Willy-nilly applying this to YAML
was not a good decision (Sorry Ingy. I luv ya, but honestly ...).
So everyone using YAML
now has a potential time bomb in their code. If they load Safe
first, or any other code that accidentally creates subs in namespaces where "Mo" thinks there should not be subs, they could fall afoul of this.
And for your amusement, here's a bug report whereby someone gets a commit bit for golfing Mo even further.
And this is the bug report for YAML.
Please don't be clever.
Don't be clever; being great is good enough. ;)
I think the quoted code is shitty code, not clever and there is no point putting all that crap into a single line
I thought YAML.pm basically was a proof-of-concept, and the production implementation was YAML::XS. Certainly I've had problems in the past which were fixed by switching; I would never choose to use YAML in production code.
Its for this reason that I use Moose and not the lighter alternatives that are probably all I need. Give me the real stuff please. I know that Mouse/Moo probably can't be this bad, but I think I will stick with Moose and hope that perl5-MOP goes core eventually to make my Moosey modules lighter.
You do realise that in your example, the single-quoted words in your perl -le commands actually aren't quoted at all, because your inlined single quote stops the shell quoting here? So from perl's point of view, the "bar" in your sub foo() is a bareword, too...
Thanks for catching that Uli. I've fixed it now.
Ovid, i'm not sure if you were aware, but:
The code snippet above is generated from this code:
https://github.com/ingydotnet/mo-pm/blob/master/src/Mo.pm
It is condensed down with an auto-golfer that i helped write and there's honestly a lot less cleverness at work there than one might think. :)
I had this error recently in an EmbPerl project. The error disappeared, when I used YAML directly before Data::DPath:
This crashes:
This does not:
Mithaldu, I'm aware of how it's generated and it's certainly nifty, but this should not be something people rely on in production, much less put into YAML.pm and risking breaking production systems everywhere. And Test::YAML depends on Test::Base which in turn depends on Spiffy. That's not something to give me a lot of confidence. So now we have YAML depending on two OO systems (and the META.yml lists a dependency on a source filter). I want code this important to be conservative and play things safe.
@Joel: Nothing to do with Mouse or Moo, which are not implemented with insane golf-code. Mo is, for a reason I'm not sure to understand.
I would have had nothing against that as long as it had remained in the Mo sandbox. But a depency of YAML? Seriously?
And please, @Ben, you cant say "not to use YAML.pm in production", it's YAML, a natural choice when searching the CPAN for ... YAML.
Anyway, thanks a lot @Ovid for the hard-core and fruitful investigation!
Ovid, for what's worth, i agree. I don't think Mo should have been put in there and if it had to be put in, at least as a module, not copy-pasted. However, i'm fairly sure ingy disagrees because it served its purpose in that position by sussing out issues like this.
Mithaldu: so if I follow you, breaking YAML is a good way to debug Mo?
Good one! :)
...
That's not what i meant, implied or wrote.
@asukrieh:
Second paragraph of YAML.pm's documentation…
I'm OK with agreeing on the fact that YAML.pm is not "robust and fast", but it clearly does not imply that YAML.pm is supposed to be unstable.
I'm sorry, I disagree.
Call it Acme::YAML or YAML::Mo if you like, but with such a generic name... it's clearly a trap, IMHO.
[...] it served its purpose in that position by sussing out issues like this.
Sorry, that's what I understood here.
@oid.philipp-soehnlein.de: Wow, an Embperl user. A dying breed these days...
You should thanks Ingy for his great work, if you are clever try to understand a little bit more and don't judge what you don't know well. That code is generated by Mo::inline, if you have a problem submit a patch on Mo repo ... that's why exists github and open source :P