Safely load untrusted YAML in Perl
Usually people deal with YAML files from trusted sources. But maybe you want to load input from a Website as YAML. This can lead to problems, and this article will talk about what you can do to make the loading safe.
The problems I'll talk about are loading objects, cyclic references and general parsing problems.
I will cover YAML.pm, YAML::Tiny, YAML::XS, YAML::Syck and YAML::PP.
Loading Objects
YAML can serialize objects, and here is a simple example of an object of the class 'Dice', which is an array of integers:
---
dice: !perl/array:Dice
- 3
- 6
# Supported by YAML::XS and YAML::Syck
another: !Dice
- 3
- 6
The resulting object is:
bless( [ 3, 6 ], 'Dice' )
Some modules allow disabling this feature, some don't, and some don't load objects at all.
Note that none of the modules actually loads the specified classes via use
,
they just bless the data structure, and if the Dice module is not loaded, there
shouldn't be a problem.
It might not be obvious that this still can be a security problem. An exploit is not trivial, but possible.
If you know Perl's Implementation of Object Orientation, you probably know
that you can add a DESTROY
method to your class, which will automatically
be called when the object falls out of scope or is set to undef.
If you're lucky, it might just generate an unwanted warning, like this example:
use DBI;
use YAML::XS;
my $yaml = <<EOM;
---
!!perl/hash:DBI::st { foo: bar }
EOM
my $data = Load $yaml;
__END__
SV = IV(0x56488cbda908) at 0x56488cbda918
REFCNT = 1
FLAGS = (ROK,READONLY,PROTECT)
RV = 0x56488cbc6340
SV = PVHV(0x56488cbccd90) at 0x56488cbc6340
REFCNT = 1
FLAGS = (OBJECT,SHAREKEYS)
STASH = 0x56488cd099d0 "DBI::st"
ARRAY = 0x56488cbe7b30 (0:7, 1:1)
hash quality = 100.0%
KEYS = 1
FILL = 1
MAX = 7
Elt "foo" HASH = 0x6dd7bf3c
SV = PV(0x56488cbc6e30) at 0x56488cbda888
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x56488ceb9250 "bar"\0
CUR = 3
LEN = 10
(in cleanup) dbih_getcom handle DBI::st=HASH(0x56488cbc6340) is not a DBI handle (has no magic)
But think of File::Temp, which will remove a temporary file in its DESTROY
method. It has some protection against this kind of attack by keeping a record
of the files it created but it still seems to be exploitable, as far as I know.
In any case, you don't want arbitrary user input to call a method on your server.
In my first version of this article I mentioned that you can use Data::Structure::Util::unbless()
to unbless all objects afterwards. This will not work if the YAML is syntactically invalid, as the YAML modules will start blessing things during parsing. Also, if you do a $data = Load $yaml
instead of @data = Load $yaml
, and the input contains more than one document, only the first will get stored, and all other will get destroyed before unbless()
gets the chance to run.
So we have to look at the various modules, if they provide a way to not bless objects:
YAML::Tiny
YAML::Tiny doesn't support tags at all, so it also won't load any objects. Safe!
YAML::Syck
YAML::Syck lets you disable loading objects like this:
use YAML::Syck 1.21;
$YAML::Syck::LoadBlessed = 0;
YAML::XS
Since version 0.69, released in December 2017, YAML::XS also lets you disable objects:
use YAML::XS 0.69;
$YAML::XS::LoadBlessed = 0;
Note that if you enable Boolean support via $YAML::XS::Boolean
, it will
still load JSON::PP::Boolean
or boolean.pm
objects.
YAML.pm
In YAML.pm, you cannot disable this.
YAML::PP
YAML::PP cannot load objects yet, but when implemented, this will be off by default. Safe!
Cyclic references
YAML supports serializing references via Anchors/Aliases:
---
some mapping: &mymap
a: 1
b: 2
c: 3
the same mapping: *mymap
The resulting data structure printed by Data::Dumper looks like that:
$VAR1 = {
'some mapping' => {
'c' => 3,
'a' => 1,
'b' => 2
},
'the same mapping' => $VAR1->{'some mapping'}
};
So far, so good. But it also supports cyclic references, so you can serialize graphs with it.
---
nodes:
- &NodeA
name: NodeA
links:
- &NodeB
name: NodeB
links:
- *NodeA
- *NodeB
...
Output by Data::Dumper:
$VAR1 = {
'nodes' => [
{
'name' => 'NodeA',
'links' => [
{
'name' => 'NodeB',
'links' => [
$VAR1->{'nodes'}[0]
]
}
]
},
$VAR1->{'nodes'}[0]{'links'}[0]
]
};
If you deal with such data structures, you probably know that they can cause a memory leak because perl's garbage collection still thinks the variable is referenced from somewhere.
That means loading untrusted YAML can result in a memory leak.
Also, if you aren't careful when processing such a cyclic structure, you could run into an endless loop by accident.
A general way to protect yourself against this could be Data::Structure::Util:
use Data::Structure::Util qw/ circular_off /;
my $data = Load $yaml;
circular_off($data);
It will weaken
any circular ref in the data structure.
It will not protect you against an endless loop, though. You can use the
has_circular_ref
function to check if there are any circular refs.
circular_off
will also return the number of references it weakened, so
you could just abort processing in this case.
use Data::Structure::Util qw/ circular_off /;
my @data = Load $yaml;
circular_off(\@data)
and die "Won't process circular refs";
There's still a chance to get a memory leak if the YAML input is syntactically invalid, though, because the constructor will already begin creating data structures while parsing.
So let's look at the modules:
YAML::Tiny
YAML::Tiny doesn't support Anchors/Aliases at all. Safe!
YAML.pm, YAML::Syck, YAML::XS
They don't provide a way to detect or disable cyclic refs.
YAML::PP
The next YAML::PP version will support disabling cyclic references or warn about it.
While detecting cyclic references is not that trivial if you have an existing data structure, the way they are built in YAML actually makes them easy to detect during loading with just a little bit of overhead.
Maybe YAML::PP can also support weakening cyclic references in the future, but that's a bit more complicated to do during the loading process.
Parsing Problems
Since YAML is not as trivial to parse as, for example, JSON, there might be bugs in the modules that can lead to problems.
I don't know of any YAML that will result in an endless loop with one of the mentioned modules.
YAML::Syck can segfault sometimes. Have a look at its list of issues on github.
Recently, a stack overflow was found for libyaml, but YAML::XS does not use the part of libyaml which causes this.
In the past, YAML::XS could segfault when loading many regex objects in one document. This has been fixed.
In general, because of their implementation in C, YAML::Syck and YAML::XS could still have undetected bugs that can lead to segfaults.
Conclusion
To load untrusted YAML:
- When using YAML::Syck/YAML::XS, run it in its own process, if possible
- Run it via the
timeout
utility - or use perl's
alarm
function - Don't use YAML.pm
- Use YAML::Syck >= 1.21 / YAML::XS >= 0.69 and disable loading objects with
$YAML::Syck::LoadBlessed = 0
/$YAML::XS::LoadBlessed = 0
- use
Data::Structure::Util
for detecting and weakening cyclic refs - Limit the input length, if possible
- Using only YAML::Tiny or YAML::PP >= 0.006 should be safe
If you think I'm wrong please contact me via email or github. blogs.perl.org will not notify me in case in case of a reply.
Thanks to the comment of jwilk https://github.com/ingydotnet/yaml-libyaml-pm/issues/45#issuecomment-371786236 I learned that there is a way to exploit it even when using Data::Structure::Util::unbless. I edited the article.
Also, Data::Structure::Util::unbless has problems with tied data, so it can be dangerous when using together with YAML.pm and $YAML::Preserve.