Safely load untrusted YAML in Perl

Usually people deal with YAML files from trusted sources. But maybe you want to load input from a Website as YAML. This can lead to problems, and this article will talk about what you can do to make the loading safe.

The problems I'll talk about are loading objects, cyclic references and general parsing problems.

I will cover YAML.pm, YAML::Tiny, YAML::XS, YAML::Syck and YAML::PP.

Loading Objects

YAML can serialize objects, and here is a simple example of an object of the class 'Dice', which is an array of integers:

---
dice: !perl/array:Dice
  - 3
  - 6

# Supported by YAML::XS and YAML::Syck
another: !Dice
  - 3
  - 6

The resulting object is:

bless( [ 3, 6 ], 'Dice' )

Some modules allow disabling this feature, some don't, and some don't load objects at all.

Note that none of the modules actually loads the specified classes via use, they just bless the data structure, and if the Dice module is not loaded, there shouldn't be a problem.

It might not be obvious that this still can be a security problem. An exploit is not trivial, but possible.

If you know Perl's Implementation of Object Orientation, you probably know that you can add a DESTROY method to your class, which will automatically be called when the object falls out of scope or is set to undef.

If you're lucky, it might just generate an unwanted warning, like this example:

use DBI;
use YAML::XS;
my $yaml = <<EOM;
---
!!perl/hash:DBI::st { foo: bar }
EOM
my $data = Load $yaml;

__END__
SV = IV(0x56488cbda908) at 0x56488cbda918
  REFCNT = 1
  FLAGS = (ROK,READONLY,PROTECT)
  RV = 0x56488cbc6340
    SV = PVHV(0x56488cbccd90) at 0x56488cbc6340
      REFCNT = 1
      FLAGS = (OBJECT,SHAREKEYS)
      STASH = 0x56488cd099d0    "DBI::st"
      ARRAY = 0x56488cbe7b30  (0:7, 1:1)
      hash quality = 100.0%
      KEYS = 1
      FILL = 1
      MAX = 7
        Elt "foo" HASH = 0x6dd7bf3c
        SV = PV(0x56488cbc6e30) at 0x56488cbda888
          REFCNT = 1
          FLAGS = (POK,pPOK)
          PV = 0x56488ceb9250 "bar"\0
          CUR = 3
          LEN = 10
        (in cleanup) dbih_getcom handle DBI::st=HASH(0x56488cbc6340) is not a DBI handle (has no magic)

But think of File::Temp, which will remove a temporary file in its DESTROY method. It has some protection against this kind of attack by keeping a record of the files it created but it still seems to be exploitable, as far as I know.

In any case, you don't want arbitrary user input to call a method on your server.

In my first version of this article I mentioned that you can use Data::Structure::Util::unbless() to unbless all objects afterwards. This will not work if the YAML is syntactically invalid, as the YAML modules will start blessing things during parsing. Also, if you do a $data = Load $yaml instead of @data = Load $yaml, and the input contains more than one document, only the first will get stored, and all other will get destroyed before unbless() gets the chance to run.

So we have to look at the various modules, if they provide a way to not bless objects:

YAML::Tiny

YAML::Tiny doesn't support tags at all, so it also won't load any objects. Safe!

YAML::Syck

YAML::Syck lets you disable loading objects like this:

use YAML::Syck 1.21;
$YAML::Syck::LoadBlessed = 0;

YAML::XS

Since version 0.69, released in December 2017, YAML::XS also lets you disable objects:

use YAML::XS 0.69;
$YAML::XS::LoadBlessed = 0;

Note that if you enable Boolean support via $YAML::XS::Boolean, it will still load JSON::PP::Boolean or boolean.pm objects.

YAML.pm

In YAML.pm, you cannot disable this.

YAML::PP

YAML::PP cannot load objects yet, but when implemented, this will be off by default. Safe!

Cyclic references

YAML supports serializing references via Anchors/Aliases:

---
some mapping: &mymap
  a: 1
  b: 2
  c: 3
the same mapping: *mymap

The resulting data structure printed by Data::Dumper looks like that:

$VAR1 = {
  'some mapping' => {
    'c' => 3,
    'a' => 1,
    'b' => 2
  },
  'the same mapping' => $VAR1->{'some mapping'}
};

So far, so good. But it also supports cyclic references, so you can serialize graphs with it.

---
nodes:
- &NodeA
  name: NodeA
  links:
  - &NodeB
    name: NodeB
    links:
    - *NodeA
- *NodeB
...

Output by Data::Dumper:

$VAR1 = {
  'nodes' => [
    {
      'name' => 'NodeA',
      'links' => [
        {
          'name' => 'NodeB',
          'links' => [
            $VAR1->{'nodes'}[0]
          ]
        }
      ]
    },
    $VAR1->{'nodes'}[0]{'links'}[0]
  ]
};

If you deal with such data structures, you probably know that they can cause a memory leak because perl's garbage collection still thinks the variable is referenced from somewhere.

That means loading untrusted YAML can result in a memory leak.

Also, if you aren't careful when processing such a cyclic structure, you could run into an endless loop by accident.

A general way to protect yourself against this could be Data::Structure::Util:

use Data::Structure::Util qw/ circular_off /;
my $data = Load $yaml;
circular_off($data);

It will weaken any circular ref in the data structure.

It will not protect you against an endless loop, though. You can use the has_circular_ref function to check if there are any circular refs. circular_off will also return the number of references it weakened, so you could just abort processing in this case.

use Data::Structure::Util qw/ circular_off /;
my @data = Load $yaml;
circular_off(\@data)
    and die "Won't process circular refs";

There's still a chance to get a memory leak if the YAML input is syntactically invalid, though, because the constructor will already begin creating data structures while parsing.

So let's look at the modules:

YAML::Tiny

YAML::Tiny doesn't support Anchors/Aliases at all. Safe!

YAML.pm, YAML::Syck, YAML::XS

They don't provide a way to detect or disable cyclic refs.

YAML::PP

The next YAML::PP version will support disabling cyclic references or warn about it.

While detecting cyclic references is not that trivial if you have an existing data structure, the way they are built in YAML actually makes them easy to detect during loading with just a little bit of overhead.

Maybe YAML::PP can also support weakening cyclic references in the future, but that's a bit more complicated to do during the loading process.

Parsing Problems

Since YAML is not as trivial to parse as, for example, JSON, there might be bugs in the modules that can lead to problems.

I don't know of any YAML that will result in an endless loop with one of the mentioned modules.

YAML::Syck can segfault sometimes. Have a look at its list of issues on github.

Recently, a stack overflow was found for libyaml, but YAML::XS does not use the part of libyaml which causes this.

In the past, YAML::XS could segfault when loading many regex objects in one document. This has been fixed.

In general, because of their implementation in C, YAML::Syck and YAML::XS could still have undetected bugs that can lead to segfaults.

Conclusion

To load untrusted YAML:

  • When using YAML::Syck/YAML::XS, run it in its own process, if possible
  • Run it via the timeout utility
  • or use perl's alarm function
  • Don't use YAML.pm
  • Use YAML::Syck >= 1.21 / YAML::XS >= 0.69 and disable loading objects with $YAML::Syck::LoadBlessed = 0 / $YAML::XS::LoadBlessed = 0
  • use Data::Structure::Util for detecting and weakening cyclic refs
  • Limit the input length, if possible
  • Using only YAML::Tiny or YAML::PP >= 0.006 should be safe

If you think I'm wrong please contact me via email or github. blogs.perl.org will not notify me in case in case of a reply.

1 Comment

Leave a comment

About tinita

user-pic just another perl punk,