Some of these changes might break code, but I expect this will be rare.
As I see more and more CPAN modules using YAML::PP, I decided to make these changes as soon as possible.
I will explain all changes and the reasons.
]]> Loading YAML in scalar contextIn scalar context, the several load*
functions will return the first document
rather than the last.
This is only relevant if your input has more than one document:
--- # Document 1
a: b
c: d
--- # Document 2
e: f
g: h
Usually you would load that in list context:
my @docs = Load($yaml);
But the common code for loading YAML data uses scalar context:
my $data = Load($yaml);
The behaviour of other CPAN modules is different. YAML::Syck::Load
also
returns the first document, while YAML::XS::Load
, YAML::Tiny::Load
and
YAML::Load
return the last.
I beliieve returning the first document is the more natural behaviour. If you have YAML files with one document, the context does not matter. If you later add more documents to a YAML file, you will still get the same result for the existing code with scalar context.
The YAML 1.2 Specification recommends three different Schemas, that a YAML processor should implement: Failsafe, JSON and Core. Core is actually the recommended default.
In the past, I switched to the JSON Schema by default, because it only has very few special values. However, the official YAML 1.2 JSON Schema is actually different from my implementation. All values that are supposed to be strings must be quoted. Only true, false, null and numbers don't have quotes.
In YAML::PP::Schema::JSON, the quotes are not necessary. But I will probably add an option in the future to require quotes, so that it reflects the official schema.
The YAML world is slowly moving towards YAML 1.2. More and more YAML processors are written that implement YAML 1.2, and most just implement one Schema: Core. So by default, YAML::PP will be compatible with that standard Schema.
To get an overview of the different Schemas, and the behaviour of YAML modules, I created a table with regular expressions per Schema.
Also I created a HTML page from my test data and compared the load results of YAML::PP and other YAML modules.
This should give you an idea what incompatibilities you have to expect if you switch from one of the other modules to YAML::PP. As you can see, neither YAML::Syck, YAML::XS or YAML.pm implement any of the standard schemas.
It would be possible to make YAML::PP behave like one of the other modules for compatibility, but it would be a lot of work. (Let me know if you need this and want to sponsor it ;-)
Since the official YAML 1.2 JSON Schema does not allow unquoted strings, the empty node is actually forbidden:
---
key: # no value
Because YAML::PP::Schema::JSON allows unquoted strings, I had to decide if
empty nodes resolve to null
or the empty string ''
. I decided to make
the empty string the default, but make it configurable:
my $yp = YAML::PP->new( schema => ['JSON'] );
my $yp = YAML::PP->new( schema => [qw/ JSON empty=str /] );
my $yp = YAML::PP->new( schema => [qw/ JSON empty=null /] );
YAML::PP will now just assume all input data are unicode characters
and won't do an explicit utf8::upgrade
.
See Issue 16.
Also, some control characters weren't correctly escaped. See Issue 17.
While preparing the HTML page for the different schemas, I noticed that I
forgot +.inf
, +.Inf
, +.INF
in the Core Schema.
Before, when dumping empty sequences and mappings, the emitter added a newline:
---
empty sequence:
[]
empty mapping:
{}
The new output will not have that newline:
---
empty sequence: []
empty mapping: {}
]]>
But for YAML that's not unusual, and people don't expect it to do potentially dangerous things by default.
]]>In this case thanks should go to the folks from the Debian Perl Team, Gregor Herrmann and Dominique Dumont.
They kept reminding us to do this change ;-)
Forgot to mention them in the post, as it was late yesterday
]]>You can create any kind of object with YAML. The creation itself is
not the critical part, but if the class has a DESTROY
method, it will be
called once the object is deleted. An example with File::Temp removing
files can be found here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=862373
YAML::Syck had the option to disable this
feature via $YAML::Syck::LoadBlessed
for a long time. Since 2018, also
YAML.pm and
YAML::XS have this variable.
See also my blog post from 2018: Safely load untrusted YAML in Perl
In the past, this feature was enabled by default in all three modules.
This will now be disabled by default, to make sure that Perl's YAML libraries are, by default, more secure.
If you are using one of the modules to serialize/load objects, you have to set this variable now:
use YAML; # since 1.30
local $YAML::LoadBlessed = 1;
use YAML::Syck; # since 1.32
local $YAML::Syck::LoadBlessed = 1;
use YAML::XS; # since 0.81
local $YAML::XS::LoadBlessed = 1;
Always use local
in a very small scope to avoid setting this variable globally.
If you are loading YAML from an untrusted source and are potentially
using older versions, it's still recommended to set this variable to 0
.
Note that YAML::Tiny cannot load objects at all, and YAML::PP does not load objects by default.
The modules will be released in the next hours.
]]> UpdateWe saw already some modules breaking (thanks to Slaven Rezic's tireless testing, of course!)
I added a list on Reddit
]]>We are using a tool called cpanspec
to create .spec
files from CPAN modules. From the spec file, the OBS then
builds rpm
packages.
I noticed there are a lot of modules missing information, or having other
problems that prevent us from automatically create a working .spec
file.
CPAN/PAUSE is not very strict on how a distribution should look like, though. That makes creating OS specific packages automatically harder.
Whenever I see a mistake in a CPAN module, and I have time, I try to file a bug report.
But since I came across the same common mistakes over and over, I decided to put up a page with a list of Best Practices that make it possible for most modules to be turned into a OS package automatically:
Best Practices for creating Perl5 CPAN Modules
But I need your help.
I think it would be useful to create more concrete examples.
For specific items it would make sense to link to a more detailed
tutorial.
For example, many issues can be solved by using
Dist::Zilla and the correct entries in
the dist.ini
.
I would be glad if, for example, Debian people could add a similar paragraph like I did for openSUSE.
Looking forward to suggestions, bug reports and pull requests
]]>Previous hackweek posts:
]]> Enhancing the YAML::LibYAML::API moduleToday, the YAML::LibYAML::API module is a very simple wrapper around the C libyaml.
You create an empty array reference and then pass that to the parse_string_events
(or parse_file_events
, parse_filehandle_events
) function.
my $events = [];
YAML::LibYAML::API::XS::parse_string_events($yaml, $events);
When parsing is done, you can then do stuff with the events.
This is how YAML::PP::LibYAML is using it currently.
This has some disadvantages, although they are not very relevant for simple usage.
My plan was to create a perl object that has a "reference" to the C struct
yaml_parser_t
, so I can keep it around and refer to it later.
This will make it possible in the future to query information from the parser, for example getting the exact error details in case of a parsing error.
Another possibility comes to mind when handling large YAML files with many documents, and you only want to load one document.
So I had to learn how to keep a C struct around. The solution in the end was pretty simple, but I was trying many things, and fortunately Harald Jörg and others helped me very patiently when I asked questions about this on IRC. Thanks a lot!
For keeping the parser struct around, you have to malloc
memory for it:
yaml_parser_t *parser;
parser = malloc(sizeof(yaml_parser_t));
Now, I would like to "reference" this struct in my perl object. I was trying
to return an integer to XS, and somehow I wasn't able to get the struct
back. It turned out I had to use a uintptr_t
and long
:
return (long) (uintptr_t) parser;
In the XS code it's still sufficient to return an SViv:
RETVAL = newSViv(id);
My perl object looks like that:
bless { uid => 1234567 }, 'YAML::LibYAML::API::XS'
To access the parser struct again in XS/C, the following is necessary:
HV *hash;
SV* obj_sv;
SV **sv;
long id;
yaml_parser_t *parser;
SvGETMAGIC(obj);
if (!SvROK(obj))
croak("Not a reference");
obj_sv = SvRV(obj);
if (SvTYPE(obj_sv) != SVt_PVHV)
croak("Not a reference to a hash");
hash = (HV *)(obj_sv);
sv = hv_fetch(hash, "uid", 3, TRUE);
if (!sv) {
croak("%s\n", "Could not get uid");
}
id = (long) SvIV(*sv);
parser = (yaml_parser_t*) (uintptr_t) id;
Now I am able to let C call a perl callback function whenever it gets a new parsing event. This is also how the YAML::PP::Parser backend is working, so the integration will be a bit nicer.
This is how it currently looks:
my $xsparser = YAML::LibYAML::API::XS->new;
$xsparser->parser_create;
my $yaml = <<'EOM';
---
a: b
EOM
$xsparser->parser_init_string($yaml);
my $cb = sub {
my ($event) = @_;
warn Dumper $event;
};
$xsparser->set_parse_callback($cb);
$xsparser->parse_callback();
$xsparser->parser_delete();
Everything is still in a branch. I have to add exception handling and refactor a bit.
And, of course I have to add the same kind of thing to the Emitter.
Many thanks to SUSE again for dedicating a week of everyone's worktime for stuff like this!
I think every company using Open Source should do this once in a while.
]]>Previous posts:
]]> Perl Objects in YAML::Syck, YAML::XS and YAML.pmHopefully you know that loading untrusted YAML can be exploited when the feature of loading objects is enabled.
YAML::Syck has had
$YAML::Syck::LoadBlessed
for many years.
YAML::XS has $YAML::XS::LoadBlessed
since
v0.69 (2017), and YAML.pm has
$YAML::LoadBlessed
since v1.25 (2018).
They all default to true for backwards compatibility. However, we have been discussing that it would be good to change the default to false for security. We are not sure yet, when this will happen.
If you are using one of these modules and need the feature of loading objects, it's a good idea to set this option to true now!
By default, YAML::PP doesn't load any perl types or objects. It only supports scalars, arrays and hashes.
To load scalar references, references of references, regular expressions and objects, you need to add the Perl Schema:
my $yp = YAML::PP->new( schema => [qw/ JSON Perl /] );
To enable also loading code refs:
my $yp = YAML::PP->new( schema => [qw/ JSON Perl +loadcode /] );
But sometimes you might want to load special perl types and only certain classes.
I added an option now that lets you do that. For that you have to instantiate the Perl Schema first:
my $perl = YAML::PP::Schema::Perl->new(
classes => [qw/ My::Class1 My::Class2 /],
);
my $yp = YAML::PP->new( schema => ['JSON', $perl] );
If you also want to enable loading code refs, you do it like this:
my $perl = YAML::PP::Schema::Perl->new(
classes => [qw/ My::Class1 My::Class2 /],
loadcode => 1,
);
my $yp = YAML::PP->new( schema => ['JSON', $perl] );
If the Loader encounters tags with unknown classes like this:
--- !perl/hash:Class::Not::Allowed
a: b
It will load this as a simple hash reference and throw away the class.
The same happens when dumping. An object like
my $object = bless { a => 'b' }, 'Class::Not::Allowed';
will end up as a simple hash:
---
a: b
If you pass an empty array ref for classes
, it will not load any objects,
but still supports scalar/ref references and regular expressions.
YAML already allows reusing data with so called anchors and aliases:
---
invoice:
billing address: &address # define anchor
name: Santa Claus
street: Santa Claus Lane
city: North Pole
shipping address: *address # use alias
But sometimes you have data that you want to reuse in several YAML documents, or maybe your files are very big and it gets hard to edit them.
I was planning to implement !include
for
YAML::PP at some point.
Recently, Martin Barth approached me on IRC and asked my if he would be able to read RAML files with YAML::PP.
RAML files are YAML 1.2, and the only non-standard tag it
is using is !include
.
It was already possible do implement that with the previous version of YAML::PP, but the API for that is not yet stable and not documented.
Also, when including other files, you have to take care of several things, so it's better to have a standard plugin.
So I began working on this already a while before hackweek, but there were still some issues I had to think about, and I created a working version this week.
You can now use includes in YAML::PP v0.017 and adjust it to your needs, if necessary. It's called YAML::PP::Schema::Include.
Let's take the example from above:
--- # invoice.yaml
invoice:
billing address: &address !include includes/santa-claus-address.yaml
shipping address: *address
--- # includes/santa-claus-address.yaml
name: Santa Claus
street: Santa Claus Lane
city: North Pole
You can read that with the following code:
# create an instance of the Include class
my $include = YAML::PP::Schema::Include->new;
# Add it as a schema for YAML::PP
my $yp = YAML::PP->new( schema => ['JSON', $include] );
# let the $include instance know of the YAML::PP object
# I might remove the need for this eventually
$include->yp($yp);
my $invoice = $yp->load_file("invoice.yaml");
The filename specified in the !include
tag is relative to the currently
processed filename. You can also recursively include files in the
included files.
Circular includes are prevented, and YAML::PP will die.
By default absolute filenames and ..
are forbidden, so that the following
won't work:
---
passwd: !include /etc/passwd
---
passwd: !include ../../../../../../etc/passwd
If you want to allow that, you have to instantiate the object like this:
my $include = YAML::PP::Schema::Include->new(
allow_absolute => 1,
);
Another possibility is to specify the include path yourself, so that the included files will be searched for in this directory. You can even specify a list of directories:
my $include = YAML::PP::Schema::Include->new(
paths => ['/path/one', '/path/two'],
);
The loader will iterate through the paths until it found the specified file.
Apparently that's not enough for reading RAML files. The specification says that an include can also just load the content of files that are not RAML. It depends on the content of the file. Only if it has a RAML directive, it will be loaded as YAML.
Here is a simpler example how you can handle the loading yourself:
my $include = YAML::PP::Schema::Include->new(
loader => sub {
my ($yp, $filename);
if ($filename =~ m/\.txt$/) {
# open file and just return text
}
else {
# default behaviour
return $yp->load_file($filename);
}
},
);
All the details about finding the full filename and detecting circular includes will still be done for you.
The only thing not (yet) working with this is this RAML syntax:
type: !include elements.xsd#Foo
I'm thinking of a way to get this working.
It was a bit tricky to get this working.
An instance of YAML::PP and its other objects saves the state of parsing. We need a fresh object for including a file. But this object should have the same configuration as the root object.
So I added a clone
method to several YAML::PP::*
classes which create
new objects with the same options as the original objects. I think that's as
efficient as it gets. I reuse the YAML::PP->schema
object, because this
is static, and it's the part of creating a completely new object which costs
most of the time.
So, thanks to SUSE for making this happen!
This blog post is about what I hacked on Tuesday, when Hackweek started for me. Expect more posts for the other days ;-)
]]> Writing shell scriptsFirst I'll write about shell scripts. Later we'll see how Perl comes into play.
Many of you probably create or maintain shell scripts sometimes. For certain tasks it can be a better choice than Perl, for example.
But there are problems, and not unique to shell scripts.
Maybe you think that these problems are not that important for small, simple scripts. But scripts grow, and I find it annoying, if I have to use a tool which has outdated (or not helpful) usage output and no tab completion.
Here are the problems I see, and then I will show you a tool that could be able to fix all that.
Let's say you have a bash script with a single command line switch -v
.
Extracting such a single switch from the commandline is comparable easy.
Now you want to add --verbose
also, and maybe add another option --task
,
which gets an argument.
And maybe you want to support levels of verbosity, so the -v
switch works
incremental (e.g. -vvv
).
There are tools out there that let you automatically parse that, but it's limited.
Before a script gets too big, it can be a good idea to add subcommands.
(Imagine git
without subcommands. Or think about gpg
...)
Some options are global, some only for a certain subcommand.
You probably all know the problem that, when options are added to a script, or changed, the usage output / man page doesn't get updated. Because it's forgotten.
Writing usage output for several subcommands isn't fun.
Nobody (well, almost) likes to manually write tab completion for their scripts.
Sometimes scripts come with completion, but it's very basic, and maybe only for bash.
Zsh, for example, offers much more features, and if people can only use the bash completion in zsh, they are missing several features.
Some of your commnd line options may take a static list of values, for example
ssh-keygen -t {rsa,dsa,ecdsa,ed25519}
.
You need to check if a value is in the list. (And, additionally, these values
should appear when hitting <TAB>
.)
Maybe you already heard about my perl5 App::Spec command line framework.
It does all the argument parsing for you, can generate completion, pod/man and help output. You only have to write a YAML file and the actual code for your script's commands.
How to port this to Bash?
The cool thing is, for pod/man and completion, I can simply reuse the existing code. Half of the work is already done!
Now for writing the bash command line parser using the YAML spec file, I had the idea to not do all the work in bash, but generate bash code, which saves me from working around the problem that bash doesn't really have rich data structures.
It can already do most of what I mentioned above, except argument value checking.
It supports flags and options with values, incremental flags (-vvv
), options
with multiple values, stacking of short options (-a -b -t val
== -abtval
)
and subcommands.
The documentation is minimal right now. The following example is taken from the command's pod.
For your bash script, the following files are needed:
share/mytool.yaml # spec
bin/mytool # simple script calling the framework
lib/mytool # your code for each subcommand
lib/appspec # generated
share/completion/zsh/_mytool # generated
share/completion/bash/mytool.bash # generated
pod/mytool.pod # generated
lib/help # generated
To generate the files, you need the
appspec
tool and
appspec-bash
. The latter one is
not yet on CPAN, only on github.
This is the YAML specification file:
# share/mytool.yaml
---
name: mytool # commandname
appspec: { version: 0.001 }
title: My cool tool # Will be shown in help
class: MyTool # "Class name" (means function prefix here)
subcommands:
command1:
op: command1 # The function name
summary: cmd one # Will be shown in help and completion
options:
- foo|f=s --Foo # --foo or -f; '=s' means string
- bar|b --Bar # --bar or -b; a flag
Maybe you recognize that the syntax for options looks like the one from Getopt::Long. I thought it would be good to reuse existing syntax, but for more features I added other syntax elements. There's not much documentation so far for this except App::Spec::Argument. For examples, you can also have a look at my collection of shell completions.
The script:
# bin/mytool
#!/bin/bash
DIR="$( dirname $BASH_SOURCE )"
source "$DIR/../lib/appspec"
source "$DIR/../lib/mytool"
APPSPEC.run $@
The actual app:
# lib/mytool
#!/bin/bash
MyTool.command1() {
echo "=== OPTION foo: $OPT_FOO"
echo "=== OPTION bar: $OPT_BAR"
}
Output example:
$ ./bin/mytool command1 --foo x --bar
# or
$ mytool command1 -f x -b
=== OPTION foo: x
=== OPTION bar: true
Please see the documentation to learn how to generate the other files.
So far I have only a few tests, so there might be bugs.
]]>It's not only a framework for perl. It can also generate shell tab completion for other tools.
Since then I have been busy with other things, but recently continued working on it for several reasons, and fixed several bugs, mostly for bash.
Last year I started a collection of generated completion scripts for bash and zsh:
https://github.com/perlpunk/shell-completions
Today it contains completions for 20 tools, mostly for perl commands. If you miss a tool there, let me know, or try to write your own YAML specification and generate the completion.
Below you will see some examples.
]]> FeaturesIt supports tools with nested subcommands, option and parameter completion.
Zsh has a builtin feature for showing descriptions; I built something similar for bash.
bash $ fatpack <TAB><TAB>
file -- Recurses into the lib and fatlib directories and bundles all .pm files found into a BEGIN block...
help -- Show command help
packlists-for -- Searches your perls @INC for .packlist files containing the .pm files
pack -- Pack script and modules
trace -- Writes out a trace file containing every module required
tree -- Takes a list of packlist files and copies their contents into a tree at the requested location
bash $ dzil <TAB><TAB>
add -- add modules to an existing dist
authordeps -- list your distributions author dependencies
build -- build your dist
clean -- clean up after build, test, or install
commands -- list the applications commands
help -- Show command help
install -- install your dist
listdeps -- print your distributions prerequisites
new -- mint a new dist
nop -- do nothing: initialize dzil, then exit
release -- release your dist
run -- run stuff in a dir where your dist is built
setup -- set up a basic global config file
smoke -- smoke your dist
test -- test your dist
bash $ fatpack trace -<TAB><TAB>
--help -- Show command help
-h -- Show command help
--to -- Location of trace file
--to-stderr -- Write the trace to STDERR instead
--use -- Specify module(s) to be included additionally
bash $ dzil build -<TAB><TAB>
--help -- Show command help
-h -- Show command help
-I -- additional @INC dirs
--in -- the directory in which to build the distribution
--lib-inc -- additional @INC dirs
--tgz -- build a tarball (default behavior)
--trial -- build a trial release that PAUSE will not index
--verbose -- log additional output
--verbose-plugin -- log additional output from some plugins only
-v -- log additional output
-V -- log additional output from some plugins only
As you can see, for example --help
and -h
show up on their own lines, which
is a disadvantage of this feature and can make completion very verbose.
I might add an option to disable descriptions for bash.
This is handled much better in zsh:
zsh % dzil build -<TAB>
Completing option
--help -h -- Show command help
--in -- the directory in which to build the distribution
--lib-inc -I -- additional @INC dirs
--tgz -- build a tarball (default behavior)
--trial -- build a trial release that PAUSE will not index
--verbose -v -- log additional output
--verbose-plugin -V -- log additional output from some plugins only
bash $ plackup -L <TAB><TAB>
Delayed Plack::Loader Restarter Shotgun
The corresponding lines in the spec:
- name: loader
type: string
enum: [Plack::Loader, Restarter, Delayed, Shotgun]
summary: Specifies the server loading subclass
aliases: [L]
You can also specify an external command for completion.
When the -M
switch for prove is completed, it calls a command to find
all installed modules in @INC:
bash $ prove -M Parse::<TAB><TAB>
ANSIColor::Tiny BBCode::HTML BBCode::Tag BBCode::XHTML RecDescent
BBCode BBCode::Markdown BBCode::Text CPAN::Meta
The corresponding lines in the spec:
- name: M
summary: Load a module
type: string
multiple: true
completion:
# TODO filter directories like x86_64-linux
command_string: |-
\
for incpath in $(perl -wE'say for @INC'); do \
find $incpath -name "*.pm" -printf "%P\n" \
| perl -plE's{/}{::}g; s{\.pm}{}' \
| grep "^$CURRENT_WORD"; \
done
When adding such custom callbacks, it's important to use a syntax which is bash and zsh compatible.
For the module completion for cpan
and cpanm
I used Ingy's trick with
grepping the 02packages.details.txt.gz
file in your cpan/cpanm
directory.
The specification for a command is written in YAML. The appspec tool can then generate completions from it.
YAML comes in very handy especially if some subcommands have the same options. You don't need to repeat them but can use the YAML alias feature.
# specs/dzil.yaml
# [...]
build:
summary: build your dist
options:
- &trial trial --build a trial release that PAUSE will not index
- tgz --build a tarball (default behavior)
- in=s --the directory in which to build the distribution
# [...]
release:
summary: release your dist
options:
- *trial # Reuse option from above
At the beginning the format was quite verbose for simple options:
options:
- name: proxy
summary: Set HTTP proxy
aliases: [p]
type: string
At some point I added a shorter syntax, which was inspired by the Getopt::Long syntax and Ingy's Schematype syntax.
options:
- proxy|p=s --Set HTTP proxy
See the usage instructions at shell-completions.
For zsh, you can just add the path to the $fpath
variable. Completions
in this directory will not be loaded every time you start a zsh session, but
they will be autoloaded when the command is first used.
For bash it might be better to just pick the completions you really want to use.
There are still problems with special characters like quotes and spaces. They have to be escaped correctly, and unfortunately this works different in bash and zsh.
Some months ago Ingy started a similar project: complete-shell.
While it has the same goal, adding completions for existing commands, it works quite differently.
Try out both and report bugs and feature requests to us ;-)
]]>For older perls I had a look at Data::Alias. But I didn't know about Array::RefElem before.
With "slot in a container" you mean, that I can only add an alias to an array or hash element?
I'm not yet sure that the implementation will be trivial, because at the time I parse and store the alias, it is stored in a temporary structure, which will be rearranged during the constructing process.
It's implemented like that because I provide callbacks for custom constructurs. A custom constructor for a mapping will get the keys and values as an arrayref, so that it can a) receive the original order, b) be able to do something with keys that are not strings, and c) do something else.
I guess that problem has to be solved in any case - for refaliasing and for Array::RefElem.
]]>Adding it to YAML::XS would be possible I guess. Just a bit of work because of, well, XS... And Ingy should be ok with adding it before I start doing any work.
If you want to get some of the speed of C, you could also try YAML::PP::LibYAML, where you could use the merge key feature already.
Let me know if you have more questions or suggestions (preferably via github, email or IRC because I don't get notifications here on blos.perl.org).
]]>