Dist::Zilla Starter revision 3 - Git, versioning, and more

I've just released revision 3 of the @Starter plugin bundle for the Dist::Zilla CPAN distribution authoring tool. There's no changes to the base configuration from revision 2, but there are now additional options to help manage common authoring tasks using modern best practices. You must set your revision to 3 to enable these new options, as a safety measure to make sure you have a new enough version of the bundle to support them.

The managed_versions option is the most complex new addition. Instead of declaring a distribution version in your dist.ini or using a plugin like [Git::NextVersion] or [AutoVersion] to determine it, this option lets you version your modules like you normally would and reads the version of your main module to use as the distribution version. The versions of any other modules in the distribution are updated to match in the build or release, and after a release it's incremented in the source files. This is widely considered the least fragile and most contributor-friendly method of managing versions with Dist::Zilla currently, as your modules don't require Dist::Zilla to determine their version while testing.

The regenerate option makes use of a very useful quasi-phase called -Regenerator created by Kent Fredric. For each filename you specify, it copies this file from the Dist::Zilla build tree to your source tree whenever you release as well as when you run dzil regenerate. This can be useful for keeping generated files such as LICENSE in your git repository where they may also be important, as well as copying your Makefile.PL/Build.PL and META.json to your source tree if you wish to make it directly installable with cpanm for testing or Travis CI. (Note: this method is only for testing unreleased changes, you should always install distributions from CPAN normally.)

Finally, revision 3 made it possible to release the @Starter::Git variant bundle, which takes all of the same options as @Starter, and interleaves plugins appropriate to a Git workflow. [Git::GatherDir] replaces [GatherDir], as it's a simple subclass that prevents cruft from being gathered by simply gathering the files tracked in Git. This means your .gitignore can be used to exclude files from both the repository and your distribution builds. This mostly obsoletes the need for [PruneCruft] and [ManifestSkip] to clean up your distribution build, but the bundle still includes them as they don't tend to cause any problems. In addition, the bundle will check for any uncommitted changes before releasing, and after releasing commit and tag the release (as well as a separate version bump commit if you also enable managed versions), and push to your Git origin if you have one configured.

The @Starter::Git bundle also comes with a matching Dist::Zilla minting profile, which is similar to the @Starter profile but additionally adds a basic .gitignore to start you out, and initializes the new directory as a Git repository.

I hope these new options are helpful in modernizing your distribution workflow. As always, any plugins used by the bundle can be fully configured by config slicing, you can even look at the new bundle's dist.ini for examples of using itself. Please stop by #distzilla on irc.perl.org if you have any questions, big or small, we are happy to help.

perldoc.pl now powered by Elasticsearch

perldoc.pl, the alternative perldoc browser, originally leveraged PostgreSQL full text searching. But as I encountered new issues as well as ones I encountered before, I started working to make the search backend swappable.

There are now three search backends available: PostgreSQL, Elasticsearch, and SQLite using the FTS5 extension. While SQLite is of course the simplest to deploy as it requires no setup, I decided against using it for the main perldoc.pl instance as it does not support skipping stopwords, but the backend is now provided so basic search features can be added to a deployment without setting up a database. Additionally, the application can be deployed without any search backend configured, which will just remove the search box and allow viewing of all pages normally.

Elasticsearch is quite complex to set up and utilize correctly in comparison to PostgreSQL, and makes quite a bit more use of server resources to be performant, but almost every aspect of its indexing and searching process is configurable. I often find this flexibility to be important for full text search applications that tend to all have slightly (or wildly) different requirements, and it's always nice to be able to tweak results as needed.

So now that Elasticsearch is in use, be sure to let me know in comments or the issue tracker when a search doesn't find what you hoped.

perldoc.pl - A new way to perldoc

For the past decade or more, perldoc.perl.org has been a useful and convenient resource for viewing perl documentation online. However, it has suffered from lack of maintenance and mounting unfixed issues over the past few years. Being familiar with the excellent Mojolicious documentation site and how it also can display core perldocs, I reasoned that such features would be simple to provide in this modern framework. And so, what would become perldoc.pl (thanks to a domain acquired by pink_mist) was born.

Based on (a now heavily customized fork of) Mojolicious::Plugin::PODRenderer, the basis of the site is simply rendering the perldocs available in various different perl installations. MetaCPAN::Pod::XHTML is used to provide similar rendering and linking features to metacpan, thanks to haarg. Using ideas from perldoc.perl.org, convenient features like being able to switch to the version of the current document in another perl, a function listing, and a search have been added, and I intend to add more and make tweaks as necessary while keeping the site simple in presentation.

My current ideas for additional features are: redirecting to the appropriate part of perlvar when a known special variable is entered in the search box; making the individual questions in the perlfaqs appear in search results; improving the mobile view of the site; and if it becomes needed, caching the rendered output of pages so they can be served faster. I am sure that the search will need lots of tweaking as well; I consider its present state a first draft. Let me know what you like or don't like, and what you want added, in the comments or the issue tracker.

A Guide to Versions in Perl

Version numbers in Perl are very important; they allow orderly updating and maintenance of modules and distributions across the CPAN, and allow CPAN modules and consumers to require the versions of modules with the bugfixes or features they need. However, in this context, they are also a very unique beast with a complex and bumpy history that leads to some surprises for those who are not familiar with the nuances. These are the things CPAN authors should know about versions in Perl, and are nice for consumers to know as well.

To summarize:

  • Versions in Perl come in two forms, decimal numbers and dotted decimal tuples.

  • The two formats can be compared using a defined conversion method, implemented by the version module.

  • Versions for modules should be declared as string values.

  • Underscores can be used in either form of version to indicate a trial version, but have pitfalls to watch for.

  • The VERSION method can be used on any module to test if it's a certain version or greater, and version objects can be used for arbitrary version comparisons.

Version Schemes

To start with, it's important to know that there are two distinct types of versions recognized by Perl. The first kind is simply a decimal number, with digits before and after a decimal separator, for example '1.0' or '0.003'. These versions are compared as numbers, so '1.1' and '1.10' refer to the same version. Despite being numbers, they should always be declared and referenced as strings within the code, so that any trailing zeroes are kept around for consistency in display, and to avoid possible floating point errors for versions with a particularly large number of digits.

While decimal number versions are a simple concept, they are different from the versions used by most everything outside of Perl. For that, Perl recognizes a second type of version: a tuple (or sequence) of integers separated by dots, sometimes referred to as "dotted-decimal" versions, and used for the concept of semantic versioning. These look like 'v1.2.3' and 'v0.0.30'. For these versions, each segment is an integer, so trailing zeroes are significant -- 'v1.2.3' is the same version as 'v1.02.03', but not the same as 'v1.20.30'. Each segment after the first must also be 999 or lower, to allow conversion as described in the next section.

Neither of these version forms allow alphabetic characters or hyphens, as you may see in other versioning schemes. I'll go into the use of underscores later.

Responsible for parsing and disambiguating these two types of versions, as well as performing version comparisons as described below, is the module version.pm (referred to in this way to disambiguate from the word "version" itself). It has been a core module since Perl v5.10.0, and it is also dual-life, so you can install a newer version from CPAN on any version of Perl (which is helpful for reasons that will become apparent). It disambiguates these two types with simple rules: if the version contains a leading "v" or more than one decimal separator, it's a version tuple; otherwise, it's a decimal number. For this reason it's best to include both a leading "v" and at least two decimal separators for clarity when using tuple versions.

Comparing and Converting

For the sake of comparison, a conversion method is defined between the two types of versions. The first integer in a tuple version and the integer component of a decimal version are considered the same, and each successive segment of the tuple version is considered equivalent to three decimal places in a decimal version. So for example, to convert the version '3.01002' to a tuple version, the first segment is 3, the second is 10, and the third is 20 (padding with zeroes so you have groups of 3 decimal places), resulting in 'v3.10.20'. To convert the version 'v0.4.1' to a decimal version, the integer component is 0, the first three decimal places after the separator are 004, and the next three are 001, resulting in '0.004001'.

AAA.BBBCCC <-> vAAA.BBB.CCC

This method extends to however many segments or decimal places a version may have. The conversion is used by version.pm to compare version objects sourced from either scheme of version. In this way, Perl can determine how one version compares to another even when they are using different schemes.

Declaring Versions

Version numbers of each of these forms are not only recognized by Perl for the purpose of checking module versions at runtime, but also by the PAUSE indexer when indexing CPAN modules. In this case and similarly when versions are extracted using Module::Metadata, the behavior is to find the declaration of $VERSION and execute only that line in isolation, so your version declaration line must be executable on its own. The simplest and standard form of declaration is:

our $VERSION = '1.02'; # for decimal versions, or
our $VERSION = 'v1.2.3'; # for tuple versions

As noted in David Golden's blog post Version Numbers Should Be Boring, support for tuple versions has varied widely among old versions of Perl. As support is mostly based on the version.pm module, which is core after Perl v5.10.0, you need to take special care if your module will use tuple versions on Perl v5.8.9 or older. Your module should declare a dependency on version.pm (at least version 0.77, but preferably more recent), and declare the version in this way:

use version 0.77; our $VERSION = version->declare('v1.2.3');

Keeping this whole declaration on a single line is important so that version.pm is always loaded when this line is executed by PAUSE or other version extraction tools. And of course, the simpler way to remain compatible with older versions of Perl is to stick to decimal number versions.

Underscores in Versions

Another complex factor in Perl version numbers is the use of underscores. Underscores are, by convention, used to indicate a development or trial release of a module or distribution, which should not be indexed by PAUSE.

For decimal versions, it's mostly straightforward: an underscore is placed somewhere between the digits, and ignored for comparison purposes. However, since these versions look like decimal numbers, some may naively compare them using numeric operators, which will fail if they contain an underscore:

our $VERSION = '0.01_02';
if ($VERSION >= '0.0101') {

This comparison will fail (and throw a non-numeric warning if warnings are enabled) because $VERSION is truncated to '0.01' when used as a number. (A less naive comparison may be performed using the VERSION method or version.pm objects as described later.) To account for this possibility while still leaving the underscore in the declaration for static parsers, a common idiom is to remove the underscore in a following line:

our $VERSION = '0.01_02';
$VERSION =~ tr/_//d;

You may also see eval() used to remove the underscore, but the tr method is more straightforward and preserves trailing zeroes.

Underscores in tuple versions have a significantly more complicated history, and may be interpreted wildly differently depending on the version of version.pm in use. My recommendation would be to avoid doing this entirely, but if you must, your distribution should declare a dependency on version.pm version 0.9915, when its interpretation of underscores in tuple versions was fixed in several ways, and so that it considers underscores in tuple versions the same way as in decimal versions (i.e. not as a separator).

Alternative to Underscore Versions

Rather than using underscores in your versions, there are other mechanisms to indicate a development or trial release of a distribution that don't involve module versions. If the archive file uploaded to CPAN has a name ending in -TRIAL (before the file extensions), PAUSE will not index it as a stable release. Additionally, you can set the release_status metadata field in meta-spec 2; a value of "testing" or "unstable" will indicate that the release should not be indexed. The method of setting this depends on your authoring tool. At the time of writing, either of these methods is sufficient to prevent PAUSE from indexing the distribution, and both are performed automatically by the "--trial" option when releasing using Dist::Zilla or Minilla.

Checking for Versions

With all of this in mind, the safest and most consistent way to check for a particular version of a module is with the UNIVERSAL::VERSION method, which is implicitly used by the use Module::Name VERSION syntax. The VERSION method compares both the module's $VERSION and the passed version as version.pm objects, which automatically does the above-mentioned conversion between version schemes if needed, and throws an exception if the passed version is less than the module version. Like when declaring versions, the version passed to the VERSION method should always be a string or version object.

if (eval { require Module::Name; Module::Name->VERSION('v1.2.3'); 1 }) {
  # Module::Name is able to be loaded and is at least version v1.2.3
}

You can of course use a nicer exception-handling method than bare eval if appropriate.

On Perls older than v5.10.0 you should make sure to 'use version' before doing a comparison involving tuple versions, which will update the VERSION method to use version objects as it does on more recent perls. I would also, as above, recommend depending on version.pm version 0.9915 if you intend to do any comparisons involving versions with underscores.

A more generic method of performing arbitrary version comparisons is to parse the versions into version.pm objects and compare them using standard numeric operators.

use version;
if (version->parse('v1.0.3') == version->parse('1.000003')) {
  # versions are equivalent
}

Versions for Vendors

Perl versions aren't just used within Perl, of course. If you are a CPAN author, you should be mindful that Perl modules from CPAN are packaged for use in various distributions, which primarily use the tuple version format. This doesn't mean you need to use it yourself, but to be polite to those translating your versions, you should follow these simple rules:

  1. If using decimal versions, never decrease the number of significant digits you use without a major version bump. For example, go from '1.19' to '1.20' or '2.0', not to '1.2'.

  2. Never change version schemes without a major version bump. Even if you use the above translation correctly, the package vendor may not, and thus end up with a decreasing version.

Methods to Avoid

There are a couple other ways to declare versions that should be wholly avoided. One is to declare a bare decimal number version, which means you will lose trailing zeroes and possibly encounter floating-point issues. Sometimes this is used because Perl has a feature that looks like underscores in version numbers:

our $VERSION = 0.01_02; # don't do this

However, Perl will immediately compile this to the number 0.0102; underscores in numeric literals are only a visual-aid feature.

Another method with a complicated history is the Perl vstrings feature. This was a feature added in Perl v5.6.0 that seemed to add a convenient syntax for declaring versions that would be stored as a binary string.

our $VERSION = v1.2.3; # don't do this

This example would create a string consisting of three characters, with ordinals 1, 2, and 3, equivalent to the string "\x01\x02\x03" (note: the vstring syntax uses decimal ordinals, but the '\x' syntax uses hexadecimal ordinals). While version.pm does parse and allow vstrings to be used to initialize version objects, their interpretation is inconsistent among older versions of Perl and their implementation is surprising in the context of versions (for instance, they are not numerically comparable), so it's better to just use a string. It turns out vstrings are most useful for creating binary string literals, and not versions.

Perl Perl Versions

As you may notice, the versions of the Perl interpreter itself use the same rules as those for Perl modules. The $] variable is a representation of the version of the current Perl interpreter as a decimal number, and the $^V variable is the version as a version object. From Perl v5.6.0 until Perl v5.10.0 $^V was a vstring rather than a version object, so it's not recommended to use this variable if your code will be running on Perls v5.8.9 or older.

The use VERSION statement and runtime require VERSION allows one to require a certain version of the Perl interpreter using either scheme of version number. Note however that both of these forms require a bareword version rather than a string, and the decimal number scheme is preferred for compatibility. Using the translation method previously described, Perl v5.26.1 would be represented as the decimal version 5.026001.

use 5.014; # also activates feature bundle and strict
use Syntax::Keyword::Try;
try { require 5.026001; say "Perl is >= v5.26.1" } catch { say "Old Perl: $@" }

Encode::Simple - Encode and decode text, simply

Encode is a well known core module in Perl with support for encoding and decoding text from almost any character encoding you can think of. But it's also an old module with a large amount of historical cruft.

With inspiration from #perl on freenode IRC, Encode::Simple is an attempt to at least improve on its interface and usability issues. Rather than an awkward and unintuitive bitmask and the option of clobbering the input data, Encode::Simple exports straightforward encode and decode functions that simply return the encoded or de…