CPAN Dependencies, static and dynamic

Dependencies or prerequisites are an integral feature of the CPAN software repository. They define what other CPAN modules are required for a particular CPAN distribution to be built, tested, or ultimately to function, as well as optionally to improve or add functionality. To define them properly for a distribution, it is helpful to understand exactly how they will be used, and what all the different distribution files like Makefile.PL, Build.PL, META.json, and MYMETA.json are for.

sidebar: In this post, I will focus on the "requires" relationship for dependencies, which are hard dependencies that must be satisfied, but "recommends" and "suggests" relationships are also defined that indicate strongly and weakly optional dependencies respectively; CPAN installers may install these based on the options that are passed.

Most commonly and at a basic level, dependencies are defined in a (generated) file called META.json in the root of the CPAN distribution, but this may not be the complete picture. CPAN installers historically would determine what is needed at the time that the user requests to install the distribution ("install time"), and though there is now the formal concept of static prerequisites (the most common case where they are the same for every install environment), some distributions need to determine prerequisites at install time, using the original dynamic configuration process.

CGI::Tiny - Perl CGI, but modern

Originally published at dev.to

In a previous blog post, I explored the modern way to write CGI scripts using frameworks like Mojolicious. But as pointed out in comments, despite the many benefits, there is one critical problem: when you actually need to deploy to a regular CGI server, where the scripts will be loaded each time and not persisted, frameworks designed for persistent applications add lots of overhead to each request.

CGI scripts have historically been written using the CGI module (or even more ancient libraries). But this module is bulky, crufty, and has serious design issues that led to it being removed from Perl core.

Enter CGI::Tiny. It is built for one thing only: serving the CGI protocol. In most cases, frameworks are still the right answer, but in the case of scripts that are forced to run under the actual CGI protocol (such as shared web hosting), or when you want to just drop in CGI scripts with no need to scale, CGI::Tiny provides a modern alternative to CGI.pm. You can explore the interface differences from CGI.pm or suggested ways to extend CGI::Tiny scripts.

So without further ado, here is the equivalent CGI::Tiny script to my previous blog post's examples:

#!/usr/bin/env perl
use strict;
use warnings;
use CGI::Tiny;

cgi {
  my $cgi = $_;
  my $input = $cgi->param('input');
  $cgi->render(json => {output => uc $input});
};

Migrating from DBD::mysql to DBD::MariaDB

DBD::mysql has long provided the interface to connect to the MySQL database from Perl code. But as old as it is, it was implemented with some critical bugs that cannot simply be corrected without breaking mountains of existing code relying on those bugs. For this same reason, though DBD::MariaDB corrects these bugs, care must be taken when migrating existing code to avoid subtle breakage.

This blog post is far too short to explain Unicode and encodings like UTF-8; for anyone seeking a more solid grasp on the concepts, I recommend a read through The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky.

In particular, like much XS code of its time, DBD::mysql suffers "The Unicode Bug" by improperly ignoring the (poorly-named) internal UTF8 flag of strings it works with, which for XS code is a necessary indication of how the string was stored. Perl's string implementation has two internal formats: a downgraded format, where the characters of the string are stored directly as the bytes representing each ordinal, and an upgraded format, where the characters of the string are stored roughly as the UTF-8 encoding of the Unicode characters representing each ordinal (roughly, because it is more permissive than UTF-8 in order to allow encoding any character allowed in a Perl string).

The reasons for these two distinct formats involve both legacy and performance, and the system works rather well, as long as everything interfacing with these internals pays attention to which format the string has been stored in, as indicated by the "UTF8 flag". And importantly, Perl does not provide guarantees as to when it will or will not upgrade or downgrade a string (converting its internal format while keeping the logical contents of the string the same), and does so as needed or as an optimization; this behavior may change between versions of Perl or based on what features are enabled, and commonly occurs during various normal string operations.

The practical effect of this is that strings passed into MySQL that are considered string-equal in Perl (according to the eq operator and other common mechanisms) may get stored or retrieved as different strings, due to factors that may be surprising, such as string concatenation operations, or the version of Perl in use, or whether the unicode_strings feature was enabled when the string was created or modified.

An attempt was made to correct this in DBD::mysql 4.042, but was reverted in the next version due to the danger of subtly breaking existing code in a way that cannot be easily caught or explained. Subsequently, PALI released DBD::MariaDB as a fork of DBD::mysql with these fixes, and development continues separately on that fork.

DBD::MariaDB is not specific to the MariaDB database, and in fact both DBD::mysql and DBD::MariaDB support interfacing with either MySQL or MariaDB clients and servers for the most part, as their protocols are compatible. However, DBD::mysql recently has removed support for building against MySQL clients older than 8.0 or newer MariaDB clients, providing additional impetus to migrate to DBD::MariaDB.

The mysql_enable_utf8(mb4) option

In DBD::mysql, there is an option called mysql_enable_utf8 (and the preferred mysql_enable_utf8mb4 which will be omitted here for brevity) which purports to encode and decode parameters from UTF-8 and tell the database server to expect and return parameters in UTF-8, allowing the code to work with general Unicode text. However, this option suffers "The Unicode Bug" whether it is enabled or disabled. In common cases, it works just enough to fool the user into thinking their code will continue working correctly.

If a string contains codepoints with an ordinal of 256 or greater, it can be safely assumed to be a text string, because byte strings by definition cannot include ordinals of 256 or greater. Such characters that force interpretation as Unicode text are commonly called "wide characters", and these strings can only be stored in Perl's internal upgraded string format, which happens to produce the bytes of the string's UTF-8 encoding. Characters in the ASCII range of 0-127 are identical to the byte ordinals of their UTF-8 encoding (a precise way to say that the encode operation does not affect them), so this distinction is practically meaningless for ASCII strings.

But in between ASCII and wide characters are the ordinals from 128-255 that prove ambiguous; both byte strings and text strings can contain these ordinals, and when encoded or decoded to UTF-8, a valid but different string will result. Perl can use either of its internal representations to store such a string, and DBD::mysql will send it however Perl has stored it without checking.

A string of bytes treated as text and encoded to UTF-8 will generally result in another valid string of bytes, and it is possible (though rare, outside of ASCII) that a string of text treated as bytes can be interpreted as valid UTF-8 and decoded to another valid string of text. This is commonly how text data ends up as "mojibake" or mangled characters, or a byte string ends up unusable. There are a couple common ways that this can result in corrupted or incorrect data using DBD::mysql:

 use utf8;
 my $str1 = "résumé";
 warn length $str1; # 6, thanks to "use utf8"
 utf8::downgrade $str1;
 warn length $str1; # still 6
 # but mysql_enable_utf8 would wrongly send the Latin-1 encoding
 # as a binary string

 my $str2 = "\N{U+2603}\N{U+2603}\N{U+2603}"; # snowman party
 warn length $str2; # 3, regardless
 use Encode 'encode';
 my $str3 = encode 'UTF-8', $str2;
 warn length $str3; # 9,
 # but mysql_enable_utf8 would wrongly send the same as $str2

Binary data can suffer the opposite issue; in most cases, it will be stored in Perl's downgraded internal string format which simply stores the bytes as-is, and in this case the string DBD::mysql sends will match the intended binary data, even with mysql_enable_utf8 enabled. But such a string may be upgraded in the normal course of Perl string operations, in which case the binary data DBD::mysql sends to the database server will be mangled.

 use File::Slurper 'read_binary';
 my $binary_data = read_binary $filepath;
 warn length $binary_data; # file size
 utf8::upgrade $binary_data;
 warn length $binary_data; # still the file size
 # but mysql_enable_utf8 would send an incorrect (likely longer) string

When retrieving strings, DBD::mysql knows whether it is a text or binary column, and so (with mysql_enable_utf8) returns text data as an upgraded Perl string, and binary data as a downgraded Perl string. In the common case, this also works, but in particular if binary data was mangled upon storage, that mangled data will be returned as stored, rather than reversing the mangle.

It is possible to work around this mysql_enable_utf8 bug to write reliable code with DBD::mysql if necessary, as explained by the mysql_enable_utf8 documentation in DBD::mysql, where the examples shown also will seamlessly convert to DBD::MariaDB (though the upgrades and downgrades are unnecessary for DBD::MariaDB).

The DBD::MariaDB correction

In DBD::MariaDB, there is no such option as mysql_enable_utf8 (or mysql_enable_utf8mb4); input parameters are always treated as Unicode strings and encoded to UTF-8 unless indicated otherwise, and text data in the response is always decoded from UTF-8 and returned as a Unicode string. (Note that this still works fine if your tables do not use utf8/mb4 charsets for storage, because MySQL/MariaDB convert automatically between the connection charset and the storage charsets.) This along with correcting "The Unicode Bug" in the XS code results in much more reliable behavior, but code that only accidentally worked due to the mysql_enable_utf8 bugs must be updated for use with DBD::MariaDB or it may consistently mangle the data.

Due to the ambiguities previously discussed, these issues cannot be automatically detected or fixed. Instead, all instances of inserting text and particularly binary data must be audited to ensure that the data is being provided as the database driver expects. Text strings are expected to be provided in decoded Unicode character form; for example, whether the string is upgraded or downgraded, a string representing 3 Unicode characters (codepoints) should have a length() of 3 when provided to the database query.

 use utf8;
 my $str1 = "résumé"; # length 6
 my $str2 = "\N{U+2603}\N{U+2603}\N{U+2603}"; # length 3
 $dbh->do('INSERT INTO `some_table` SET `x`=?, `y`=?', undef, $str1, $str2);

For binary data, DBD::MariaDB requires that the parameter be marked as binary, so that it does not encode it to UTF-8, as the database driver does not have a way of intuiting whether the query will ultimately interpret the data as binary or text. This marking is done using the parameter binding interface:

 my $sth = $dbh->prepare($query);
 $sth->bind_param(1, $text_string);
 $sth->bind_param(2, $binary_data, DBI::SQL_BINARY);
 $sth->execute;

Unfortunately, there is no interface built into DBI to allow specifying parameter types in the more convenient methods such as do() and selectall_arrayref(); the full prepare/bind_param/execute flow must be used. I commonly write a wrapper function to execute these steps for my application in a single function call while being able to specify parameters as binary. Alternatively, Mojo::mysql provides a similar wrapper, and can transparently use DBD::MariaDB in place of DBD::mysql.

For more information, check out the DBD::MariaDB Unicode documentation.

Perl 7: A Modest Proposal

I've written a new blog post on Perl 7 (prev: Perl 7: A Risk-Benefit Analysis and Perl 7 By Default). You can find it, and likely my future posts, on dev.to#perl, for similar reasons as mentioned here.

Perl 7: A Modest Proposal

Perl 7 By Default

Perl 7 has been announced as the next direction of Perl development. My previous blog post explored at a high level the risks and benefits of the announced direction, as well as those of a more incremental proposal. The primary and critical difference between these two approaches is the decision to change interpreter defaults in an incompatible manner; specifically, to have strict and warnings and possibly other features enabled by default for code that does not specify otherwise. I would like to explore each of the arguments presented for this design choice.

Optimizing For New

The primary benefit of changing the implicit defaults is, of course, to allow Perl programmers to write code in a more modern way and newcomers to program in a safer environment without having to know the sometimes arcane or niche ways to activate such an environment.

In a perfect world, a change in defaults helps with this problem. But in practice, it will be a step backwards. As it stands today, people continue to use Perl tutorials and code examples from the early 2000s, and this won't change because some new ones will have a 7 attached. The oldest examples out there will fail to run under this Perl 7 (in the best case; the worst case is more unpredictable), which is most likely to turn the newcomer away. These aren't great examples to learn Perl from in any case, but the chance of continued engagement is better than none. The vast majority of tutorials and examples will still recommend and include strict and warnings, which will behave no differently under this Perl 7.

And finally, any new material written for this Perl 7 will omit the boilerplate entirely. Since it will have no boilerplate, it will have no indication that the code was written to a Perl 7 standard, or preventing it from running on a Perl 5 interpreter, as these features do not carry any inherent incompatibility. This puts the onus on every person sharing such code to include these caveats, with no guarantee that a reader will understand such context. Even if only the bare minimum of strict and warnings are enabled by default, this code run on a Perl 5 interpreter will silently miss the compile- and run-time safety features these pragmas provide, resulting in less exposure of the modern coding environment we wish to present, not more. If other features such as say are enabled by default in Perl 7, such code on a Perl 5 interpreter will result in the same confusing errors one currently gets when omitting the necessary feature.

Temporarily accepting the optimistic assumption that changing the programming environment based on Perl version rather than explicit declaration will be more attractive to new programmers; we should not ignore the needs of the existing userbase. Yes, Perl use is declining; no, it is not dead (as evidenced by having this discussion). There are active CPAN authors, active use of Perl in corporate codebases, 15 year old oneliners and scripts used for systems administration. New defaults do not inherently improve the experience of any of these users; they have not only existing code that may stop working, but existing expertise, code style and assumptions that affect new code they write. New defaults require re-evaluation of both of these aspects; a new version declaration requires only re-evaluating what is written or converted within that scope.

This is not a choice to optimize for new users instead of abandoned code, but a choice to optimize for appearance instead of both existing code and active users.

Why Not Both

As changing the defaults would necessarily alienate those who wish to continue using old code without having to rewrite it, it has been suggested that such users can simply continue to use Perl 5, that LTS versions of Perl 5 would be provided for many years, and further that distributors of Perl are likely to distribute separate perl5 and perl7 binaries.

If we expect Perl 7 to be provided ultimately as a separate incompatible binary, then we are expecting a fork of the Perl ecosystem. Existing code will not be guaranteed to run on the new Perl 7 binary, and new Perl 7 code will at best fail to run on the old Perl 5 binaries, and at worst silently fail to inform the user of errors. Code analysis tools like perlcritic will need to be told which Perl they are analyzing as the code will not contain any hints itself. Any discussion of Perl coding will have to start with "Are you using Perl 5 or Perl 7?"

This approach is technically feasible with an uncomfortable transition period, and would make sense if Perl 7 was to be a significant evolution of the language, and brought great benefits inherent in the incompatibility. This is simply not the case; all of the foreseeable benefits can be realized without breaking changes, and thus this design and its potential for confusion and schism is unnecessary.

The greatest risk of this approach, of course, is the scenario that the Perl 7 interpreter never reaches widespread adoption and development restarts on Perl 5. This risk should not be taken lightly, particularly as this approach seems predicated on the expectation that most people will in fact stick with Perl 5 if Perl 7 is not compatible. We should certainly learn from the Raku saga, and not follow the same path without even a new language to offer this time.

A New Culture

Since the new defaults will not contain any revolutionary new feature or ability, a Perl 7 with changed defaults would not provide sufficient value inherent in the incompatibility. This has been rationalized with a secondary goal to change the culture of Perl development. The current Perl policy is one that it has held for decades: backwards compatibility is paramount, despite the inconvenience. It is suggested that future Perl releases will in fact follow the Perl 7 model to a greater extent, and thus the Perl 7 changes to defaults are designed to set a precedent of major versions breaking compatibility, possibly without the restrained nature of the currently suggested changes.

This seems short-sighted. By all measures, all of the features that could be introduced in future major versions will similarly be easy to introduce in a non-breaking manner. And more importantly, do we really want to go down the road of releasing major versions that break code as a policy rather than when it is necessary and beneficial? This approach does not seem to be in line with Perl's defining pragmatism. Perl is a known quantity, for better or worse; we cannot reinvent what Perl means to the world, only how we present its reality.

They Did It

Python and other languages have experienced major versions with significant breaking changes. After a long and arduous transition period, the Python community is just now beginning to coalesce around Python 3. This appears to show that major versions can break compatibility and the end result can justify the difficult transition.

But the comparison is flawed. The Python 2 to 3 change primarily heralded a one-time incompatible change of explicitly marking strings as bytes or characters, resulting in an objectively better language feature that can only work when applied throughout the whole stack of code. Perl has no feature that even approaches implementing an explicit distinction like Python added, and all of Perl's existing features are easily able to be limited to a lexical scope and leave other code unmodified. Thus, Perl has no feature that would benefit from the significant risk of this approach in the way that Python did. And far from the idea that it facilitates a culture of breaking changes in major versions, the ordeal of Python 3 has very likely made developers more apprehensive of future breaking versions so as not to repeat the trauma without good reason.

Moving Forward

Another suggested impetus for incompatible major versions is to allow Perl development to continue more smoothly, without the shackles of past cruft.

It is of course a worthwhile goal, but changing interpreter defaults does not assist in this goal. Changing defaults does not affect policy or practical ability to remove or alter actual language features. Of course the established policy and application of it can be debated, and the ability to leverage major versions provides significant opportunity in this regard, but this discussion is unrelated to the choice to change user-facing defaults.

Further, it has been pointed out by two of the three people that have recently been working on forward-moving language features that user-facing features and syntax are not a significant barrier to such development. See these messages from Dave Mitchell, who has been working on the signatures feature, and Paul Evans, who recently implemented the isa feature and has many more planned including core try/catch. (Zefram, who recently implemented chained comparisons, has not weighed in on the discussion to my knowledge.)

Conclusion

Moving forward in Perl development and culture would be best achieved by the simple and consistent approach of leveraging the use v7 declaration to provide a modern environment lexically, and interpreting code without such a declaration the same way as now in Perl 5.

The major version provides significant opportunity for messaging of new features, as well as what features such a declaration will enable. The announcement of Perl 7 can detail all of the great things you get in a scope with the line use v7. It is significantly easier to remember that, hypothetically, use v7 will enable warnings and use v8 will enable signatures, than to remember that use v5.12 enables strict and use v5.16 enables __SUB__. The declaration provides explicit reference that the code within that file or code snippet requires that version of Perl, without relying on context of shebangs, filenames, or surrounding text that can get left out. And it provides an explicit hint to static parsers (syntax highlighters, perlcritic, etc) how the following code should be interpreted. Someday, when we are ready for a Perl 8, it will not require repeating any of the plans discussed above for introducing incremental incompatibility; it will only require use v8.

I believe making good use of a new major version is extremely important to portraying the continued and forward development of Perl to the wider programming community. A major version with major features can be a significant boon to jumpstart the stagnating perception of Perl and bring it in line with the reality of its development. But changing the interpreter defaults is unnecessary, irresponsible, and counterproductive. A simple scoped and versioned declaration to enable the modern coding environment provides a stronger way forward and no new risk. It allows Perl 5 users and Perl 7 users to just be Perl users. It should be our default choice.

Reddit comments