Musings Archives

GitHub and the Perl License

When we publish our Perl module repository on GitHub, we might notice something peculiar in the "About" section of our repository: GitHub doesn't recognize the Perl 5 license. This can be a bit confusing, especially when we've explicitly stated the licensing in our LICENSE file.

Without properly defined license, GitHub ranks the quality of a repository lower. This is also unfortunate because it limits the "searchability" of our repository. GitHub cannot index it according to the license and users cannot search by license. This is today more important than ever before as many enterprises rule out open source projects purely on the grounds that their license is poorly managed.

The Problem: Two Licenses in One File

The standard Perl 5 license, as used by many modules, is a dual license: Artistic License (2.0) and GNU General Public License (GPL) version 1 or later. Often, this is included in a single LICENSE file in the repository root.

GitHub's license detection mechanism, powered by Licensee, is designed to identify a single, clear license. When it encounters a file with two distinct licenses concatenated, it fails to make a definitive identification.

Here's an example of a repository where GitHub doesn't recognize the license. Notice the missing license badge in the "About" section:

github-licenses-not-visible.png

Also the "quick select" banner above Readme file does not acknowledge which license there is. github-licenses-not-visible-bottom-bar.png

The Solution: Separate License Files

The simplest and most effective solution is to provide each license in its own dedicated file. This allows Licensee to easily identify and display both licenses. This is perfectly valid because the Perl 5 license explicitly allows for distribution under either the Artistic License or the GPL. Providing both licenses separately simply makes it clearer which licenses apply and how they are presented.

(The other reason for having multiple licenses is situation where different parts of the repository are under different licenses. But this is not our problem here.)

For example, instead of a single LICENSE file containing both, we would have:

  • LICENSE-Artistic-2.0
  • LICENSE-GPL-3

Let's look at an example from my own env-assert repository. In this repository, I've separated the licenses into LICENSE-Artistic-2.0 and LICENSE-GPL-3.

And here's how GitHub's "About" section looks for env-assert, clearly recognizing both licenses:

github-licenses-visible.png

As we can see, GitHub now correctly identifies "Artistic-2.0" and "GPL-3.0" as the licenses for the project.

Same is also visible in the "quick select" bar:

github-licenses-visible-bottom-bar.png

Automating with Software::Policies and Dist::Zilla::Plugin::Software::Policies

Manually creating and maintaining these separate license files for every module can be tedious. Fortunately, there is a way to automate this process if you are using Dist::Zilla for authoring.

Dist::Zilla::Plugin::Software::Policies

If we're using Dist::Zilla for our module authoring, Dist-Zilla-Plugin-Software-Policies can automatically check that we have the correct License files. It uses Dist::Zilla's internal variable licence to determine the correct license files.

The Dist::Zilla plugin uses Software-Policies as a backend to do the heavy lifting.

Software::Policies

Software::Policies is a module that provides a framework for defining and enforcing software policies, including licensing. It comes with a pre-defined policy for Perl 5's double license. It can also generate other policy files, such as CONTRIBUTING.md, CODE_OF_CONDUCT.md, and SECURITY.md.

By using Software::Policies, we can programmatically check for the presence and content of our license files.

This approach not only solves the GitHub license detection problem but also helps us maintain consistent and correct licensing across all our Perl modules, integrating it directly into our build workflow.

By configuring this plugin in our dist.ini, we can ensure that our distribution always includes the correct and properly formatted license files, making GitHub (and other license scanners) happy.

Here's a simplified example of how we might configure it in our dist.ini:

[Software::Policies / License]
policy_attribute = perl_5_double_license = true

[Test::Software::Policies]
include_policy = License

This configuration tells Dist::Zilla plugin Test::Software::Policies to apply the Perl licensing policy, which typically means Artistic License 2.0 and GPL. When we build our distribution with Dist::Zilla, the plugin will create a test file checks for the existence and content of the LICENSE-Artistic-2.0 and LICENSE-GPL-3 files. During testing phase, when running dzil test or dzil release, the test files will be run and if the license files are missing or incorrect, the tests will fail.

To generate the files, we can run the command dzil policies License or just dzil policies. This will create the files according to config in dist.ini, the [Software::Policies / License] part of dist.ini.

We cannot create the files automatically during build because then they will only be included in the release, not in the repository. It is precisely in the repository that we need them for GitHub's sake. So the process to create or update the license files has to have this small manual stage.

Perl in a Business Application

Perl in a Business Application - Musings of an Architect

Everybody knows that Perl is not the right language for a large scale enterprise application. This is common knowledge, right? But why is that? Explanations are as many as there are people explaining. Everything from "it's a script language, therefore slow" to "its free syntax breeds discoherence" to "Perl developers are horrible individualists".

Well, I didn't believe this, and I went on to help in a startup which wants to build some fintech systems, the first aim of which is to integrate with Finnish banks and collect daily payments from a customer's bank account.

It was decided to use Perl as the core language. If Perl is (was) good enough for Goldman Sachs and Morgan Stanley it surely is good enough for us. So off to build a framework!

Two Failed Attempts for System Architecture

We decided to do the web part with Dancer2 and build our own object system with mappers to read from and write to database and a clever filing metaphor with a class called BusinessObjectManager which creates, stores, restores and retires (removes) one object at a time. I had previously worked with a similar kind of metaphor in a C++ system. That was, of course, much more rigid, whereas with Perl we didn't seem to be able to create the safeguards we wanted to prevent future developers from abusing the system. The design just grew more complicated. I considered employing Moose but then instead took a whole different approach.

An approach familiar from Java web programming, Java internal services, which concentrates on business processes instead of business objects. I built an example service called SimulateBank with a simple structure: At the top was Dancer2 SimulateBank::Web package creating the WWW interface, this used SimulateBank::Client package which in turn called the JSON REST Api. Api was done with Dancer2 using SimulateBank::Api package which in turn used internally SimulateBank::Service which spoke to database directly. SimulateBank::Service implemented "services" like finddeposits, _createdeposit, _reportdeposit_ and canceldeposit_. These are processes which codify the business rules of the company. We do not care about objects, we only care about interfaces and processes.

This second approach was much more convenient and efficient. However, it still required code duplication and syncronization between Client and Service. What's more, I wasn't happy with it because it didn't feel "Perlish". I was resonably happy with the "super-structure" of the code, the division between Web and Api. But I felt I wasn't using the possibilities of Dancer2 framework with its plugins, like Dancer2::Plugin::Database and Dancer2::Plugin::Queue. I felt I was doing the same job twice when implementing my own interfaces to database and message queue instead of using the readily available Dancer2 facilities.

├── SimulateBank
│   ├── Api
│   │   └── Transactions.pm
│   ├── Client
│   │   └── Transactions.pm
│   ├── Common.pm
│   ├── Service
│   │   └── Transactions.pm
│   └── Web
│       └── Transactions.pm
└── SimulateBank.pm

And then it struct me! I had looked at the whole problem from the wrong angle. My approach was code first. I tried to create a perfect structure for the future Perl programmers to use.

Towards a Perlish Approach

Perlish approach is result oriented. After all, do we not pride ourselves on using a language which is fast to program with? Software only matters if it's put into production. But what about all that horrible Perlish hacking, the quick-and-dirty way?

That is the Perl way! To create a working solution today. Not to worry about tomorrow.

In my experience the most far reaching problems in software development are not done by programmers, or they are done by programmers when they have been forced into roles that should be done by others, such as database designer, integration architect and user interface designer. These are the people who should do the worrying. They are paid to design long-term solutions!

Business App Blues

The purpose of most enterprise applications is to collect data, and then deliver or distribute it, or act upon it. So is ours. From the very beginning we started to plan our datamodels and the resulting database schema meticulously. For instance, the application has only SELECT and INSERT access to many of our tables to prevent the loss of past state information. The whole schema has only one sequence and its value is inserted in every table so that the whole flow and order of operations in database is trackable.

Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.

Fred Brooks, The Mythical Man-Month: Essays on Software Engineering (1975, 1995), https://en.wikiquote.org/wiki/Fred_Brooks

With Fred Brooks' quote in mind, I approached again the issue of code first. Code could not come first. Data and datamodels came first. Code second. In fact, code comes third, because second place is given to apis, most notably the REST apis created with Dancer2.

Interfaces and Silos

Database schemas and system apis are both interfaces to data. They are the longest lasting parts of the system - and the ones that are the most difficult to change later. Code isn't. It can always be refactored and improved and tested against the unchanging interfaces.

If the interfaces are locked, especially the database schema, and we can be 99% sure that our data is always protected from a malfunctioning program, then it's time to give the programmer free hands to create the best code he can.

Furthermore, the future of our application is in constant change - like in many startups. Microservices is a natural way to extend this way of thinking. Different parts of the system become microservices and silos whose implementation code is their private part. This code can be quickly changed and it must have no connection to any other silos' code. This allows very radical changes if need be, such as web programming frameworks, math packages or even Perl interpreter version.

And what's best, none of the changes in the code threaten the stability of the whole system. Code quality becomes a matter of code reviews. Our backs (interfaces) covered, coding with Perl is fast and fun, because it just works (TM).

The Perl Way

There were several times when we were second guessing our decision to use Perl. This whole story happened in the course of one year's time. I consider myself lucky to have been given the chance to go through that whole mental process. I believe I understand Perl a lot better now - not perhaps as language but as a way of seeing software development and organizing development projects.

The first version was indeed only good for throwing away like Brooks writes in The Mythical Man-Month. We saved some parts and also some ideas from the second version. Speed is of the essence. The internal services for accessing database are mostly skipped and Dancer2 database plugin is used to fetch the data directly. Most action happens right in the same package where there the Dancer2 REST interface endpoints are defined because in most cases the data fetched from or written to database requires no additional handling. So there is no need to create additional layers, especially when the rigid database schema assures that fetched data is always sound (no nulls, no missing values or missing foreign fields). While quality control was earlier exercised only via code reviews, those are now complemented with api tests and rigid database schema modelling.

[Software engineering is the] establishment and use of sound engineering principles to obtain economically software that is reliable and works on real machines efficiently.

Friedrich Bauer (1972) "Software Engineering", In: Information Processing. p. 71

About Mikko Koivunalho

user-pic Perl Programmer for fun and office. CPAN modules and command line tools.