Top Github Languages for 2013 (so far)

Maxim Yemelyanov pointed me to this post comparing programming languages based on the number of new GitHub repositories in the first 8 months of 2012 and 2013.
(excluding forks!)

Interestingly the numbers dropped significantly for every language (except JavaScript).

Perl dropped a lot more (in percentage) than other languages (from 48,620 to 15,412).

Some Perl repositories might be miscategorized by GitHub which might add to the low number, but then either it got worse in the last year, or regardless of that Perl has dropped a lot.

Still, there is something strange.

According to the stats on CPANTESTERS there were 1,524 new distributins uploaded to CPAN in the first 8 months of 2013. That's only 10% of the number of GitHub repositories written in Perl. What are all the other Perl-based projects?

Maybe some of those repositories were for older CPAN distributions that just moved to GitHub? There are 28,459 distributions on CPAN, but accoring to that blog post,
just in the 2 * 8 months checked there were about 64,000 new Perl projects on GitHub.
Probably there are a lot more.

I looked at my own GitHub account. Out of the 70 or so repositories I created, about 20-30 are on CPAN. The rest are experiments, sources of web sites, non-Perl projects, etc. The CPAN/GitHub ratio is certainly much higher than 10%.

So what are all those other Perl-projects that are not on CPAN?


Among my public source repos i have 24 repos that aren't on CPAN because they're either:

- incomplete
- not something that belongs on cpan
- playgrounds and experiments

I have several pieces of code which just work (however those are not in Github separate repos).

I expect CPAN code to be good quality, so the following stops me
from publishing it on CPAN:

- documentation (including explanation why one would need this)
- API which won't break backward compatibility in the future.
- good testsuite
- no brittle tests
- centralized exception handling
- error handling and protection from misuse.
- proper dependency tracking (M::B or EU::MM files)
- proper localization of perl magic variables
- compatibility across OSes (and between different OS distros)
- compatibility with fork()'ed processed
- compatibility with threads
- proper EINTR/EWOULDBLOCK handling
- proper Unicode/byte/character string handling
- workaround for bugs in dependencies (in different versions, in different perl)

If I ignore all of above and just submit "code which just works", either no one would use it or
people will report bugs until I fix all above issues.

So I think there is a big gap between working code and code with production quality.

I think they would simply be forks of other repositories, for the most part.

Perl dropped a lot more (in percentage) than other languages (from 48,620 to 15,412).

The metric is the number of new repositories on Github.

I observed 2012 was a big jump for the CPAN community on Github. Many CPAN authors have moved their old source repositories to Github.
I think (I have no metrics) that 2012 had a big boost of new Perl projects just because some old Perl projects moved to Github. Now that Github is the established open source hosting, the mass move effect is gone and project creation slows down.
This is just speculation.

We just need to resurrect the gitpan project to skew the stats in our favor... ;)

I have never put a line of my distributed Perl code on Github, but it's mostly there through gitpan-style auto-copying. That suggests to me that these numbers are even less useful than TIOBE.

TIOBE guesses on indirect evidence from external sources. GitHub can tell you directly the real counts (not guesses) from their own operation. That "your" code is on there doesn't change that. It's open source, and "your" open source code is on there.

What's with the scare quotes around "your"? I wrote it. People can do what they want with it, even try to make money without giving me credit (though such people are sociopaths). My point was that measuring how good some language's advocates are at writing scripts to spam github is kind of a waste.

> So what are all those other Perl-projects that are not on CPAN?

I think a better question is "where are all those CPAN distributions that aren't on github?" -- with an eye to encouraging everyone to update their distribution metadata to contain the repository location.

It might even be worthwhile to track this explicitly, similarly to how tracks Changes updates.

> What are all the other Perl-based projects?

Also, the stats make it very unclear whether this counts forks or not. One CPAN distribution can be represented as many github repositories, if there are multiple contributors and they all maintain a separate fork.

These stats are only meaningful if forks are only counted as one repository.

@Gabor: cool! I'm a bit shocked that only ~60% of recent uploads contain repository links... have you considered publishing a feed of the results, like does? (I bet most of the same code could be used.. perhaps even hosted on the same server?)

That would give the questhub folks something specific to look at to find repositories to help patch.

Leave a comment

About Gabor Szabo

user-pic Perl author and trainer. Usually writing on other sites: Writing the Perl 5 Maven tutorial Perl 6 articles. Started a Perl IDE. Running the Weekly Perl newsletter. My personal blog.