Do your piece to fix TIOBE or stop talking about it

Many people talk about TIOBE and how it's bad, or irrelevant, or broken, or many other vague descriptors of why it should be ignored.

All people talking about TIOBE miss one crucial point: It is software, it has an algorithm, and it is not "bad", it is buggy. That means it can be fixed.

So either fix it, or stop talking about it.

Here's why you can fix it

The TIOBE algorithm is to search for "[language] programming" on a number of search engines, then apply a weight to the resulting count, based on the search engine, and sum the results up to get a score.

They do this because C is a letter, Ruby is a stone, Python is a snake (or a comedy troupe). If they searched only for those words they'd get a lot of trash results. Adding programming means they will mostly get results that contain the phrases "[language] programming" or "[language] programming language". Especially the latter is important, since people talking about the other languages often add "programming language" to disambiguate; while Perl developers have no reason to do so (since the word is mostly unique), and simply don't, thus removing themselves from what TIOBE can find.

Here are the top four search engines and their weights:

  • Google: 28%
  • Blogger: 28%
  • Wikipedia: 13%
  • YouTube: 7%

What is notable? All of these sites either point to other content, or are directly editable. I'll go on with wikipedia, as that's the easiest example to use. Here are some search terms and their current result counts:

perl 5194
perl programming 179
python programming 289
python 7239

By comparing the list of results for "perl" and "perl programming" you'll find that many perfectly fine results simply won't show up for the latter and thus won't be visible to TIOBE either.

It's not that the talk about Perl isn't there, the problem is that TIOBE can't see it. And that can be changed.

Here's how you can fix it

The solution is simple: The string "perl programming language" needs to be added to results that are valid for Perl, but don't currently contain it and as such remain invisible to TIOBE. Once TIOBE can actually see the talk about Perl, it will at least stop being a source of false bad press for Perl.

  • Go through this list and ask site owners to update their site.
  • Go through this list and ask blog owners to update their blog or entries.
  • Go through this list and update the page where appropiate.
  • Go through this list and ask uploaders to update their video descriptions.

Lastly, don't forget that if you own or create any of the type of contents that appear in any of previous result lists, make sure you mention "perl programming language" as well.


32 Comments

Updated my own perl stuff.

*points to the footer of this ’ere page*

(Of every single page on this ’ere site, in fact.)

Oh that was you! The first I saw of this was the internal discussion about it (“would this be scummy or are we fine with it?”, which unanimously came down on the side of “fine” after a moment’s consideration), not the pull request itself – so I didn’t make the connection. Plus, for some reason my ability to connect nicks and real names has been highly inconsistent in recent times.

Anyway – thanks for prompting this!

What about search.cpan.org and metacpan? I don't see even "Perl" on their pages. No meta-keywords, no meta-description.

And of course module authors don't put "written in perl programming language" into .pod (because of obvious reasons).

So either fix it, or stop talking about it.
I choose to stop talking about it.

I haven't cared about TIOBE and don't normally talk about it. That said, this seems to be convincing logic and I will do my best to try to help. In the meantime, is there any chance that TIOBE would listen to this logic and allow "Perl" to work in place of "Perl programming" since its not ambiguous? Probably not, but you never know.

Wouldn't it be better to try to get TIOBE to treat "perl" as a synonym for "perl programming?" It would be a special case for them but it's easier (fix in one place vs. update the entire Internet), more robust (you'll never get everyone to update their content) and forward-facing (handles new content by people who don't follow the "rule"). Ultimately, it should be TIOBE's problem if their algorithm doesn't handle the way people discuss the language. This problem wouldn't seem to be unique to Perl, either. I wouldn't expect "PL/SQL programming" to be as common as just "PL/SQL."

I've heard the "negative stance towards Perl" comment a number of times, but I've not seen evidence of this. Can you provide a reference?

@Joel
@Michael

perl 5194
perl programming 179
python programming 289
python 7239

If they count "perl" results, it will get 17-time higher index than "python programming"

Thing is Python (I mean programming language) also used as sole word, without "programming"

They have list of exceptions for search queries.
http://www.tiobe.com/index.php/content/paperinfo/tpci/tpci_definition.htm

(for example ABC is language unless it's "ABC tv")

and grouping (Awk = Gawk)

I wonder if Perl5 (or even Perl6 ?) could be added as alias for Perl?

also:


> Artifacts or ideas on improving the calculation of the TIOBE index will be received with gratitude (tpci@tiobe.com).

I don't think that having "perl programming" on every page that talks about Perl will solve the main problems. I don't think this will suddenly increase the number of people looking for Perl

BUT

I do think that TIOBE has an impact on perception and the comments of RickTick and then of mithaldu on Reddit convinced me how the lack of "perl programming" on the Perl sites lies to TIOBE and any other organization who checks these numbers.

"Of the first 20 hits for Perl, 17 are actual hits on Perl. Meanwhile for the 20 Python results, 5 are actually about the programming language."

BTW You could also try "perl -programming" and "python -programming" to see which pages have the could be updated.

So I added "Programming Perl" to all the pages on the Perl programming weekly, and on the Perl Programming Maven sites. That's about 400 pages.

Now we only need Google to re-index those pages and to see them as "important".

@Mithaldu, those links lead to google.de and it complains about various Perl non-programming site in German....

Don't hold your breath for such title.

Nor for any "spike".

Since then I also updated the two Perl Mongers sites I maintain and sent out a call to all the Perl Monger admins to use their web assets wisely.

Oh and to further brag a bit I also added perl programming tags to all the
interviews with perl programmers.

BTW I think it would be also important that people who update their sites with the magic phrase will also mention it here.

Others, even if they don't have web sites, could then share those links on Google+, further encouraging Google to take those pages seriously and maybe even reindex them sooner.

A couple of more issues as I don't really understand the numbers.

I just searched on Google:


perl programming is higher here than python programming, but if Google really weights 28% of the TIOBE index then either they see different numbers than I did or, in the other searches Python outweights Perl by so much.

Let's see YouTube:

There you go. On YouTube Python Programming is more than 10 times bigger than Perl.

I have no idea how to search on Blogger.com, but I found search.blogger.com that redirected to Google with the "Blog" tab lit up.
I think these are the corresponding searches:

I don't understand this. There are 3 times more hits for perl programming there than for python programming so if this 28% of the weight then I think the 10 time lead in YouTube with its 7% weight should not have such a big impact.

Very strange.

Let's roll with what you say. Are these the searches they use?

Google:

Google blogs:

YouTube

I think the basic problem with TIOBE is that their concept is wrong. They are using a proxy for interest, a proxy that doesn't really mean what they say it does, and then making conclusions from it. Much like the perlmonks' CB stats. Except there I'm quite explicit that the stats are invalid.

TIOBE isn't doing proper sampling to extrapolate anything. Their core methodology is simply flawed. Adding "programming" after the word "Perl" doesn't fix it. It simply does some SEO to game the system.

If TIOBE wanted to reach relevance, they would have to tweak their algorithms to fit reality, to reduce the number of legitimate pages skipped and yet minimise the number of unrelated pages counted. In a way that handles all languages, not just the ones whose communities have rallied around it.

Right now, the strongest conclusion that can be drawn from their numbers is, "oh, that's interesting." Anyone who is drawing stronger conclusions are putting far too much trust in the system and don't understand sufficient statistical analysis to comprehend what the numbers really mean (i.e., nothing).

perlenespanol.com forum, from 2120 results, to 3250, and going up.

I'm curious if we can use the <abbr> tag or title attribute to tag the word Perl with "Perl programming language" and still have it be indexed. That way we can use Perl and still have the text flow naturally.

I've added a quest stencil on Questhub: update 5 pages as per Mithaldu's directions. I've just kicked it off by taking 5 easy wins on wikipedia :-)

Mithaldu, I do wish you would just have let sleeping dogs rest, and forget all about TIOBE. If we try to game TIOBE, then so do the user communities of other programming languages, and then we are going to start an arms’ race.

In any case, since I am unhappy with TIOBE being taken for granted, without close reconsideration, I decided to do my part in educating people against it using my newly created anti-TIOBE page (which is just another page in my section of pages against bad software). Since this was inspired by this post and its publicity, I guess I should thank you for it.

Still as needed as ever...

Leave a comment

About Mithaldu

user-pic Not writing much, but often found pawing at CPAN with other #perl-cats