Scientific papers and softwares

This week I recieved from a friend researcher a paper from the scientific journal Bioinformatics. The journal is very famous among bioinformaticians and it describes itself as 'The leading journal in its field'. I'm not gonna specify who is the author or what is the name of the paper because I don't think necessary. In a simple way, the paper is about a software written in Perl designed to increase the performance of a database searching using protein mass-spectra.
I became interested so I downloaded the .rar, what I saw was 7 .pl scripts, 2 .xml and a readme file telling me the right order of execution and the correct inputs and outputs. The first impression I had was not good, there was no organization at all, the documentation was limited to comments above functions and the most important, the authors did not made test files. The scripts were a little messy too, no 'warnings' and 'strict' pragmas were used and some scripts weren't even indented.
My point here is that I think scientific journals should have better peer-review systems for papers based on softwares, specially those between biology and informatics, I know that it could be hard for someone who don't code in Perl to see some of those points, but good practices are a common thing among all languages. In scientific projects if bad laboratory work can be refused why can't bad coding be refused as well? This kind of project became a publication in a scientific journal and with this software I can't have sure that it works exactly like the authors described and I can't know if I will have reproductible results.
Good practices in software development are as important as laboratory good practices, everyone should respect it.


Hear hear! I started a comment, but it has become a post.


Perhaps then we need a pool of volunteers to review code to be published.

But, would the author of such code join in? I don't know.

And, if they did, would they accept the reviewer's recommendations, the way they (presumably) do with the text. I hope so.

I could spend some time helping out with things like that, but I don't want to get flooded with requests :-)!


If there is one thing I know it's grad student coding.

It's awful, but gets their job done.

I'm not surprised at all. Most get very little training on that. TDD isn't even in their vocabulary. You're lucky if they don't stare at you vacantly when you ask about "version control".

Their job is to do whatever it takes to graduate. They don't give a second thought about writing code that someone else could use.

Would you believe I'm generally an optimistic person?

We could try Ron's idea of helping out with code reviews but it probably won't amount to much. It's just a hoop they won't jump through because their advisor won't require it, thus no incentive to do it.

Hmmm, I guess I'm a little guilty on this front, my UEM simulation has no tests. How would you go about testing something like that? Its not even an idle question, I hope to put this on CPAN before I graduate. That said, I like to think my coding style is better than most grad students, and most of my projects are at least decently well tested (if not 100%), and nearly all are on CPAN.

Testing is my weakest point but being a sysadmin is my excuse.

However, as far as testing you could make sure for a given set of inputs the correct output is created?

Also, test the various methods/subroutines behave properly when given good or bad input.

As a one time Software QA engineer/analyst you think I would be more knowledgeable about testing, but that's part of this science and Perl thing, too. How can we get grad students to use TDD? First thing really is to get them to use version control, then TDD.

I've got the version control thing down :-) Also it works well with LaTeX too!

Thanks for this important post. You are not alone in calling for code review as part of the publication process. C. Titus Brown recently gave a panel discussion at the Bioinformatics Open Source Conference (BOSC) 2012, and his slides / notes are on Slideshare here.

His main points are:
* There should be review standards for software as part of the journal's review process
* The focus of that review should be on replication of the results
* Reviewers should opt-in

He proposes to start a web site to facilitate the reviewers who want to do this kind of thing, with checklists and dynamic features to calculate scores, and that sort of thing. I don't know if this proposal has seen any uptake.

Okay, I just left a comment, and it's not showing up. I'm not sure if it needs to go through moderation, or if it was automatically spam-filtered (it has a few links). So this comment is to try again, and also to add one link that I neglected to add in my first comment, to an excellent blog post on this topic by the same C. Titus Brown, "Our approach to replication in computational science".


Leave a comment

About leprevost

user-pic I blog about Perl, Bioinformatics, Big Data and Complex Networks.