Scientific papers and softwares

By leprevost on July 12, 2012 6:11 PM

This week I recieved from a friend researcher a paper from the scientific journal Bioinformatics. The journal is very famous among bioinformaticians and it describes itself as 'The leading journal in its field'. I'm not gonna specify who is the author or what is the name of the paper because I don't think necessary. In a simple way, the paper is about a software written in Perl designed to increase the performance of a database searching using protein mass-spectra.
I became interested so I downloaded the .rar, what I saw was 7 .pl scripts, 2 .xml and a readme file telling me the right order of execution and the correct inputs and outputs. The first impression I had was not good, there was no organization at all, the documentation was limited to comments above functions and the most important, the authors did not made test files. The scripts were a little messy too, no 'warnings' and 'strict' pragmas were used and some scripts weren't even indented.
My point here is that I think scientific journals should have better peer-review systems for papers based on softwares, specially those between biology and informatics, I know that it could be hard for someone who don't code in Perl to see some of those points, but good practices are a common thing among all languages. In scientific projects if bad laboratory work can be refused why can't bad coding be refused as well? This kind of project became a publication in a scientific journal and with this software I can't have sure that it works exactly like the authors described and I can't know if I will have reproductible results.
Good practices in software development are as important as laboratory good practices, everyone should respect it.

10 comments

10 Comments

Joel Berger | July 12, 2012 9:01 PM | Reply

Hear hear! I started a comment, but it has become a post.

Ron Savage | July 12, 2012 11:56 PM | Reply

Perhaps then we need a pool of volunteers to review code to be published.

But, would the author of such code join in? I don't know.

And, if they did, would they accept the reviewer's recommendations, the way they (presumably) do with the text. I hope so.

I could spend some time helping out with things like that, but I don't want to get flooded with requests :-)!

Cheers
Ron
http://savage.net.au/

gizmo_mathboy | July 13, 2012 3:32 AM | Reply

If there is one thing I know it's grad student coding.

It's awful, but gets their job done.

I'm not surprised at all. Most get very little training on that. TDD isn't even in their vocabulary. You're lucky if they don't stare at you vacantly when you ask about "version control".

Their job is to do whatever it takes to graduate. They don't give a second thought about writing code that someone else could use.

Would you believe I'm generally an optimistic person?

We could try Ron's idea of helping out with code reviews but it probably won't amount to much. It's just a hoop they won't jump through because their advisor won't require it, thus no incentive to do it.

leprevost | July 13, 2012 4:23 AM | Reply

Thank you all for your comments, my reply to Joel is in his post here.
The idea for a group of people dedicated to code review is very interesting, this kind of process already is conventional when publishing scientific papers, but I never heard about something like that for softwares. I think the journals should have in their boards people devoted to code review, they already have for example people specialized in biology, statistics, math, chemistry among other areas.
Ron idea is interesting, could it exist a group of volunteers available to scientist for making code reviews ?
I also agree with gizmo_mathboy, I once was a grad student too and at that time I never heard about things like TDD. In a scientific group its very common to learn with your advisor or colleagues through daily practice, and somethings even they learned with other colleges and advisors who weren't preoccupied with software quality and this generates the impression that TDD for example is an unnecessary waste of time.

Joel Berger | July 13, 2012 4:39 AM | Reply

Hmmm, I guess I'm a little guilty on this front, my UEM simulation has no tests. How would you go about testing something like that? Its not even an idle question, I hope to put this on CPAN before I graduate. That said, I like to think my coding style is better than most grad students, and most of my projects are at least decently well tested (if not 100%), and nearly all are on CPAN.

gizmo_mathboy replied to comment from Joel Berger | July 14, 2012 3:02 AM | Reply

Testing is my weakest point but being a sysadmin is my excuse.

However, as far as testing you could make sure for a given set of inputs the correct output is created?

Also, test the various methods/subroutines behave properly when given good or bad input.

As a one time Software QA engineer/analyst you think I would be more knowledgeable about testing, but that's part of this science and Perl thing, too. How can we get grad students to use TDD? First thing really is to get them to use version control, then TDD.

Joel Berger | July 15, 2012 9:59 AM | Reply

I've got the version control thing down :-) Also it works well with LaTeX too! https://github.com/jberger

Chris Maloney | July 22, 2012 1:34 PM | Reply

Thanks for this important post. You are not alone in calling for code review as part of the publication process. C. Titus Brown recently gave a panel discussion at the Bioinformatics Open Source Conference (BOSC) 2012, and his slides / notes are on Slideshare here.

His main points are:
* There should be review standards for software as part of the journal's review process
* The focus of that review should be on replication of the results
* Reviewers should opt-in

He proposes to start a web site to facilitate the reviewers who want to do this kind of thing, with checklists and dynamic features to calculate scores, and that sort of thing. I don't know if this proposal has seen any uptake.

Chris Maloney | July 22, 2012 1:44 PM | Reply

Okay, I just left a comment, and it's not showing up. I'm not sure if it needs to go through moderation, or if it was automatically spam-filtered (it has a few links). So this comment is to try again, and also to add one link that I neglected to add in my first comment, to an excellent blog post on this topic by the same C. Titus Brown, "Our approach to replication in computational science".

Cheers!

leprevost | July 23, 2012 12:31 AM | Reply

Hi Maloney, thank you for your comments. Your first post was marked for approval, maybe because the links, as you point out. I'm going to approve it.
Cheers

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About leprevost

I blog about Perl, Bioinformatics, Big Data and Complex Networks.

More info »

leprevost