Learning from other industries, part 1 of n

My first job was as a bus conductor, and my second one was as a student trainee in an engineering company - proper engineering, with production lines, big machines, hot things, and "danger of death" notices on equipment. In both of these, safety was an important concern, and especially in the second one it was drilled in to me that safety and quality are closely related and arise from systems, not merely from individual endeavour. While I never completed my degree in manufacturing/systems engineering (I dropped out because I was fed up after too many years in the classroom) I still retain an interest in the subject.

I recently came across the excellent Disastercast podcast by Drew Rae. Of particular interest to programmers is the sixth episode, which looks at the report into a fatal rail crash caused by a poor safety and testing culture.

In the perl world, we like to think that we're pretty good at testing, and to a certain extent that's true - most of us write tests for our code, use tests to find bugs, and to prevent the introduction of new bugs in old code when adding new features or bugfixes. But there are still lessons that we can learn from this podcast.

In particular it looks at what the job of a test engineer is. While most of us aren't officially test engineers, we are responsible to varying degrees for testing, and in some organisations there will be someone who, either because it's his job or just because it's what interests him, takes some leading role in testing. I fall into the latter category and have hassled my management into making it my job.

Rae tells us that a test engineer has three main responsibilities, only one of which is actually conducting tests. The others are to continually improve the standard of testing, and to improve the training and competence of other testers. It's nice to have someone put into words what I've been doing at work on an ad hoc "it seems like the right thing to do" basis. I've done very little formal training of my colleagues - I used them as a guinea pig for my well-received talk on Unit Testing which was presented at the London Perl Workshop 2012 but that's about it. Most of it has been teaching by example, and by ad hoc discussions about how to write and test specific bits of functionality. Their testing competence has visibly improved. I don't think they'd mind if I told you that testing here was somewhat below standard, but is now about average, but still improving and, most importantly, enthusiastically improving.

He also tells us that test engineers always need to strike a balance between getting the task at hand done, and long term organisational needs. In the part of my job that is improving the standard of testing and the competency of other testers (that is, all the other developers) there is often resistance because those get in the way of producing a new feature or fixing an important bug. Which leads us on to his last important point.

Bugs (or, in his example, safety failures) can be avoided by having those who know most about testing spend less time testing and more time creating a better test organisation. Which, again, is nice to have put in words, because it's largely what I do. I don't write much "production" code - new features and the like - and I spend very little time writing tests for other peoples' code. I spend much of my time improving the infrastructure.

I've done things like set up Jenkins, so that whenever we merge code to master it gets tested, so we now catch cases where a developer has just run the tests for the bit of code he was working on and not realised that he's introduced bugs elsewhere. I've set up automatic test coverage reports which, amongst other things, have found parts of our tests that never get run (see my previous blog entry). And I've improved our test harness so that it automatically checks some things without anyone else having to write any code - things like whether a call to our API results in a reasonable number of database queries or not.

This improved infrastructure has proven invaluable in finding (and fixing) bugs before our customers do, and fixing egregiously slow code. I have also managed to make my colleagues more test-aware and all of us and our management more confident that when we release a new version of the application it'll do what it's meant to do.

I recommend Disastercast to you all. I've found every episode so far to be interesting and well-presented, and even those episodes which don't appear to have much bearing on software will at least make you stop and think about the world around you.

3 Comments

Thanks, downloading the episodes now.

Looking forward to part 2..n.

I really enjoyed reading this. Thanks for posting.

Leave a comment

About David Cantrell

user-pic I'm in yur test resultz analyzn yr failz