TIL about Literate Programming

As a programmer whose first job was in the mortgage software industry, “TIL” has always meant “Truth In Lending” to me: you know, that document that the bank is required to give you when you get a mortgage, that’s supposed to explain how much you’re really paying after all the bank’s hidden finance charges, except the numbers don’t seem to make any sense so you just sign it anyway and don’t know anything more than you did before?  Yeah, that one.

Of course, nowadays it means something else, and I’ve had to redirect my ossified mental patterns into new channels, so that, now when I see “TIL,” I can have my brain recognize it as “Today I Learned.” Which is a handy phrase: it encapsulates feelings of discovery, serendipity, and epiphany all into one.  And TIL1 that the way I’ve always tried to write code has a name, a history, and a venerable progenitor—most of my life, without even realizing it, I’ve been trying to use literate programming (only without the tangling).

Let me elaborate.  The aforementioned venerable progenitor is one of the two names you’d expect to see attached to a profound statement about the nature of computer science:2 Donald Knuth.  Here’s how he explained the concept in his essay from 1984 entitled (appropriately enough) “Literate Programming”:

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

Now, at the time Knuth was writing this, the heavy hitter in the computer language scene was C—Pascal was an adolescent, Smalltalk was a toddler, C++ was an infant, and Common Lisp was a newborn.3  Java wasn’t even a zygote yet ... nor was Perl, though it would be birthed into the world in 3 short years.  So most of Knuth’s demonstrations of how literate programming might actually work were done in Pascal and C: specifically using WEB (for the former) and CWEB (for the latter).4  As you can imagine, you can’t really write C code (or even Pascal code) like an “essayist,” so Knuth’s approach was to write in English with codelike constructs sprinkled in that could then be preprocessed into actual C code—a process he called “tangling.” The end result looked a lot like a programming textbook: chunks of text broken up by pseudocode.  Here’s an example Wikipedia gives; it’s part of a literate programming version of wc.

The present chunk, which does the counting, was actually one of
the simplest to write. We look at each character and change state if it begins or ends
a word.

<<Scan file>>=
while (1) {
<<Fill buffer if it is empty; break at end of file>>
c = *ptr++;
if (c > ' ' && c < 0177) {
/* visible ASCII codes */
if (!in_word) {
word_count++;
in_word = 1;
}
continue;
}
if (c == '\n') line_count++;
else if (c != ' ' && c != '\t') continue;
in_word = 0;
/* c is newline, space, or tab */
}
@
Now, given what Knuth had to work with, perhaps this is the best that we can expect for the time.  But I have to say: this doesn’t really read like an essay, now, does it?

It’s also interesting to me that Knuth’s system seems like it has “comments” built right into the code.  That first paragraph there isn’t actually contributing to the code at all.  It’s just there to give the reader of the code some idea of what’s going on—essentially, a comment.  Now, the past few decades have seen a real shift away from comments.  Comments are bad, we’re told in article after article (after article).  It’s devolved into somewhat of a religious debate, like vi vs emacs, or spaces vs tabs.  But there’s a crucial difference here between comments and many of those other subjects of religious wars:  If you used emacs to write your code, I may disagree as violently as I like with your choice of editor, but, in the end, it doesn’t really impact me at all ... not even if I’m the person who has to maintain your code.  Tabs vs spaces is a little more impactful, but if I can’t switch the code from one to the other with a single command-line call, I really shouldn’t be calling myself a coder, now, should I?  I may not like where you put your curly braces (to take another area of contention), but I can still read the code just fine.  On the other hand, if you abhor or adore comments, that will very much impact my experience reading your code.5

Now, I don’t actually want to get into whether or not you should use comments, because it’s really a separate issue.  But I bring it up for a reason.  One might think that people who write articles such as the ones I mentioned above are actually advocating for less literate programming by telling us to use fewer comments.  But that’s not their message at all.  These authors (primarily) object to comments which explain what the code is doing.  Those sorts of comments are unnecessary, they argue: your code explains what your code is doing, and, if it doesn’t, your code is not written clearly enough.  Instead of glomming on extra words in the form of comments, you should rewrite your code.  Jeff Atwood6 puts it thus:

... if your feel your code is too complex to understand without comments, your code is probably just bad. Rewrite it until it doesn’t need comments any more.  If, at the end of that effort, you still feel comments are necessary, then by all means, add comments ... carefully.

I focus on the Atwood article above the others for a couple of reasons.  Firstly, I think Atwood is an articulate and trustworthy source, not to be dismissed lightly (or at all, for that matter).  But also because this article contains a bit of dubious wisdom:7

Perhaps that’s the dirty little secret of code comments: to write good comments you have to be a good writer. Comments aren’t code meant for the compiler, they’re words meant to communicate ideas to other human beings.  While I do (mostly) love my fellow programmers, I can’t say that effective communication with other human beings is exactly our strong suit.  I’ve seen three-paragraph emails from developers on my teams that practically melted my brain.  These are the people we’re trusting to write clear, understandable comments in our code?  I think maybe some of us might be better off sticking to our strengths—that is, writing for the compiler, in as clear a way as we possibly can, and reaching for the comments only as a method of last resort.

There are two implied premises here that I strongly reject:

  1. Code isn’t meant for other human beings; it’s only for the compiler.
  2. In order to write good code, you don’t need to be a good writer.

Now, I’m pretty sure that first one isn’t too controversial.  It’s been over 10 years, after all, since Robert Martin told us that:

Indeed, the ratio of time spent reading vs. writing is well over 10:1.  We are constantly reading old code as part of the effort to write new code.

Because this ratio is so high, we want the reading of code to be easy, even if it makes the writing harder.  Of course there’s no way to write code without reading it, so making it easy to read actually makes it easier to write.

You really will go farther in this business if you believe that you’re writing code primarily for other humans, and the fact that the compiler can understand it too is just a side benefit.  That may sound radical to some folks: after all, if the compiler can’t understand it, the program doesn’t work, right?  But that’s the wrong way to look at it.  If you can’t make the compiler understand you, you don’t last long as a programmer at all.  The flipside, though, is that you can make the compiler understand you, but no other humans can.  Folks like that can last a depressingly long time in our field, but everyone hates working on their code, no one wants to recommend them to work at the same place they work, everyone complains about them behind their backs (or, occasionally, given our tendency towards social bluntness, to their faces) ... that’s not how you’d like to known by your fellow programmers, I’m sure.  Getting the compiler to understand your code is literally the bare minimum you can do as a coder.  Getting your fellow coders to understand your code is the real goal.

So let’s take it as read that we all reject premise #1.  What about premise #2?  Surely that’s much more controversial.

But, to me, that’s where literate programming comes in.  I want my code to read like an interesting essay.  I try to construct my code carefully, using the same principles I use when writing: I break my code into blocks just like I break my writing into paragraphs; each paragraph needs a topic sentence, which is usually (but not always!) the first sentence; I define my terms before I use them, but then I expect my reader to know what they mean, or else they’ll have to look them up in the dictionary (which in this case is a library module).  And I expect a lot of meaning to be delivered via context.  If I name a method fetchFrobnabulator, then you should be able to assume that that method will go fetch a frobnabulator and return it: you don’t need to go find and read the code of the method to know that.  Now, you might still have to find and read it for other reasons of course: perhaps it isn’t delivering a frobnabulator when you think it should, or it’s delivering the wrong one, so you need to dig into it and see where it’s going wrong.  But, barring those special circumstances, you shouldn’t need to break the flow of your reading by jumping around to a whole separate piece of code; you can just assume the thing does what it says on the tin until you have reason to suspect otherwise.

And finally we get to the place where this is a Perl blog post and not just a general programming blog post.  Because, you see, Perl is the most literate programming language I know of.  I don’t need those big blocks of text like Knuth had to write for his weaving and tangling and all that.  I can just write code, and it reads pretty much like English.  Just to take a random chunk of code I wrote recently—not even something that I chose specifically to illustrate this point—here’s some code that’s part of my attempt to do a better job of having “sessions” in vim than vim does:8

my $vim_script = tempfile();
sh vim => -c => "mksession! $vim_script", -c => 'q', '>/dev/null', '2>&1';
my @baseline_mappings = uniq map { parse_mappings } apply { chomp } $vim_script->slurp;
if ($OPT{D})
{
say foreach map { "#-->PRE: $_" } @baseline_mappings;
}

Note a few things here.  You technically have no idea what sh does, or what parse_mappings does, but if you know what sh is in Linux, and you know anything about vim sessionfiles and what’s in them, you can very easily guess.  You technically don’t know what $OPT{D} is either, but it takes very little imagination to work out that it refers to a -D option, which indicates that the user wants debugging (and, even if you lacked that much imagination, there was a block which laid out all the options up above, so hopefully you filed that away for future reference at the time).  You may not know off the bat that tempfile() comes from Path::Tiny,9 but then you don’t even really need to know that to understand that it’s going to create a tempfile.  It’s all very clear.

This is just a stupid, simple example, but it illustrates a number of principles that I try to use for all my code:10

  • Write your code so it flows naturally for the reader.  Start with the first thing that happens, then tell the next thing, and so on.  Don’t go all Pulp Fiction unless you have very good reason.
  • Let context do a lot of the heavy lifting.  Name things so they make sense, and imply what they actually do.  Reuse terms that other programmers are familiar with (here, slurp, uniq, and tempfile might not be obvious to a random native English speaker, but most coders are going to get those immediately).
  • Take advantage of other stuff that people have written to make shortcuts for you, as long as they contribute to readability.  Here, uniq and apply from List::Util and List::MoreUtils respecitvely,11 and tempfile from Path::Tiny all make perfect sense, and they’re both named well and fit seamlessly into the “narrative.”
  • Where nothing currently exists that fills a niche you need, write your own.  In this case, sh is a shortcut for making a bash call,12 and %OPT is the variable created by my opts function that I wrote to process command-line options.13 Just make sure you do the same two things you look for in other folks’ libraries: name them well—it’s perfectly fine if you spend more time thinking up the perfect name than you actually do writing the code—and give them calling semantics that make code that uses them flow seamlessly.
  • Do all the above well and you probably won’t need many—or any—comments.  That block doesn’t have any comments because I didn’t particularly feel like it needed any, but if I were going to add some, they would probably look like this:14
# See what mappings are defined _prior_ to loading the session, and save them for comparison with
# the mappings we see _afterwards_.


Doing this takes effort.  Writing well is a skill like any other.  But this idea that if you can make a compiler do what you want you don’t have to care about writing well is, I think, a fallacy.  Writing well is about how to communicate with your fellow humans, and that’s worth learning for lots of reasons, many of which have nothing whatsoever to do with coding.  But, if you also want to communicate with your fellow humans through code, it’s probably worth spending a little time learning how to do it well.

So I want my code to read like well-crafted prose, and Perl helps me do that.  In today’s world, where most people believe that Perl is dying, I’m under just as much pressure as anyone to switch languages.  And I’m often accused of being resistant to change, or even afraid of it.  But that’s not true: I’m perfectly happy with change ... as long as it’s change for the better.  If I ever find a language that helps me write my own little version of literate programming better than Perl does, I will switch in a heartbeat.

But, so far, I’m still looking.





__________

1 Okay, to be completely honest, it was a few days ago.  But it took me a while to write this post.

2 The other, of course, being Edsger Dijkstra.

3 Though of course Lisp in general had been around forever.  It just hadn’t caught on yet.  I suppose some would say it never did ...

4 Possibly the most important piece of software (and one of the few that still remain) resulting from C/WEB is TeX.

5 Unless, of course, I’m one of those obdurate coders who just utterly refuses to read any comments, on general principle.  That creates entirely different issues though.

6 In the third of the articles I provided links to in the previous paragraph.

7 Just because I think we should listen to what he has to say doesn’t mean I agree with him all the time, you know.  I dove into this a little in an older post on my Other Blog.

8 Don’t even get me started on why vim session files suck.  That’s probably a whole ‘nother blog post.

9 Okay, technically speaking I’m using my own Path::Class::Tiny here, but tempfile() is just a pass-through from Path::Tiny, so ... same diff.

10 Not claiming I’m always successful, of course.  But I try.

11 Or List::AllUtils, if you want to get ’em all in one go.

12 It is in fact a fairly thin wrapper around my own PerlX::bash.

13 And, just to stave off comments, yes, I know about Getopt::Std, and Getopt::Long, and Getopt::Declare, and Getopt::Compact, and Getopt::Euclid, and ... I’ve tried a bunch of ’em, and read about even more, and I still wrote my own.  But, to be fair, it’s mostly just a wrapper around Getopt::Std.

14 Still mostly trying to avoid the comment controversy, but hopefully we can agree that comments shouldn’t explain how the code works—that’s what the code is for.  But comments can be useful to explain why code exists.

2 Comments

It's no surprise that Perl is good for literate programming. Larry Wall is a linguist and has a degreee in Natural and Artificial Languages.

Leave a comment

About Buddy Burden

user-pic 14 years in California, 25 years in Perl, 34 years in computers, 55 years in bare feet.