Please Don't Throw Me into that Briar Patch ...
[Quick note: This is mostly a Git story, although there’s plenty of links to Perl code and Perl discussions. It also turned out to be quite a bit longer story than I originally intended. But, even though it gets pretty deep into Git features, I think it should be interesting enough for anyone who has a CPAN module and uses Git. I hope.]
The Backstory: So, a couple months back, I look in my email and spy a message from Damian Conway. Now, some of you fine readers probably know Damian personally, and I’m sure I’ll get several comments pointing out how he’s just an ordinary bloke and all, but to you I say: hush! You may be all jaded and world-weary, but I’m still just a regular Perl schmoe, and when I see an email from The Damian addressed personally to me, I get all goosebumpy. So hush up and allow me my fanboy gushes.
Especially since the purpose of his email was to send me a patch for Method::Signatures (for which I’m a co-maintainer) and to tell me:
I adore Method::Signatures and am now using it in nearly all my own development...and in absolutely all my teaching.Ooohh ... high praise indeed, from a Perl luminary. Coolio, man!
Anyways, this missive just happened to have the ill fortune to land at the same time that $work began to (uncharacteristically) demand a larger chunk of my life than normal. So I didn’t have time to review the patch too closely, but I certainly didn’t want to lose the opportunity to discuss a few larger design issues with Mr. Conway. Thus began a long, winding discussion among me, Damian, and Schwern (who must be given full credit as the original author of MS), both in email and in GitHub issues, and even spilling over into my blog. And I learned something about Damian: he thinks in code. So, every time he comes up with a new idea, he sends me a new patch. By the time it’s all over, and $work is settled enough that I can really start devoting some time to this issue, I’ve got six, fairly hefty patchfiles from Damian—some of them overwriting or cancelling out the direction of others, the first four of them patched against current HEAD (of master) on GitHub, and the last two as both patches against current HEAD and against the previous patch.
What to do, what to do?
A Brief Tangent on Contributions: At some point, Schwern noted that we’d finally come to a general agreement on basic design, and that he’d made Damian a contributor, so Damian should just go ahead and make a branch, and put his code on it. To which Damian replied:
Re branching repo: Probably won’t. Expect annoyingly complete patches. (I know. Not properly social. Not a team player. Not git-ish. But also not a new phenomenon. Every single report card I ever received included some variation of: “Does not play well with others”.)Now, at the risk of putting words in The Damian’s mouth, I read this as “hey, I’ve been delivering patches to CPAN authors for years, and I’m not going to change now just because of the latest newfangled method.” Which leads me to ponder a bit on the relationship between author and contributor.
Like so many things in life, this involves a balance. The contributor has a feature they want added. The author wants people to fix their code for them (for free, even). This is a mutually beneficial arrangement, but there’s still all sorts of things to work out. Perhaps the author uses Dist::Zilla and the contributor doesn’t, or perhaps the author adheres to strict TDD and the contributor has sent no tests for the new feature. Or, maybe, the contributor has sent a patch when the author would really prefer a GitHub pull request.
Now the two can have some negotiation. In the TDD example, perhaps the author wants the new feature badly enough that they’re willing to write the tests themselves. Or perhaps the contributor wants the new feature badly enough that they’re willing to go back and add the tests upon request. But, the point is, everybody has to evaluate how bad they want it.
So ... how bad do I want a pack of new features and refactorings from The Damian? Yeah, you bet your ass.
The Meat of the Matter: So here’s what I suggested: I’ll make a branch, apply the patches to it, one at a time, review the code (not that I feel Damian’s code needs a lot of review, but, as he said: “Long and bitter experience indicates I’m every bit as fallible as a real human. Better check that code. ;-)”), make sure it all fits together, and then offer up the results for review.
This sounded simple enough at the time.
Of course, the tricky part is that, for the first four patches at least, they’re all patches against HEAD. That means that, while the first patch is no problem, the next three are going to be annoying, because the source I’m patching against isn’t the source originally used to generate the patchfile. Happily, I foresee this, and start to formulate a plan to work around it. Here’s how it went, and hopefully this is helpful to someone else out there.
First, I make a new branch off head, which I creatively name “damian_features”. Next, I apply patch #1 to my new branch. Since it was an exact copy of master’s HEAD, it applied cleanly ... once I remembered how to work
patch. Been a while since I had any cause to apply a patchfile. Looking at the structure of the file helped me jog my memory: each file consists of a set of context diffs from several files. Each file is identified with a relative path. Within each file’s section are one or more “hunks”; that is, the individual context diffs themselves. The
patch command (which, you may recall, was Larry Wall’s original claim to fame, before he invented Perl) simply finds the files indicated in the patchfile, then applies each “hunk” of context diffs, one at a time, to said files. If the source of the file being patched matches the source of the file the patchfile was generated from, the hunk “applies cleanly” and all is well. If not ... then you’ve got a problem to solve.
Since patchfiles are nearly always generated a directory above the source directory, but nearly always applied in the source directory, the filenames in the patchfile generally have an extra directory component in them that you need to strip off. Happily,
patch can do that for you. In fact, it can strip off an arbitrary number of directories for you, but we’ll just tell it to strip one. The only other mildly tricky bit is that patchfiles are read from stdin, not as args:
And it all works just fine. Next a simple
patch -p1 </tmp/Method-Signatures-Conway.patch
git diffto review the changes—making notes of anything I want to change or ask about, but not actually changing anything yet—then I stage and commit. For the commit message, I make up a brief summary line, but I pull the guts of the message from Damian’s description of the patch in his email. Then I make any little tweaks I want to, as separate commits (that way Damian’s changes and my changes aren’t comingled). I could hold off on this until the end, but doing it in between each commit preserves the history better, and plus the next patch isn’t going to work straight up anyway, so there’s no advantage in waiting.
Okay, the next few patches aren’t going to apply cleanly, so let’s see how bad I want them. Looks like patch #3 pretty much wipes out everything patch #2 did (hey, when you’re thinking out loud in code, you’re going to go down a few blind alleys here and there), but, other than that, all the patches are pretty useful, at least in terms of showing a gradual progression to how we got to the end product. So I decide to toss patch #2 but keep the rest.
So, how do I proceed? Simple:
- I switch back to the
- I apply patch #3 there, which is the same HEAD the patchfile was generated from, so it will apply cleanly.
- I then stash the changes.
- Then I switch back to the
- Then I unstash.
Voilà. Now, this sounds simple(ish) in theory, but of course in practice it isn’t perfect. First, I want to stash any files which the patchfile adds as well as the ones it modifies. To do this, I need
git stash -u, but that’s only available in
git 1.7.7 or later, which I don’t actually have, so I have to upgrade
git. That’s okay; I probably needed to do that anyway.
Next problem I run into is what happens when patch #1 adds a file, and then patch #3 adds the same file. This blows the unstash up (because it’s trying to add a file that already exists), and I have to move the existing file out of the way, then re-unstash, then do a manual
diff to see if the two “new” files are identical or not. And, of course, if the first unstash added any really new files before it blew up, I have to remove those before the re-unstash or it’ll blow up again (this takes me a few tries to get right). Overall, this is an annoyance, but luckily none of the changes between patches to any such files are any big deal.
Final problem I run across is, what happens if patch #1 adds some code in a place, and patch #3 doesn’t touch the code in that same place? Well, the code really should be removed, because patch #3 doesn’t contain it, and it’s a full replacement for patch #1. But my method is going to leave it, because an unstash is essentially a merge, so it has no idea that code shouldn’t just stay right where it is. Luckily this only happens once throughout the process (if you think about it, it makes sense that this would be rare, since the patches are all messing with the same features), and I catch it easily.
So basically it’s just rinse and repeat this whole shebang for patch #4, and then patches #5 and #6 have versions against the previous patches, so I just apply those directly.
Now, you may be asking yourself: why go through all this rigamarole with the stashing and unstashing? If the patch doesn’t apply cleanly, you have a mess to clean up. On the other hand, if the unstash doesn’t merge cleanly, you have a mess to clean up there too. What’s the difference?
Well, when the patch doesn’t apply cleanly, it often keels over right there and doesn’t move on to the next file. Plus, what it generates is a file where as many of the hunks as it could apply cleanly are applied, the hunks it couldn’t apply cleanly are stuffed into a
.rej file, and the original source is saved as a
.orig file. Which is fine, as far as it goes. But it means that you have to pick the rejected hunks out of the
.rej, find where they were supposed to go in the file based on the context (because the line numbers likely aren’t right if there are earlier hunks which were applied), insert the rejected hunk, which is actually a diff, which means it’s got little +‘s and -‘s in front of all the lines ... it’s just a little messy, is all. But if the unstash doesn’t apply cleanly, it’s just a merge conflict, and those are simple to fix.
At the end of this whole process, I’ve got 11 commits (5 substantial ones from Damian, and 6 little ones from me), and I review them all before I push:
That shows me all the commits that I haven’t pushed yet (that command is so handy and yet so hard to remember that I have it scripted). In my review, the first thing I notice is that I can’t really tell Damian’s commits from mine, at least not at a glance. I’ve done all the committing, so they all show up under my name. But that’s wrong: I really want them to show up under Damian’s name. And I know that
git log origin/damian_features..HEAD
githas the concept of the author and committer being separate, which is perfect for this application. If only I’d remembered to use the
--authorswitch when I committed!
But, wait: this is
git we’re talking about. I can fix this ... right?
Of course I can:
Yes, that’s 11 carets, and, yes, there’s a way to abbreviate that, but I don’t recall it off the top of my head (I’m sure someone will supply it in the comments). You might think it should be 10 carets, because HEAD counts as one—at least I always think that—but it’s actually not. This puts me in an editor with all 11 commits listed, with “pick” in front of each. Now all I need to do is change “pick” to “edit” for each of the Damian commits. When I save that file,
git rebase -i HEAD^^^^^^^^^^^
gitreapplies each commit, stopping at each one that I changed to
edit(actually, stopping just after committing each of those), then I just have to:
with just a quick
git commit --amend --author "Damian's info here" git rebase --continue
ZZin between to resave the commit messages (which are fine), rinse and repeat 4 more times for the other Damian commits, and, Bob’s yer uncle, it’s all fixed. One last review to make sure it all looks good, then
git pushto GitHub and send Damian an email with a link to review (look at the commits on Sep 26, 2012, in case there happen to be more by the time you read this), and I’m all done.
Wrap-up: I hope you found something in this long post useful, or interesting, or at least entertaining. I found it to be an interesting and challenging exercise in making two different forms of contributing work together, and I bet it’s not the last time I’ll need to do that.
Certainly, as long as Damian is willing to keep sending me patches, I’ll keep integrating them. Look at the awesomeness we’ve gotten so far!
[Notes: 1) Excerpts from Damian’s emails used by permission. 2) If you’re interested in getting to use the actual code that Damian wrote for Method::Signatures, don’t worry: I’m going to be doing a bit more testing on different Perl versions, but you should see it hit CPAN soon. 3) If you didn’t catch the reference in the post title, you could read the classic American folktale online.]