Converting Complex SVN Repositories to Git - Part 1
In May and June, I worked on converting the DBIx::Class repository from SVN to Git. I’ve had a number of people ask me to describe the process and show the code I used to do so. I had been somewhat busy with various projects, including working on the web client for The Lacuna Expanse, but I’ve finally had some time to write up a bit about it. The code I used to make the conversion is on my github account, although not in a form meant for reuse.
Having previously done the git conversion for WebGUI, JT Smith mentioned to me that the DBIx::Class developers wanted to move to git. The somewhat convoluted history of the DBIx::Class repository and the extensive use of SVK made it a bit more complex than the existing tools could handle automatically. I ended up using git-svn to do the import of the raw data, a set of scripts I wrote or modified from others, and a bit of manual digging to create a pretty accurate history of the project.
git-svn is a tool included with git allowing you to work with SVN repositories using Git. While its bidirectional capabilities aren’t useful when just doing a conversion, it does a serviceable job importing the history into Git. The main problem areas are branch locations and merge tracking. For many projects, branch locations won’t present a problem. For DBIx::Class though, the repository layout had been changed a few times. This meant I had to search through the project history to find the old locations, but this was relatively easy to do. The larger problem, merge tracking, isn’t as easy to resolve. Newer versions of SVN will record extra information about merges, as will SVK. But this was an older repository, and in many cases the recorded merge information wasn’t adequate. Additional work was needed to track down the merges, or to smooth over the recorded ones.
Grafts and filter-branch
History in git is tracked by each commit listing its parent commits. Merges are represented by commits with multiple parents. Git’s storage model prevents you from altering commits directly without changing all of its descendants, but you can record an alternate set of parent commits using grafts. Grafts aren’t part of the normal repository data, and aren’t suitable for redistribution. They can be ‘baked in’ by the filter-branch command, allowing you to redistribute the result, as well as make any other changes to a commit.
Tracking down the branch locations, importing everything into git, and cleaning up commit messages was all relatively straightforward. Most of my effort was spent on creating all of the needed grafts. This involved creating scripts to automatically find merges missed by git-svn, making tools to find and fix merges that were recorded in convoluted ways, as well as manually tracking down what happened to almost every branch in the repository history. Some of this may not have strictly been needed, but the goal was to create a repository where you didn’t have to think about the fact that it previously had existed in SVN. I think the result is about as good as can be done at that.
Next: Initial import from SVN to Git