Converting Complex SVN Repositories to Git - Part 2

Initial Import into Git

Creating a mirror

SVN is slow, and git-svn is slower. The amount of network traffic needed by SVN makes everything slow, especially since git-svn needs to walk the history multiple times. Even if I made no mistakes and only had to run the import once, having a local copy of the repository makes the process much faster. svnsync will do this for us:

# create repository
svnadmin create svn-mirror
# svn won't let us change revision properties without a hook in place
echo '#!/bin/sh' > svn-mirror/hooks/pre-revprop-change && chmod +x svn-mirror/hooks/pre-revprop-change
# do the actual sync
svnsync init file://$PWD/svn-mirror http://dev.catalyst.perl.org/repos/bast/
svnsync sync file://$PWD/svn-mirror

Importing with git-svn

Next, we have to import it with git-svn:

mkdir DBIx-Class
cd DBIx-Class

05.import:

git init

git svn init \
    -TDBIx-Class/0.08/trunk \
    -ttags/DBIx-Class \
    -tDBIx-Class/tags \
    -bbranches/DBIx-Class \
    -bDBIx-Class/0.08/branches \
    -bDBIx-Class/0.08/branches/_abandoned_but_possibly_useful \
    -bbranches \
    --prefix=svn/ \
    file://$BASE_DIR/svn-mirror

git config svn.authorsfile $BASE_DIR/authors
git svn fetch --authors-prog=$BASE_DIR/author-generate

A number of parts go together for this. The most important part is the locations of all of the branches. The current branch locations (DBIx-Class/0.08/branches and .../abandonedbutpossiblyuseful) were simple. And trunk (DBIx-Class/0.08/trunk) would be tracked back past when it had been moved. But past branches wouldn't be found. For this, I manually searched through the repository for past branches. Another option for that would be searching the entire history for and files ending with the path 'lib/DBIx/Class.pm' and assuming that is a branch. With the configuration given, branches also get imported for other projects that kept their branches in the same directories. These can just be deleted after the fact.

The second part is defining an authors file. This lists the mappings between SVN user names and a name and email as used by Git. We don't have this information yet, so the author-generate script is used, which generates a fake name and records it. That recorded list of names will later be used to re-write the authors using the correct information.

The 'git svn fetch' operation takes many hours to run, but as long as the branch locations are correct, this only needs to be done once. Running 'svnsync sync' and 'git svn fetch' again will update the git repository with any later changes to the SVN repo 10.update. All of the steps past this are much faster, but are also destructive. At this stage I just created a backup of the Git repository to be restored as I made corrections to the later scripts.

Initial Cleanup

The next step is to remove some of the extra branches created during the import. There are some branches that existed in the same branch root but weren't actually part of the DBIx::Class project. The 20.delete-non-branches script removes these by searching through each branch and deleting any that don't contain the file lib/DBIx/Class.pm.

There are also some duplicate branches created when they were found in two different branch roots. These are labeled with an @ and revision number at the end. I initially made a script to delete all of these these duplicate branches if they were actually duplicates, and not different branches that had been given the same name (collapse-past-branches). I found that they were all duplicates though, so I ended up just deleting all of the branches marked with @ symbols (25.delete-past-branches).

The last step in the initial rough import was to create standard git branches and tags for all of the imported branches. The 30.fix-refs script does this work. Most of it is taken from nothingmuch's git-svn-abandon project, which does a similar task to my scripts, but without as much cleanup. For branches, all that is done is to create normal local branches rather than the svn/ prefixed remote branches created by the import. Because SVN doesn't differentiate between branches and tags, git-svn creates doesn't create real tags when importing. So the fix-refs script searches backward from the tag to find what commit it refers to and tags that. Due to the reorganization that had been done to the SVN repository this wasn't entirely adequate, so I had to manually fix some of the tags later.

The repository is now starting to resemble a real Git repository.

Next: Calculating the many merge points to record as grafts.

1 Comment

Always wondered how you did that.

Leave a comment

About Graham Knop

user-pic