Thursday, March 4, 2010

Stitching git histories

We finally finished migrating the Castle subversion repository to git. When starting the migration we decided that each project under the Castle umbrella would keep all of its history, which meant including the history from when the projects weren't separate and stand-alone but a single humongous project. This was a problem, as git-svn couldn't follow the project split.

I first asked on stackoverflow about this, but didn't get any real solutions. So after a few failed experiments I settled on using grafts and filter-branch. Here's the guide I wrote to migrate each project, I think it could be of help for someone in a similar situation.

I already had run basic git-svn migrations of everything so I'll just skip that step.

First, clone the original-history project from the read-only URL (to prevent accidentally pushing to it)

$ git clone git://github.com/castleproject/castle.git
$ cd castle

Add the recent-history project as a remote (with the private read-write URL) and fetch it:

$ git remote add recent git@github.com:castleproject/Castle.Facilities.ActiveRecordIntegration.git
$ git fetch recent

Launch gitk to see both trees:

$ gitk --all

Press F2 and select remotes/recent/master

Both histories are unrelated!

Take note of the SHA1 of the first commit in the recent-history (in this case, the one that has the description "Creating Facilites new folders and setting up the structure". The SHA1 of this commit in this case is 1ad7a4e10b711d1a58f7ac610078dcdf39b36d08

Search in gitk the exact commit in the original-history where the project was moved to its own repository. The first commit in recent-history has the date 2009-10-20 07:30:08 so it has to be around that time.

Found it! Take note of the SHA1: 3526f1a76f6ee2fb23cb2b402201bab1fc5a5b28

Now we're going to build the graft point. Create a .git/info/grafts file with the SHA1s we wrote down:

1ad7a4e10b711d1a58f7ac610078dcdf39b36d08 3526f1a76f6ee2fb23cb2b402201bab1fc5a5b28

Note that the format is <child SHA1> <parent SHA1>
Restart gitk, check that both histories are now related:

Now let's make this permanent with git-filter-branch. First we locate all branches and tags in recent-history. In this case there are two branches: master and svn, and no tags. Create local branches and tags for each of these:

$ git branch rmaster recent/master
$ git branch rsvn recent/svn

Now we run filter-branch for these heads: </P? $ git filter-branch -- 3526f1a76f6ee2fb23cb2b402201bab1fc5a5b28..rmaster 3526f1a76f6ee2fb23cb2b402201bab1fc5a5b28..rsvn

If it complains about a dirty working copy when running filter-branch, reset and retry.

Refresh gitk, check that everything's OK:

Remove the graft and the original heads:

$ rm -rf .git/info/grafts .git/refs/original

Check gitk again, if everything's OK relocate master:

$ git reset --hard rmaster

The temporary branches can be removed now:

$ git branch -d rmaster rsvn

And finally push:

$ git push -f recent

Note that we need to use the -f (force) flag since we rewrote history.

Check on github that everything looks good. Hmm, there's an outdated svn branch, let's remove it:

$ git push recent :svn

On github, check that the committers are correctly mapped, each commit should be linked to the profile of its author.

Now add the build scripts as a submodule:

$ git submodule add git://github.com/castleproject/Castle.Buildscripts.git buildscripts

Commit and push. That's it.

Actually, after all of this we decided to avoid submodules and instead copy the build scripts and build tools to make forking easier for everyone.

Also, this guide wasn't applied verbatim for all projects. Some projects were merged into other projects, so these "destination" projects required multiple graft points to merge the other projects' histories.

1 comment:

fotos said...

Works great! One step closer on migrating all SVN repos to GiT.

Thanks a lot for writing this down Mauricio.

-fotos