We finally finished migrating the Castle subversion repository to git. When starting the migration we decided that each project under the Castle umbrella would keep all of its history, which meant including the history from when the projects weren't separate and stand-alone but a single humongous project. This was a problem, as git-svn couldn't follow the project split.
I first asked on stackoverflow about this, but didn't get any real solutions. So after a few failed experiments I settled on using grafts and filter-branch. Here's the guide I wrote to migrate each project, I think it could be of help for someone in a similar situation.
I already had run basic git-svn migrations of everything so I'll just skip that step.
First, clone the original-history project from the read-only URL (to prevent accidentally pushing to it)
$ git clone git://github.com/castleproject/castle.git
$ cd castle
Add the recent-history project as a remote (with the private read-write URL) and fetch it:
$ git remote add recent email@example.com:castleproject/Castle.Facilities.ActiveRecordIntegration.git
$ git fetch recent
Launch gitk to see both trees:
$ gitk --all
Press F2 and select remotes/recent/master
Both histories are unrelated!
Take note of the SHA1 of the first commit in the recent-history (in this case, the one that has the description "Creating Facilites new folders and setting up the structure". The SHA1 of this commit in this case is 1ad7a4e10b711d1a58f7ac610078dcdf39b36d08
Search in gitk the exact commit in the original-history where the project was moved to its own repository. The first commit in recent-history has the date 2009-10-20 07:30:08 so it has to be around that time.
Found it! Take note of the SHA1: 3526f1a76f6ee2fb23cb2b402201bab1fc5a5b28
Now we're going to build the graft point. Create a .git/info/grafts file with the SHA1s we wrote down:
Note that the format is <child SHA1> <parent SHA1>
Restart gitk, check that both histories are now related:
Now let's make this permanent with git-filter-branch. First we locate all branches and tags in recent-history. In this case there are two branches: master and svn, and no tags. Create local branches and tags for each of these:
$ git branch rmaster recent/master
$ git branch rsvn recent/svn
Now we run filter-branch for these heads: </P?
$ git filter-branch -- 3526f1a76f6ee2fb23cb2b402201bab1fc5a5b28..rmaster 3526f1a76f6ee2fb23cb2b402201bab1fc5a5b28..rsvn
If it complains about a dirty working copy when running filter-branch, reset and retry.
Refresh gitk, check that everything's OK:
Remove the graft and the original heads:
$ rm -rf .git/info/grafts .git/refs/original
Check gitk again, if everything's OK relocate master:
$ git reset --hard rmaster
The temporary branches can be removed now:
$ git branch -d rmaster rsvn
And finally push:
$ git push -f recent
Note that we need to use the -f (force) flag since we rewrote history.
Check on github that everything looks good. Hmm, there's an outdated svn branch, let's remove it:
$ git push recent :svn
On github, check that the committers are correctly mapped, each commit should be linked to the profile of its author.
Now add the build scripts as a submodule:
$ git submodule add git://github.com/castleproject/Castle.Buildscripts.git buildscripts
Commit and push. That's it.
Actually, after all of this we decided to avoid submodules and instead copy the build scripts and build tools to make forking easier for everyone.
Also, this guide wasn't applied verbatim for all projects. Some projects were merged into other projects, so these "destination" projects required multiple graft points to merge the other projects' histories.