Sunday, November 21, 2010

Shallow submodules in Git

A git submodule is really little more than an embedded repository with some configuration in the parent repository to track its HEAD. When you first clone the parent repository, you have to manually fetch the submodules by running:

git submodule init
git submodule update

This actually runs a clone (initially) for each submodule. Each submodule is a fully-fledged repository. You can even commit and push from a submodule repository (if you have permissions, of course).

But sometimes you only want submodules for read-only purposes. For example if you have a "superproject" whose purpose is to integrate several projects through submodules. When that's the case, it becomes annoying having to download the entire history for each submodule. It wastes time, bandwidth and disk for everyone that wants to work with this superproject. Shallow clones are great for this, but how can we apply them to submodules?

After asking on stackoverflow, Ryan Graham gave me the hint that "submodule update" can handle previously cloned submodules, so it was just a matter of running a "manual" shallow clone for each submodule between "submodule init" and "submodule update". This script does just that:

Keep in mind that the savings in disk space are usually not quite what you'd expect.


Diogo Saad said...

Wouldnt the final line be git submodule init instead of update?

Mauricio Scheffer said...

@Diego: Nope. First init, then shallow clone, and last update.