26
votes

I have read in several places that it's possible to share the objects directory between multiple git repositories, e.g. with symbolic links. I would like to do this to share the object databases between several bare repositories in the same directory:

shared-objects-database/
foo.git/
  objects -> ../shared-objects-database
bar.git/
  objects -> ../shared-objects-database
baz.git/
  objects -> ../shared-objects-database

(I'm doing this because there are going to be lots of large blobs redundantly stored in each objects directory otherwise.)

My concern about this is that when using these repositories, git gc will be called automatically and cause objects which are unreachable from one repository to be pruned, making the other repositories incomplete. Is there any easy way of ensuring that this doesn't happen? For example, is there a config option that would force --no-prune to be the default for git gc, and, if so, would that be sufficient to use this setup without risking losing data?

At the moment, I've been using the objects/info/alternates mechanism to share objects between these repositories, but maintaining these pointers from each repository to all the others is a bit hacky.

(My other alternative is to just to have a single bare repository, with all the branches of foo.git, bar.git and baz.git named foo-master, foo-testing, bar-master, etc. However, that'd be a bit more work to manage, so if the symlinked objects directory can work safely, I'd rather do that.)

You might guess that this is one of those Using Git For What It Was Not Intended use cases, but I hope the question is clear and valid nonetheless ;)

2
I'm curious why it's more work to manage extra refs within one repository. (Also, you can name them foo/master, foo/testing, bar/master - bit better for organization. You can see from the history of git.git that they use that kind of setup.)Cascabel
OK :) I have a large USB disk with a similar repository structure to that described above, and on each of the computers (e.g "foo") a symlink ~/.git -> /media/big-disk/foo.git - I'm using a modified version of gibak for backup and "time-travel" through the history of my home directory on each of these computers when the disk is plugged in. If I had a single repository with different branches, I'd need an extra step after plugging in (changing HEAD manually or "git checkout --leave-my-working-tree-alone foo-master" (?)) before things like "git diff" would work in the obvious way.Mark Longair
You might also be interested in git-new-workdir, which sounds like it works for my similar use-case (multiple checkouts of possibly-unrelated branches in the same repo, only slightly ew!). It symlinks refs and packed-refs which should stop git gc from nuking anything that's been committed; you just need to point each HEAD at a different branch. Stuff in your index is another issue, but if there's nothing important that isn't in the repo or your working tree, rm .git/index; git reset HEAD seems to do the trick.tc.
Googlers might also be interested to know that git clone --shared is a way to create such repos: stackoverflow.com/questions/23304374/…Ciro Santilli 新疆再教育营六四事件法轮功郝海东

2 Answers

14
votes

Perhaps this was added to git after this question was asked/answered: it seems there is now a way to do this explicitly. It's described here:

https://git.wiki.kernel.org/index.php/Git_FAQ#How_to_share_objects_between_existing_repositories.3F

How to share objects between existing repositories? Do

echo "/source/git/project/.git/objects/" > .git/objects/info/alternates

and then follow it up with

git repack -a -d -l

where the -l means that it will only put ''local'' objects in the pack-file (strictly speaking, it will put any loose objects from the alternate tree too, so you'll have a fully packed archive, but it won't duplicate objects that are already packed in the alternate tree).

9
votes

Why not just crank the gc.pruneExpire variable up to never? It's unlikely you'll ever have loose objects 1000 years old that you don't want deleted.

To make sure that the things which really should be pruned do get pruned, you can keep one repo which has all the others as remotes. git gc would be quite safe in that one, since it really knows what is unreachable.

Edit: Okay, I was a bit cavalier about the time limit; as is pointed out in the comments, 1000 years isn't gonna work too well, but the beginning of the epoch would, or never.