Mercurial Repo Living Archive

Question

We have an Hg repo that is over 6GB and 150,000 changesets. It has 8 years of history on a large application. We have used a branching strategy over the last 8 years. In this approach, we create a new branch for a feature and when finished, close the branch and merge it to default/trunk. We don't prune branches after changes are pushed into default.

As our repo grows, it is getting more painful to work with. We love having the full history on each file and don't want to lose that, but we want to make our repo size much smaller.

One approach I've been looking into would be to have two separate repos, a 'Working' repo and an 'Archive' repo. The Working repo would contain the last 1 to 2 years of history and would be the repo developers cloned and pushed/pulled from on a daily basis. The Archive repo would contain the full history, including the new changesets pushed into the working repo.

I cannot find the right Hg commands to enable this. I was able to create a Working repo using hg convert <src> <dest> --config convert.hg.startref=<rev>. However, Mecurial sees this as a completely different repo, breaking any association between our Working and Archive repos. I'm unable to find a way to merge/splice changesets pushed to the Working repo into the Archive repo and maintain a unified file history. I tried hg transplant -s <src>, but that resulted in several 'skipping emptied changeset' messages. It's not clear to my why the hg transplant command felt those changeset were empty. Also, if I were to get this working, does anyone know if it maintains a file's history, or is my repo going to see the transplanted portion as separate, maybe showing up as a delete/create or something?

Anyone have a solution to either enable this Working/Archive approach or have a different approach that may work for us? It is critical that we maintain full file history, to make historical research simple.

Thanks

Hello Bryan, could you details why do you want to make the repo much smaller? Is it because cloning is too slow? Is it because some operations are too slow (commit, push, pull)? There is some experimental changes that have landed recently in Mercurial that could helps you, but first I would need more information about your repository. Could you run hg heads -T "\n" | wc -l, it will give the number of open heads on your repository? — Boris Feld

Pierre-Yves David Pierre-Yves David · Accepted Answer · 2018-08-23T09:58:24

You might be hitting a known bug with the underlying storage compression. 6GB for 150,000 revision is a lot.

This storage issue is usually encountered on very branchy repositories, on an internal data structure storing the content of each revision. The current fix for this bug can reduce repository size up to ten folds.

Possible Quick Fix

You can blindly try to apply the current fix for the issue and see if it shrinks your repository.

upgrade to Mercurial 4.7,
add the following to your repository configuration:

[format] sparse-revlog = yes
run hg debugupgraderepo --optimize redeltaall --run (this will take a while)

Some other improvements are also turned on by default in 4.7. So upgrade to 4.7 and running the debugupgraderepo should help in all cases.

Finer Diagnostic

Can you tell us what is the size of the .hg/store/00manifest.d file compared to the full size of .hg/store ?

In addition, can you provide use with the output of hg debugrevlog -m

Other reason ?

Another reason for repository size to grow is for large (usually binary file) to be committed in it. Do you have any them ?

Mercurial Repo Living Archive

2 Answers

Possible Quick Fix

Finer Diagnostic

Other reason ?