Merging history of multiple branches from merged subrepository in mercurial

votes

I'm starting with a Mercurial repository which has multiple subrepositories that I'm trying to merge into it, as if they had always been a part of the main repository. They should have never been subrepositories in the first place.

I put together a process to convert the old history into a single repository:

Take a sub-repo and hg convert it using --filemap to move everything into a subdirectory with the name of the directory it should end up in, as described here: Join multiple subrepos into one and preserve history in Mercurial
Grab a list of revision from each of the repositories using Clay Lenhart's answer here: Is it possible to manually change/reorder the revision numbers (provided topology remains the same)?
Sort all the revisions by date, from all the repositories to be merged.
Pull each revision, one by one, into a single new repository.
Do an hg convert on the resulting repository using a python script, as described here: https://www.mercurial-scm.org/wiki/ConvertExtension#Customization, to strip out references to the subrepository in .hgsub and .hgsubstate files
Manually merge the branches.

The problem I'm left with is that the history is unusable, i.e. I can't go back and update to a specific version, because each branch only has its own data. So let's say my main repo is A and my subrepos are B and C: if I update to history of branch A, it doesn't have the files from branch B or branch C, and if I update to history of branch B, it doesn't have files from branch A or branch C.

What I want is some way to merge the whole history together, so it is mostly a single branch, and the files from all the branches appear in each commit. Is there a way to convert it so all the history of the branches are merged, and not just a single merge at the very end?

mergemercurial

2 Answers

votes

While I do not have a canned process for you, I can tell you where things go wrong, and that's in steps 3-6. What you want to do is:

List all the revisions in all subprojects, using the superproject / master repo to control which revisions are grouped-together into what order. (You can do this on the fly as a part of step 4.)
Bring in (pull) the next revision(s) from the next repository/ies according to the sequencing obtained in step 3. If the superproject says that the next commit uses rev 41 of subproject A with rev 97 of subproject B and rev 11 of subproject C, those are the three to bring in. Then collapse them down to a single revision that does not reference any subprojects (e.g., using hg histedit if that seems suitable).

There is no step 5, and there is no step 6, as there is nothing to merge at this point: the combining happened in step 4.

votes

I have a working process now. Instead of doing a pull --force for each step:

First decide the overall order to import from each repository. The stuff in [] are variables.
(In source repo, export the changeset): hg export --git -r [OldID] -o [TempPath]\patch[OldID]
(In target repo, choose the parent): hg update --clean [ParentID]
(In target repo, import the changeset): hg import [TempPath]\patch[ID] --import-branch --bypass
(In target repo, grab the node for later use): hg log -T "{rev} {node}" -r [NewID]

This works because the patch doesn't carry the full version information, just what has changed. It doesn't expect the parent to match up to where it was exported from, just the files you're patching. And since each patch only affects files from one branch on one repository, this is a perfectly reasonable thing to assume.

Changesets which are merges are a bit trickier. I have to edit the patch file to match up its source node ID to the newly imported node ID. This is why we must capture the node ID in step 5 above. You can import a merge changeset as long as you use --exact, but that doesn't work unless the 3 nodes in the patch file match the actual nodes in the repository.

So in case of a merge, after step 2 above, modify the patch file so each "# Parent" matches the correct parent node ID in the target repository. Then after it has been imported, update the patch file again, this time replacing "# Node ID" with the imported node ID.

At this point, the patch file should have all 3 accurate node IDs, and a "hg import --exact --bypass" will work. This second time you import the patch, it will overwrite the existing node you already imported, but it will mark it as a merge between the proper nodes.

The other slightly tricky part is to make sure merges are correctly parented. Each merge is defined in terms of 1 branch merging into the other. If you choose the wrong parent, the merge will fail. So we define the main branch as the branch which all other branches were merged into. When merging two repositories together, you have to start at the tip and follow the main branch back to the beginning. Only the main branches from each repository should be concatenated together, and the other branches stay as separate branches.