Named Branches vs Multiple Repositories

130

votes

We're currently using subversion on a relatively large codebase. Each release gets its own branch, and fixes are performed against the trunk and migrated into release branches using svnmerge.py

I believe the time has come to move on to better source control, and I've been toying with Mercurial for a while.

There seems to be two schools of though on managing such a release structure using Mercurial. Either each release gets its own repo, and fixes are made against the release branch and pushed to the main branch (and any other newer release branches.) OR using named branches within a single repository (or multiple matching copies.)

In either case it seems like I might be using something like transplant to cherrypick changes for inclusion in the release branches.

I ask of you; what are the relative merits of each approach?

version-controlmercurialbranchdvcs

129

votes

The biggest difference is how the branch names are recorded in the history. With named branches the branch name is embedded in each changeset and will thus become an immutable part of the history. With clones there will be no permanent record of where a particular changeset came from.

This means that clones are great for quick experiments where you don't want to record a branch name, and named branches are good for long term branches ("1.x", "2.x" and similar).

Note also that a single repository can easily accommodate multiple light-weight branches in Mercurial. Such in–repository branches can be bookmarked so that you can easily find them again. Let's say that you have cloned the company repository when it looked like this:

[a] --- [b]

You hack away and make [x] and [y]:

[a] --- [b] --- [x] --- [y]

Mean while someone puts [c] and [d] into the repository, so when you pull you get a history graph like this:

            [x] --- [y]
           /
[a] --- [b] --- [c] --- [d]

Here there are two heads in a single repository. Your working copy will always reflect a single changeset, the so-called working copy parent changeset. Check this with:

% hg parents

Let's say that it reports [y]. You can see the heads with

% hg heads

and this will report [y] and [d]. If you want to update your repository to a clean checkout of [d], then simply do (substitute [d] with the revision number for [d]):

% hg update --clean [d]

You will then see that hg parents report [d]. This means that your next commit will have [d] as parent. You can thus fix a bug you've noticed in the main branch and create changeset [e]:

            [x] --- [y]
           /
[a] --- [b] --- [c] --- [d] --- [e]

To push changeset [e] only, you need to do

% hg push -r [e]

where [e] is the changeset hash. By default hg push will simply compare the repositories and see that [x], [y], and [e] are missing, but you might not want to share [x] and [y] yet.

If the bugfix also effects you, you want to merge it with your feature branch:

% hg update [y]
% hg merge

That will leave your repository graph looking like this:

            [x] --- [y] ----------- [z]
           /                       /
[a] --- [b] --- [c] --- [d] --- [e]

where [z] is the merge between [y] and [e]. You could also have opted to throw the branch away:

% hg strip [x]

My main point of this story is this: a single clone can easily represent several tracks of development. This has always been true for "plain hg" without using any extensions. The bookmarks extension is a great help, though. It will allow you to assign names (bookmarks) to changesets. In the case above you will want a bookmark on your development head and one on the upstream head. Bookmarks can be pushed and pulled with Mercurial 1.6 and have become a built-in feature in Mercurial 1.8.

If you had opted to make two clones, your development clone would have looked like this after making [x] and [y]:

[a] --- [b] --- [x] --- [y]

And your upstream clone will contain:

[a] --- [b] --- [c] --- [d]

You now notice the bug and fix it. Here you don't have to hg update since the upstream clone is ready to use. You commit and create [e]:

[a] --- [b] --- [c] --- [d] --- [e]

To include the bugfix in your development clone you pull it in there:

[a] --- [b] --- [x] --- [y]
           \
            [c] --- [d] --- [e]

and merge:

[a] --- [b] --- [x] --- [y] --- [z]
           \                   /
            [c] --- [d] --- [e]

The graph might looks different, but it has the same structure and the end result is the same. Using the clones you had to do a little less mental bookkeeping.

Named branches didn't really come into the picture here because they are quite optional. Mercurial itself was developed using two clones for years before we switched to using named branches. We maintain a branch called 'stable' in addition to the 'default' branch and make our releases based on the 'stable' branch. See the standard branching page in the wiki for a description of the recommended workflow.

29

votes

I think you want the entire history in one repo. Spawning off a short-term repo is for short-term experiments, not major events like releases.

One of the disappointments of Mercurial is that there seems to be no easy way to create a short-lived branch, play with it, abandon it, and collect the garbage. Branches are forever. I sympathize with never wanting to abandon history, but the super-cheap, disposable branches are a git feature that I would really like to see in hg.

14

votes

You should do both.

Start with the accepted answer from @Norman: Use one repository with one named branch per release.

Then, have one clone per release branch for building and testing.

One key note is that even if you use multiple repositories, you should avoid using transplant to move changesets between them because 1) it changes hash, and 2) it may introduce bugs that are very difficult to detect when there are conflicting changes between the changeset you transplant and the target branch. You want to do the usual merge instead (and without premerge: always visually inspect the merge), which will result in what @mg said at the end of his answer:

The graph might looks different, but it has the same structure and the end result is the same.

More verbosely, if you use multiple repositories, the "trunk" repository (or default, main, development, whatever) contains ALL changesets in ALL repositories. Each release/branch repository is simply one branch in the trunk, all merged back one way or the other back to trunk, until you want to leave an old release behind. Therefore, the only real difference between that main repo and the single repo in the named branch scheme is simply whether branches are named or not.

That should make it obvious why I said "start with one repo". That single repo is the only place you'll ever need to look for any changeset in any release. You can further tag changesets on the release branches for versioning. It's conceptually clear and simple, and makes system admin simpler, as it's the only thing that absolutely has to be available and recoverable all the time.

But then you still need to maintain one clone per branch/release that you need to build and test. It's trivial as you can hg clone <main repo>#<branch> <branch repo>, and then hg pull in the branch repo will only pull new changesets on that branch (plus ancestor changesets on earlier branches that were merged).

This setup best fits the linux kernel commit model of single puller (doesn't it feel good to act like Lord Linus. At our company we call the role integrator), as the main repo is the only thing that developers need to clone and the puller needs to pull into. Maintenance of the branch repos is purely for release management and can be completely automated. Developers never need to pull from/push to the branch repos.

Here is @mg's example recasted for this setup. Starting point:

[a] - [b]

Make a named branch for a release version, say "1.0", when you get to alpha release. Commit bug fixes on it:

[a] - [b] ------------------ [m1]
         \                 /
          (1.0) - [x] - [y]

(1.0) is not a real changeset since named branch does not exist until you commit. (You could make a trivial commit, such as adding a tag, to make sure named branches are properly created.)

The merge [m1] is the key to this setup. Unlike a developer repository where there can be unlimited number of heads, you do NOT want to have multiple heads in your main repo (except for old, dead release branch as mentioned before). So whenever you have new changesets on release branches, you must merge them back to default branch (or a later release branch) immediately. This guarantees that any bug fix in one release is also included in all later releases.

In the meanwhile development on default branch continues toward the next release:

          ------- [c] - [d]
         /
[a] - [b] ------------------ [m1]
         \                 /
          (1.0) - [x] - [y]

And as usual, you need to merge the two heads on default branch:

          ------- [c] - [d] -------
         /                         \
[a] - [b] ------------------ [m1] - [m2]
         \                 /
          (1.0) - [x] - [y]

And this is the 1.0 branch clone:

[a] - [b] - (1.0) - [x] - [y]

Now it's an exercise to add the next release branch. If it's 2.0 then it'll definitely branch off default. If it's 1.1 you can choose to branch off 1.0 or default. Regardless, any new changeset on 1.0 should be first merged to the next branch, then to default. This can be done automatically if there's no conflict, resulting in merely an empty merge.

I hope the example makes my earlier points clear. In summary, the advantages of this approach is:

Single authoritative repository that contains complete changeset and version history.
Clear and simplified release management.
Clear and simplified workflow for developers and integrator.
Facilitate workflow iterations (code reviews) and automation (automatic empty merge).

UPDATE hg itself does this: the main repo contains the default and stable branches, and the stable repo is the stable branch clone. It doesn't use versioned branch, though, as version tags along the stable branch are good enough for its release management purposes.

5

votes

The major difference, as far as I know, is something you've already stated: named branched are in a single repository. Named branches have everything handy in one place. Separate repos are smaller and easy to move around. The reason there are two schools of thought on this is that there's no clear winner. Whichever side's arguments make the most sense to you is probably the one you should go with, because it's likely their environment is most similar to yours.

2

votes

I think it's clearly a pragmatic decision depending on the current situation, e.g. the size of a feature/redesign. I think forks are really good for contributors with not-yet-committer roles to join the developer team by proving their aptitude with neglectable technical overhead.

0

votes

I'd really advise against using named branches for versions. That's really what tags are for. Named branches are meant for long lasting diversions, like a stable branch.

So why not just use tags? A basic example:

Development happens on a single branch
Whenever a release is created, you tag it accordingly
Development just continues on from there
If you have some bugs to fix (or whatever) in a certain release, you just update to it's tag, make your changes and commit

That will create a new, unnamed head on the default branch, aka. an anonymous branch, which is perfectly fine in hg. You may then at any point merge the bugfix commits back into the main development track. No need for named branches.

Named Branches vs Multiple Repositories

6 Answers