The Mercurial workflow that I use
Published: aug 14, 2009
Category: Scribbles
Tags: dvcs, mercurial, sose
Language: [English]
page views: 1069
Over the last year, I've been using Mercurial as my distributed version control system of choice. It's easy to use, trivial to setup and ready to use for any platform out there. However, since DVCS' are quite new, the best workflows are still not entirely known. Also being distributed gives a lot more room for different workflows then a centralized system ever could. There are simply a lot more choices to make.
As there are so many ways of using the tool, it took me quite a while to find a flow that fit with my needs. Below I'm detailing the Mercurial with it's pbranch solution I am using now. It might give you some ideas of what might work for you.
For details on how to use mercurial itself you can refer to the excellent online book.
My normal environment is one where I work with mercurial on small/medium size projects, where I'm either the a developer with repositories where people can pull from or I send out patches to other developers. I publish some of my repositories in a public place, but I'm not using it as a shared central repository (which is no problem at all, but it's just not something I do).
For small and simple code, and system configuration files, I just use a single repository. Because there's no cost in setting up a repository, whenever I find some code laying around, I just hg init; hg addremove it. This is also the case when contributing a one off patch to an existing project. I just grab the tar.gz and initiate my mercurial repos. Then I have all the tools to write the patch. If a new version comes out a later date, tracking those changes and my patch also becomes simple.
For more complex projects I have some more rules. First there's the general best practice to commit often. Each time you achieved something that gives a basis to continue with the task do a commit. It's like a long string of stepping stones. Each gives a solid basis for the next step.
However an other equally important rule is that each commit to the main branch should add value. Having a test suite that doesn't pass all the tests looses value, even if the code base has extra features. Having outdated documentation because also reduces the value of the code base and when is the documentation of a feature going to be updated if not with the commit that adds it? To get the tests, the code and the documentation updated, I want to make more then one commit. So just commiting to the main branch will not get me what I want.
During my years with subversion this was easy enough to solve. For each feature or bug, open a feature branch, do all the commits to that branch and then, when all is well, merge it back into the main branch, which is called trunk in subversion. On the trunk you see exactly one commit for a feature or bugfix. That one commit contains the code, tests and documentation updates. Anybody could get a copy of the trunk and trust that tests would run and documentaion was up-to-date. Looking at the history of the trunk, there were relatively few commits. With each commit being a complete feature. The subversion log reads like a change-log. Great!
Using subversion has other issues and limitations for me. So, when I switched to DVCS Mercurial, to fix those issues, I went looking for how to get my workflow.
One of the standard practices of Mercurial is to, for each feature being developed, make a local clone of the main development repository and develop the feature there. When it's finished, you "push" those changes back into the main development repository. That works fine, except that the history of the main repository there are visible all the intermediate commits I did to create the feature. There's no way to tell when I thought a feature was finished. Except that that commit is a merge (a change that has two parents in mercurial). But I do merges all the time (to pull into the feature branch the changes made in the main branch) so that was not a good criteria. Using commit messages like "Fixes issue #1234" is also fiddely. It reminds me of the bad old days of working with SVN without merge-tracking. But the worst is, it's a lie. That one change doesn't close it, it and all it's previous change-sets close it, together.
So I went looking for a solution. And many of the smart people that develop the Linux kernel, use something called a "patch queue". Basically the idea is to have patches live on top of a repository, it allows for there modification, creating updates step by step. Once a patch is ready it is turned into a proper commit on the main repository. So the changes to the patches are sort of like pre-commit changes. It sounded like a good thing.
Mercurial has an excellent implementation of this concept called "Mercurial Queues". MQ is very good extension for Mercurial and allows you to do all kinds of fiendishly complex things with (stacks of) patches. For me it just has one problem. It made my brain meld. MQ has a separate repository that doesn't contain files, it contains patches. And when when patch is updated MQ does a patch of a patch. And reading patches-of-patches to know what changed from one version of my feature to the next, that hurts! I use mercurial because to me, it's the simplest version control solution. And that does not include brain meld.
So, back I was, wondering how to implement my workflow. Then I found the excellent patch branches extension, also known as 'pbranch'. It supports much of the mercurial queue functionality, but uses standard mercurial concepts like named-branches. The main ideas behind pbranches is the idea of having named branches as feature branches, and allowing a tree (a DAG really) of those features on top of the default development branch. Once a feature is ready, the changes are exported in one patch, this can send to the maillinglist or developer for inclusion in the main development branch.
A new patch branch is created with hg pnew . It creates a mercurial named-branch with some nice tricks added. One thing to keep in mind when creating a pbranch is that if you are already in a pbranch and do hg pnew you get a branch that is "on top of" the old pbranch. This allows for a "tree" of patches. The relation of pbranches to the default branch and each other can be seen by running hg pgraph. It's easy to change the relation between branches by editing the .hg/pgraph file. It's a good idea to use hg peditmessage to set a short message about what the patch is supposed to solve.
After changing some code for a patch just do the known hg commit on it as you would normally. So commit by commit patches are created. All of the normal mercurial commands work as expected, and switching between patches to work on is a simple hg up branchname .
If the default branch changed, it can merge those changes in the pbranches with hg pmerge. To see what pbranches need updating run the pgraph command with the --status option. When the patch is complete hg pexport branchname generates the patch from all your commits to be send off to the mailing list.
After messing about a bit, my workflow has solified into the following. I have two repositories: a main one and a development one. Both are clones of the public mercurial repository. The main just reflects the changes in the public repository and I don't do any work on it directly.
Normally I like to keep all of the patches depend directly on de "default" branch so that I can release feature-patch without having to depend on other work being completed first. Regularly pulling and merging new changes from the main development branch helps to keep the patches from having conflicts once they are finished
The classical example of where this is not the case is when, while developing a feature, it really needs some refactoring to get the feature to fit right in the code. When this happens I split up the branch into refactor and feature branches. However most of the time the feature fits well in the code and this is not needed.
Once I've finished sculpting my patch (all the tests pass and documentration is up to date), I make sure that my development repository has all the latest revisions of the main development. Once I'm sure, I run pexport and get the patch, collapsing all of the efort into one diff file. This is the patch I send to the developers for revision.
If the patch needs more work to be accepted, I can continue work on it without any trouble. generating patches with hg pexport as much as I need until it does get accepted into the main development branch. If colaboration on a patch is needed, it's very easy to make the pbranch development repository available, either entirely or just that branch.
On some of the projects, I am the main developer. In these cases, when I have a patch ready, I then Switching back to the main development repository, I run hg import and import my patch into the mainline. Without any options import will take a patchfile and commit it to the local copy of the repository. This is basically what any repository maintainer does when they receive a patch. Finally all the project's unit-tests should pass, of course. Visible on the main repository history is one commit, containing all the changes in one. Of course nothing of the branch that was used for developing the patch is visible in the main repository.
By my own doing, or someone else's, a changeset with the patch is now in the main repository and the work of the pbranch is done. Because I use the same development repository for many patches I found it useful to do some extra housekeeping steps. First, I pull the applied patch back into the development branch, so that the same changes twice: once as the pbranch and once as the pulled changeset. Next, I merge the pbranch with the changeset of the feature. This "links" the pbranch with the main changeset for future. Then remove the pbranch from .hg/pgraph, turning it into a normal named branch. It is always possible add it to .hg/pgraph later, reviving it as a proper pbranch. Finally, when sufficient time has passed on the branch, I close the branch with hg commit --close-branch. All of this could be scripted with little effort, I just havn't felt the need for it yet.
The end result is that there are two publicly visible repositories, the traditional mainline repository, with clean changesets. This is the one other developers are encouraged to pull and the continues deployment server runs against. Then there's the development repos, with branches and all the tedious steps of the creation of the patches. Where it is possible to colaborate with others on specific issues or problems.
Conclusion is that mercurial can go a long way to get a specific workflow going with little effort.


No comments yet. Comments to posts older then a month are not allowed, due to comment-spam, Sorry.