The Mercurial workflow that I use
[en] Scribbles dvcs, mercurial, sose, aug 14, 2009

Over the last year, I've been using Mercurial as my distributed version control system of choice. It's easy to use, trivial to setup and ready to use for any platform out there. However, since DVCS' are quite new, the best workflows are still not entirely known. Also being distributed gives a lot more room for different workflows then a centralized system ever could. There are simply a lot more choices to make.

As there are so many ways of using the tool, it took me quite a while to find a flow that fit with my needs. Below I'm detailing the Mercurial with it's pbranch solution I am using now. It might give you some ideas of what might work for you.

For details on how to use mercurial itself you can refer to the excellent online book.

My normal environment is one where I work with mercurial on small/medium size projects, where I'm either the a developer with repositories where people can pull from or I send out patches to other developers. I publish some of my repositories in a public place, but I'm not using it as a shared central repository (which is no problem at all, but it's just not something I do).

For small and simple code, and system configuration files, I just use a single repository. Because there's no cost in setting up a repository, whenever I find some code laying around, I just hg init; hg addremove it. This is also the case when contributing a one off patch to an existing project. I just grab the tar.gz and initiate my mercurial repos. Then I have all the tools to write the patch. If a new version comes out a later date, tracking those changes and my patch also becomes simple.

For more complex projects I have some more rules. First there's the general best practice to commit often. Each time you achieved something that gives a basis to continue with the task do a commit. It's like a long string of stepping stones. Each gives a solid basis for the next step.

However an other equally important rule is that each commit to the main branch should add value. Having a test suite that doesn't pass all the tests looses value, even if the code base has extra features. Having outdated documentation because also reduces the value of the code base and when is the documentation of a feature going to be updated if not with the commit that adds it? To get the tests, the code and the documentation updated, I want to make more then one commit. So just commiting to the main branch will not get me what I want.

During my years with subversion this was easy enough to solve. For each feature or bug, open a feature branch, do all the commits to that branch and then, when all is well, merge it back into the main branch, which is called trunk in subversion. On the trunk you see exactly one commit for a feature or bugfix. That one commit contains the code, tests and documentation updates. Anybody could get a copy of the trunk and trust that tests would run and documentaion was up-to-date. Looking at the history of the trunk, there were relatively few commits. With each commit being a complete feature. The subversion log reads like a change-log. Great!

Using subversion has other issues and limitations for me. So, when I switched to DVCS Mercurial, to fix those issues, I went looking for how to get my workflow.

One of the standard practices of Mercurial is to, for each feature being developed, make a local clone of the main development repository and develop the feature there. When it's finished, you "push" those changes back into the main development repository. That works fine, except that the history of the main repository there are visible all the intermediate commits I did to create the feature. There's no way to tell when I thought a feature was finished. Except that that commit is a merge (a change that has two parents in mercurial). But I do merges all the time (to pull into the feature branch the changes made in the main branch) so that was not a good criteria. Using commit messages like "Fixes issue #1234" is also fiddely. It reminds me of the bad old days of working with SVN without merge-tracking. But the worst is, it's a lie. That one change doesn't close it, it and all it's previous change-sets close it, together.

So I went looking for a solution. And many of the smart people that develop the Linux kernel, use something called a "patch queue". Basically the idea is to have patches live on top of a repository, it allows for there modification, creating updates step by step. Once a patch is ready it is turned into a proper commit on the main repository. So the changes to the patches are sort of like pre-commit changes. It sounded like a good thing.

Mercurial has an excellent implementation of this concept called "Mercurial Queues". MQ is very good extension for Mercurial and allows you to do all kinds of fiendishly complex things with (stacks of) patches. For me it just has one problem. It made my brain meld. MQ has a separate repository that doesn't contain files, it contains patches. And when when patch is updated MQ does a patch of a patch. And reading patches-of-patches to know what changed from one version of my feature to the next, that hurts! I use mercurial because to me, it's the simplest version control solution. And that does not include brain meld.

So, back I was, wondering how to implement my workflow. Then I found the excellent patch branches extension, also known as 'pbranch'. It supports much of the mercurial queue functionality, but uses standard mercurial concepts like named-branches. The main ideas behind pbranches is the idea of having named branches as feature branches, and allowing a tree (a DAG really) of those features on top of the default development branch. Once a feature is ready, the changes are exported in one patch, this can send to the maillinglist or developer for inclusion in the main development branch.

A new patch branch is created with hg pnew . It creates a mercurial named-branch with some nice tricks added. One thing to keep in mind when creating a pbranch is that if you are already in a pbranch and do hg pnew you get a branch that is "on top of" the old pbranch. This allows for a "tree" of patches. The relation of pbranches to the default branch and each other can be seen by running hg pgraph. It's easy to change the relation between branches by editing the .hg/pgraph file. It's a good idea to use hg peditmessage to set a short message about what the patch is supposed to solve.

After changing some code for a patch just do the known hg commit on it as you would normally. So commit by commit patches are created. All of the normal mercurial commands work as expected, and switching between patches to work on is a simple hg up branchname .

If the default branch changed, it can merge those changes in the pbranches with hg pmerge. To see what pbranches need updating run the pgraph command with the --status option. When the patch is complete hg pexport branchname generates the patch from all your commits to be send off to the mailing list.

After messing about a bit, my workflow has solified into the following. I have two repositories: a main one and a development one. Both are clones of the public mercurial repository. The main just reflects the changes in the public repository and I don't do any work on it directly.

Normally I like to keep all of the patches depend directly on de "default" branch so that I can release feature-patch without having to depend on other work being completed first. Regularly pulling and merging new changes from the main development branch helps to keep the patches from having conflicts once they are finished

The classical example of where this is not the case is when, while developing a feature, it really needs some refactoring to get the feature to fit right in the code. When this happens I split up the branch into refactor and feature branches. However most of the time the feature fits well in the code and this is not needed.

Once I've finished sculpting my patch (all the tests pass and documentration is up to date), I make sure that my development repository has all the latest revisions of the main development. Once I'm sure, I run pexport and get the patch, collapsing all of the efort into one diff file. This is the patch I send to the developers for revision.

If the patch needs more work to be accepted, I can continue work on it without any trouble. generating patches with hg pexport as much as I need until it does get accepted into the main development branch. If colaboration on a patch is needed, it's very easy to make the pbranch development repository available, either entirely or just that branch.

On some of the projects, I am the main developer. In these cases, when I have a patch ready, I then Switching back to the main development repository, I run hg import and import my patch into the mainline. Without any options import will take a patchfile and commit it to the local copy of the repository. This is basically what any repository maintainer does when they receive a patch. Finally all the project's unit-tests should pass, of course. Visible on the main repository history is one commit, containing all the changes in one. Of course nothing of the branch that was used for developing the patch is visible in the main repository.

By my own doing, or someone else's, a changeset with the patch is now in the main repository and the work of the pbranch is done. Because I use the same development repository for many patches I found it useful to do some extra housekeeping steps. First, I pull the applied patch back into the development branch, so that the same changes twice: once as the pbranch and once as the pulled changeset. Next, I merge the pbranch with the changeset of the feature. This "links" the pbranch with the main changeset for future. Then remove the pbranch from .hg/pgraph, turning it into a normal named branch. It is always possible add it to .hg/pgraph later, reviving it as a proper pbranch. Finally, when sufficient time has passed on the branch, I close the branch with hg commit --close-branch. All of this could be scripted with little effort, I just havn't felt the need for it yet.

The end result is that there are two publicly visible repositories, the traditional mainline repository, with clean changesets. This is the one other developers are encouraged to pull and the continues deployment server runs against. Then there's the development repos, with branches and all the tedious steps of the creation of the patches. Where it is possible to colaborate with others on specific issues or problems.

Conclusion is that mercurial can go a long way to get a specific workflow going with little effort.

Managing changes on a win server: patch to zip script
[en] Scribbles software, sose, c3, jun 25, 2009
Situation: I work with a version control system for development. I roll-out your changes daily (or more frequently) to the production system. Changes are always plaintext script files. However the production system is a windows box and you cannot install any version control system on it. Windows doesn't come with the patch program.

Moving the whole directory over for each change is not an option.

So how to get the changes to the machine, and maintain some control?

Here's my solution:
  1. Get the version control system to publish the changeset as a diff and save that changeset at patches/my_change.patch
  2. run patch2zip that will create a zip in the same directory structure with all the changed files.
  3. Send the zip to the windows box.
  4. On the windows box, compare the directory with the zip, creating a back-up for each file before coping over the changed file with the new version from the zip.
It's ugly, manual and scales badly for changes that touch many files. However nothing needs to be installed and works well for small changes.

Here download the patch2zip command. It should work with the zip command available in the path. Stand in the directory with the source files and point to the patch file. It will generate the zip file with the files next to the patch.

Opensource becoming OCP?
[en] Scribbles opensource, software, sose, apr 23, 2009

So, sun is being bought by Oracle. If that's a good thing for the Opensource projects run by sun still has to be seen. My guess is that MySQL might have become a lot more then it now ever will be .

One thing is sure, the projects are now complete parts of the maneuvering giant IT corporations, just like normal product lines are. They buy and sell companies based on the projects they start, support and promote.

Whatever happened to the catedral and the bazaar?

What was the main selling point of Linux systems, when it was just being laughed at... What was it... ohhh ya... It was that the system never ever went down, solid as a rock. If linux went down, it was because it had failty hardware, end of discussion.

And that was the big deal. It showed the competition for what is was: nicely polished but profoundly broken product. And why was Linux like that? What magic make Linux kernel so much superior?

The Mantra of "Release early, Release often". Seeing early on how the product performs in the cruel and unforgiving world outside the lab. And care for quality. Developers saw a kernel "oops" and took it personal. Their system should *not* do "oops", ever.

The difference was not that the code available it's the process that makes the difference.
In "Traditional" open source the priorities are set by people that are personally involved with it. Those that will suffer when the software suffers. The developers, administrators and users. They know that they will need to maintain, support and use the beasts they create and adopt.

Compare that with Open-Corporate-Projects. They are Opensource, in name and license and especially in marketing. But it does not go much further then that.

In a OCP, the choices on what to develop, how the design should be, what features to include in the next stable, and what is "good enough" to ship and how to handle security issues are corporate ones. Those are often decided for internal reasons: politics, budgets, sales, marketing forecasts, market segmentation, bonuses and the competition.

All of which has only very, very indirectly to do with quality, usability or maintainability of the product.

Using a forum maintained by payed support is not a substitute for following the developer mailling list, where the features are discussed as they are being implemented. Having an issue tracker is no subsitute to be part of the project's design as a peer. Phoning home at each start-up is not something a opensource project would do. Nor should it try to detect a license.

A OCP will degenerate to the same state as closed software. With all it's tendency for low quality, feature bloat, poor design... But there is premium support and someone-to-yell-at!

Nice polish on a broken process, soon leading to broken products.

More then just "access to the code" maybe we need to rescue respectful software. Respectful of resourses and the time of people that need to work with it.

I guess I should go back to refering to free software. Opensource is too tainted by OCP now. :-/

Presentation of mercurial DVCS.
[en] Scribbles dvcs, mercurial, sose, c1, apr 22, 2009

I gave a presentation at Sourcesense NL about Distributed Version Control Systems (DVCS) in general and mercurial in specific. The slides are available: dvcs-presentation.pdf

Bootstrapping Users in Alfresco
[en] Scribbles alfresco, development, sose, mrt 3, 2009

Just posted a decent fix for bootstrapping users in Alfresco. I posted it to the forums for better visibility by the Alfresco community.

I hope this helps someone not spend an inordinate amount of time to do something that should be properly documented in the first place!

Sans top 25 dangerous programming errors.
[en] Scribbles development, security, sose, feb 12, 2009

[I didn't blog about this since I assumed most people would grab these headlines, but since i found a few inteligent people not having heard about them, I'd rather help spread the word...]

A few weeks back the SANS institute published a compilation of the top 25 most dangerous errors developers can make.

It was compiled by a long list of corporate hot-shots but is mostly a good listing of proper common sense. Most code reviews would catch these things.... Hmmm... code reviews.. now there is a novelty!...

Of course the big software pushers are going to try to sell you semi-automated tools for detecting+fixing the stuff.  Also expect new empty marketing BS like "SANS25 verified" on software. But such is the state of IT.

WSSE the authentication protocol that could have been...
[en] Scribbles authentication, sose, steward, wsse, jan 12, 2009

For soon to be released Steward 0.3 (*cough*) I implemented part of the Atom Publishing Protocol relevant for publishing binary blobs, after all that's what Steward is all about. With APP came doing the WSSE authentication protocol. Web-Service Security Extension is an attempt to fix the problems with HTTP-basic and HTTP-digest authentications. However it will never be globally implemented.

As we all know, HTTP basic sends your user name and password in the clear over the wire. So unless the service is wrapped in TLS, that is not a good idea. The advantage of HTTP-basic is that every client since the dawn of time support it, and support it well.

Then there's HTTP-digest, which fixes some of the problems, but has problems of it's own.. And honestly, I've never seen it implemented in Ernest since it's implementation cost is higher then the marginally improved in security.

Thus WSSE was born. On the package it looks pretty good. And actually it has a lot going for it:
  • No password going in the clear over the wire
  • Using sha-1 as a hash algorithm
  • Uses nonces (one-time random numbers) preventing replay attacks
  • Contains a date also helping mitigate replays
  • It can be used without a round-trip to the server (fire and forget).
  • And last of all, it's simple to implement.
And that last part is important. If it has any hope of being a standard droves of mediocre developers will have to try and implement both the server and client.

So what are the catches:
  • Server needs access to the clear-text password to authenticate.
  • There seems to be no published standard.
  • Uses Yet-another-HTTP-header.
But it's never going to be massively adopted because of need for a clear password in the database on the server. If anything, after 30 years of network security, that's something that has stuck. It's good practice only to store a hash of the password in the server. So LDAP stores it in MD5, SMD5, SHA1, SSHA1 or whatever is your taste for hashes is. Even Active-Directory does.

So all corporations have their databases of people, but no clear text password of those accounts. And that means, no implementation. Also, if a provider does support it, it makes it that more interesting a attack target because the web-server has access to the clear-text.

And because there is not official spec, of course the Nokia software in my N95 implemented it in it's own particular way. having to implement nonstandard way of encoding to make it work with the phone.

My guess is that for proper authentication, the client and server need to agree on what hash algorithm the password is in on the server. But then you need more round trips and more intelligence in the client, but at least it would work in most scenarios.

WSSE is a step in the right direction, unfortunately, not a wise to implement in the real world. I guess that wrapping HTTP-basic with TLS still is the only proper way to do things.

Apache2 Sucks; Hail Nginx
[en] Scribbles apache, nginx, sose, nov 11, 2008

For a while I have been hating apache's HTTPD. It is slow, uses memory like crazy, it suffers from bloat, it doensn't stimulate good practices, and in its standard realworld configuration it's one honking security problem... Honestly, apache httpd is a pain. IT's the sendmail of webservers.

Besides now with web2.0 you need to have some _fast_ web infrastructure. Now one page element can get you many requests.. and I'm not talking static content. But what's worse when the user is waiting for a widget to respond, you do not have 300ms to get a page out the door. You have 30ms.

So for a while I have been experimenting with alternative httpd's. And I am starting to really love Nginx . It's lightweight, it's FAST (10K req/sec compliant), takes advanges of a modern OS and the config is really clean.

The nice thing about Nginx is it doesn't try to be more then be a great frontline http server. It doesn't do CGI (and that's a feature!), you cannot imbed PHP, perl, python or whatever inside it (and you shouldn't want to either). About the only thing nginx serves by it self are static files.

When I want to do something dynamic, Nginx just passes the request on to the application server via fast-cgi or just proxies it. Nginx does the compresion, the http1.1 keepalives, the TLS (SSL), http level authentication, and everything. Nginx sends a http1.0 requests to the application server, responds and forgets about it.

Now that I have switched to Nginx, I started to think differently about webspace. Nginx makes it natural to compose a site and hook the different services that respond to it's own url space all behind the same Nginx engine. So you start to compose a site. Now I have a bunch of light (+simple) processes each dedicated to one task hanging behind their own url receiving requests from Nginx. It's like it should be.

Nginx made setting up URLspace fun again! :)

Internal Nav

Data Feeds


If a service is not monitored, it's DOWN
By Marijn Vriens (inspired in Bruce Ecken)