17

Patterns for Managing Source Code Branches

 4 years ago
source link: https://martinfowler.com/articles/branching-patterns.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Source code is a vital asset to any software development team, and over the decades a set of source code management tools have been developed to keep code in shape. These tools allow changes to be tracked, so we recreate previous versions of the software and see how it develops over time. These tools are also central to the coordination of a team of multiple programmers, all working on a common codebase. By recording the changes each developer makes, these systems can keep track of many lines of work at once, and help developers work out how to merge these lines of work together.

This division of development into lines of work that split and merge is central to the workflow of software development teams, and several patterns have evolved to help us keep a handle on all this activity. Like most software patterns, few of them are gold standards that all teams should follow. Software development workflow is very dependent on context, in particular the social structure of the team and the other practices that the team follows.

My task in this article is to discuss these patterns, and I'm doing so in the context of a single article where I describe the patterns but intersperse the pattern explanations with narrative sections that better explain context and the interrelationships between them. To help make it easier to distinguish them, I've identified the pattern sections with the "✣" dingbat.

Base Patterns

In thinking about these patterns, I find it useful to develop two main categories. One group looks at integration, how multiple developers combine their work into a coherent whole. The other looks at the path to production, using branching to help manage the route from an integrated code base to a product running in production. Some patterns underpin both of these, and I'll tackle these now as the base patterns. That leaves a couple of patterns that are neither fundamental, nor fit into the two main groups - so I'll leave those till the end.

Source Branching

Create a copy and record all changes to that copy.

If several people work on the same code base, it quickly becomes impossible for them to work on the same files. If I want to run a compile, and my colleague is the middle of typing an expression, then the compile will fail. We would have to holler at each other: "I'm compiling, don't change anything". Even with two this would be difficult to sustain, with a larger team it would be incomprehensible.

The simple answer to this is for each developer to take a copy of the code base. Now we can easily work on our own features, but a new problem arises: how do we merge our two copies back together again when we're done?

A source code control system makes this process much easier. The key is that it records every change made to each branch as commit. Not just does this ensure nobody forgets the little change they made to utils.java , recording changes makes it easier to perform the merge, particularly when several people have changed the same file.

This leads me to the definition of branch that I'll use for this article. I define a branch as a particular sequence of commits to the code base. The head , or tip , of a branch is the latest commit in that sequence.

MBJFNrr.png!web

That's the noun, but there's also the verb, "to branch". By this I mean creating a new branch, which we can also think of as splitting the original branch into two. Branches merge when commits from one branch are applied to another.

fAVJ3e7.png!web

The definitions I'm using for "branch" correspond to how I observe most developers talking about them. But source code control systems tend to use "branch" in a more particular way.

I can illustrate this with a common situation in a modern development team that's holding their source code in a shared git repository. One developer, Scarlett, needs to make a few changes so she clones that git repository and checks out the master branch. She makes a couple of changes committing back into her master. Meanwhile, another developer, let's call her Violet, clones the repository onto the her laptop and checks out the master branch. Are Scarlett and Violet working on the same branch or a different one? They are both working on "master", but their commits are independent of each other and will need to be merged when they push their changes back to the shared repository. What happens if Scarlett decides she's not sure about the changes that she's made, so she tags the last commit and resets her master branch to origin/master (the last commit she cloned from the shared repository).

nyqqIvV.png!web

According to the definition of branch I gave earlier, Scarlett and Violet are working on separate branches, both separate from each other, and separate from the master branch on the shared repository. When Scarlett puts aside her work with a tag, it's still a branch according to my definition (and she may well think of it as a branch), but in git's parlance it's a tagged line of code.

This terminological confusion gets worse when we run into different version control systems as they all have their own definitions of what constitutes a branch. A branch in Mercurial is quite different to a branch in git, which is closer to Mercurial's bookmark. Mercurial can also branch with unnamed heads and Mercurial folks often branch by cloning repositories.

All of this terminological confusion leads some to avoid the term. A more generic term that's useful here is codeline. I define a codeline as a particular sequence of versions of the code base. It can end in a tag, be a branch, or be lost in git's reflog. You'll notice an intense similarity between my definitions of branch and codeline. Codeline is in many ways the more useful term, and I do use it, but it's not as widely used in practice. So for this article, unless I'm in the particular context of git (or another tool's) terminology, I'll use branch and codeline interchangeably.

A consequence of this definition is that, whatever version control system you're using, every developer has at least one personal codeline on the working copy on their own machine as soon as they make local changes. If I clone a project's git repo, checkout master, and update some files - that's a new codeline even before I commit anything. Similarly if I make my own working copy of the trunk of a suberversion repository, that working copy is its own codeline, even if there's no subversion branch involved.

When to use it

An old joke says that if you fall off a tall building, the falling isn't going to hurt you, but the landing will. So with source code: branching is easy, merging is harder.

Source control systems that record every change on the commit do make the process of merging easier, but they don't make it trivial. If Scarlett and Violet both change the name of a variable, but to different names, then there's a conflict that the source management system cannot resolve without human intervention. To make it more awkward this kind of textual conflict is at least something the source code control system can spot and alert the humans to take a look. But often conflicts appear where the text merges without a problem, but the system still doesn't work. Imagine Scarlett changes the name of a function, and Violet adds some code to her branch that calls this function under its old name. This is what I call a Semantic Conflict . When these kinds of conflicts happen the system may fail to build, or it may build but fail at run-time.

The problem is familiar to anyone who has worked with concurrent or distributed computing. We have some shared state (the code base) with developers making updates in parallel. We need to somehow combine these by serializing the changes into some consensus update. Our task is made more complicated by the fact that getting a system to execute and run correctly implies very complex validity criteria for that shared state. There's no way of creating a deterministic algorithm to find consensus. Humans need to find the consensus, and that consensus may involve mixing choice parts of different updates. Often consensus can only be reached with original updates to resolve the conflicts.

I start with: "what if there was no branching". Everybody would be editing the live code, half-baked changes would bork the system, people would be stepping all over each other. And so we give individuals the illusion of frozen time, that they are the only ones changing the system and those changes can wait until they are fully baked before risking the system. But this is an illusion and eventually the price for it comes due. Who pays? When? How much? That's what these patterns are discussing: alternatives for paying the piper.
-- Kent Beck

Hence the rest of this article, where I lay out various patterns that support the pleasant isolation and the rush of wind through your hair as you fall, but minimizing the consequences of the inevitable contact with the hard ground.

I'm releasing this article in installments. I expect future installments to include the patterns:Mainline, Healthy Branch, Mainline Integration, Feature Branching, Continuous Integration, Reviewed Commits, Release Branch, Hotfix Branch, Production Branch, Release Train, Experimental Branch, and Future Branch

To find out when I publish the next installment subscribe to the site'sRSS feed, or my twitter stream


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK