PROBLEM SOLVING ON UNIX/LINUX SYSTEMS: Why You Should Switch from Subversion to Git

You may have heard some hubbub over distributed version control systems recently. You may dismiss it as the next hot thing, the newest flavor of kool-aid currently quenching the collective thirst of the bandwagon jumpers. You, however, have been using Subversion quite happily for some time now. It has treated you pretty well, you know it just fine and you are comfortable with it – I mean, it’s just version control, right?

You may want to give it a second look. Not just at distributed version control systems, but at the real role of version control in your creative toolkit. In this article, I’m going to introduce you to Git, my favorite DVCS, and hopefully show you why it is not only a better version control system than Subversion, but also a revolutionary way to think about how you get your work done.

Now, this isn’t really a how-to on Git – I won’t be going over a lot of specific commands or get you up and running. This is a list of arguments on why you should be seriously considering Git if you’re currently using SVN. To learn Git, there is a free online book called Pro Git that I wrote that will walk you through Git step by step, should this article entice you. For each point I make here, I will be linking to the appropriate section of that book, should you want to find out more about that specific feature of Git.So, first we’re going to look at the inherent advantages of distributed systems over centralized ones. These are things that systems like Subversion simply cannot do. Then we’ll cover the powerful context switching and file crafting tools that are technically possible to do with Subversion, but which Git makes easy enough that you would actually use them. These tools should completely change the way you work and the way you think about working.

The Advantages of Being Distributed

Git is a distributed version control system. So what does “distributed” actually mean? Well it means that instead of running `svn checkout (url)` to get the latest version of your repository, with Git you run `git clone (url)`, which gives you a complete copy of the entire history of that project. This means that immediately after the clone, there is basically no information about that project that the server you cloned from has that you do not have. Interertingly, Subversion is so inefficient at this that in general it’s nearly as fast to clone an entire repository over Git as it is to checkout a single version of the same repository over Subversion.

Now, this gives you a couple of immediate advantages. One is that nearly every operation is now done off data on your local disk, meaning that it is both unbeliveably fast and can be done offline. This means that you can do commits, diffs, logs, branches, merges, file annotation and more – entirely offline, off VPN and generally instantly. Most commands you run in Git take longer to type then they do to execute. Now stop for a moment and try to remember how many times you’ve gone to get a cup of coffee while Subversion has been running some command. Or jot down a quick list of occasions on which you’ve wanted to commit but didn’t have an internet connection or couldn’t connect to your corporate VPN.

The other implicit advantage of this model is that your workflow does not have a single point of failure. Since every person working on your project has what is essentially a full backup of the project data, losing your collaboration servers is a minor inconvenience at best. Imagine for a moment your SVN server having a hard drive corruption – when was your last backup and how many hours will it take to get to the point where your team can start working again? In Git, any team member can push to any server where every member has SSH access and the whole team can be easily up and running in a matter of minutes.

The final advantage I’ll cover of distributed systems are the incredible workflows that are now available to you. Git does not depend on a centralized server, but does have the ability to syncronize with other Git repositories – to push and pull changes between them. This means that you can add multiple remote repositories to your project, some read-only and some possibly with write access as well, meaning you can have nearly any type of workflow you can think of.

You can continue to use a centralized workflow, with one central server that everyone pushes to and pulls from. However, you can also do more interesting things. For example, you can have a remote repository for each user or sub-team in your group that they have write access to, then a designated maintainer or QA team or integrator can then pull their work together and push it to a ‘gold’ repository that is deployed from.

An example distributed workflow involving several Git repositories

You can build any sort of heirarchical or peer-based workflow model with Git that you can think of, in addition to being able to use it as a centralized hub as in SVN. Your workflow can grow and adapt with your business model.

You can also use it in other ways – an interesting example of this is deploying on the Ruby hosting company Heroku. To deploy to their systems, you simply push to your ‘heroku’ remote repository. You can develop and collaborate on other remote repositories, but then when you actually want to deploy your code to running servers, you push to the Heroku Git repository instead. Imagine trying to do that with Subversion.

Lightweight Branches: Frictionless Context Switching

Before I begin explaining this, which is actually my favorite feature of Git, I need you to do me a favor. Forget everthing you know about branches. Your knowledge of what a ‘branch’ means in Subversion is poisonous, especially if you internalized it pre-1.5, like I did, before Subversion finally grew some basic merge tracking capabilities. Forget how painful it was to merge, forget how long it took to switch branches, forget how impossible it was to merge from a branch more than once – Git gives you a whole new world when it comes to branching and merging.

In Git, branches are not a dirty word – they are used often and merged often, in many cases developers will create one for each feature they are working on and merge between them possibly multiple times a day, and it’s generally painless. This is what hooked me on Git in the first place, and in fact has changed the entire way I approach my development.

When you create a branch in Git, it does so locally and it happens very fast. Here is an example of creating a branch and then switching to your new branch to start doing development.

$ time git branch myidea real 0m0.009s user 0m0.002s sys 0m0.005s $ time git checkout myidea Switched to branch "myidea" real 0m0.298s user 0m0.004s sys 0m0.017s

It took about a third of a second for both commands together. Think for a second about the equivalent in Subversion – running a `copy` and then a `switch`

$ time svn copy -m 'my idea' real 0m5.172s user 0m0.033s sys 0m0.016s $ time svn switch real 0m8.404s user 0m0.153s sys 0m0.835s

Now the difference between 1/3 of a second and 13 seconds (not to mention the time it takes to remember each long URL) may not seem huge at first, but there is a significant psychological difference there. Add to that the fact that your network speed, server load and connectivity status are all factors in Subversion, where it always takes 1/3 of a second in Git and that makes a pretty big difference. Also, branching is considered a fast operation in Subversion – you will see even more pronounced speed differences in other common operations like log and diff.

However, that is not the real power of Git branches. The real power is how you use them, the raw speed and ease of the commands just makes it more likely that you will. In Git, a common use case is to create a new local branch for everything you work on. Each feature, each idea, each bugfix – you can easily create a new branch quickly, do a few commits on that branch and then either merge it into your mainline work or throw it away. You don’t have to mess up the mainline just to save your experimental ideas, you don’t have to be online to do it and most importantly, you can context switch almost instantly.

Now, once you have work on a couple of branches, what about merging? If you’re from the world of Subversion, you may cringe at that word, ‘merge’. Since Git records your commit history as a directed graph of commits, it’s generally easy for it to automatically figure out the best merge base to do a 3 way merge with. Most Subversion users are used to having to figure that out manually, which is an error prone and time consuming process – Git makes it trivial. Furthermore, you can merge from the same branch multiple times and not have to resolve the same conflicts over and over again. I often do dozens of merges a day on certain Git projects of mine and rarely have even trivial merge conflicts – certainly nothing that isn’t predictable. Raise your hand if you’ve ever done a dozen branch merges on a Subversion project at least once a week and didn’t end each day by drinking heavily.

As an anecdotal case study, take my Pro Git book. I put the Markdown source of the book on GitHub, the social code hosting site that I work for. Within a few days, I started getting dozens of people forking my project and contributing

copy edits, errata fixes and even translations. In Git, each of these forks is treated as a branch which I could pull down and merge individually. I spend a few minutes once or twice a week to pull down all the work that has happened, inspect each branch and merge the approved ones into my mainline.

The Network Graph for the Pro Git project

As of the time of writing this article, I’ve done 34 merges in about 2 weeks – I sit down in the morning and merge in all the branches that look good. As an example, during the last merge session I inspected and merged 5 seperate branches in 13 minutes. Once again, I will leave it as an exercise to the reader to contemplate how that would have gone in Subversion.

Becoming a Code Artist

You get home on Friday after a long week of working. While sitting in your bean bag chair drinking a beer and eating Cheetos you have a mind blowing idea. So, you whip out your laptop and proceed to work on your great idea the entire weekend, touching half the files in your project and making the entire thing 87 times more amazing. Now you get into work and connect to the VPN and can finally commit. The question now is what do you do? One great big honking commit? What are your other options?

In Git, this is not a problem. Git has a feature that is pretty unique called a “staging area”, meaning you can craft each commit at the very last minute, making it easy to turn your weekend of frenzied work into a series of well thought out, logically separate changesets. If you’ve edited a bunch of files and you want to create several commits of just a few files each, you simply have to stage just the ones you want before you commit and repeat that a few times.

$ git add file1.c file2.c file3.c $ git commit -m 'files 1-3 for feature A' $ git add file4.c file5.c file6.c $ git commit -m 'files 4-6 for feature B'

This allows other people trying to figure out what you’ve done to more easily peer-review your work. If you’ve changed three logically different things in your project, you can commit them as three different reviewable changesets as late as possible.

Not only that, which is pretty powerful in itself, but Git also makes it easy to stage parts of files. This is a feature that has prevented coworkercide in my professional past. If someone has changed 100 lines of a file, where 96 of them were whitespace and comment formatting modifications, while the remaining 4 were significant business logic changes, peer-reviewing that if committed as one change is a nightmare. Being able to stage the whitespace changes in one commit with an appropriate message, then staging and committing the business logic changes seperately is a life saver (literally, it may save your life from your peers). To do this, you can use Git’s patch staging feature that asks you if you want to stage the changes to a file one hunk at a time (git add -p).

These tools allow you to craft your commits to be easily reviewable, cherry-pickable, logically seperate changes to your project. The advantages to thinking of your project history this way and having the tools to easily maintain that discipline without having to carefully plan out every commit more than a few seconds before you need to create them gives you a freedom and flexibility that is very empowering.

In Subversion the only real way to accomplish the same thing is with a complicated system of diffing to temporary files, reverting and partially applying those temporary files again. Raise your hand if you’ve ever actually taken the time to do that and if you would consider the process ‘easy’ in any way. Git users often do this type of operation on a daily basis and you need nothing outside of Git itself to accomplish it.

Not Just for Teams of Coders

I hear from individuals all the time that this could not possibly be worth switching because they don’t work in large teams or don’t collaborate with other people at all. Or perhaps you’re not really a programmer, but a designer or a writer.

Well, on the individual versus a team front, I would argue that nearly everything I love about Git, much of which I’ve written about here, I love because it helps me, not because it helps my teammates. Screw them.

Local branching and frictionless context switching is entirely useful to an individual and probably the most unique and revolutionary feature of Git. In fact, I very often use Git like you might use RCS – just fire it up on some local directory and check stuff in every once in a while, having no remote repositories at all. Creating commits as logically seperate changesets is also helpful to you to remember why you did something a month ago, so those tools are also helpful on an individual level and finally, speed and backups are always a good thing, team or individual.

If you’re not really a software developer, I’ve already listed an example of using Git to collaborate on a book. Pro Git is being published by Apress, a major publishing company, and most of the writing and review of the book was done in Markdown using Git to collaborate. All the errata and translations are being handled in Git branches. You don’t know real writing bliss until you merge in a technical reviewers or copy editors modifications with something as simple as `git merge`.

In Closing…

In closing, this is really just the tip of the iceburg of awesome that is Git. There are tons of fantastic and powerful features in Git that help with debugging, complex diffing and merging and more. There is also a great developer community to tap into and become a part of and a number of really good free resources online to help you learn and use Git. The few things I’ve mentioned here are simply the features that most changed the way I think about working and version control. They are the major reasons I could never go back to a system like Subversion. It wouldn’t be like saying to me “you have to use a Toyota instead of a Mercedes”, it would be like saying “you have to use a typewriter instead of a computer” – it has forever changed the way I approach and think about creating things.

I want to share with you the concept that you can think about version control not as a neccesary inconvenience that you need to put up with in order to collaborate, but rather as a powerful framework for managing your work seperately in contexts, for being able to switch and merge between those contexts quickly and easily, for being able to make decisions late and craft your work without having to pre-plan everything all the time. Git makes all of these things easy and prioritizes them and should change the way you think about how to approach a problem in any of your projects and version control itself.

This article was written by Scott Chacon

PROBLEM SOLVING ON UNIX/LINUX SYSTEMS

2015-10-18

Why You Should Switch from Subversion to Git

The Advantages of Being Distributed

Lightweight Branches: Frictionless Context Switching

Becoming a Code Artist

Not Just for Teams of Coders

In Closing…

No comments:

Post a Comment