You may have heard some hubbub over distributed version control
systems recently. You may dismiss it as the next hot thing, the newest
flavor of kool-aid currently quenching the collective thirst of the
bandwagon jumpers. You, however, have been using Subversion quite
happily for some time now. It has treated you pretty well, you know it
just fine and you are comfortable with it – I mean, it’s just version
control, right?
You may want to give it a second look. Not just at distributed
version control systems, but at the real role of version control in your
creative toolkit. In this article, I’m going to introduce you to Git,
my favorite DVCS, and hopefully show you why it is not only a better
version control system than Subversion, but also a revolutionary way to
think about how you get your work done.
Now, this isn’t really a how-to on Git – I won’t be going over a lot of specific commands or get you up and running.
This is a list of arguments on why you should be seriously considering
Git if you’re currently using SVN. To learn Git, there is a free online
book called Pro Git
that I wrote that will walk you through Git step by step, should this
article entice you. For each point I make here, I will be linking to the
appropriate section of that book, should you want to find out more
about that specific feature of Git.So, first
we’re going to look at the inherent advantages of distributed systems
over centralized ones. These are things that systems like Subversion
simply cannot do. Then we’ll cover the powerful context switching and
file crafting tools that are technically possible to do with Subversion,
but which Git makes easy enough that you would actually use them. These
tools should completely change the way you work and the way you think
about working.
The Advantages of Being Distributed
Git is a distributed version control system. So what does
“distributed” actually mean? Well it means that instead of running `svn
checkout (url)` to get the latest version of your repository, with Git
you run `git clone (url)`, which gives you a complete copy of the entire
history of that project. This means that immediately after the clone,
there is basically no information about that project that the server you
cloned from has that you do not have. Interertingly, Subversion is so
inefficient at this that in general it’s nearly as fast to clone an entire repository over Git as it is to checkout a single version of the same repository over Subversion.
Now, this gives you a couple of immediate advantages. One is that
nearly every operation is now done off data on your local disk, meaning
that it is both unbeliveably fast and can be done offline. This means
that you can do commits, diffs, logs, branches, merges, file annotation
and more – entirely offline, off VPN and generally instantly. Most
commands you run in Git take longer to type then they do to execute. Now
stop for a moment and try to remember how many times you’ve gone to get
a cup of coffee while Subversion has been running some command. Or jot
down a quick list of occasions on which you’ve wanted to commit but
didn’t have an internet connection or couldn’t connect to your corporate
VPN.
The other implicit advantage of this model is that your workflow does
not have a single point of failure. Since every person working on your
project has what is essentially a full backup of the project data,
losing your collaboration servers is a minor inconvenience at best.
Imagine for a moment your SVN server having a hard drive corruption –
when was your last backup and how many hours will it take to get to the
point where your team can start working again? In Git, any team member
can push to any server where every member has SSH access and the whole
team can be easily up and running in a matter of minutes.
The final advantage I’ll cover of distributed systems are the incredible workflows
that are now available to you. Git does not depend on a centralized
server, but does have the ability to syncronize with other Git
repositories – to push and pull changes between them. This means that
you can add multiple remote repositories to your project, some read-only
and some possibly with write access as well, meaning you can have
nearly any type of workflow you can think of.
You can continue to use a centralized workflow, with one
central server that everyone pushes to and pulls from. However, you can
also do more interesting things. For example, you can have a remote
repository for each user or sub-team in your group that they have write
access to, then a designated maintainer or QA team or integrator can
then pull their work together and push it to a ‘gold’ repository that is
deployed from.
You can build any sort of heirarchical or peer-based workflow model
with Git that you can think of, in addition to being able to use it as a
centralized hub as in SVN. Your workflow can grow and adapt with your
business model.
You can also use it in other ways – an interesting example of this is deploying on the Ruby hosting company Heroku.
To deploy to their systems, you simply push to your ‘heroku’ remote
repository. You can develop and collaborate on other remote
repositories, but then when you actually want to deploy your code to
running servers, you push to the Heroku Git repository instead. Imagine trying to do that with Subversion.
Lightweight Branches: Frictionless Context Switching
Before I begin explaining this, which is actually my favorite feature
of Git, I need you to do me a favor. Forget everthing you know about
branches. Your knowledge of what a ‘branch’ means in Subversion is
poisonous, especially if you internalized it pre-1.5, like I did, before
Subversion finally grew some basic merge tracking capabilities. Forget
how painful it was to merge, forget how long it took to switch branches,
forget how impossible it was to merge from a branch more than once –
Git gives you a whole new world when it comes to branching and merging.
In Git, branches are not a dirty word – they are used often and
merged often, in many cases developers will create one for each feature
they are working on and merge between them possibly multiple times a
day, and it’s generally painless. This is what hooked me on Git in the
first place, and in fact has changed the entire way I approach my
development.
When you create a branch in Git, it does so locally and it happens
very fast. Here is an example of creating a branch and then switching to
your new branch to start doing development.
$ time git branch myidea real 0m0.009s user 0m0.002s sys 0m0.005s $ time git checkout myidea Switched to branch "myidea" real 0m0.298s user 0m0.004s sys 0m0.017s
It took about a third of a second for both commands together. Think
for a second about the equivalent in Subversion – running a `copy` and
then a `switch`
$ time svn copy -m 'my idea' real 0m5.172s user 0m0.033s sys 0m0.016s $ time svn switch real 0m8.404s user 0m0.153s sys 0m0.835s
Now the difference between 1/3 of a second and 13 seconds (not to
mention the time it takes to remember each long URL) may not seem huge
at first, but there is a significant psychological difference there. Add
to that the fact that your network speed, server load and connectivity
status are all factors in Subversion, where it always takes 1/3 of a second in Git and that makes a pretty big difference. Also, branching is considered a fast operation in Subversion – you will see even more pronounced speed differences in other common operations like log and diff.
However, that is not the real power of Git branches. The real power
is how you use them, the raw speed and ease of the commands just makes
it more likely that you will. In Git, a common use case is to create a
new local branch for everything you work on. Each feature, each
idea, each bugfix – you can easily create a new branch quickly, do a
few commits on that branch and then either merge it into your mainline
work or throw it away. You don’t have to mess up the mainline just to
save your experimental ideas, you don’t have to be online to do it and
most importantly, you can context switch almost instantly.
Now, once you have work on a couple of branches, what about merging?
If you’re from the world of Subversion, you may cringe at that word,
‘merge’. Since Git records your commit history as a directed graph of
commits, it’s generally easy for it to automatically figure out the best
merge base to do a 3 way merge with. Most Subversion users are used to
having to figure that out manually, which is an error prone and time
consuming process – Git makes it trivial. Furthermore, you can merge
from the same branch multiple times and not have to resolve the same
conflicts over and over again. I often do dozens of merges a day on
certain Git projects of mine and rarely have even trivial merge
conflicts – certainly nothing that isn’t predictable. Raise your hand if
you’ve ever done a dozen branch merges on a Subversion project at least
once a week and didn’t end each day by drinking heavily.
As an anecdotal case study, take my Pro Git book. I put the Markdown source of the book on GitHub,
the social code hosting site that I work for. Within a few days, I
started getting dozens of people forking my project and contributing
copy edits, errata fixes and even translations.
In Git, each of these forks is treated as a branch which I could pull
down and merge individually. I spend a few minutes once or twice a week
to pull down all the work that has happened, inspect each branch and
merge the approved ones into my mainline.
As of the time of writing this article, I’ve done 34 merges in about 2
weeks – I sit down in the morning and merge in all the branches that
look good. As an example, during the last merge session I inspected and
merged 5 seperate branches in 13 minutes. Once again, I will leave it as
an exercise to the reader to contemplate how that would have gone in Subversion.
Becoming a Code Artist
You get home on Friday after a long week of working. While sitting in
your bean bag chair drinking a beer and eating Cheetos you have a mind
blowing idea. So, you whip out your laptop and proceed to work on your
great idea the entire weekend, touching half the files in your project
and making the entire thing 87 times more amazing. Now you get into work
and connect to the VPN and can finally commit. The question now is what
do you do? One great big honking commit? What are your other options?
In Git, this is not a problem. Git has a feature that is pretty unique called a “staging area”,
meaning you can craft each commit at the very last minute, making it
easy to turn your weekend of frenzied work into a series of well thought
out, logically separate changesets.
If you’ve edited a bunch of files and you want to create several
commits of just a few files each, you simply have to stage just the ones
you want before you commit and repeat that a few times.
$ git add file1.c file2.c file3.c $ git commit -m 'files 1-3 for feature A' $ git add file4.c file5.c file6.c $ git commit -m 'files 4-6 for feature B'
This allows other people trying to figure out what you’ve done to
more easily peer-review your work. If you’ve changed three logically
different things in your project, you can commit them as three different
reviewable changesets as late as possible.
Not only that, which is pretty powerful in itself, but Git also makes it easy to stage parts
of files. This is a feature that has prevented coworkercide in my
professional past. If someone has changed 100 lines of a file, where 96
of them were whitespace and comment formatting modifications, while the
remaining 4 were significant business logic changes, peer-reviewing that
if committed as one change is a nightmare. Being able to stage the
whitespace changes in one commit with an appropriate message, then
staging and committing the business logic changes seperately is a life
saver (literally, it may save your life from your peers). To do this,
you can use Git’s patch staging feature that asks you if you want to stage the changes to a file one hunk at a time (git add -p).
These tools allow you to craft your commits to be easily reviewable, cherry-pickable,
logically seperate changes to your project. The advantages to thinking
of your project history this way and having the tools to easily maintain
that discipline without having to carefully plan out every commit more
than a few seconds before you need to create them gives you a freedom
and flexibility that is very empowering.
In Subversion the only real way to accomplish the same thing is with a
complicated system of diffing to temporary files, reverting and
partially applying those temporary files again. Raise your hand if
you’ve ever actually taken the time to do that and if you would consider
the process ‘easy’ in any way. Git users often do this type of
operation on a daily basis and you need nothing outside of Git itself to
accomplish it.
Not Just for Teams of Coders
I hear from individuals all the time that this could not possibly be
worth switching because they don’t work in large teams or don’t
collaborate with other people at all. Or perhaps you’re not really a
programmer, but a designer or a writer.
Well, on the individual versus a team front, I would argue that
nearly everything I love about Git, much of which I’ve written about
here, I love because it helps me, not because it helps my teammates. Screw them.
Local branching and frictionless context switching is entirely useful
to an individual and probably the most unique and revolutionary feature
of Git. In fact, I very often use Git like you might use RCS – just fire it up
on some local directory and check stuff in every once in a while,
having no remote repositories at all. Creating commits as logically
seperate changesets is also helpful to you to remember why you did
something a month ago, so those tools are also helpful on an individual
level and finally, speed and backups are always a good thing, team or individual.
If you’re not really a software developer, I’ve already listed an example of using Git to collaborate on a book. Pro Git
is being published by Apress, a major publishing company, and most of
the writing and review of the book was done in Markdown using Git to
collaborate. All the errata and translations are being handled in Git
branches. You don’t know real writing bliss until you merge in a
technical reviewers or copy editors modifications with something as
simple as `git merge`.
In Closing…
In closing, this is really just the tip of the iceburg of awesome
that is Git. There are tons of fantastic and powerful features in Git
that help with debugging, complex diffing and merging and more. There is also a great developer community to tap into and become a part of and a number of really good free resources online
to help you learn and use Git. The few things I’ve mentioned here are
simply the features that most changed the way I think about working and
version control. They are the major reasons I could never go back to a
system like Subversion. It wouldn’t be like saying to me “you have to
use a Toyota instead of a Mercedes”, it would be like saying “you have
to use a typewriter instead of a computer” – it has forever changed the
way I approach and think about creating things.
I want to share with you the concept that you can think about version
control not as a neccesary inconvenience that you need to put up with
in order to collaborate, but rather as a powerful framework for managing
your work seperately in contexts, for being able to switch and merge
between those contexts quickly and easily, for being able to make
decisions late and craft your work without having to pre-plan everything
all the time. Git makes all of these things easy and prioritizes them
and should change the way you think about how to approach a problem in
any of your projects and version control itself.
This article was written by Scott Chacon
No comments:
Post a Comment