Sunday, February 5, 2012

Version Control for Paper Writing

Version control is a concept I've come to appreciate more and more recently. Originally, I thought its usefulness was constrained to programming and code writing. Lately, however, I've found it to be indispensable in my paper and document composition.

Branch timeline diagram for Debian Linux.
What's version control?
These days, pretty much everything has a version: movies (Star Wars Episode 1, Star Wars Episode 1 HD, Star Wars Episode 1 3d), your phone (iPhone, iPhone 3G, etc.), your phone's operating system (Android Honeycomb, Android Ice Cream Sandwich, etc.), and so on. Version control is the effort to embrace the fact that what you've done in the past is good, but life moves on, so let's not confuse how you're product is going to evolve.

When you produce different versions of a product, it's often nice to be able to have access to old versions while working on new versions. With physical objects, there's not much to do about this: You have last year's car sitting around, and you want to make improvements to it, so you start stripping things off and pasting other things on.

But with electronic (and/or intellectual) products, it can be a waste of space to keep exact copies of old versions around. Think about it this way: You've written a first edition of a book. It has 10 chapters and it looks great. You've saved all the text and the formatting of the book on your computer in a directory called "First Edition".

Fast forward five years, and your publisher tells you about this cool new thing and that they'd really like a second edition of your book, but with an extra chapter on the end (or...in the middle!). Are you going to copy your whole "First Edition" directory, rename it "Second Edition" and tag on the extra chapter in there? That seems like a waste of space if nothing in the First Edition had changed.

If you employed a version control system to your book writing, you could simply start a new "version" of the book that had the first ten chapters "pointing" towards the First Edition, but then with your new chapter tagged on the end (or wherever it is to go).

How does it apply to document preparation?
You may say to all this, "OK, computer projects get changed all the time, and with open-source licensing, it makes sense to split and track these projects, but when I write a paper/essay/document, I know what's going in it, and no one is going to be branching off my work!"

You are probably right: You don't want people branching your work before it's published. However, there are many more parallel connections to computer program development than you may realize. For example, how many academic papers (or even books) these days are written by a single author? Not many. A version control system is useful even among a few co-authors. Different people can write different (or, heaven forbid, the same) sections and then merge them downstream into a complete draft.

Also, as has been my experience, while the overall topic of a paper rarely changes, the details and sections a very often fluid in the early stages. Instead of writing most of a section, then deciding that you want to go in another direction and deleting what you just wrote, why not create a new "version" of the paper and write about the new discoveries there. This way, if you have to go back, you can look at the old "version" to see what you've already written.

Ok, I'm sold. How do I begin?
Well, that was easy! There are lots of options for version control systems. I haven't tested all (or even most) of them. These ones seem to be popular at the moment:
Out of these, I've used SVN and git. I have to say that I stand wholeheartedly behind git (but that's another post....).

Stay committed,
Clay

No comments:

Post a Comment

What do you think?