Sunday, April 17, 2011

Is A GitHub Fork A Real Fork ?

Karl Fogel, who was most notably involved with the development of CVS & Subversion and is a director of the Open Source Initiative (OSI), makes an interesting observation about Github's usage of the term "fork" in an interview:
Actually, I think the notion of forking has not changed -- there has been some terminological shift, perhaps, but no conceptual shift.

When I look at the dynamics of how open source projects work, I don't see huge differences based on what forge the project is using. GitHub has a terrific product, but they also have terrific marketing, and they've promoted this idea of projects inviting users to "fork me on GitHub", meaning essentially "make a copy of me that you can work with".

But even though there is a limited technical sense in which a copy of a git-based project is in theory a "fork", in practice it is not a fork -- because the concept of a fork is fundamentally political, not technical.

To fork a project, in the old sense, meant to raise up a flag saying "We think this project has been going in the wrong direction, and we are going to take a copy of it and develop it in the right direction -- everyone who agrees, come over and join us!" And then the two projects might compete for developer attention, and for users, and perhaps for money, and maybe eventually one would win out. Or sometimes they'd merge back together. Either way, the process was a political one: it was about gaining adherents.

That dynamic still exists, and it still happens all the time. So if we start to use the word "fork" to mean something else, that's fine, but it doesn't change anything about reality, it just changes the words we use to describe reality.

GitHub started using "fork" to mean "create a workable copy". Now, it's true that the copy has a nice ability to diverge and remerge with the original on which it was based -- this is a feature of git and of all decentralized version control systems. And it's true that divergence and "remergence" is harder with centralized version control systems, like Subversion and CVS. But all these Git forks are not "forks" in the real sense. Most of the time, when a developer makes a git copy and does some work in it, she is hoping that her work will eventually be merged back into the master copy. When I say "master" copy, I don't mean "master" in some technical sense, I mean it exactly in the political sense: the master copy is the copy that has the most users following it.

So I think these features of Git and of GitHub are great, and I enjoy using them, but there is nothing revolutionary going on here. There may be a terminology shift, but the actual dynamics of open source projects are the same: most developers make a big effort to get their changes into the core distribution, because they do not want the effort of maintaining there changes independently. Even though Git somewhat reduces the overhead of maintaining an independent set of changes, it certainly does not reduce it so much that it is no longer a factor. Smart developers form communities and try to keep the codebase unified, because that's the best way to work. That is not going to change.