Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Git Origin Story (2018) (linuxjournal.com)
188 points by revorad on June 15, 2022 | hide | past | favorite | 59 comments


Andrew Tridgell maintains that he didn't violate the terms of BK license. The story of his 'reverse engineering' of BK is necessary to understand the context: https://lwn.net/Articles/132938/


So he noticed a repo identifier of bk://thunk.org:5000/ and tried telnetting to it, typed 'help' and got a list of commands and tried 'clone' and it gave all the files as sccs. That is still reverse engineering, even if it is facilitated by the wire-protocol. Especially when one has been asked directly to not do any reverse engineering.

It also seems unlikely that that was where it stopped, as that likely wouldn't have gotten any attention. The whole claim just seems disingenuous.

One can be fulfilling one's curiosity in simple and obvious ways and still be breaking license terms or the law. That something is trivial doesn't mean that it is moral, legal or licensed.


Andrew Tridgell never downloaded or used the BitKeeper client program; he was never party to the BitKeeper license terms.


It would still fall afoul of the CFAA (for which there is no exception for reverse engineering) and equivalent laws outside the US - the server he was connecting to was owned by BitMover (as far as I see) and he wasn't authorised to use it in that fashion. And the person through whom he might have authorisation (Linus) had explicitly disallowed reverse engineering. So nobody who could authorise it had done so. People have been arrested and lost court cases for less.

In good news, the supreme court clarified/limited the CFAA in June 2021 [1], so it's now seemingly legal to access a server however you like as long as you're not accessing information you wouldn't otherwise have access to.

[1] https://www.theverge.com/2021/6/5/22491859/supreme-court-van...


> the server he was connecting to was owned by BitMover (as far as I see)

That would be surprising to me. Being a DCVS, lots of folks in the community would be running their own servers and sharing the URLs to them; that is were Andrew would have noticed URLs like bk://thunk.org:5000. For instance, here's one such URL being shared on the LKML where Andrew hung out: https://www.cs.helsinki.fi/linux/linux-kernel/2003-12/1108.h... It is my assumption that bk.arm.linux.org.ku was not owned BitMover.

Perhaps that means that whoever was running the server than Andrew connected to was in violation of their license by "authorizing" Andrew's access.


Thank you, that elucidates a little. I thought odd the wide claims that the metadata was stored on BitMover's servers and that they might own that IP.

Note that the CFAA is a criminal law, so it was possible for Andrew to be prosecuted for breaking it, given a report to the police, if he hadn't been given direct permission to access that server in that way, whether or not the server owner minded.


That's not what the article claims he says.

"It was not, he says, an act of wizardly reverse engineering."

The word "wizardly" is important here. He's denying reverse engineering, only that the reverse engineering wasn't "wizardly," or basically anything magical or special. Just plain old curiosity.

Basically, the browser version of view source.


Hmm no mention of Mercurial which was also released around the same time as Git and IIRC was also a contender for Linux kernel development VCS.

Larry McVoy basically forced the creation of Git via Andrew Tridgell and look where we are today - while world uses Git. Larry must have had his reasons but I wonder how he feels about the outcome.


Around that time I (and a couple of others) worked on the excitingly named hgfront (https://github.com/tanepiper/hgfront) which was a Mercurial frontend a little like GitHub.

There's a old low-res video (https://www.youtube.com/watch?v=NARcsoPp4F8) of it in action.

I really liked Mercurial and used BitBucket for a while, but in the end GitHub won out.


I'd argue that GH won over Bitbucket because of two key factors:

1. It wasn't developed by Atlassian, who at the time had a reputation for building more unwieldy tools (Jira, anyone?) and had a weird, awkward feel. The fact that Git had gained near instantaneous traction and Bitbucket needed you to learn Mercurial made it a near non-starter.

2. GitHub's model ("Social Coding") and zero-effort force amplification made it just the right wedge. Git had terrible tooling on the frontend and GitHub providing "Just enough" tooling around the server side while also providing the right kinds of tooling (webhooks, enterprise-ish features, click-to-fork, a very good PR path, etc) made it easy to use Git.

Mercurial might be the "superior" tool from certain standpoints but Git had a few tricks up its sleeve that made people like it:

* Written in C. No extra runtimes.

* Modular construction: `git-slap` is immediately available as `git slap` like a native tool.

* Performance of Mercurial was bad on massive codebases and slower machines. I mean terrible. once you hit 500KLOC or into the 2MLOC, a simple merge could take minutes if not tens of minutes in the old days if you weren't on a fairly modern (2005-7ish) machine. If you'd kept your slightly older P2 laptop around, you were absolutely roasting under the weight of what took Git mere seconds.


Mercurial performance is still terrible for large codebases.


Someone said it is their job to keep HN a nice place. I will try to follow suit.

Could you provide citations or evidence to such a claim? In following HN guidelines, I did not do a quick dismissal. I spent time and ran tests. I do not see the same conclusions. I am ready to post up some data in good faith.


I work with (a fork of) Mercurial + a huge codebase. Checkout and merge operations in a local repo frequently take minutes. Perhaps there is something I don't understand going on, or perhaps it is the age of the fork. But I don't think so it, because I think it would have been fixed.

> I spent time and ran tests.

Ok? Share 'em. The scale I'm talking about is ginormous though. (Roughly: multiple GBs, 100k+ files & probably 250k+ commits in history).

For all I know it could be state-of-the-art performance for the scale, but repeat tasks taking minutes is still bad news.


From the article: In 2005, Andrew tried to reverse-engineer the BitKeeper networking protocols in order to create a free software alternative. If it hadn't been him, it would've been someone else—it was only a matter of time. Larry McVoy had warned the Linux developers that he would pull the plug if anyone tried this, and that's exactly what he did.

Some speculation, but is there a chance that Tridgell was aware of MyVoy's condition on the use of BitKeeper as source control for the Linux Kernel and attempted the reverse-engineering deliberately to force the scenario where the development of an Open Source BitKeeper alternative was necessary?



Interestingly about a year after that conversation BitKeeper was made open source


At some point you realize that you can no longer monetize some proprietary code you have. Now what? You basically have two options: let it die, or open source it. Pride dictates that you'll want to open source it.

In the world of infrastructure software, you have to keep moving forward because open source will replace your proprietary infrastructure software eventually. Stay still, milk that cow, and you'll lose the cow.

It's that simple.

Or better yet: develop a solid open source monetization plan to begin with. Not that that's easy! But D. Richard Hipp managed.

Thinking about it, why couldn't BK have been the SQLite3 of DVCS? The answer is probably that McVoy didn't think about the SQLite model, probably because SQLite was the first to succeed at it, and that came after the BK->Git migration, or at least the realization that SQLite had a great business model came later.

To recap, the SQLite model is this: a) make a great open source project, b) make a proprietary test suite for it with 100% branch coverage, c) convince the world that (b) exists, d) let them come to you for features and bug fixes, e) form a consortium to get paid through. BK coulda been a contendah!


>...To recap, the SQLite model is this: a) make a great open source project, b) make a proprietary test suite for it with 100% branch coverage, c) convince the world that (b) exists, d) let them come to you for features and bug fixes, e) form a consortium to get paid through.

Where does this recap come from?

My understanding was that SQLite was initially created in a context for a client that needed assurances of correctness, thus the extensive testing suite.

As for the revenue, there's a proprietary extension to support fully encrypted db and access. This is licensed and supported.


> Where does this recap come from?

Is it not true?

> My understanding was that SQLite was initially created...

Ah, you think I meant that that was the SQLite model on day 1. But no, I meant that this is what it became.

> As for the revenue, there's a proprietary extension to support fully encrypted db and access. This is licensed and supported.

Consortium membership costs money. That money pays for the dev team.


For completeness here's where that was discussed:

https://news.ycombinator.com/item?id=11667494


Every time he pops in he gets asked about this, apparently...

https://news.ycombinator.com/item?id=26204218#26205688 (2021)


Git undoubtedly has disrupted the SCM space and become the major player for some very good reasons. But this bit of history casts the famous Linus quote in a bit of a different light; "Git proved I could be more than a one-hit wonder." Initial adoption by a major project is not something everyone can make happen by throwing their weight around. If he had communicated better, arch, darcs and monotone could've been git.

SQLite has their own SCM (fossil), which hasn't taken off to nearly the same degree. Some of it has to do with the main problems different projects have. Some of it has to do with marketshare, and git started on 2nd base in that area.


It’s true that Darcs or something else could have won if only Linus told them exactly what to do. Maybe he didn’t even know how until he tried himself, but it doesn’t cast these other SCMs in a good light to say they couldn’t build something performant without Linus telling them to; you’re giving him a little too much power & credit.

It’s also true that Fossil (or any of the DVCSs released after Git) might’ve won if only they had released before Git. Fossil was released after Git was, and after Git’s popularity had started growing. I have the impression that Fossil was built partly as a reaction to Git since the Fossil docs have always compared themselves to Git and complained about certain Git features. Fossil historically might not even exist without Git, but either way, market share may not be the primary reason, since one of these two actually predates the other. BitKeeper previously had the market share that Git took over, and BK didn’t win either, so market share may not be the explanation at all. Other reasons for nothing yet overtaking Git may be that nothing yet is better enough. I’d be surprised if something doesn’t replace Git eventually though.


I don't think Git "won" principally on technical merits.

I think it won because VCS transitions are very painful so people wanted to use the system that they expected to end up being most widely adopted. Certainly that's why I chose it in 2007.

And of course when you're making this decision you should assume that many other people are making it on the same grounds.

I believe the term for this situation is a "Keynesian beauty contest".


> I don't think Git "won" principally on technical merits.

Git seems a very pure example of the Unix philosophy "worse is better". Lots of evidence that Linus examined darcs in some detail, decided it was "too smart" and wanted/needed "dumb but fast". It's even in the name of git given "git" is a British colloquialism for "idiot" or "fool". Sometimes you just need the dumb tool that works fast and gets out of the way, and the technically better and smarter (but slower; harder to understand) tool loses.


Git started out dumb, but had the right internal primitives. The rest got built over time.

Nothing wrong with that. Yes, there's warts left over all over the place. So what.


So what? It's a useful reminder that the "best technical solution" isn't always "the best solution for me/right now/here/this need that I have".

It's also useful to remember that git started out dumb and is still dumb in its own ways, because that gives us room to grow, things to aspire to, and areas to continue to research, better "technical" tools to keep an eye on.

Git was named dumb especially with respect to darcs, and from that perspective git will always be dumb: you call git's internals "right", but "right" has multiple meanings. Yes, they are well suited to the job. But the darcs perspective here is that they aren't the most "technically right". Darcs internals use primitives that much more closer resemble OTs/CRDTs. There's a lot of HN love for the technical beauty of tools like CRDTs. Darcs made the case from its beginning that source control and the mental model of source control is much closer to things like CRDTs (keeping in mind darcs predated/coincided much CRDT development) than it is to "trees of trees of files" like it is in git internals. Darcs is closer to the "right" tech internally, especially where its primitives do model programmer behavior better.

It's useful to still have darcs to compare to for what a "smarter git" could look like, even/especially at the internal primitives level. It's useful that projects like pijul are still exploring those ideas even despite "git has won" and "worse is better".

At this point like most everyone, I use git because it is incredibly pragmatic. I appreciate knowing that git isn't perfect, isn't "technically smart", and that there are cases to continue learning how we do the beautiful technically smart stuff in a way that is also or at least nearly as pragmatic.


For sure.


I think GitHub was a massive factor in Git's rise.

It would be much easier to displace if a huge part of the world's code hosting infrastructure wasn't using it.

Nearly every project will use GitHub, directly or indirectly.


To me git won because of some ergonomics. CVS and SVN felt like gnu screen, cryptic and a bit limited, requiring a good setup phase to make it right.. egg walk.

You grab git, clone stuff, branch out, merge in .. it was transparent locally or networked.. plug and play.


Fossil is amazing, but a) it lacks Git's deployment at scale experience, and b) it's too opinionated, like Mercurial.

(a) means that, for example, the Windows source code can never fit in a Fossil repository. The issue is that the SQL metadata alone is daunting. Git devs have developed a bunch of techniques for dealing with this, such as shallow clones, partial clones, and the use of Bloom filters for speeding up git log traversal (for, e.g., `git blame`).

(b) is just terribly obnoxious. Rebase == script around cherry-pick. Fossil provides cherry-picking. Fossil refuses to provide rebase. Mercurial too went through the same opinionated stage and then landed at providing all the things that Git power users want. Until Fossil improves this, for example, I can't be bothered to use it much. Why would you reject a huge portion of your audience, especially power users who can influence projects to choose your DVCS?


Come on, subsurface [0] had already proved Torvalds wasn’t a one hit wonder.

Of course, being a one hit wonder in software is pretty good if your hit is the Linux kernel.

[0] https://subsurface-divelog.org/


According to Wikipedia Torvalds started developing that in 2011, whereas he started Git in 2005.

[1] https://en.wikipedia.org/wiki/Subsurface_(software) which is just an alias for Torvald's page, search for "late 2011"


I think Git benefited from Linux, but not from Linus' fame (which was next to null outside of the Linux circles at that time, it took off a little when Linux started to power phones when it became the kernel for Android, I believe). By being used for a project of a significant size, it proved that it was efficient and rock solid. And those facts overcame the fact that it came from a niche community and that its basic user interface is lacking according to many.


There's also this Tech Talk Linus gave in 2007 -

https://youtu.be/4XpnKHJAok8


Imagine if McVoy had adopted the SQLite business model instead of taking his ball and gone home!

The world would be a different place. It would be better, though I don't know how much better.


McVoy claimed that it would take someone roughly 5 years to solve the problems that bitkeeper solved. Torvalds took a few weeks of break and did it. Granted git at that time was far removed from the feature set and polish it had after a few years.

But considering how many times we have heard the "I can build this in a weekend of coding" trope, git is truly an exception weekend-project (+) that actually delivered on the promise.

(+) - Taking vast liberties by stretching the weekend


It might well have been true(-ish) at the time McVoy did it. Problem is that a lot of these things are hard until you see a solution, but become fairly trivial or at least much simpler with even the smallest little hints at which part of the solution space is fruitful. If you want to monetise an idea like that you better build a lot of more amazing stuff around it to give people a reason to pay you.

E.g. to do something like git you'd save a whole lot of effort the moment you know to consider a hierarchical store addressed by hash of contents from the whole repository down to individual files, and a list of the hash of the root of each revision (this description is already a refinement of the first iteration of how git stores trees, which apparently didn't use a hierarchical structure at first but just a list of files according to [1]).

[1] https://utcc.utoronto.ca/~cks/space/blog/tech/GitTreeEvoluti...


> Linus himself was the person who first discovered the techniques of open-source development that had eluded the GNU project, and that would be imitated by open-source projects for decades to come, right up to the present day

A bold claim, but what exactly are these techniques?


I assume it's some sort of reference to GNU Hurd and something like "release something that works now rather than continually plan something that will someday be better".



>The Linux kernel was the most important open-source project in history

That i think is unfair, it's the most prominent, but i would argue that GNU especially GCC was more important. However the way Linus split the work and with that idea used Bitkeeper and then later created Git was revolutionary.

Some opensource kernel's already existed at that time, for example Mach and BSD (to name a micro and monolith).

EDIT: But yeah we all know why BSD was legally problematic around the 90's.


this is probably related to a tweet i made sharing one perspective of the timeline of git: https://twitter.com/swyx/status/1536832603411451905?s=20&t=U... it took 4 days from call to starting the project


But I heard Linus "wrote git as a joke [on] a weekend coding binge, jacked up on blow in Vegas." https://www.youtube.com/watch?v=CDeG4S-mJts


What was the relationship or connection with Apache Subversion, if any? I remember using SVN for about 5 years before GIT.


The article has a one liner Linus didn't like it. But if you google it you will come across this Linus quote: "Subversion has been the most pointless project ever started. Subversion used to say CVS done right: with that slogan there is nowhere you can go. There is no way to do CVS right."


Just watch the Linus Google talk. It's all there.


The article mentions that merging in svn was not as easy as what BK did and what git needed. It allowed merges but they were “big deals” where git makes merges very easy and common.

I remember projects where out of 50 people only 1-2 were capable and trusted to do merges. And that was a tedious and boring job so the joke was that you never wanted anyone to know you could merge a code base for release.


I suspect this is a myth. I used svn for years and don't remember merging being problematic. In fact I remember it being better (less need to manually merge) than today's git. What svn couldn't support well is: zillions of branches. So branch-happy workflows would kill the server. Today's PR-style would probably work ok since it mirrors how we typically worked with svn -- small frequent commits to trunk. Branches only used for maintenance on past releases.


The thing that CVS/SVN didn't like much was people working on the same area of the code at the same time.

Arguably git doesn't like it much as well, but it can branch easier.

The real thing needed was the decentralization, CVS/SVN weren't setup to allow every copy to be a master copy on equal footing (and even now most people do NOT use git that way, they setup main or master and let GitHub be the central authority).


It can be slow. As in: 20+ minutes slow.

It's possible to only commit part of a merge.

It's possible to accidentally include additional changes after a merge.


rebasing of chains of check-ins is a pita in svn, and with git taking over the world, the svn book is no longer getting the love it needs so the de facto reference documentation is no longer fully up to date with the tool.


You are remembering correctly. Branching on svn was painful; on git it's basically free.


The article talks about Torvalds disliking CVS and Subversion but never making himself clear about the reasons, and later suggests that the most important "not fully articulated" reason was speed.

But I think a more important reason for him was that he felt that CVS and SVN implicitly came with a development model where a small group of people has official commit privileges, while he very strongly wants a model without that.

See for example what he says here: https://www.tag1consulting.com/blog/interview-linus-torvalds... (search for "commit bit").


SVN enforces in code the reality of "who has control" whereas git allows that to be more lax. If Linus were to go insane tonight one of the other kernel maintainer's git would become the "main" without needed anything official done.


svn doesn't support loosely distributed operation. You need a single central server and all contributors need accounts on that server. That server ends up being costly to provide for projects of non trivial size. Git "won" because it didn't have those two problems.


The article mentions that Subversion existed at the time Linus chose Bitkeeper. As I understand it, the development goal of Subversion was to be CVS without the problems.


Subversion was so bad I missed CVS.


I liked subversion much better than cvs as it was less buggy and was more integrated with Apache for web-based vcs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: