Technology

GitTorrent: A P2P Network of Git Repositories Shared Over BitTorrent

31 Aug 2015 2:36pm, by

GitTorrent is a project in development by Chris Ball, that will allow developers to use decentralized remote Git repositories. While Git itself is distributed in nature — many developers can have their own complete copy of a project’s history — nearly all projects require some authoritative place for the project code. Currently, this single source of truth will be located in one place, such as GitHub, and will not have any sort of distribution or fallback. GitTorrent attempts to solve this problem by storing the code over a peer-to-peer (P2P) network. You might be wondering, though, why would someone ever want this when we have GitHub?

GitHub is a primary destination for stashing open source code. This is somewhat old news at this point, given that GitHub reached 10 million repositories back in 2013. GitHub has become such a de facto standard, that giants like Microsoft have been hosting projects on GitHub instead of their own CodePlex platform.

Microsoft’s CodePlex isn’t the first example of a major and unexpected concession to GitHub. Not too long ago, Google announced it was shutting down Google Code, and tipped its hat to GitHub’s superior repository offering.

I doubt many, myself included, feel that there is a problem with GitHub’s dominance. Nor has GitHub given us cause to worry. But there is no way to guarantee that the GitHub we currently know and love will be the same GitHub many years from now. It is important to keep in mind that even if GitHub can be considered a bastion for free and open source software, and GitHub has contributed a substantial amount to open source itself (Hubot, anyone?), organizations change over time.

Enter SourceForge

Venerable old SourceForge has given many — at least in terms of software timelines — years of service to hosting open source projects. However, when love for SourceForge started shifting to GitHub, SourceForge’s heart and intentions turned bitter and increasingly desperate. In an attempt to make up for lost traffic, SourceForge displayed increasingly large numbers of ads on their download pages, some of which were unknowingly malicious. SourceForge even admitted — which they should be given credit for — the ad problem was out of control. As annoying as ads might be, this problem pales in comparison to the recent issue with the GIMP project.

Having grown tired of SourceForge’s ways, GIMP decided to stop using SourceForge as a download mirror for their code. SourceForge responded by hosting a version of GIMP that was secretly bundled with adware. This was the final nail in the coffin for SourceForge’s reputation: What was once a gradual migration from SourceForge to GitHub became a full on exodus. Five, or even ten, years ago, I doubt anyone would have expected the beloved SourceForge to have fallen so far.

While Bitbucket is another great source code repository and a strong competitor to GitHub, it’s still somewhat distant in terms of an open source user base. SourceForge’s actions were somewhat mitigated by the fact that all this happened long after they fell out of favor. If GitHub were to do something similar, is there a comparable service to jump ship to? This risk is always present, regardless of the company. The only way to truly mitigate the chance is to have a Git hosting service that is just as decentralized as Git itself.

GitTorrent uses a variety of technologies in tandem to circumvent these issues that can arise from a non-distributed source, such as GitHub. In addition to Git itself, GitTorrent brings Torrenting P2P technology into the mix, as well as the Bitcoin blockchain. While the last blockchain component isn’t ready for prime time yet, it is being actively developed, and the work done so far is superb.

GitTorrent Components

Let’s take a more in depth look at each of the components. Let’s start with Git’s extensible network protocols. Once installed, a repo can be cloned using the GitTorrent prefix. For example, to clone the Hubot repo, we would invoke the following command:

This is passing the git clone command to git-remote-gittorrent, which will then process the url. This Git transport helper can be used for other operations as well, such as fetching, cloning and pushing.

So, it may seem kind of odd that we just talked about GitTorrent as a way to avoid GitHub, and then the first thing we do is point to a GitHub repo; however, there is more than meets the eye here. From here, GitTorrent will not actually download any files from GitHub; we are simply asking GitHub what the latest commit for the repository is. After we determine what the unique Git SHA1 is for that commit, we will ask peers on the GitTorrent network if they have that SHA1, and we will then download the Git objects from peers. Specifically, we will be grabbing all the commits for a specific Git project. Just like regular Torrenting, we will be grabbing different commits from different users, and some users we are downloading from might not even have the entire set of commits.

Additionally, there are is another style of working with GitHub that does not use anything outside of GitTorrent. Each GitTorrent user has a public key. We can grab a specific repo from a user if we enter a SHA1 of a hash of their public key:

While referring to a user via a hash is conceptually straight forward, a hash value doesn’t make for a memorable username. An alternative currently being worked on is to encode user information in the Bitcoin blockchain, but that is currently still in the works.

Regardless of which of the above styles you use to query the repo, the next step involves the GitTorrent network’s distributed hash table. This distributed hash table is very much like a BiTtorrent’s distributed hash table. For each Git commit we are interested in downloading as part of that repo, we can ask the distributed hash table to give us nodes that have that the particular commit we are looking for.

In order to actually download these files from a peer — and similar to the Git transport helper — there is a BitTorrent protocol extension to facilitate the transfer. After connecting to a node, the client will send its request as bencoded JSON. Here is an example request response from the GitTorrent project.

Once a client has connected to another node, it sends a request for the SHA1 it’s looking for as bencoded JSON:

The node providing the packfile returns:

Conclusion

While the GitTorrent project is still a little new, it looks like it has great hope. BitTorrent and Git have both been around for a while now, and it’s fascinating to see how this project puts them together. Using the Bitcoin block chain for verification is also a very interesting idea, and I’m curious to see how it plays out. In case you missed it earlier, be sure to check out — and possibly contribute to — the project’s GitHub Repo.

Feature image: “Beijing, Nov-2014” by Mitch Altman is licensed under CC BY-SA 2.0.

A newsletter digest of the week’s most important stories & analyses.