Data / Development / Storage

GitHub’s Plan to Freeze Your Code for Thousands of Years

8 Mar 2020 6:00am, by

Recently I discovered some computer code I’d written will outlive me — for many centuries. A copy of it has been stored in a chilly cave in the Arctic Circle.

It’s part of a fascinating project by GitHub, the “Arctic Code Vault” program, which brings modern technologies into a surprisingly primitive environment to deliver an unexpected honor for a wide swath of the 100 million code repositories currently hosted on GitHub’s servers, by archiving of all this material in perpetuity in an exotic archipelago in Norway, near the northernmost town in the world.

GitHub’s vice president of special projects, Thomas Dohmke, tells news.com.au that GitHub is uniquely positioned for the archival, and “has the responsibility to protect and preserve the collaborative work of millions of developers around the world.” On its webpage for the project, GitHub strikes a similarly grand tone, calling open source software “a hidden cornerstone of modern civilization, and the shared heritage of all humanity.”

“We will protect this priceless knowledge by storing multiple copies, on an ongoing basis, across various data formats and locations,” he said.

On a visit, GitHub’s CEO Nat Friedman described the storage location, a decommissioned coal mine, as “more mine-y and rustic and raw-hole-in-the-rock than I thought it would be,” according to a recent article in Bloomberg. The news service goes on to note that, “to Friedman, it’s a natural next step. Open source software, in his view, is one of the great achievements of our species, up there with the masterpieces of literature and fine art.”

And it’s not the only priceless knowledge being stored in this remote location. According to Bloomberg, the other shelves in the mine include Vatican archives, Italian movies, Brazilian land registry records, “and the recipe for a certain burger chain’s special sauce.”

Pull Requests

But what’s the rationale for this massive effort? The project’s page cites the threat of code being “abandoned, forgotten, or lost.” Worse yet, how would the code be otherwise saved in case of a global catastrophe?

“There exists a range of possible futures in which working modern computers exist, but their software has largely been lost to bit rot. The GitHub Archive Program will include much longer-term media to address the risk of data loss over time,” the site notes.

Of course, the code repository services has also given some thought to how “the future” might use our code. “There is a long history of lost technologies from which the world would have benefited, as well as abandoned technologies which found unexpected new uses,” explains the project web page. “It is easy to envision a future in which today’s software is seen as a quaint and long-forgotten irrelevancy until an unexpected need for it arises.”

Future historians might see the significance in “our age of open source ubiquity, volunteer communities, and Moore’s Law.”

Which code blocks make the cut? According to GitHub: “The archive will include every repo with any commits between the announcement at GitHub Universe on Nov. 13 and 02/02/2020, every repo with at least 1 star and any commits from the year before the snapshot (02/02/2019 – 02/02/2020), and every repo with at least 250 stars. Plus, gh-pages for any repository that meets the aforementioned criteria.”

The Norwegian data-storing company Piql, whose custom film and archiving technologies will allow the project to store terabytes of data for over 1,000 years,  brags that code is now headed into the “gold standard of long-term data storage.”

But besides offering “vault storage services, Piql also offers a unique form of data digitization. Piql is storing the code on hundreds of reels of film made from polyester and silver halide. Bloomberg points out they’re coated with an iron oxide powder “for added Armageddon-resistance.” Each of its microfilm-like frames holds over 8.8 million pixels. Piql explains that its method involves converting 1’s and 0’s into QR code. “No electricity or other human intervention is needed as the climatic conditions in the Arctic are ideal for long-term archival of film,” explained a Piql web page.

“By using a self-contained and technology-independent storage medium, future generations will be able to read back the information,” according to Piql. The project also includes instructions on how to unpack and read the code.

Bloomberg even notes that there’s a treaty in place which keeps Svalbard neutral in times of war. Because it’s all stored on offline film reels, GitHub doesn’t have to worry about power outages. An added layer of security comes from its remote location. One GitHub video points out that the Svalbard archipelago is home to the northern-most town in the world — as well as thousands of polar bears. The video’s description notes that though it’s called the GitHub “Arctic Code Vault,” it’s actually closer to the North Pole than the Arctic Circle.

Response Time

It’s been fun to watch the reactions to GitHub’s video. “The future will be amazed by my JavaScript Calculator,” joked one comment.

Others couldn’t resist commenting on the Arctic location. (“Now my code can freeze before it even gets run…”) Another naysayer even quipped, “When your code is so bad that you need to bury it under the permafrost…”

GitHub’s FAQ says the company plans to re-evaluate the program (and its storage medium) every five years — at which point it’ll decide whether to take another snapshot.

And if you’re curious what it’s like in a Svalbard mine, a nearby coal mine is offering tours. “Most of Svalbard’s old Norwegian and Russian coal mines have shut down,” explains Bloomberg, “so locals have rebranded their vast acres of permafrost as an attraction to scientists, doomsday preppers, and scientist doomsday preppers.”


WebReduce

A newsletter digest of the week’s most important stories & analyses.