GDAL: The Open Source Technology Behind Google Maps
If I had to pick an innovation that has dramatically changed my life for the better, Google Maps would top the list. Google Maps used to make my commute (remember those?) manageable, but in general, it has helped me navigate the physical world — including once guiding me up a steep couloir in Utah’s Wasatch Mountains before the sun rose to light the way. “Are in the right or left fork of the Y Couloir?” I asked my ski partner. “Let me check Google Maps.”
He might just as well have said, “Let me check GDAL,” because the open source Geospatial Data Abstraction Library is a key part of Google Maps (not to mention Google Earth, NASA’s planetary cartography, Uber’s mapping functionality, and every other major imagery provider in the world). GDAL isn’t the only geographic information system (GIS) but, as Robert Simmon has written, “the most pervasive GIS software is expensive [and] difficult to learn.” By contrast, he continues, GDAL is “free, broadly supported, constantly updated, and runs on almost anything.”
GDAL, in short, may not be a household name, but it’s everywhere and helps us to map everything. Or, as the folks at Google Earth put it, “GDAL makes the world go round.”
The reasons for that broad utility arise from its diverse open source community. To better navigate the GDAL community, I talked with Even Rouault, a lead maintainer and prominent contributor to GDAL since 2007.
Born for Geospatial
GDAL, launched in 1998, is primarily a translation tool, able to read and write over 250 different file formats (database or protocols, raster or vector), mostly in the area of geospatial. Do GIS/mapping vendors and users absolutely need GDAL? Perhaps not. Instead, they could write their own proprietary reader and/or writer for the file formats they need (there are many required). It’s “only” 1.4 million lines of code….
Or they could instead use the free and open source GDAL, a bit of a “Swiss army knife” for geospatial, as Rouault described it. GDAL has also become important in cloud computing, as it includes a virtual file system layer for accessing the storage services of most cloud vendors, including AWS. As Rouault says, “You can do things like reading the metadata of a GeoTIFF file stored in a multi-gigabyte ZIP sitting on Amazon S3 with just a few kilobytes actually retrieved.”
But whether in the cloud or elsewhere, most of the time someone uses C/C++ GIS software, they’re depending on GDAL in some way, sometimes without even knowing what GDAL is. It’s just that pervasive and foundational in geospatial.
Though Rouault didn’t start using GDAL until 2005, he became familiar with one problem GDAL solves as a teenager. His father, who happened to work in geospatial, needed his help translating file formats so that he could exchange data between different GIS software. Years later Rouault worked with digital mapping as a software engineer. His employer decided to switch from a homegrown GIS stack to an open source alternative that was based on MapServer, which relies on GDAL for raster (pixel) data access.
This is where we start to see the power of open source. Rouault worked in the defense industry in 2005, home to incredibly niche formats (NITF, anyone?). Supporting such formats is hard to justify in a proprietary program, absent big dollar commitments, but in an open source project like GDAL, it’s easier, because developers who need the support can contribute the requisite code.
Developers like Rouault
Rouault discovered that some of the defense industry-specific formats were already supported by GDAL. However, he said there were a few bugs or compliance issues with formal test suites, and he started patching GDAL to address them. Rouault enjoyed his work with GDAL so much that it became his after-hours source of relaxation, too, “because it was fun.” It helped that the project maintainers tended to like the patches he was suggesting, quickly making him a key part of the GDAL community.
In fact, after a few years of contributing, Rouault now runs a GDAL consulting company, Spatialys, and works on GDAL full-time. What started as a way to do his job became his hobby which, in turn, became his job. It’s an optimistic tale of how open source sustains itself.
Not that the story is all happiness and sunshine.
More than 15 years ago, I wrote about the importance of modularity to open source projects, because modular code facilitates ease of incremental development. Rouault says GDAL is quite modular, which allows unfamiliar contributors to focus on their particular needs/corners of the project without worrying about GDAL in its entirety. This has helped the project to thrive, he says, but it has introduced problems as well. He notes the modular code has also enabled high-velocity contribution without massive time investment, and has limited GDAL’s ability “to build up a cadre of sticky core contributors to help for bug triaging, fixing, issuing releases, reviewing pull requests, ensuring continuous integration keeps running, and all the other countless maintenance tasks.”
Catch that? By enabling “drive-by development,” too many GDAL contributors work on their part of GDAL and then drive away, as it were.
With all that said, of course even such drive-by contributions are useful. After all, one of the reasons GDAL is so attractive is because of its expansive file format support. This principle has helped other projects, too. As Gerald Combs, founder of the Wireshark project, said in an interview, “Our drive-by [contributions] usually come in the form of a new or updated protocol dissector. Each new dissector makes Wireshark incrementally more useful and grows our community a bit, which means more useful feedback for long-term developers.”
The same is true for GDAL.
More Contributors, Please!
What kind of contributions are GDAL developers making? According to Rouault, companies tend to develop and maintain specific drivers for the format or remote service that ties into their own products. Or maybe someone contributes to support their own country-specific formats. Although important, such contributions don’t help to sustain the core upstream project. There are examples of those that feed the upstream.
Rouault said one large company contributed numerous code cleanups to improve the standards of the codebase. A member of that team is a power user who helps with bug triaging and assists people on the mailing list, which is great, but more contributions to GDAL would be very welcome.
“The core developer base of GDAL is quite static,” Rouault says, and it has been challenging to expand this base of repeat contributors, as well as to financially sustain GDAL. While the project has had some success raising money to address specific, discrete areas of development, funding for more significant, long-term goals — such as deep architectural refreshments — “have been lacking.”
As such, when asked to name one thing to improve GDAL, Rouault’s response is unsurprising, if unfortunately very necessary: “More contributors and especially co-maintainers!”
This doesn’t even necessarily mean code contributions. As Rouault suggested, “For people wanting to contribute to GDAL development, that can be through enhancing documentation, or just trying to address the issue they are facing, or trying to implement the new feature they want in an existing one.” Given just how deep and broad the dependence on GDAL is, there is no shortage of potential contributors to the GDAL code, documentation, or other areas of the project.
If you’re involved with GDAL and have yet to contribute, maybe now is the time to start.
Visit the AWS Open Source Blog to learn how open source projects can apply for AWS promotional credits.