Data / Open Source / Sponsored / Contributed

Datasette: A Developer, a Shower and a Data-Inspired Moment

18 Jun 2020 9:00am, by

This is part of a series on Open Source Builders. For a list of other articles in this series, check out the introductory post.

Amazon Web Services (AWS) sponsored this post.

Matt Asay
Matt is a principal at AWS and has been involved in open source and all that it enables (cloud, machine learning, data infrastructure, mobile, etc.) for nearly two decades, working for a variety of open source companies and writing regularly for InfoWorld and TechRepublic. You can follow him on Twitter (@mjasay).

We tend to think of open source as a community, even as an altruistic endeavor. But sometimes a project can be “aggressively open source,” as Datasette founder Simon Willison puts it, for a bunch of “very selfish reasons.” Willison has written a great deal of open source (e.g., co-creator of Django) and proprietary (e.g., co-founder of Lanyrd) software in his career, but says he turns to open source as a “creative outlet” that lets him retain a measure of independence, even as full-time employment may constrain it.

You heard that right: Open source doesn’t always need to save the world. It’s enough if it saves an individual developer’s sanity.

Of course, many others do benefit from Willison’s open source contributions. But even if they didn’t, Willison would continue to open source his code. Why? Because it pushes him to write better software even as it helps him keep current with the wide variety of projects — 285 repositories and counting — to which he contributes. It’s an incredibly efficient way to build. Willison says that something he enjoys about open source is that for every problem he solves, he’ll never have to solve it again. “The code will be out there forever,” he says.

Indeed, Willison’s approach to open source offers deep insights into how to set up a project for maximum community benefit — even when it’s a community of one.

Of Showers and Steam

As Willison explained in an interview, his need for Datasette, a tool for exploring and publishing data, was clear to him in 2009 while working at The Guardian, a UK newspaper. There he helped start the data blog, a place to publish the data behind The Guardian’s stories. Though Willison ended up using Google Sheets as a way to publish the underlying static data sets, he longed for a better way to publish and query that data.

Years later, he had his Datasette “shower moment.”

“It was literally a moment and I was literally in the shower,” he says.

First, he reasoned, cloud providers were making hosting dynamic code cheap and easy. Second, he could insert SQLite, a widely used, public domain, relational database, into places normally uninhabited by databases.

“If I get SQLite, then export the data, and I bundle it in with a Docker container with a little app that can give you an interface and stuff … Maybe that’s a really interesting space to be exploring,” he says.

And it was. Datasette was born.

But Datasette wasn’t merely a matter of tackling a technical problem. It was also a way for Willison to express himself and to maintain independence. Throughout his career, Willison had found himself within larger organizations. Such employment brings privilege, but it also imposes a restraint on a developer’s ability to build. He says part of his reason for starting Datasette was to have a creative outlet.

“It was an opportunity for me to get really deep into technology again,” he explains. “I wanted a project where I got to decide what to build and have that as my own personal place. It was almost a way of blowing off steam.”

Open Source Maintenance Group with a Population of One

Today Datasette has roughly 30 contributors, but most of the work is done by Willison. He’s OK with that. In fact, it’s great, he says, because outside contributions aren’t an unalloyed good.

“There’s a dream that you wake up one morning and there’s a beautiful, shining pull request with a new feature, but actually, if that happens, it’s kind of stressful,” Willison says. “Because you then have to go through the code and review it to make sure it fits the wider model of what you’re trying to build.”

One alternative, he says, is a plug-in architecture, which he incorporated into Datasette almost from the start.

“The beautiful thing about plugins is that I don’t have to give anyone permission to load your plugin, and they don’t even have to talk to me,” Willison explains. He says he could wake up one morning, and Datasette will have a brand new feature, without him doing anything. “And it doesn’t cause any harm to the core if it’s low quality or if it doesn’t quite work,” he says.

Not only does this plug-in architecture protect the Datasette core from others’ bad code, but it also protects the core from Willison’s bad code. “I have a lot of crazy ideas, and I don’t want to put those into the core because maybe they’re terrible ideas, but there’s no harm at all in me putting them into a plugin,” he says.

The dream is to become a bit like WordPress, whose plug-in architecture he modeled for Datasette. “WordPress is a very decent CMS with 7,000 plugins that mean it can do anything. I want Datasette to be a ‘decent sort of engine’ for data analysis and exploration with 7,000 plugins that mean that if you want a map or chart or all of these different things, it’s all available to you,” Willison says.

Open Source for ‘Very Selfish Reasons’

If Willison’s plug-in approach protects himself and others from bad code, his embrace of open source is perhaps the best way he’s found to ensure he writes good code in the first place.

“Datasette is aggressively open source for a bunch of reasons,” Willison says. “Most of them are very selfish reasons.”

For starters, he explains, he’s written a lot of closed source in his career, including while at Lanyrd, which he and his co-founder/wife Natalie Downe sold to Eventbrite. Once you leave that employer, however, you don’t get to use that software again in the future, Willison says.

He points out that even if you’re writing software on your own and not for your employer, if it’s not open source you don’t really build it with a mind to having other people use it, which means it can go stale. When working in the open, by contrast, Willison says, “It forces me to write good code. It forces me to write really good documentation.” Not to mention great unit tests.

That documentation, including associated comments, helps to tell the story of his code. While that may prove useful for outside contributors, docs also helps Willison to pick up where he leaves off. Willison maintains 73 open source projects, and he says the only way you can maintain 73 projects is if you treat every single one of them as if you’re not a core maintainer. Each must have a ReadMe and tests and detailed issue threads discussing what he was working on. “Because then you can drop back into them after a six-month gap and be productive with them,” he says.

“I’m an open source maintenance group with a population of one,” Willison says. “I’m taking the lessons I learned [at Eventbrite], a 600-engineer organization, and applying them to a one-engineer organization.”

A moment of truth now looms for Willison. For the past year, he’s been a John S. Knight Journalism Fellow at Stanford University, getting paid to tinker with his Datasette dream. His current thought is to do freelance development using Datasette to solve interesting problems related to data journalism for a variety of organizations, which he says could be a great way to figure out what the software should do.

Are you interested in helping Willison shape the future of data journalism? Visit the Datasette page to contribute code, thoughts, documentation, and more.

If you or someone you know needs help with a different open source project, visit the AWS Open Source site to learn how open source projects can apply for AWS promotional credits.

Feature image via Pixabay.

At this time, The New Stack does not allow comments directly on this website. We invite all readers who wish to discuss a story to visit us on Twitter or Facebook. We also welcome your news tips and feedback via email: feedback@thenewstack.io.

A newsletter digest of the week’s most important stories & analyses.