Bit.io Offers Serverless Postgres to Make Data Sharing Easy
When Tony Grant, a UK-based digital marketing professional, set out to analyze data around local businesses, he found Excel too limited and too memory intensive, even on a powerful Windows machine.
If someone searched for a plumber in his home city of Lincoln, for instance, he wanted to be able to automatically find the three other nearest plumbers.
“This sounds easy enough, but when you start to understand how UK postcodes work, it becomes less helpful. You cannot just assume that similar postcodes are near to each other, and you also need to factor in that there are almost 2.25 million different postcodes in England alone, not including Wales, Scotland or Northern Ireland,” he explained in email.
Then there was the problem that postcodes are not naturally adjacent and do not equate to a set area of land. A UK postcode could refer to just one house, or it could refer to several hundreds, if not thousands.
The logical answer was to turn to geo-coordinates, but that would triple the data set, making this an impossible task with Excel’s limits of around 1 million rows.
So he set out to find an online tool for the project, but found them also limited and expensive. Amazon Web Services he found too technical and jargony for a layman, Airtable too sales-focused. Then he stumbled upon bit.io, which promised to solve these problems and be simple.
“Bit.io was a dream. Even the free starter table had something like 10 million rows. Even if I added Scotland and Wales, I would still be only 25% full,” he said.
Tapping the Postgres Ecosystem
Bit.io enables users to quickly create a full-featured serverless Postgres database and share it with team members or clients easily.
“You can go to Bit today and without signing up, drag a file and get a database,” said Bit cofounder and CEO Adam Fletcher. “We just want people to use the software and find the value right away. That was a key tenet for us.”
These databases work with any tool that works with Postgres. Users don’t even have to set up an account, although doing so unlocks more features.
You can load data by dragging and dropping files, entering a data file’s URL, sending data from R or Python applications or using just about any other Postgres or HTTP client. It has an in-browser SQL editor or you can work in tools like R, Python, Jupyter notebooks, the command line and more.
Getting to Value Quickly
Fletcher previously was head of technology for cybersecurity vendor BlueVoyant, director of engineering at healthcare analytics platform Nuna, and a site reliability engineer at Google. Co-founder Jonathan Mortensen, a data scientist at Stanford, led data science at medical and cybersecurity companies. BlueVoyant bought out their developer tools company Gyroscope Software.
“Everywhere we’ve been we’ve encountered the same data problems,” Fletcher said. “In particular, there’s the problem around productivity with data and effectiveness with data and iteration speed.
“When we left the company that bought my last company, we said, ‘Let’s build that thing, just one last time for everybody, right? Like everybody seems to have this problem.”
Basically they wanted to alleviate the headaches of data ingestion and data sharing. Making it serverless means users don’t have to manage infrastructure.
Bit.io uses algorithms to determine the structure for CSV, JSON and other data for loading into Postgres.
“It’s actually quite hard in any traditional relational database to get data in,” Fletcher said. “You have to do it in a programming language. You have to do it with command line tools that are hard to use and not native. And what we just said was, ‘Look, people have these datasets. All we want to have them do is drag and drop the data they have without having to do any work on it, and it turns into a real Postgres database.’
“In order to do that, we had to solve a couple of problems, one with CSV files in particular and Excel files, you know, sort of the regular tabular data you see all over the place. … there’s no schema, right? And so we used a bunch of open source tools, and then we modified them and wrote a bunch of technology on top of that, so we can do things like say, ‘Oh, this column is a string, this column is an integer, this column is you know, whatever.’ And type it into Postgres columns such that … you have a real schema, real columns, real data types, and then support all of the millions of corner cases that show up when you do that,” he said.
“So it’s really just taking on that kind of engineering / dirty data problem upfront, and making sure it’s as easy as possible.”
Bespoke Control Plane
Then there’s the sharing problem, which Fletcher describes as “hard in a different way.”
Sharing is hard, he said, because of authentication and authorization models that come with Postgres. They decided to take a page from GitHub and built authorization and authentication into a control plane outside of Postgres.
Extracting out user and security information and sending it the Bit control plane to ensure users have the correct access took what Fletcher called “a complex bit of programming on that side too.”
Users can make their data either public or private. COVID data, for example, have been popular public data sets that can be shared with read-only access.
“We’ve had some really interesting stuff like liquor consumption in Iowa. … like during the pandemic, do people drink more? Iowa happens to track every sold bottle of liquor and beer and everything like that, right? Like, what a fun piece of data!” he said.
But sharing internally is important too. As a member of a finance team, for instance, you could share billing information as read-only. And when a team member leaves the company, access can be revoked with the click of a button.
In addition to focusing on ingestion and sharing, the team built out proxying and an API so you can do this all programmatically. By combining the Postgres ecosystem with serverless means users don’t have to manage it or worry about scaling. He said the system has been tested to 25,000 transactions per second.
“It just scales up automatically. It just kind of grows as you go and then it scales down. And if you’re not using it, it shuts itself off,” he said.
“I think that people had this idea that the cloud would be completely elastic and flexible, and you wouldn’t have to worry about hardware anymore. And in many ways, that has not yet been true,” he said in an interview.
“What the team at bit.io is doing is helping the cloud live up to its promise for data by creating a serverless offering, which means that when you create a database there, you don’t have to tell the thing how much hardware to run it on or what type of hardware to run it on. It just gives you an endpoint in the cloud that grows and expands as you need it without being limited to the granularity of machines.
“And in particular, when you have lots of little new experimental things, there can be a lot of overhead for that, right? Your new experimental application might only need 3% of a machine and you don’t want to allocate a whole machine or even … equivalent of a quarter of a machine — that still may be a lot more than you need. And it doesn’t necessarily sound like a big problem. You’re renting still a fairly small slice, and it’s not that expensive. But in terms of fostering experimentation and innovation, really being able to use just what you need makes a big, big difference. You may want to let 1,000 flowers bloom and each flower isn’t worth spending 100 bucks on [because] you don’t know that anything is going to come of it.
“That’s why I think a lot of people are excited about this idea of a serverless database that you just use as much as you need. Maybe it’s really tiny, maybe it’s medium, eventually, maybe it grows into something large. But it’s really great for experimentation. It’s great for getting started.”
Bit.io is far from the only serverless database, though. It competes with serverless offerings including CockroachDB, PlanetScale, Amazon Aurora and DynamoDB, Google Firestore and Fauna DB.
The San Francisco-based startup announced general availability of its Database as a Service product in October, as well as $7.5 million in seed funding led by Battery Ventures and GreatPoint Ventures. Founded in 2019, the company’s product had been in private beta since 2021.
It has since grown to more than 15,000 users, including companies like Ford, Visa and Morgan Stanley, and use cases including production OLTP (online transaction processing) workloads, building web applications, low-code/no-code backends, data analysis and mobile applications.
The free tier allows users to create up to three free databases with 3GB of storage and query 1 billion rows with data access via any Postgres-compatible tool. It supports all major programming languages, business intelligence tools such as Tableau and PowerBI, and ETL (extract, transform, load) tools such as Airbyte, Airflow, and Dagster. It also offers the ability to migrate from the open source SQLite database.
“bit.io has very simply answered almost all of my initial needs. It delivered a truly simple service that doesn’t baffle the normal user, and doesn’t complain when I try to throw 2.5 million rows of data at it,” Grant said.
Almost as an afterthought, he added that he is blind and can only access the computer through the keyboard, not a mouse.
“This tool works perfectly under this accessibility situation, so another thumbs-up from me.”