PlanetScale Rewind: An ‘Undo’ Button for Bad Schema Migrations
In his first week as a database administrator for GitHub, Sam Lambert crashed the website while making schema changes.
“I wanted to be super impressive. It’s my first week on; I want to help out. They thought they’d cleaned up all the pieces of the code that were accessing certain database columns, like, you know, this column we don’t use anymore, I want to clean it all up, get the models in the app looking good,” he said about agreeably going to the command line to make the requested changes.
The platform was down for two hours and when it came back up, the data from the moment of the change had been lost.
“I can feel the response again, now in my stomach. It’s horrible. Truly horrible,” the CEO of PlanetScale recalled.
It’s a common mistake and one that PlanetScale takes aim at with Rewind, an “undo” button that enables users to recover in seconds from schema changes that break production databases. It’s, in effect, the equivalent of Control-Z, allowing you to roll back schema changes but without losing any data collected from that breakpoint.
“This is very common, right?” explained PlanetScale vice president of engineering Nick Van Wiggeren. “Engineers think, ‘I can drop this column because nothing is using it, or I can change this table and my code will handle it.’ Common mistake. They think they’re right; they’re not. And then all of a sudden, they end up in a situation where they’ve broken production or they’ve broken their database.”
He describes Rewind as the “magic button” that lets you step back in time.
“So you want to be signing up new users — you’re always signing up new users. … But if you’ve broken a different part of the website, you don’t want to just rewind to when you made the schema migration because then you lose all the new registrations that you made in that interim period. So what Rewind is able to do is actually plot out here’s how I can undo the migrations I just did, while still making sure that the database is operational. And all the data that was written is functional.”
PlanetScale is a serverless MySQL database built atop Vitess, the open source project developed at YouTube to scale its databases horizontally during YouTube’s years of hypergrowth. Basically, Vitess is a sharding middleware system that sits between your applications and the sharding of MySQL, presenting the database as a single entity so your application doesn’t have to worry about keeping track of which shard holds the data being queried.
Vitess became a graduated Cloud Native Computing Foundation project in 2019. Its users include GitHub, HubSpot, Slack, SoundCloud and Square.
The company announced the general availability of its zero-downtime schema migration technology in November after a six-month beta.
It also introduced a beta technology based on the Vitess vReplication feature to enable users to easily import data from any existing MySQL database. Van Wiggeren calls it a “jetpack” atop MySQL and Vitess. Rewind is built from that technology.
“It’s able to take the changelogs, much like a binary log from MySQL, and when you perform a schema migration is able to transform those changes and say, ‘Here’s what the data looks like after the migration. And here’s what the data looked like before the migration,’” he explained.
For 30 minutes after a migration, it can make sure you have a transactionally accurate, up-to-date picture of the database. You can flip between the two within a second to make sure everything is correct.
It’s all built on Vitess’s ability to understand and manipulate MySQL. “To the user, it just looks like how a database should work,” Van Wiggeren said.
These schema changes are a major source of outages for companies that generally require hours, if not days, of manual process to compare database backups and go through binary logs to the exact point that a table was dropped. Instead of having teams of DBAs working on this, Rewind brings this “easy button” to the developer to quickly fix it, according to the company.
Removing Developer Risk
Lee Robinson, director of developer relations at Vercel, called it an exciting feature.
“From my perspective as a frontend developer, PlanetScale is helping lower the barrier to entry for any developer wanting to use a database. Features like Rewind would typically only be possible for those with lots of backend or DBA experience and are now being made accessible to a lot more developers,” he said.
“Rewind is analogous with frontend development too. You push a change that breaks your frontend, you rollback. Now, you push a change that breaks your backend, you Rewind. It de-risks developers from innovating and trying new things.”
The company announced a $30 million Series B in June led by Insight Partners and a $50 million Series C in November led by Kleiner Perkins with existing investors a16z, SignalFire, Insight Partners and others participating. The New Stack is a wholly-owned subsidiary of Insight Partners.
Focused on User Experience
PlanetScale calls itself a developer-first database requiring no knowledge or cloud zones, cluster sizes and other database-centric details increase complexity for developers. It maintains that Rewind is a feature no other database has.
RedMonk analyst Stephen O’Grady said that to his knowledge, that is true.
“Historically, database design has focused more on core features like performance, stored procedures, triggers and so on, and less on the user experience. Unlike in the application development world, therefore, the experience of working with databases hasn’t changed all that much over the years,” he said.
Lambert maintains that even the databases introduced in the past few years — and they include Azure Cosmos DB , CockroachDB, Fauna, MongoDB Atlas, DataStax Astra and others — have taken a “tone-deaf” approach to user experience by focusing too much on storing data and too little on developer pain, though those folks will argue about that.
Added Van Wiggeren: “Over the last 10 or 15 years, entire industries, and, you know, trillions of dollars of value have been created by making developers’ lives easier — Kubernetes, serverless pushed with Lambda … all the things that, you know, AWS and Google Cloud allow you to do are in service of making developers more productive and making companies more productive.”
Yet the database is the piece of the stack that has kept up the least, he said.
“For many companies that change workflows, still, I propose a schema migration, I open a ticket, I get it over to my DBA team, my DBA team runs a Perl script against it and says, ‘OK, I think we should add indexes.’ You add the indexes, and they say it’ll be five to seven business days before your schema migration is entered.”
Even at companies like GitHub, this is largely the process, he said, but that’s not the speed that businesses need to execute in 2022.
“At PlanetScale, we’re trying to throw out a lot of the crustiness that people have experienced, we’ve experienced ourselves, and actually build the thing that will keep up with software development elsewhere, that will keep up with serverless. That will keep up with all the amazing advancements that have been made in the big and small clouds from Vercel and Netlify, all the way over to Fargate and other Amazon primitives. We don’t want to just bring people the best place to store their data. We want to bring people the database that will let them build their company at the speed they need to build their company,” he said.
The company has experienced surprising, explosive growth since its GA announcement in November, they said. One of the things they’ll be working on this year is abstracting the complexity of Kubernetes away from users.
Vitess was migrated into Borg, the precursor to Kubernetes, supporting PlanetScale’s assertion that Vitess was Kubernetes-ready before Kubernetes even existed. But Lambert and Van Wiggeren maintain that while Kubernetes is complex because it is so powerful, that doesn’t mean that every organization needs to deal with that complexity. It doesn’t have to be something that’s just passed through to customers.
“Our goal is to soak up some of that complexity for our users, …whether they’re small or whether they’re huge … give customers something that they can get started within under a minute and can just use. It’s essential to us that there’s continued innovation in all of these layers of the stack,” Van Wiggeren said.