Embracing Database Deployments in CI/CD Practices with Git

Imagine that you are working the line at a Starbucks during peak hours. Each section behind the bar is designed to optimize the production of different commodities: Food, cold and hot drinks. But with so many moving parts, you’re often left with lost order tickets and a visibly chaotic process.
This is what the database DevOps and CI/CD tooling landscape reminds me of. Tools like Jenkins, GitHub Actions, CircleCI and Spinnaker have popped up and made the build, test and deploy process for code changes smooth. Meanwhile, container and orchestration technologies like Kubernetes and Docker have made it easier to isolate your application, push changes and scale up on demand without affecting other environments.
Even with these advancements, databases have not been well integrated into the CI/CD tooling landscape.
This is because incorporating the database into schema deployment is different from application code management. Due to the stateful nature of databases, you risk irreversible data corruption and consistency issues.
There are tools specifically designed to make database change management smoother. However, juggling additional CI/CD tools for the database along with tools designed for frontend code management introduces complexity. Due to this complexity, you’ll likely run into versioning issues, complexities around rollback mechanisms and ironically, because this is contrary to the purpose of CI/CD, slow and risky deployment.
Databases, Version Control and Git
For database change management, applying git-like concepts can help teams avoid the complexity of the CI/CD tooling landscape while still incorporating the database into both pipelines.
The following are some observations on where a modern database should be designed to integrate into CI/CD workflows using components of git.
Creating Test Environments and Branching Functionality for the Database
In git, branches are used to manage changes to application code. Database schema deployments are often left out of this kind of version control. So development and operations teams invest in tools designed to manage schema changes or follow a process to generate test environments where changes can be tested before deployment.
These are some options in practice:
- You could invest in a tool like Flyway or Liquibase to manage schema versioning.
- You could use blue/green deployments to spin up an isolated (green) instance to test schema changes without affecting your production (blue) environment.
- Conceptually, you could spin up an isolated instance of your entire application, including the database and schema.
- Or you could use a collection of custom scripts to help manage different versions of your database.
As often happens when purchasing additional tools or using additional resources to mirror production environments, all of these options tend to lead to operational overhead. With more orchestration required, they introduce more complexity than intended. Staging in general is a pain as staged environments become stale and require rebuilding frequently. While this may be acceptable for some use cases, it’s expensive and tedious to manage for many.
There is an alternative, however. In git, branching streamlines collaboration across teams making changes to a single code repo. What if this could be done for each of your production databases?
Databases like PlanetScale for MySQL provide branching functionality that requires no additional orchestration or overhead to manage. Instead of configuring and managing another disparate tool or spinning up entirely new environments to test, schema branching becomes a process closer to the database.
This functionality often works by generating isolated database instances that mirror the schema in your production environment, or in some cases, the data as well. In the same way, git branching creates another line of deployment in your central code repository, production databases exist on production branches, which can be used as the base for development and test branches. Once changes have been tested in the test or development branch, they can be safely merged back to production.
Investing in database branching minimizes the need to use additional resources for spinning up test environments while at the same time speeding up the build/test process.
Automating Schema Deployments and Git
Application changes often require a corresponding schema change. How nice would it be if you could automate a schema deployment by simply merging the pull request created for the related application change?
The functionality just described is possible with the strategic use of GitHub Actions or other CI/CD providers.
With this GitHub-native tool, you can create custom workflows that make it easier to build, test and deploy changes to your code. On top of new code deployments, you could create custom workflows to streamline the code and schema change process. For example, once a pull request is created, you can kick off a GitHub Actions workflow that will create a database branch, apply any corresponding changes to the schema and run tests against your database branch to ensure they are the intended changes.
The YAML file for a workflow that does this would look something like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
name: Automate Schema Deployment on: pull_request: types: [opened, reopened] jobs: build: runs-on: ubuntu-latest env: DB_SERVICE_TOKEN_ID: ${{ secrets.DB_SERVICE_TOKEN_ID }} DB_SERVICE_TOKEN: ${{ secrets.SERVICE_TOKEN }} steps: - name: Checkout uses: actions/checkout@v3 - name: Deploy to production run: npm run deploy |
You’d need to configure your workflow, define the job that will be run in your workflow, make sure you have the right permissions set to connect to your database environment and maintain this over time.
Alternatively, your database provider may create predefined GitHub Actions to use within your CI flow. Databases that have a holistic workflow for change management with options to integrate automation inherently make these processes closer to modern software delivery practices.
Imagine this:
Your application is written in Rails and runs on a MySQL database in PlanetScale. You need to push a new feature live to your application that requires a new column on your users
table called address
and have a pull request staged with modifications to your code. Within this pull request, you make updates to the backend with a migration and schema file that proposes changes to the database schema itself. This looks something like:
1 2 3 4 5 6 7 |
# db/migrate/20230830123456_add_phone_number_to_users.rb class AddAddressToUsers < ActiveRecord::Migration[6.0] def change add_column :users, :address, :string end end |
When you create that pull request, a GitHub actions workflow kicks off, creates a branch and opens a matching deploy request in your MySQL PlanetScale database. Your team then reviews the deploy request, accepts the proposed changes and merges the pull request in GitHub.
Just by merging the pull request in GitHub, the feature is built and deployed to your application and your database schema is also changed. The schema diff is added as a comment in the pull request in GitHub to review, and PlanetScale is staged to help you revert this schema migration in case it introduces a breaking change.
Minimizing the schema change process or other database change management workflows to this can make the schema change process notably easier and more congruent with the process you already use to push changes to your application code:
- Opening a pull request in GitHub
- Defining the schema change in a migration file within the pull request
- Merging the original pull request in GitHub to apply the change to your application AND your database schema
Comprehensive workflows like this, especially when maintained and guided by your database provider, can smooth any friction you may run into when application changes also dictate a schema change. You get the functionality out of the box using your existing toolset ( GitHub for git version control and automation integrated with your database.)
Versioning, Rollbacks and Online Schema Changes
We’ve all experienced that moment of dread after accidentally deleting or erroneously modifying a table, column or index. It’s scary to not be able to easily revert those changes without data loss, especially when they introduce a breaking change. Having to fix the problem from backups can take hours or days.
Similar to the way a code deployment in git can be rolled back if it introduces bugs, performance regressions, or other issues, a similar workflow should be available for database schema deployments. If an erroneous schema change is merged, schema rollback functionality can help restore previous versions of the schema. Ideally, this would work even if you’re actively writing to the database as this revert occurs.
As mentioned previously, some of this functionality can be obtained using third-party tools. However, the answer to performing this natively within the database may actually lie with the nature of online schema changes.
In summary, online schema change logic works like this:
- Create an empty shadow table mirroring your production schema.
- Apply schema changes to this shadow table.
- Sync data from the production table to the shadow table.
- Replace the production table with the shadow table.
Online schema changes make it so that you can test and merge changes without table locking or downtime. There are different ways you can implement online schema changes using open source command line tools, but not all of them allow you to roll back a change.
You could implement general online schema change functionality with tools like gh-ost
, a tool developed at GitHub, or pt-online-schema-change
, a Percona Toolkit command-line tool. Unfortunately, both of these solutions only carry you so far. Both terminate once the migration is complete, and the data you once had about the migration is now lost. This makes it impossible to revert schema changes once they are deployed without some data loss.
The solution to this problem may be Vitess. Vitess is an open source database management and middleware technology built at YouTube. With Vitess, you can run DDLs aka schema changes using the “online” (or “Vitess”) strategy. Behind the scenes this is powered by the VReplication feature in Vitess This allows you to perform online schema changes with the ability to revert, which provides better versioning than alternative solutions.
VReplication uses MySQL GTID (global transaction identifier) to track changes to data and maintains a precise record of all migrations. Unlike other online schema change tools, even after a migration is complete, it can keep the older version of the schema in sync with continuous data updates. Due to this, using a SQL interface, you could easily revert the last successful migration within 24 hours of completion using the migration’s universal unique identifier:
REVERT VITESS_MIGRATION <uuid>;
This creates a new migration and does so quickly as no table data needs to be copied. It’s now possible to revert schema changes back to their original state often without any data loss.
This method requires you to implement and maintain Vitess and is specific to MySQL. If a MySQL database fits your needs, every database that is spun up in PlanetScale runs on Vitess and requires no manual orchestration to get this functionality out of the box.
If you’re not on MySQL, there are other databases that provide native online schema changes with some ability to recover data that may have been lost. This includes restoration from a backup that may use additional resources and potentially require downtime to complete.
Human error is inevitable. Rollback functionality is an essential component of your database’s CI/CD process. Choosing a database with this functionality could be the difference between long periods of downtime or smooth deployments that seldom affect the end-user experience.
Investing in Forward-Thinking Database Solutions
Forward-thinking companies are investing in forward-thinking database solutions. Those databases that incorporate facets of git into the database change management process will make it significantly easier for teams to manage their data with confidence. The future isn’t a mess of complex DevOps tools with the database siloed from the CI/CD process. It’s databases that are integrated with a smooth change management process that matches the git-like framework developers are used to.