Move Fast and Test in Kubernetes without Breaking Things
Continuous integration and continuous delivery (CI/CD) in application development is an approach that enables developers to make code changes rapidly and reliably, accelerating development lifecycles and getting new applications to market faster. But the underlying assumption is that developers can test on data at different stages of the pipeline.
Herein lies the rub: Historically, fear and risk have prevented development teams from integrating production databases into the development process. Why are they afraid? There are three valid reasons:
- Static schemas. Databases have traditionally featured snowflake schemas, with data residing outside the application clusters. Performance and data integrity are key issues, and snowflakes often undergo manual changes, which can lead to increased risk.
- Automation gaps. For many databases, automation is lacking because it’s not built into the CI/CD pipeline early, and testing may happen with synthetic or stale data.
- Silo mentality. Even with the DevOps shift helping developers and operations to work together, it’s still common for database engineers to operate in a silo, away from developers.
Adding to these concerns is the increased risk of data corruption. Clearly, it’s not a good idea to test on production data and run the risk of corrupting data in your production environment. So, how do you increase your agility and accelerate development by bringing databases into your cloud native environment without the risk?
Some Best Practices for Getting Started
To benefit from the added agility of having your database in your cloud native environment, you need to begin with a source control system that is tightly integrated with the application code your database is serving. Your application repository should include all schema changes, upgrade tools and modifications to your source code. Additionally, your database infrastructure must be expressed as code in the same application repository.
Finally, automated testing is key. Be sure to build automation not just into the CI/CD pipeline, but as early as possible.
Why Use Kubernetes?
While these best practices apply across various development environments, Kubernetes offers a number of critical advantages that can increase your team’s agility and performance:
- Good DevOps hygiene. Eventually, developers will move away from using snowflakes in infrastructure, and the fact that expressing configurations and infrastructure as code will make automation and testing simple and reliable.
- Testing at scale. In a large-scale testing environment, developers can work on individual copies of databases, which reduces costs and boosts performance.
- Universal control plane. Over the long term, it doesn’t make sense to have one control plane for your database services and a separate one for the stateless components of your application. Combining these within Kubernetes provides greater efficiency.
Bringing your database into your cloud native development environment makes it easier to deliver better value to the end customer, and do so faster, with automated testing at every stage of the development lifecycle. More importantly, it will lead to a higher degree of confidence and better engineering agility. A faster iteration process will enable better and faster deployments and the ability to catch potential issues before they affect your production environment.
The Keys to Successful Database Integration
To get to an ideal state of database integration, you must integrate against different layers in the stack, including your storage, database and applications.
First, you may need to combine storage volumes using volume-level storage APIs for efficiency. This will enable your developers to grab a snapshot and move it to their desktops for testing. Next, database integration is important for consistent data capture and data stores. You might want to create hooks into your NoSQL system or relational database, or create your database before you take a volume-level snapshot.
At the application level, you might be using multiple best-of-breed data services within a single application, such as Mongo or Redis. Such polyglot persistence in microservice-based applications requires app-level coordination, as does data masking, to protect sensitive data.
Essentially, these challenges relate to data mobility. To extract data from various sources in a consistent manner, you need the application context, particularly when you’re doing advanced operations such as data masking and encryption, prior to moving it into the test/dev environment.
Tackling Data Mobility Challenges with Kanister
When using data mobility to improve your CI/CD pipeline, it’s important to consider the data at different layers in your application stack. In many instances, you must perform operations on multiple layers of your application at once, as well as interact with Kubernetes itself. Kasten by Veeam developed Kanister to address these data mobility challenges and enable organizations to test safely with real data.
Kanister, an open source project, provides a Kubernetes-native framework for application-level data management that supports complex data management workflows. Domain experts can capture application-specific data management tasks in blueprints, which can be easily shared and extended, eliminating many of the tedious details around execution on Kubernetes.
Kanister is easy to integrate with your CI/CD pipeline, because it uses Kubernetes API extensions called custom resources. You can easily extend Kanister to work with custom applications as well as several common cloud native databases, simplifying and streamlining testing operations while reducing risk.
Kanister is available as an open source project at github.com/kanisterio/kanister.