Rollbar sponsored this post.
Today, modern applications need frequent updates to deliver on business objectives with fast turnaround time. The days of months- or years-long release cycles are gone. Modern applications may be deployed on short notice multiple times per day. This gives the enterprise more agility to make changes that deliver business value faster. Now product teams can iterate on new versions of the product quickly, testing the impact on key metrics and fixing problems immediately.
Faster, incremental deployments also decouple development teams so they can work in parallel. This can increase the efficiency of your development teams. With the advent of service-oriented architecture and microservices, we can deploy changes to multiple services in parallel with no downtime or service interruption.
However, this also creates new challenges for the operations team. With more frequent deployments, it’s more likely that the deployed code could negatively affect site reliability or customer experience.
It’s important to develop strategies for deploying code that minimize the risk to the product and customers. In this article, we will talk about a few deployment strategies, best practices and tools that will allow your team to work faster and more reliably.
Challenges of Modern Applications
Modern applications are often distributed and cloud-based. They can elastically scale to meet the needed demand, and are more resilient to failure thanks to highly-available architectures. They may even use fully managed services like AWS Lambda or Elastic Container Service (ECS) where the platform takes on some of the operational responsibility. These applications almost always have frequent releases. For example, a mobile application or a public-facing website may undergo several changes within a month.
These applications frequently use microservice architectures in which several components work together to deliver the full functionality. There can be different release cycles for different components, but they all have to work together seamlessly. With multiple development teams making changes in parallel throughout the code base, it can be difficult to determine the root cause of a problem.
Another complexity arises from the abstraction of the infrastructure layer, which is now considered code. Deployment of a new application thus may mean deployment of new infrastructure code with it.
Why Use a Deployment Strategy
As mentioned above, since there are so many moving parts and so many changes happening in modern apps, there are more opportunities for something to go wrong. To meet this challenge, application and infrastructure teams need to devise and adopt a deployment strategy suitable for their use case.
However, one particular deployment strategy may not fit all use cases. A deployment method for a new microservice may not be the ideal solution for a cloud-based office productivity suite rollout.
That’s why it’s best to be familiar with different deployment techniques. We will review several and discuss the pros and cons of each that you can choose the best for your organization.
“Big Bang” Deployment
As the name suggests, “big bang” deployments updates whole or large parts of an application in one operation. It has been the default approach dating back to the days when software was released on physical media and installed by the customer. Big bang deployments required the business to conduct extensive development and testing before release, often associated with the “waterfall model” of large sequential releases. Modern applications have the advantage of updating regularly and automatically on the client side or the server side. Therefore, this big bang approach is slower and less agile for modern teams.
Characteristics of big bang deployment are:
- All major pieces are packaged in one deployment;
- It largely or completely replaces an existing software version with a new one;
- The deployment is usually the result of long development and testing cycles;
- It assumes a minimal chance of failure as rollbacks may be impossible or impractical;
- The completion time is usually long and can take multiple teams’ efforts;
- It may require action from clients to update the client-side installation.
Big bang deployments aren’t suitable for modern applications because the risks are unacceptable for public-facing or business-critical applications where outages mean huge financial loss. Rollbacks are often costly, time-consuming or even impossible.
The big bang approach can be suitable for non-production systems (e.g., re-creating a development environment) or vendor-packaged solutions like desktop applications.
Rolling, phased or step deployments are better than big bang ones because they minimize many of the associated risks like user-facing downtime without easy rollbacks. In a rolling deployment the old version of an application is gradually replaced with a new version. The new and old versions will coexist without affecting functionality or user experience. The actual deployment happens over a period of time. Also, it’s easy to roll back any new component incompatible with the old components.
The following diagram shows the pattern where the old version is shown in blue and the new version is shown in green across each server in the cluster.
An example of rolling deployment can be the upgrade of an application suite. If the original applications were deployed in containers, the upgrade can tackle one container at a time: Each container is modified to download the latest image from the app vendor’s site, and the container is re-created. If there is a compatibility issue for one of the apps, the older image can be used to recreate the container. In this case, the new and old versions of the suite’s applications coexist until all the apps are upgraded.
Blue-Green, Red-Black or A/B Deployment
This is another fail-safe process. In this method, two identical production environments are used in parallel. One is the currently running production environment receiving all user traffic (depicted as Blue, Red or A), the other is a clone of it, but idle (Green, Black or B). Both use the same database back-end and app configuration. The setup is shown below:
The new version of the application is deployed in the green environment and tested for functionality and performance. Once the testing results are all good, application traffic is routed from blue to green. Green then becomes the new production.
If there is an issue after green becomes live, traffic can be routed back to blue.
In a blue-green deployment both systems use the same persistence layer or database back end, so it’s necessary to keep the application data in sync. You can do this with a mirrored database. You can use the primary database by blue for write operations and use the secondary by green for read operations. During switchover from blue to green, the database is failed over from primary to secondary. If green also needs to write data during testing, the databases can be in bidirectional replication.
Once green becomes live, you can shut down or recycle the old blue instances. You might deploy a newer version on those instances and make them the new green for the next release.
Blue-green deployments rely on traffic routing. This can be done by updating DNS CNAMES for hosts. However, changes can be delayed by long TTL values. Alternatively, you can change the load balancer settings and the changes take effect immediately. Features like connection draining in ELB can be used to serve in-flight connections.
Canary deployment is like blue-green, except it’s more risk averse. Instead of switching from blue to green in one step, you use a phased approach.
With canary deployment, you deploy a new application code in a small part of the production infrastructure. Once the application is signed off for release, only a few users are routed to it. This minimizes any impact. If there are no errors reported, the new version is gradually rolled out to the rest of the infrastructure. The image below shows this:
The main challenge of canary deployment is to devise a way to route some users to the new application. Also, some applications may always need the same group of users for testing, while others may require a different group every time.
Devising a way to route new users can be achieved with several techniques:
- Exposing internal users to the canary deployment before allowing external user access; The routing can be based on the source IP range;
- Releasing the application in only certain geographic regions;
- Using an application logic to unlock new features to certain users and groups. This logic is removed when the application is made live for the rest of the users.
Deployment Best Practices
Modern application teams can follow a number of best practices to keep deployment risks to a minimum.
- Use a deployment checklist. For example, one of the items in the checklist can be “backup all databases only after app services have been stopped.” An item like this can prevent data corruption;
- Use continuous integration (CI). CI ensures code checked into the feature branch of a code repository is merged with its main branch only after it has gone through a series of dependency checks, unit and integration tests, and a successful build. If there are errors along the path, the build fails and the app team is notified. Using CI therefore means every change to the application is tested before it’s made available for deployment;
- Use continuous delivery (CD). With CD, the CI-built code artifact is packaged and deployed in one or more environments. The incremental deployments thus minimize risk;
- Use standard operating environments (SOEs) to ensure environment consistency. You can use tools like Vagrant and Packer for development workstations and servers;
- Use Build Automation tools like CloudFormation to automate environment builds. It should be simple to click a button to tear down an entire infrastructure stack and rebuild from scratch;
- Use configuration management tools like Puppet, Chef or Ansible in target servers to automatically apply OS settings, apply patches or install software;
- Use communication channels like Slack for automated notifications of unsuccessful builds and application failures;
- Have a process for alerting the responsible team on deployments that fail. Ideally you’ll catch these in the CI environment, but if the changes are deployed to prod you’ll need a way to notify the responsible team so they can fix the problem;
- Consider automated rollbacks for deployments that fail health checks not just for availability but also for error rate.
Despite best efforts and careful planning, deployments can often go wrong. It’s therefore best to use a tool to monitor application performance and errors after deployment.
An application performance monitoring (APM) solution will provide you a way to monitor critical performance metrics such as server response times after deployments. Changes in application logic or system architecture can dramatically affect application performance. When those changes exceed your service level objectives (SLO) you need your operations team to investigate and potentially roll back.
An error-monitoring solution like Rollbar is equally essential when practicing CI/CD. It will quickly notify your team and give them visibility to new errors that may be negatively impacting customer experience. New or reactivated errors could indicate bugs in the code that require developer attention. Error monitoring allows you to proactively fix those problems before they are reported to the support or sales teams. It will also help your team resolve errors faster by pointing them to the code changes responsible for the error messages.
With the right deployment strategies and tools in place, your team should be able to make releases more frequently with confidence and deliver a great customer experience.
Feature image via Pixabay.