3 Phases to a Safe and Successful Cloud Migration
When migrating complex, business-critical applications from on-premises to the cloud, the safest, most effective approach involves three phases.
In phase one, applications are “lifted and shifted” to the cloud without refactoring or rearchitecting.
In phase two, IT can use cloned environments to make software development and testing more efficient while refactoring the application piece by piece to use cloud native services. Even without using cloud native services, incremental automation for environment builds can be introduced, starting with a non-automated working model and gradually creating a fully automated version.
And in Phase 3, most of the application exists in cloud native formats, but it maintains a connection back to on-premises. The concepts outlined in this post create a quick and safer path to cloud innovation, while simultaneously reducing the reliance on existing on-prem resources.
Let’s walk through this “safe path to the cloud” in detail.
Phase 1 – Lift and Shift
In Phase 1 on-prem applications are reproduced in the destination cloud without refactoring or rearchitecting. This allows IT to apply “cloud flexibility” to what was historically a “cloud stubborn” system, enabling fast cloning, ephemeral longevity, software-defined networking and API automation to be applied to the application now running in the cloud. After this step is complete, IT can hand out environment “clones” to multiple engineering groups, allowing each to work simultaneously and independently at an accelerated pace.
It’s good practice to create a “clone” of the eventual system of record that will exist on-prem, but not re-engineer any components into cloud native equivalents. Create the same number of VMs/LPARs (logical partitions), same memory/disk/CPU allocations, same file system structures, same exact IP addresses, same exact hostnames, same network subnets, etc.
Note that applications running in IBM i series or AIX cannot be lifted and shifted directly to Azure, the Google Cloud Platform or Amazon Web Services without a specialized modifications. However, there are solution providers that offer this capability. The investment cost of doing this is generally outweighed by the benefit of adding cloud capabilities to these traditional applications.
Once a collection of VMs/LPARs representing an “environment” has been created in the cloud, the environment is saved as a single object called a template. The template is used to clone other working environments. Clones are exact duplicates of the template, down to the hostname, IP address, subnet, and disk allocations. Multiple environment clones can be running simultaneously without colliding, although the work required to set this up varies among cloud providers.
Creating ready-to-use environments from a template is the most powerful component of the cloud-based approach. It provides multiple exact copies of the reference system to be handed out to numerous engineering/dev/test groups, all of which can be running in parallel. There is no need to change the IP address of individual servers or their hostnames.
Each environment runs in a virtual data center in harmony with the others. If environments need to communicate to other on-prem resources, they are differentiated via an isolated NAT mechanism, as described below. Many of the environments contain the same VM clone base image(s) with the same hostnames, IP addresses, and so forth.
If necessary, assign Template to Projects and assign those Projects to groups of users. Most cloud providers offer a built-in access control/security model so users can only work on what has been assigned to them — a QA user cannot see an environment solely assigned to ENG, for example. Users also have role assignments that allow them to view/edit/admin VMs/LPARs defined in an environment assigned to a project.
How to Create Cloned Environments with Duplicate Address Spaces
To create multiple working environments that replicate the same network topology as the final target system, some form of isolation must be implemented to avoid collision across duplicate environments. In this case, “replicate” means re-using the same hostnames, IP addresses and subnets within each environment. Ideally, each environment should exist within its own software-defined networking space that is not visible to other environments that are running. Each environment becomes a virtual private data center.
Here’s one way to achieve this using an “environmental virtual router” (EVR). Cloned environments communicate back to upstream on-prem resources via the EVR and the EVR hides the lower VMs containing duplicate hostnames and IP addresses and exposes a unique IP address to the greater on-prem network. This creates a simplified and elegant way for multiple duplicate environments to exist in harmony without breaking basic network constructs.
By allowing duplicate hostnames and IP addresses to exist, individual hosts do not have to go through a “re-IP” process, which is error-prone and time-consuming. The EVR paired with a “jump-host” can be configured to forward ssh requests (via ssh proxy, OpenSSH 7.x and higher), this allows SSH into each unique host in an environment. From on-prem, users would SSH to any host in the environment (e.g. ssh user@environment-1-host-2), which exposes a unique IP address to on-prem, and then relays down to the VM within an individual environment.
Once the cloned environments have been created and passed off to the appropriate teams, the project is ready to move on to the next phase.
Phase 2 – Replatform / Refactor
Once application components have been migrated to the cloud, they can be used as a dev/test “sandbox” while IT incrementally refactors them to native cloud services using proven design patterns such as “Side Car” or “Strangler”. Developers can also slowly build automation into the application piece by piece even if they are not using cloud native services. As an alternative, the application can simply be re-hosted (see the Microsoft post “The 5Rs of Rationalization” for more information on how to make this decision) without significantly changing its original on-prem structure.
This approach follows the “Strangler Pattern” methodology described by Martin Fowler. The conceptual Strangler process is more recently described here by Microsoft with the following visualization:
This incremental approach offers the following benefits to Agile teams who will ultimately refactor the application:
- It allows for an incremental approach to transformation instead of starting from scratch. Refactoring R&D is done in a rationalized manner so that the overall application continues to run. This prevents the creation of totally “net-new” development efforts that are application-wide in scope. A “start from scratch” approach across multiple applications traveling through the migration factory is high risk and goes against the Agile principle of “limit work in progress.”
- Increased velocity. By cloning the original on-prem application, complete working versions of the reference application can be delivered to agile teams performing short-duration sprints, each investigating different aspects of the application to be refactored.
- Splitting up the entire application into smaller parts lowers the risk for the entire project and will potentially shorten the overall migration. Delivering a small part of the refactoring as opposed to the entire application aligns with the agile manifesto values of “working software” and “responding to change.”
Phase 3 – Migration Complete
In Phase 3, most of the application exists in cloud native format but maintains a dedicated connection back to on-prem IT. If you think of the migration process as a factory, multiple applications will be traveling through the assembly line at different rates. When applications exit the factory at Phase 3, other applications on the transformation target list are entering Phase 1 and Phase 2. As experience grows working with cloud native services and components, the “factory floor” can speed up, since solutions to problems discovered earlier in the transformation journey can be applied quickly without requiring excessive R&D, trial and error, and re-work.
Safe, Agile and Successful
This three-phased approach enables organizations to take advantage of cloud benefits like capacity on-demand and matches to many Agile software development best practices. It speeds up many engineering, QA and dev/test processes by allowing teams to work simultaneously while reducing risk by not re-platforming or rearchitecting during the migration and by tackling that work in small, manageable increments. It’s both the safest migration strategy for important on-premise applications and the most likely to succeed.