Cloud adoption is on the rise, with cloud spending expected to grow at more than six times the rate of general IT spending through 2020. Businesses today are tasked with moving huge amounts of data from on-premise to cloud storage platforms such as AWS S3. An effective data migration strategy is extremely important and often overlooked until its severity is realized during the migration process.
When we consider the sheer magnitude and scale of Big Data, the on-premise infrastructure operations are rather limiting — both in terms of scale and capability. It has been convenient for many organizations to default to cloud operations for everything related to Big Data and AI. A few scenarios where cloud migration can turn out to be the preferred option:
- You want a speedy application implementation and deployment
- Your project has started receiving heavy volumes of traffic overnight
- You are cautious about the effect of data center going down
- It is becoming expensive to administer the growing database needs
Challenges with Migrating Data
A common misconception about cloud migration is that it will be a one-time journey. But the reality is that the process of migrating data infrastructure to the cloud should happen gradually and systematically, whilst minimizing downtime and disruption to users. Moving data is only one part of the puzzle. There are several other challenges associated with cloud migration.
Cost can play a significant role in deciding the approach to be taken. Underestimating the resources involved in cloud migration can quickly cause costs to spiral out of control and cloud migration could eventually turn out to be a cash-eating monster. Lyft recently reported that it will be spending $300 million on Amazon Web Services over the next three years. With big data and cloud, there is also a looming elephant in the room of data security. Your organization’s sensitive data is put at risk when moved from on-premise to cloud. Companies can incur a large amount of economic losses if this data is leaked during the process. It is important to remember that the onus to secure data is yours, not the cloud provider’s. Another grave challenge is finding people with the right skill sets to execute a cloud migration plan successfully. Lack of knowledge on the ever-changing cloud technologies and insufficient skill sets can lead to slow, ineffective adoption in the way of seamless cloud migration.
Before starting a migration process, it is crucial to have a detailed analysis of the cloud’s dependencies and constraints, migration patterns, potential applications and the advantages of infrastructure as a service (IaaS). This will effectively launch you on the path that works best for your company. There are three primary types of cloud migration, based on the way different companies want to use cloud to accomplish their goals.
Data Migration Models
When broadly classified, we see three models of data lake migration from on-premise to the cloud:
This type of migration refers to moving an on-premises Hadoop cluster to one built ground up from basic compute instances in the cloud. This is the simplest migration model leveraging existing staff skill sets. It uses only the IaaS aspect of cloud with persistent compute instances, typically with instance local storage. Except for infrastructure access, security is entirely the cloud customer’s responsibility, as is the creation, configuration, monitoring and maintenance of the cluster.
Moving from Hadoop on-premises to using Hadoop as a service from the cloud provider is the second model of migration. Much of the heavy lifting around Hadoop cluster setup and configuration, and ensuring compatibility of Hadoop ecosystem components is left to the cloud provider. A data lake management application may aid in the creation and use of transient Hadoop clusters on demand and interface directly to cloud native persistent storage.
The third model of data lake migration involves a gradual transition from Hadoop on-premises to hybrid architectures — on-premises/cloud, using a variety of cloud native storage options and services in addition to the Hadoop ecosystem tools, adopting cloud service patterns for processing event streams, real-time analytics, and machine learning. This model presupposes a metadata management layer to remove any mismatch between the underlying technologies and provide a seamless data fabric view across all the data regardless of storage location.
There can be numerous ways for migration depending upon the set of options you choose:
- The aforementioned three migration models (Forklift, Hadoop AAS, Hybrid)
- Hadoop distributions (Cloudera, Hortonworks, MapR)
- Hadoop ecosystem tool variations
- Cloud service providers (AWS, Azure, GCP)
Meaningful comparisons will need to be done in the context of specific business and technical requirements.
Developing an Effective Data Migration Strategy
Your migration is unique to your Hadoop environment, so there isn’t really a one-size-fits-all migration plan. Make a plan for your migration that gives you the flexibility to translate each piece to a cloud-computing paradigm.
Knowing your current software architecture, infrastructure and database schemas helps in defining the timeframe, cost and effort required to implement your cloud migration. You can begin by evaluating the business use case of the data lake, security considerations as well as prioritize the apps/data that need to be moved in the first place.
POC on Subset of Data
Testing the waters before you go all-in with a new cloud vendor is highly recommended. You need to develop a proof of concept to validate the network challenges, feature parity, and performance comparisons. In this phase, you need to effectively test your workload and understand about cloud storage services, the necessary security controls and production cluster sizing.
As you have now verified the cloud provider and model as per your requirements, you can proceed with the migration process itself and begin moving your data and apps to the cloud. A phased-approach consistent with the chosen migration model takes into account the following:
- Infrastructure migration decisions — storage and compute, sizing, scaling, networking
- Security of data and governance of data access, and resource usage in the cloud
- Retooling data ingestion for sending to the cloud data lake data that is currently received by the on-premises platform from different sources
- Detailed inventory of on-premises data lake, and mapping to cloud platform
- Data transformation pipelines and corresponding translation to cloud mechanisms
- Application migration — forklift vs rewrite, processes for development, test, and production
- Migration options for historical data
- Data Lake management applications
As your data and applications are now successfully re-hosted, you can focus on automating processes within the new infrastructure and optimizing its performance. It’s best to put automatic testing frameworks to use and consider Infrastructure as Code (IaC) approach to streamline your deployment process. You can also double-check some of the most critical aspects of your infrastructure manually, e.g. security, compliance, performance, etc.
When migrating to the cloud, enterprises need a partner with a broad set of cloud migration capabilities to support a diverse array of technologies, regulatory requirements, operating models and target environments. Companies today frequently settle for what they can get now, rather than what they actually want or need. A comprehensive risk-evaluation backed by professional cloud expertise can help achieve one’s long-term strategic targets. Service providers on the other hand, must remain flexible in adapting to the evolving market demands in order to make the most of new technologies.
Cloud computing can offer a variety of organizational benefits — flexibility, efficiency and strategic value. With a thorough assessment, any organization can create a solid migration plan fitted to their short-term and long-term business objectives. As most successful companies have shown, the time and effort required for cloud migration processes are more than leveraged by the resulting gains in the quality, efficiency and speed to market technological solutions.
Still running your business on cumbersome and outdated infrastructure? It might be time you consider migrating your processes to the cloud with minimum risk to your business.