As we enter the final months of the 2010s, cloud computing has become ubiquitous. I’ve spent the last decade working as a cloud architect and product manager and have seen up close how the space has evolved.
What do all these changes mean for IT infrastructure moving forward? I’ve been thinking seriously about that and it’s time for new rules:
1. Modifications Disadvantage You
Don’t you squirm when you hear vendors proclaim they are able to somehow make open source projects “enterprise-ready” by releasing a “hardened” version of infrastructure software based on upstream projects? That game is over. OpenStack, for example, has been stable for many releases and is capable of addressing even the most advanced use cases and workloads without any vendor intervention.
“Hand-crafted” is for beer, not infrastructure.
This is the most important new rule of all because it is self-limiting to not follow it. Why would you restrict the number of people able to work on, support, and innovate with your platform in production by introducing downstream patches? The whole point of open infrastructure is to be able to engage with the larger community for support and to create a common basis for hiring, training, and innovating on your next-generation infrastructure platform. Don’t lose that advantage.
2. Standardization Is King
“Hand-crafted” is for beer, not infrastructure. Large-scale implementations are without exception based on standardization of components and simplicity in architecture. The only way to assure that knowledge of clusters can be transferred to new teams or recover from staff departures is to avoid customized reference architectures that introduce technical debt.
3. Automate for Well into the Future
Almost all teams have not automated to the degree they ought to. Most of them realize this at some level but fail to do something about it. Certain tools that have become popular with operators address the first 80 percent of automation use cases nicely but not the rest. The result is that lifecycle management events such as upgrades, canarying, expansion, etc. remain too complicated and time-consuming.
When choosing orchestration automation, assume that the technology stack will change over the course of your hardware amortization period (typically five years). Your VMware of today might be an OpenStack of tomorrow, might turn into a Kubernetes cluster on top, right next to it on bare metal, or even be replaced by it. It is unrealistic to expect a specific set of hardware to be tied to a specific infrastructure throughout the entirety of its lifespan.
4. Run at Capacity On-Prem; Use Public Cloud as Overflow
If providing the best economics in the data center is an organization’s paramount goal, running on-premise infrastructure as close to capacity as possible is a natural. Hardware should be chosen to provide the best value for the money, which may not always lead to the lowest cost in that investment but will lead to the best economics overall, especially if the goal is to achieve comparable cost structures as public alternatives.
That said, don’t mistake this rule as a call to avoid public cloud. On the contrary, it’s smart to work with a minimum of two public cloud providers in addition to having a solid on-prem strategy that fulfills economic objectives. Having two public cloud partners allows for healthy competition and enforces cloud-neutral automation in your operations, a key attribute of a successful multicloud strategy.
5. Upgrade, Don’t Backport
As upstream project support cycles shorten — think of the number of supported releases and maintenance windows for OpenStack and Kubernetes, for example — it’s vital to get into the habit of upgrading rather than introducing technical debt that is exacerbating the costs of every lifecycle management event that comes after its introduction. With the right type of automation process in place, upgrading should be predictable, and a solvable problem in a reasonable amount of time.
6. Workload Placement Matters
Clouds are by nature dynamic, so debugging what happened when a service-level violation occurred needs to take the changing nature of the infrastructure into account. All clouds of reasonable volume have this problem, and most operations teams ignore the necessity of maintaining the correlation between what happens at the bare metal level all the way to what happens at the virtual and container level. Think about workload placement as you onboard tenants and establish the necessary telemetry to capture those events in their context. This will lead to predictive analysis, which ultimately may allow you to introduce AI into your operations (and the larger/more complex your cloud infrastructure is, the more urgent that will be).
7. Stop Treating Security as Separate
Most cloud projects are devised between developers and operations and infrequently do they involve a separate security team. The result? Security specialists are confronted with “done deals” to which they mostly react by throwing water on all of those plans. Security is a mindset and an original posture that should be exhibited from the start and continued to be focused on as part of the requirements analysis. Security is just as important and critical as any other non-functional requirement that has to be met by the cluster in order to meet the stakeholder’s expectations. So involve security early, often and stay close.
8. Embrace Shiny Objects
The whole point of open infrastructure is to foster innovation and to give companies a competitive edge through the acceleration of their next-generation application rollout. Why stand in the way? If your developers want containers, why not? If your developers want serverless, why not? Being part of the solution rather than deriding new technology stacks as “shiny objects” only highlights a lack of confidence in the existing operational paradigm and automation.
By adhering to these eight new rules, enterprises should be able to position themselves for maximum efficiency and productivity as a new decade dawns.