HPC Needs to Be Built for the Cloud, Not Just Run on the Cloud

With governments and the private sector increasingly pursuing digital and cloud transformation strategies, engineering and R&D within these organizations and agencies are also being pushed to shift to the cloud.
This transition is accelerating as engineering and R&D are finding that their ability to innovate and create new products is severely impacted by traditional high-performance computing (HPC) practices. With HPC as the engine that powers engineering and R&D, the demand for HPC resources continues to rise. Simply lifting and shifting workloads from on-premises to the cloud is not enough.
A lot of money is at stake here, so getting it right matters a lot. According to Grand View Research, the overall HPC market is expected to reach $53.6 billion by 2027. Government agencies represent the largest and fastest-growing category of adoption. Digitally transforming engineering and R&D requires a fundamental reconceptualization of HPC practices. The focus needs to move from optimization of HPC hardware to optimization of R&D throughput.
For HPC to serve as a growth and innovation engine as workloads move to the cloud, engineering needs a culture change on par with the shift in software development over the past decade from waterfall processes to agile programming with continuous integration and continuous delivery. New capabilities and practices for HPC will require cultural adjustments as well.
Here are some of the chief challenges that may drive HPC to the cloud:
1. Supply chain issues continue to impact HPC vendors’ ability to fulfill customer orders.
The long delay is likely to last for an extended period. As a result, IT is unable to upgrade on-premises HPC infrastructures to meet increasing demand from engineering or any ad-hoc spike computing needs. This has caused longer wait times for engineers, as well as project delays.
2. Enterprises’ needs for HPC resources are accelerating.
As computational science and engineering methods see broader adoption, HPC is seeing increased demand. Additionally, organizations using simulation are onboarding new workloads and adopting additional techniques such as multiphysics and surrogate ML models, further driving up demand for HPC resources.
3. Talent shortage for HPC is impacting both HPC vendors and customers.
Many enterprises are finding that they do not have sufficient expertise to implement new technologies, slowing down their IT modernization efforts. Vendors meanwhile are not in a much better position to help.
4. Proliferation of islands of analysis in organizations.
Since traditional HPC approaches employ tightly-coupled systems, R&D teams create workload-specific technology stacks focused on their needs, generally disconnected from each other. The result has been slower product innovation and less collaboration across R&D teams.
5. Supporting increasingly distributed workforces.
Organizations have discovered that remote work is often feasible in a post-pandemic world. For engineering and R&D, the need is to provide anywhere access and better collaboration capabilities that provide flexibility and attract more talent.
Why Changing HPC Practices Matter in the Cloud
HPC practices today look a lot like software development 20 years ago. Companies employed waterfall development models using monolithic and proprietary technology stacks. The cloud launched a Cambrian explosion of open source tools, leading to social coding, microservices and continuous integration & delivery. The result has been empowered developers and dramatic acceleration in the innovation of new software services. Services like Twitter, Airbnb and Uber all owe their success to this cloud transformation.
The cloud transformation for the HPC stack has come later due to its complexity, with specialized computing hardware architectures and dominance of commercial packaged software. Today, every specialized architecture is available in the cloud, and simulation software providers are also adopting cloud business models.
However, just having the hardware and software in the cloud isn’t enough. Cloud transformation requires that we don’t just do a “lift and shift” but that we start with a “built for the cloud approach.”
Industry analyst research consistently shows increasing demand for HPC with the adoption of cloud HPC accelerating even faster. Key factors driving cloud HPC include new workloads in deep learning, machine learning and artificial intelligence, along with the need for more flexible architecture options so organizations can more easily run new workloads on the most performant architecture options.
Meeting these challenges means shifting our approach in HPC — changing the traditional tenets of HPC and being squarely focused on the outcomes we are trying to achieve (in fact, this is the work we do at Rescale).
Here are five key strategies in embracing this new era of HPC built for the cloud (vs. simply running HPC in the cloud). Organizations competing in HPC need to start shifting from:
1. Hardware-centric to user-centric. Scientists and engineers, like software developers, are increasingly the most expensive line item of any business. Solve for ease of use to optimize researcher productivity.
2. Inflexible to unlimited. Increasingly, it’s becoming impossible to remain competitive in any HPC vertical without the broadest optionality of tools and deployment models.
3. Siloed to connected. Team collaboration will be table stakes in any HPC market. Unify the islands of analysis to enable multiteam collaboration and sharing of best practices.
4. Static to intelligent. Organizations will need to make smarter and faster decisions on real-world cost-performance tradeoffs. Cloud offers unlimited options.
5. Manual to automated. Operationalizing policy-based control is a requirement for Global 2000 organizations. Automate it. Ensure security and compliance while also empowering engineers.