Hardware Independence Is Critical to Innovation in Machine Learning
We’re in the midst of a storm. On one side we have a global chip shortage with no end in sight, forcing major companies to re-forecast their targets for products. On the other, there’s a race between the big hardware players (e.g., Intel, Nvidia, Arm) and smaller, more specialized chip startups (e.g., Cerebras Systems and Ampere Computing) all vying for dominance in artificial intelligence.
To make matters worse, we (ML researchers and practitioners) live in a world of hardware dependence, where models and software have to be tuned and optimized according to specific hardware characteristics and frameworks. This is in stark contrast to the software world, where code is easily portable and established standards like DevOps exist.
This, along with the current economic climate, is forcing the ML industry to shift the focus from innovation to budgets. Given increasing inflation and talks of recession, many companies are looking to cut costs without impacting user experience, but it’s proving difficult. But it’s also not surprising since 90% of ML compute costs are tied to inferencing production workloads.
If it were possible to migrate those workloads from one cloud instance to another, or from cloud to edge, it could save money, but could also degrade performance to the point where the app or service is unusable.
This is a shock to many in the software industry because we’ve become accustomed to shifting around generic software workloads at major price/performance benefits from the latest and greatest chip innovations, like cloud GPUs. But not in the case of ML development.
For one, the costs and performance of specific models on different hardware are highly variable and often unpredictable. Two, migrating ML models in production requires specialized talent, which is scarce and costly. Three, even if you had the talent, it still takes many weeks of manual work to migrate workloads — and that’s per model. And even if you overcame all of these hurdles, you have a 50% chance of deriving cost or performance benefits at all.
Billions of dollars in time, talent and R&D are going into solving this dependence problem with little result. And this is threatening AI innovation.
We as an industry must achieve hardware independence if we’re to fulfill the promise of AI. Achieving hardware independence will enable faster innovation, unlock hybrid options for model deployment and ultimately save practitioners time and energy.
Breaking the Dependence Barrier
I won’t get into all the brass tacks of how to overcome this problem here, but the bottom line is all of this can be solved through automation. Once you’re able to make models portable and automate deployment, you can achieve:
Faster Innovation: The negative impact of hardware dependence can manifest in different scenarios. What if a certain piece of hardware you need to deploy your application isn’t available in your region — or at all given the shortage? For example, you’re looking to deploy on a GPU but availability is constrained. Enabling choice to move to, say, a CPU, would take pressure off the GPU and you can run your application at peak performance.
Here’s another example. What if you are trying to scale an application but you can’t predict the load on the application accurately enough to add/remove instances to meet changes in demand? Hardware independence can enable you to easily optimize to run their models where you want to run them in order to scale as needed.
The operative word here is choice. Some hardware is simply better suited for certain applications, so having the choice of hardware target is critical. Think of resource-constrained environments, like the edge, where you have to live with edge processors. Having independence enables practitioners to operate in any environment. This applies to any environment: wearable devices, smart cameras, self-driving cars, you name it.
Ultimately, eliminating dependence by enabling choice in hardware targets can help you use what you have, and extend the life of what you have by maximizing performance.
Hybrid Deployment: Hardware independence enables ML models to migrate between or even be split between on-premise and cloud-to-edge. Think of the latest Stable Diffusion models that generate art. To make it viable in its current form to be deployed in a mobile app, it may be necessary to split the workload, one on-prem and one in the cloud. Or think of a language model that represents a style of writing. If you’re typing on a computer, you’re running this model in the cloud. But if you type on your phone, you’d want to be running the model locally for responsiveness. Enabling fluid migration of ML models between different hardware will enable new experiences and the mode impact of ML on applications.
Energy/Cost Efficiency: Hardware independence together with target-specific model optimization enables the choice to run where it is more energy or cost-efficient. Having the ability to run inference to highlight migration benefits from, say, GCP with Intel Cascade Lake to AWS Graviton3 can quickly show cost savings. In fact, we did this research ourselves and found that making this migration would save 73% on compute costs for natural language processing (NLP) models, such as GPT-2. And that’s just one example. The trend of specialized functional units in future hardware will make this even more pronounced.
The Future of ML Depends on Choice
We’re living in interesting times. ML is poised to effect real change across many industries. Businesses are investing heavily as a result. But ROI remains questionable given the limits to where companies can run and deploy applications. Success in ML anchors on adaptability and agility, which are both currently lacking.
Whatever the future holds, one thing is certain: hardware independence will be a defining factor in the success of ML. But given all the larger forces affecting computer availability and flexibility, relying on ML that is hardware-dependent will surely hinder innovation and business operations.