What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
AI / Cloud Services

Unlock the Next Wave of Machine Learning with the Hybrid Cloud

Even when analysts and data scientists overcome the hurdle of getting access to data in other parts of the business, they quickly find that they lack effective tools and hardware to leverage the data. Domino Data Lab's Kjell Carlsson discusses how to achieve AI transformation with a hybrid-cloud machine learning strategy.
Mar 22nd, 2023 10:00am by
Featued image for: Unlock the Next Wave of Machine Learning with the Hybrid Cloud

Machine learning is no longer about experiments. Most industry-leading enterprises have already seen dramatic successes from their investments in machine learning (ML), and there is near-universal agreement among business executives that building data science capabilities is vital to maintaining and extending their competitive advantage.

The bullish outlook is evident in the U.S. Bureau of Labor Statistics’ predictions regarding growth of the data science career field: Employment of data scientists is projected to grow 36% from 2021 to 2031, much faster than the average for all occupations.

The aim now is to grow these initial successes beyond the specific parts of the business where they had initially emerged. Companies are looking to scale their data science capabilities to support their entire suite of business goals and embed ML-based processes and solutions everywhere the company does business.

Vanguards within the most data-centric industries, including pharmaceuticals, finance, insurance, aerospace and others, are investing heavily. They are assembling formidable teams of data scientists with varied backgrounds and expertise to develop and place ML models at the core of as many business processes as possible.

More often than not, they are running headlong into the challenges of executing data science projects across the regional, organizational, and technological divisions that abound in every organization. Data is worthless without the tools and infrastructure to use it, and both are fragmented across regions and business units, as well as in cloud and on-premises environments.

Even when analysts and data scientists overcome the hurdle of getting access to data in other parts of the business, they quickly find that they lack effective tools and hardware to leverage the data. At best, this results in low productivity, weeks of delays, and significantly higher costs due to suboptimal hardware, expensive data storage, and unnecessary data transfers. At worst, it results in project failure, or not being able to initiate the project to begin with.

Successful enterprises are learning to overcome these challenges by embracing hybrid-cloud strategies. Hybrid cloud — the integrated use of on-premises and cloud environments — also encompasses multicloud, the use of cloud offerings from multiple cloud providers. A hybrid-cloud approach enables companies to leverage the best of all worlds.

They can take advantage of the flexibility of cloud environments, the cost benefits of on-premises infrastructure, and the ability to select best-of-breed tools and services from any cloud vendor and machine learning operations tooling. More importantly for data science, hybrid cloud enables teams to leverage the end-to-end set of tools and infrastructure necessary to unlock data-driven value everywhere their data resides.

It allows them to arbitrage the inherent advantages of different environments while preserving data sovereignty and providing the flexibility to evolve as business and organizational conditions change.

Use Hybrid-Cloud to Weave Machine Learning into Every Part of Your Business

While many organizations try to cope with disconnected platforms spread across different on-premises and cloud environments, today the most successful organizations understand that their data science operations must be hybrid cloud by design. That is, to implement end-to-end ML platforms that support hybrid cloud natively and provide integrated capabilities that work seamlessly and consistently across environments.

In a recent Forrester survey of AI infrastructure decision-makers, 71% of IT decision-makers say hybrid cloud support by their AI platform is important for executing their AI strategy, and 29% say it’s already critical. Further, 91% said they will be investing in hybrid cloud within two years, and 66% said they already had invested in hybrid support for AI workloads.

In addition to the overarching benefit of a hybrid-cloud strategy for data science — the ability to execute data science projects and implement ML solutions anywhere in your business — there are three key drivers that are accelerating the trend:

  • Data sovereignty: Regulatory requirements like GDPR are forcing companies to process data locally with the threat of heavy fines in more and more parts of the world. The EU Artificial Intelligence Act, which triages AI applications across three risk categories and calls for outright bans on applications deemed to be the riskiest, will go a step further than fines. Gartner predicts that 65% of the world’s population will soon be covered by similar regulations.

  • Cost optimization: The size of ML workloads grows as companies scale data science because of the increasing number of use cases, larger volumes of data and the use of computationally intensive, deep learning models. Hybrid-cloud platforms enable companies to direct workloads to the most cost-effective infrastructure; e.g., optimize utilization of an on-premise GPU cluster, and mitigate rising cloud costs.

  • Flexibility: Taking a hybrid-cloud approach allows for future-proofing to address the inevitable changes in business operations and IT strategy, such as a merger or acquisition involving a company that has a different tech stack, expansion to a new geography where your default cloud vendor does not operate or even a cloud vendor becoming a significant competitor.

Three Key Elements of Your Hybrid-Cloud Strategy for Machine Learning

Implementing a hybrid-cloud strategy for ML is easier said than done. For example, no public cloud vendor offers more than token support for on-premises workloads, let alone support for a competitor’s cloud, and the range of tools and infrastructure your data science teams need scales as you grow your data science rosters and undertake more ML projects. Here are the three essential capabilities for which every business must provide hybrid-cloud support in order to scale data science across the organization:

  • Full data science life cycle coverage: From model development to deployment to monitoring, enterprises need data science tooling and operations to manage every aspect of data science at scale.

  • Agnostic support for data science tooling: Given the variety of ML and AI projects and the differing skills and backgrounds of the data scientists across your distributed enterprise, your strategy needs to provide hybrid cloud support for the major open-source data science languages and frameworks — and likely a few proprietary tools — not to mention the extensibility to support the host of new tools and methods that are constantly being developed.

  • Scalable compute infrastructure: More data, more use cases and more advanced methods require the ability to scale up and scale out with distributed compute and GPU support, but this also requires an ability to support multiple distributed compute frameworks since no single framework is optimal for all workloads. Spark may work perfectly for data engineering, but you should expect that you’ll need a data-science-focused framework like Ray or Dask (or even OpenMPI) for your ML model training at scale.

Embedding ML models throughout your core business functions lies in the heart of AI-based digital transformation. Organizations must adopt a hybrid-cloud or equivalent multicloud strategy to expand beyond initial successes and deploy impactful ML solutions everywhere.

Data science teams need end-to-end, extensible and scalable hybrid-cloud ML platforms to access the tools, infrastructure and data they need to develop and deploy ML solutions across the business. Organizations need these platforms for the regulatory, cost and flexibility benefits they provide.

The Forrester survey notes that organizations that adopt hybrid cloud approaches to AI development are already seeing the benefits across the entire AI/ML life cycle, experiencing 48% fewer challenges in deploying and scaling their models than companies relying on a single cloud strategy. All evidence suggests that the vanguard of companies who have already invested in their data science teams and platforms are pulling even further ahead using hybrid cloud.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.