Cloud Services / Data / Kubernetes / Machine Learning

Hybrid Cloud Machine Learning on Kubernetes with Azure Arc

8 Mar 2021 9:38am, by

Azure Arc is Microsoft’s hybrid solution for getting the simplicity and value of cloud services on any infrastructure, by putting a representation of that infrastructure in Azure so the automation, monitoring and policy tools that work in the cloud can manage it too.

Arc started with servers, VMs, Kubernetes and SQL Server databases — an on-premises equivalent of IaaS — and moved on to bring cloud PaaS services to the infrastructure you manage with Arc, starting with putting Azure Data Services on Kubernetes containers. That’s an “evergreen” database service (for SQL Server or PostgresDB) with the continuous updates, automation and elasticity of a cloud service, just on your own hardware or your infrastructure in other clouds.

The next Azure service to come to Arc is machine learning, which you can now use to run training locally on the data in the databases you’re managing with Arc. That could be data in a different cloud that you don’t want to copy to Azure, incurring data egress costs and latency, data you want to keep in your own data center for regulatory reasons, or data you want to process at the edge so you can act on it immediately.

Arc-enabled Machine Learning will be useful at the edge for workloads like predictive maintenance, monitoring assets in the field for failure or analyzing activity in retail locations, where you don’t have the connectivity or bandwidth to use cloud services.

“Customers leveraging machine learning today operate under a broad set of scenarios and environments,” a Microsoft spokesperson told The New Stack. “They may be in regulated industries like financial services or healthcare and have specific requirements as a result. We want to support our customers and their scenarios, so they can do machine learning now, where they have their data. With the flexibility of Kubernetes and Arc-enabled ML they can leverage existing hardware, work across data residency requirements, in edge scenarios, or when their data is in other clouds.”

Arc shows ML frameworks in native Kubernetes environments.

Microsoft suggested some typical customers for machine learning on Arc; banking, healthcare — medical imaging analysis or drug discovery — and renewable energy infrastructure like wind farms. “Banks commonly have data on-premises. They frequently need to run financial risk modeling predictions (for capital, credits, etc.) and rely heavily on machine learning models. They are also often highly regulated and want to run ML on-premises to meet requirements. Healthcare organizations commonly need multicloud support given the inter-organizational collaboration structures.”

“Wind farms are a great renewable energy source but fluctuate greatly depending on wind speed and other environmental factors. They also generate petabytes worth of data continuously onsite. ML models can help forecast energy capacity and make power commitments. With Azure Arc enabled ML, models can be trained on-premises without data movement, and as the data changes, the models retrain themselves and maintain accuracy.”

It will also allow organizations to use existing local infrastructure that’s not currently required for other workloads to run machine learning training that they’d otherwise be paying to run in Azure Machine Learning. If they need more resources temporarily, they could also burst out to AKS clusters.

“Orchestrating training jobs in this manner and lifecycle management of the Kubernetes clusters are done by the customer, in a manner consistent with their DevOps process and preferences,” Microsoft explained.

Machine Learning Pipeline

Managing all your different machine learning models with the same tools is also an important part of operationalizing machine learning. “All your models, no matter where they’re built, can be stored and tracked in a central location in Azure Machine Learning for sharing, reproducibility, and audit compliance,” Director of Azure Management Jeremy Winter explained at the Microsoft Ignite event.

Arc-enabled Machine Learning runs on Kubernetes clusters (with any centrally-applied Arc policies applying to workloads), uses native Kubernetes objects and sticks to standard Kubernetes concepts like separation of roles.

That helps with one of the conundrums of machine learning on Kubernetes; it’s a scalable, distributed infrastructure that works well for machine learning training, but machine learning professionals rarely have Kubernetes expertise.

Data engineers don’t need to learn Kubernetes to use Arc; it’s just another compute target for Azure Machine Learning Studio, the Azure Machine Learning Python SDK, Jupyter notebooks or frameworks like SciKit, TensorFlow, PyTorch, or MPI.

“Azure Machine Learning provides built-in support for many open source frameworks like SciKit, PyTorch, TensorFlow and more. Customers can use Azure Arc enabled ML to train models using built-in support for these open source frameworks,” Microsoft told us.

But Kubernetes operators don’t need to know those machine learning tools. The Azure Machine Learning agent can be deployed from the command line or with the usual Kubernetes patterns like GitOps and can run on any Kubernetes cluster that Arc supports. The agent extends the Kubernetes API so operators for tools like PyTorch and TensorFlow show up as Kubernetes objects in kubectl, and it’s also how the Azure Machine learning service can talk to your cluster and deploy training jobs for you.

Admins can see their Arc-enabled Kubernetes clusters in the Azure portal, whether they’re on Azure, Google Cloud Platform, Amazon Web Services, edge devices or in a data center, and choose the ones they want to run Arc-enabled Machine Learning on. “IT operators will need to provide storage access to the Kubernetes cluster, set up network security configurations to their data endpoints on-premises per their requirements, and then run the Azure Machine Learning agent,” Microsoft told us.

They can also use the portal to choose which data engineers they want to give access to — either the entire cluster or to just part of it, using familiar Kubernetes patterns.

“The design for Arc-enabled Machine Learning helps IT operators leverage native Kubernetes concepts such as operators, custom resource definitions, namespaces and labels. By letting the IT operator manage this setup, we create a seamless experience for data scientists who don’t need to learn or use Kubernetes directly. The flexible design also helps us support a broad range of customer scenarios such as using AKS clusters for training when data is in a different cloud, training on Azure Stack Hub, and more.”

When those data engineers use Azure Machine Learning Studio for building and deploying machine learning models, they can see their data sets, notebooks, experiments, models and pipelines, and the compute locations they can use. That includes the Arc-enabled Kubernetes clusters they have access to, which they can pick as a location directly or use inside a Jupyter notebook as a location for training the model.

Target Arc clusters from Jupyter notebooks.

Any training runs for that model show up in the list of experiments, so they can compare the performance and results of training a model in different locations or at different stages of development. This is particularly handy for remote sites and distributed staff, who might be working from home in different locations but can still collaborate on building machine learning systems through the cloud, without the data having to leave the remote site.

Arc-enabled Machine Learning doesn’t yet support other machine learning orchestration patterns like the popular Kubeflow but Microsoft is looking into how that would work. “We are talking to customers about their Kubernetes environments and are interested in getting more feedback on what they would like to see supported with Kubeflow — for example, pipelines, notebooks, or hyperparameter tuning and so on.”

Arc-enabled Machine Learning is available in public preview now: apply at this link, giving some details of the type of workload you’re interested in.

The Arc Roadmap

In the long run, Arc is the way Microsoft will bring more and more Azure services to run in the hybrid model. The Arc team confirmed at Ignite that Microsoft will announce some additional Azure Arc-enabled services later this year. We’re expecting one of those to be Arc App Services, which will combine Azure Functions and Event Grid into a PaaS deployment platform, but there is also a lot of customer interest in bringing Azure Key Vault to Arc.

Although you can use Arc to manage any Cloud Native Computing Foundation-certified Kubernetes distribution, Microsoft is also working with vendors to certify platforms like OpenShift, Charmed Kubernetes and RKE to run Arc-enabled Kubernetes clusters and it’s now added VMware Tanzu and Nutanix’s Karbon Kubernetes management solution to the list, for customers using their hyperconverged systems.

Feature image: Training runs from Arc clusters in Azure ML. All images courtesy of Microsoft.

A newsletter digest of the week’s most important stories & analyses.