Machine Learning

Azure CTO: Open Source Is Key to Machine Learning in the Cloud, or on the Edge

22 Mar 2018 2:41pm, by

In this interview recorded at the Open Source Leadership Summit, Azure CTO Mark Russinovich and The New Stack’s founder Alex Williams discussed how Microsoft builds on and contributes to open source for Azure’s artificial intelligence (AI) and machine learning. As Russinovich — who had just given a keynote suggesting that AI owes its current strength to the combination of open source and the cloud — explained, “Fundamentally a lot of AI, machine learning and analytics is built on top of open source and it’s a key part of our strategy to build with and use that open source, as well as to contribute back and to add to open source.”

Examples of that range from Microsoft contributing its own enhancements and fixes to existing projects like YARN (which is used in Azure Data Lake Analytics), to supporting the R open source community, to working with Facebook and AWS on the ONNX project to exchange models between Caffe, MXNet and CNTK, Microsoft’s own deep learning framework — which is also open source.

“CNTK is our own intellectual property; it’s a differentiated convolutional neural network framework… that we developed internally for Bring. We use that internally for a lot of our cognitive APIs… and we contributed that to [the] open source [community],” Russinovich explained.

The way Azure brings those open source components to customers is usually as part of a service, like the Cognitive Services APIs or the way R is integrated with SQL Server and Azure SQL DB so you can do your machine learning where your data is, he said. “Either we’re making it available for use on our platform or powering our platform and providing services to customers [with it].”

Increasingly, customers put those pieces together as part of a pipeline to create machine learning-driven apps, which is a good fit for new application-development practices like serverless computing. In the keynote, Russinovich showed a new app called DiagnostiCX that uses machine learning to help doctors who don’t have access to a trained radiologist interpret chest x-rays for signs of pneumonia — one of the biggest killers of children around the world.

“It works as a pipeline; a new x-ray comes in that kicks off a workflow, which will go do analytics on it, that will then go and do scoring, which will result in that score triggering some workflow like a producing report that the doctor takes an action on in the application they get. Everything is basically data-driven, which corresponds to events that are triggering the production and the processing of that data.”

Other open source technologies like containers are also useful for putting machine learning apps into production. Azure services like the Machine Learning Workbench create data models that can run both on the Azure Container Service or on IoT devices at the edge. “On a factory floor with anomalous object detection for finding faulty parts that’s done with image recognition where you don’t want to send all the images of the good parts up to the cloud for additional processing and classification to… figure out what’s causing the problems in the factory assembly line; you want to just take the anomalous ones and send them up. So, if you put the image detection right on the edge you save all that bandwidth.”

Azure also has tools like IoT Hub for bringing that all that together into a functional and manageable IoT application that can be provisioned remotely, with tools for both developers and the ops team, Russinovich pointed out. “What we’re looking at is let’s create an application framework and infrastructure to support the most efficient way to manage an entire application to deployment to operations at scale, regardless of the topology of it.” It’s that combination of cloud and open source that have unlocked the power of machine learning for developers, he maintains.

Microsoft is a sponsor of The New Stack.

A newsletter digest of the week’s most important stories & analyses.