Technology

Microsoft Puts AI Where the Data Is

25 Apr 2017 1:00am, by

If you want to do machine learning, you need data to do it with. So far, however, the complexity of machine learning tools has usually meant doing development with a framework like TensorFlow, the Microsoft Cognitive Toolkit, using R and Python and specialist statistical tools, or using cloud APIs to machine learning services.

Any of these approaches requires getting the data out of a database, and then integrating the output of the machine learning system with the applications. Those transforms and transfers and integrations make development and deployment more complex, slow things down, can be error prone, and discourage retraining models as frequently as you might want (to avoid ‘ML rot’).

With the second Community Technology Preview of SQL Server 2017 relational database management system (RDMS), Microsoft is adding in-database machine learning functions as stored procedures, plus support for Python as well as R. SQL R Services, now called SQL Machine Learning Services, and this interface also lets you reach out to GPU-powered analytics, data processing and machine learning tools like deep learning frameworks.

“You never have to take your data out of the database, so you have all the security and audit tools that you’re used to,” Joseph Sirosh, corporate vice president of the Microsoft data group, explained to The New Stack. He called SQL Server 2017 “The first commercial, transactional RDBMS that supports AI, that supports intelligence in the database. The manageability is huge part of the value that a database brings and now you have an intelligence management system.”

Speed and Security

Although Microsoft has an increasing range of machine learning services (Cognitive Services now includes 25 APIs), you can connect a wide range of machine learning tools to SQL Server 2017. “The speed of innovation in artificial intelligence is in open source,” said Sirosh. A query could use Python or R code to invoke a GPU-powered library to transform data, then run deep learning on that transformed data, and get a deep-learned prediction.

“Intelligent solutions will not be restricted to any one company, they’re going to be democratized the way all computing has been and they’re going to be created by mainstream developers,” Sirosh said. “If you think about what developers need to create their intelligence revolution, we think they need simplification of intelligence and availability of intelligence in the platforms they use.”

“When you locate the algorithms right next to the data in the data platform, you don’t have to slosh data around; the algorithm comes to the data and it runs dramatically faster because it’s running in place.”  — Joseph Sirosh.

Putting intelligence in the database is about performance, but it’s also about manageability and developer productivity, he said. “The data we learn from is massive; you can’t move that around networks without incredible slowdown. When you locate the algorithms right next to the data in the data platform, you don’t have to slosh data around; the algorithm comes to the data and it runs dramatically faster because it’s running in place.”

Compliance and security are another good reason to keep the data you learn from in the database; after all, it’s often data about your customers that will damage your brand and your bottom line if it leaks. “Databases provide high availability, access control, security, encryption; you can take advantage of that. You can train deep learning models with data that resides in the database and you can deploy them in the database itself and you’ve never take data out of the database.”

Because the intelligence features in SQL Server are treated like any stored procedure, users can also take advantage of the other SQL Server built-in security and access controls like hiding rows and columns users don’t have the rights to see. “You can learn with an identity that has access to all the data but when you deploy [your intelligent app], the identity using it might not have access to the privacy-sensitive piece of data because those can be masked off,” Sirosh pointed out. You could even create data simulations and what-if scenarios in SQL Server if you have a limited training set that you need to bulk up.

 

Where the Machine Learning Meets the Apps

Once your machine learning system is trained, you need to operationalize it. Often that means rewriting R code in another language like JavaScript so you can run it in a web server, as well as provisioning the system to run it; another inefficient and time-consuming step.

If you need to draw data from multiple sources like Hadoop, SQL Server already includes the Polybase technology, which dramatically simplifies querying Hadoop. Rather than computing joins between the different data sources and setting up a highly available system, which is complicated to build, analyse and scale out as usage grows, administrators can use the to execute the heavy lifting,” given that “they have support for concurrency, for joins, for data management, as well as security,” Sirosh said.

SQL Server 2017 offers a built-in platform to serve machine learning models from, with monitoring and performance tools, and the usual database development tools, like SQL Server Management Studio and the SQL Server Data Tools, as well as Visual Studio. You can expect these tools to be better integrated in future, Sirosh suggested, as well as for more machine learning models to be built in. “In the future, we want to make AI functions into simple SQL functions, like the ‘analyze faces’ function in SQL Azure.”

Microsoft is applying the same principles of putting data and intelligence tools in the same place to R Server and its Azure cloud data services. Azure Data Lake Analytics lets you run U-SQL, R, Python and .NET code against petabyte-scale databases and U-SQL includes a number of the APIs from Cognitive Services as functions you can call.

If you’re storing data in Microsoft’s globally distributed NoSQL service, DocumentDB, that now integrates with Spark so you can run machine learning on that data. And Microsoft R Server 9.1 includes several machine learning algorithms from Microsoft, plus pre-trained neural network models for sentiment analysis and image recognition.

But for many enterprises, SQL Server is still where their data lives and it’s what drives the apps that create and use that data. The history of SQL Server — much like the history of Windows Server — is Microsoft taking features like high availability, data analytics and business intelligence that were once reserved for expensive, high-end systems and bringing them to mainstream businesses at affordable prices.

While the machine learning and artificial intelligence landscape is far more complicated than the database market, if Microsoft can turn SQL Server into a platform where enterprises can work with machine learning and AI from the comfort of their own database systems, where they have familiar controls and development tools, then AI and ML may truly find a home in tomorrow’s enterprise

Feature image via Pixabay.


A digest of the week’s most important stories & analyses.

View / Add Comments