Amazon Web Services Brings Machine Learning to DataOps
Remember that old saying, “If the mountain will not come to Muhammad, then Muhammad must go to the mountain”? Well, Amazon Web Services is hoping to bring more machine learning to the worldwide development community.
Over the past few years, Amazon Web Services has exerted a lot of engineering effort into integrating the processes around creating and refining machine learning models into modern development lifecycles, developing a platform, Amazon SageMaker, to streamline the process.
Now, the company is taking the next step, integrating the ML workflows directly into the sources of data themselves. The company has incorporated its tool for automating the creation of ML models, called SageMaker Autopilot, into many of its chief data management services, said Swami Sivasubramanian, who manages ML/AI services for AWS, at this year’s annual, and virtual, user conference, AWS Re:Invent.
The idea is to provide users of AWS data storage, databases and data warehouse tools the ability to create models with an interface all database administrators know: the Structured Query Language (SQL).
“Machine learning is a very iterative process. You prepare some of the data, you train the model and you [check] the model is converging the right way. What often happens is during that process you realize that you need to change your data preparation, like include new labels in the data, new features in the data [or] combine the features in a different way,” said Bratin Saha, vice president for AWS ML, explaining to The New Stack why AWS is integrating data preparation with SageMaker and its machine learning infrastructure. “When you’re going through hundreds of models, you really need this closely coupled so it’s a single tool, and teams become much more productive [with] a single tool.”
This work actually started last year, with AWS integrating ML inside Amazon Aurora for relational database developers. This new feature allowed them to add ML capabilities to an enterprise application through a simple query. That year, it did something similar with its interactive query service, called Athena, allowing developers to access built-in or custom ML models directly from Athena ad-hoc queries.
.@awscloud launched a #ML service to check for #Bias in #AI models.Running in #SageMaker, #Clarify allows you to specify attributes (age, gender) that it will check for associated trends indicating prejudice, i.e. more denials for one particular group-@phenomenashlie #awsreinvent pic.twitter.com/xU9sMLznBN
— Joab Jackson (@Joab_Jackson) December 8, 2020
This year the integrations continue. The company’s Redshift data warehouse has been outfitted with machine learning capabilities. As the company explained in a blog post:
Amazon Redshift now enables you to run ML algorithms on Amazon Redshift data without manually selecting, building, or training an ML model. Amazon Redshift ML works with Amazon SageMaker Autopilot, a service that automatically trains and tunes the best ML models for classification or regression based on your data while allowing full control and visibility.
When you run an ML query in Amazon Redshift, the selected data is securely exported from Amazon Redshift to Amazon Simple Storage Service (Amazon S3). SageMaker Autopilot then performs data cleaning and preprocessing of the training data, automatically creates a model, and applies the best model. All the interactions between Amazon Redshift, Amazon S3, and SageMaker are abstracted away and automatically occur. When the model is trained, it becomes available as a SQL function for you to use.
Even Amazon’s graph database, Neptune, gets some ML smarts. A graph database can be used to examine the links between different entities, revealing patterns that can’t be identified strictly through an examination of the entities themselves. A new update to Neptune brings graph neural networks (GNNs), a technique to improve the accuracy of predictions by over 50% compared to traditional approaches, according to the company.
“Neptune ML uses the Deep Graph Library (DGL), an open-source library to which AWS contributes that makes it easy to develop and apply GNN models on graph data. As a result, you can now create, train, and apply ML on Neptune data in hours instead of weeks without the need to learn new tools and ML technologies. Now, any developer with data in Neptune can easily use ML on their graphs,” according to the company.
Amazon Web Services is a sponsor of The New Stack.