AWS Serves Up Tools for Data Heads, Cloud Native Security

AWS has delivered a slew of new tools and previews of products and services for developers, data analysts, security specialists, DevOps practitioners and others, as is typical at its annual re:Invent conference this week in Las Vegas.
Making announcements at a dizzying pace during his more-than two-hour keynote, AWS CEO Adam Selipsky, shed light on new AWS technology for nearly every specialty. For instance, for data heads, Selipsky announced new products that put Amazon Web Services (AWS) on the road to delivering a “zero-ETL [extract, transform, load] future,” he said to hearty applause from the audience.
Zero ETL
“This is our vision, what we’re calling a zero-ETL future and in this future, data integration is no longer a manual effort,” Selipsky said.
One of the places where customers spend the most time building and managing ETL pipelines is between transactional databases and data warehouses, which is where AWS set its sights.
“We’ve been working for a few years now…to make it easier to do analytics and machine learning without having to deal with ETL,” Selipsky said as he introduced Amazon Aurora zero-ETL integration with Amazon Redshift and Amazon Redshift integration for Apache Spark.
Amazon Aurora now supports zero-ETL integration with Amazon Redshift, to enable near-real-time analytics and machine learning (ML) using Amazon Redshift on petabytes of transactional data from Aurora, AWS said in a press release.
“Within seconds of transactional data being written into Aurora, the data is available in Amazon Redshift, so you don’t have to build and maintain complex data pipelines to perform extract, transform, and load operations,” AWS said.
Moreover, customers can use Amazon Redshift’s analytics and capabilities such as built-in ML, materialized views, data sharing and federated access to multiple data stores and data lakes to derive insights from transactional and other data, AWS said.
“Zero ETL between Aurora and Redshift is something I’ve been begging them to do for a long time as AWS needs to make life simpler — take the burden of integration off the shoulders of their clients. Versus folks like Oracle that do analytics and OLTP in the same database, they need to remove this point of pain,” said Tony Baer, an analyst at dbinsight.
Meanwhile, Amazon Redshift integration for Apache Spark helps developers build and run Apache Spark applications on Amazon Redshift data. Customers using AWS analytics and machine learning services — such as Amazon EMR, AWS Glue and Amazon SageMaker — can now build Apache Spark applications that read from and write to their Amazon Redshift data warehouse without compromising on the performance of your applications or transactional consistency of their data, AWS said.
“In-database Spark in Redshift follows a similar theme,” Baer said. “Google is already doing this with BigQuery. It was only a matter of time before AWS would respond.”
These are two more steps toward AWS’s zero-ETL vision.
“We’re going to keep on adding and finding new ways to make it easier for you to access and analyze data across all of your data stores,” Selipsky said.
Amazon DataZone
In other data-related news, AWS introduced Amazon DataZone, a new data management service that makes it easier for data producers to manage and govern access to data and enables data consumers to discover, use and collaborate on data to drive business insights. DataZone makes it faster and easier for customers to catalog, discover, share and govern data stored across AWS, on-premises and third-party sources, the company said.
DataZone is “a data management service that helps catalog discover, share and govern data across your organization,” Selipsky said. “DataZone enables you to send data free throughout the organization safely by making it easy for admins and data stewards to manage and govern access to data. And it makes it easy for data engineers, data scientists, product managers, analysts and other business users to discover, use and collaborate around that data to derive insights for your business.”
Early-use customers weighed in with thoughts about the new service. ENGIE, a global energy company, is an early DataZone user that took part in Selipsky’s keynote.
“Rather than building and maintaining a platform to support our data-sharing and governance needs, over the last six months we have been working with the Amazon DataZone team as a beta customer, providing input into creating an AWS-native service and are looking forward to using Amazon DataZone to disseminate data throughout the organization and gain simplified access to AWS analytics services and governance tooling,” said Gregory Wolowiec, chief technology officer at Data@ENGIE, in a statement. “This will empower our analysts and line-of-business-leaders to create innovative projects and make data-driven decisions.”
Meanwhile, sports, news and entertainment brand Fox Corp. also is a beta user of DataZone.
“Amazon DataZone will help streamline and automate our data discovery and sharing — with the right governance — so we can ensure it is accessed at the right time and with the right tools,” said Alex Tverdohleb, vice president of Data Infrastructure at Fox Corp., in a statement.
DataZone maintains a catalog by using machine learning to collect and suggest metadata for each dataset and by training on a customer’s taxonomy and preferences to improve over time, AWS said.
“In your catalog, you can define the taxonomy or the business glossary,” Selipsky said. “You can organize these in terms of datasets based on your organizational hierarchies … and then you connect data zones to data sources. And it uses machine learning to collect and populate your catalog with the appropriate metadata, and you can add labels and descriptions to provide additional information …”
Baer said DataZone drew his attention as one of the standout announcements from the event.
“It’s a response to Google Dataplex. I was surprised that this didn’t become an extension of Glue, which AWS is keeping with its ‘some ETL’ focus,” he said.
Amazon Security Lake
Targeting both data and security specialists, AWS also introduced in preview Amazon Security Lake, a data lake for security.
“Security Lake is a data lake that makes it easy for security teams to automatically select and combine security data at petabyte scale,” Selipsky said.
Amazon Security Lake automatically centralizes security data from the cloud, on-premises and custom sources into a data lake stored in a user’s account. Security Lake makes it easier to analyze security data so users can get a more complete understanding of the security across their entire organization. It helps users improve the protection of their workloads, applications and data. And it automatically gathers and manages all of a user’s security data across accounts and regions, AWS said.
“Amazon Security Lake is the first data lake that supports the OCSF standard,” Selipsky said. “Security Lake automatically collects and aggregates security data for partner solutions like Cisco CrowdStrike and Palo Alto Networks as well as more than 50 security tools integrated into the security hub.”
Indeed, Security Lake has adopted the Open Cybersecurity Schema Framework (OCSF), an open standard that helps normalize and combine security data from AWS and a broad range of enterprise security data sources.
Container Security
And for the cloud native set, Selipsky highlighted enhancements to its GuardDuty security service to cover containers.
“I’m really pleased to announce a new capability of GuardDuty that adds container runtime threat detection,” Selipsky said. “Now GuardDuty will help protect threats from software running inside the container by monitoring operating system-level behavior in the container itself, such as file access, process execution and network connection. It can detect an attempt to access underlying compute nodes and obtain an instance credential or identify a container that’s trying to communicate with the malicious actor’s command control surface.”
This represents a continuum. In January, AWS expanded the coverage of Amazon GuardDuty to continuously monitor and profile Amazon Elastic Kubernetes Service (Amazon EKS) cluster activity to identify malicious or suspicious behavior that represents potential threats to container workloads. Amazon GuardDuty for EKS Protection monitors control plane activity by analyzing Kubernetes audit logs from existing and new Amazon EKS clusters in user accounts.
“We’ve taken our security learnings and your feedback to build machine learning models that intelligently continuously monitor and identify hard-to-detect threats, often a lot faster than other security products,” Selipsky said.