Kinesis, Kafka and Amazon Managed Service for Apache Flink
Apache Flink is an open source framework and a distributed processing engine that offers connectors to multiple data sources. It does computations such as joins, aggregations, and extrat, transformation, and load (ETL) capabilities. It allows for advanced real-time techniques such as complex event processing.
Deepthi Mohan and Nagesh Honnalli of AWS joined us on The New Stack Makers to discuss Apache Flink, Amazon Managed Service for Apache Flink (MSF), and the past ten years that saw the emergence of Amazon Kinesis, and the eventual focus on Flink as a data framework, known for its connectors to third-party services, including those developed by the users.
MSF provides a service that supports customers with different preferences and requirements, Honnalli said.
Some customers would like to have complete control, Honnalli said. And then there are customers on the other end of the spectrum who want to avoid doing anything related to infrastructure. They don’t want to know what kind of instances are running. They want AWS to manage it for them.
Use cases fall into three buckets, Honnalli said. There’s streaming ETL, which, for example, would include log aggregation for storing and later auditing purposes. Second is real-time analytics to help customers use dashboards, for example, to understand their transactions for fraud detection. Third is complex event processing with data from multiple sources that needs joining and aggregating to make more sense of the information.
MSF launched in 2018. But the origins of the service weave a story that speaks to Amazon Web Services’ role with its managed services and technologies.
In 2013, AWS launched Amazon Kinesis, while Apache Kafka emerged in the open source community. These services came long before MSF. They reflected a shift to real-time data and the eventual evolution of Flink and its diverse connector ecosystem.
Under the Kinesis umbrella, AWS offers Amazon Kinesis Data Firehose, which delivers data streams to destinations like Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and AWS Lambda.
In 2016, AWS also launched Kinesis Data Analytics, a SQL-based offering. AWS customers expressed that they wanted the ability to write code in different languages, primarily JVM-based languages like Java and Scala, to do more complex things than SQL alone could not provide.
“So which is why in 2018, we decided to support under the Kinesis Data Analytics umbrella, the Flink offering,” Mohan said. “We’ve been running Flink on AWS since the latter half of 2018.”
Thousands of customers are on the MSF service, Mohan said. In 2019, AWS also launched Amazon MSK, Amazon Managed Streaming for Apache Kafka.
“And that’s when things got a bit confusing because customers who were using Kinesis Data Analytics weren’t sure that they could be using Flink with Kafka,” Mohan said. So we listened to our customers. We wanted to improve our service’s awareness, so we are now renaming Kinesis Data Analytics for Apache Flink to Amazon Managed Service for Apache Flink. So it’s an existing service going through a rebranding exercise to attach to Flink because that’s the open source project we are based on.”