3 Trade-offs to Consider When Deploying Apache Kafka in the Cloud

Organizations that are rapidly producing and processing high volumes of data — like Netflix, Salesforce, Shopify and even the United States Postal Service (USPS), are constantly applying and testing new methods to manage the complexity of data streaming in the cloud.
Data streaming and the real-time use cases it supports have transformed the way consumers and businesses operate in daily life. The examples are endless, from real-time location tracking and personalized content feeds to intelligent fraud analysis and operational monitoring.
Engineering, operations and data science teams are looking forward to learning from other experts in the data streaming space this October at Current 2022: The Next Generation of Kafka Summit. In the lead-up to the event, we’ve talked with speakers presenting various operational challenges of using Apache Kafka in the cloud.
Across every industry and business function, organizations are racing to deploy high-performance applications, streaming pipelines and analytics services to enhance their most critical capabilities. As a result, the data streams we so heavily rely on often start and end in chaos.
Handling Cloud Integration Challenges with Streaming Data
Companies are getting better at using Kafka to put data to work as soon as it’s generated, whether deployed in on-premises or cloud environments. Deploying Kafka workloads in the cloud, however, gives organizations the ability to massively scale their real-time capabilities — that is, if their integration architecture doesn’t get in the way.
The ongoing demand for “speed to value” undoubtedly delivers results — the business benefits of real-time ingestion, processing and observability have become essential for growth and competitive innovation. This speed-over-strategy approach threatens long-term operational reliability.
Spaghetti architectures abound, resulting in technical debt that promises future headaches at every stage of the software development life cycle.
Trade-off No. 1: Speed to Value vs. Scalable Integration Architectures

Viktor Gamov
Accelerated delivery timelines can leave little time for essential steps like integration testing. Viktor Gamov, principal developer advocate at Kong Inc., is covering the importance of integration testing during his Current 2022 breakout session, “Testing Kafka containers with Testcontainers: There and back again.”
According to Gamov, “Kafka has never been the easy tool to learn. There is inherent complexity that comes from the architecture because it’s a distributed system.” During his session, he plans to show attendees how they can use the Testcontainers library to assess whether complex data streaming stacks are ready for production.
Striving for Real-Time Data Ingestion across Multiple Sources
In a recent survey, 60% of tech leaders cited the difficulties integrating multiple data sources as their biggest hurdle when it comes to accessing real-time data. According to Jay Patel, software engineer and technical lead for Kafka connector at Snowflake, that challenge will likely grow as data streaming use cases advance.

Jay Patel
“Streaming data applies to most industry segments and big data use cases. Initially, applications may process data streams to produce simple reports and perform simple actions in response, such as raising alarms when key measures exceed certain thresholds,” Patel says. “Eventually these applications will be used to perform more sophisticated forms of data analysis, like applying machine learning algorithms, to extract deeper insights from the data.”
In his upcoming breakout session, “How Snowflake Sink Connector Uses Snowpipe’s Streaming Ingestion Feature to Load Data in Near Real Time,” Patel plans to demonstrate how users can spin up Kafka and Kafka Connect environments to overcome latency and bandwidth issues and load data into Snowflake databases in near-real time.
Real-time ingestion has become an essential capability for most organizations today, but that isn’t the only concern. To provide the most value, many artificial intelligence and machine learning (AI/ML) applications not only need to be fed data as soon as possible, they also need high data quality and completeness.
Trade-off No. 2: High-Speed Data Ingestion vs. Centralized Data Governance
To ensure all three, organizations need an effective data governance framework. Although tools like change data capture and stream processing can help organizations control data quality and completeness, centralized data teams almost inevitably become a bottleneck as the demand for their services outpaces their capacity.
Achieving Resiliency in Multiregion Deployments
As the data streaming space matures, technology vendors and internal product teams are moving toward more strategic, scalable approaches to stream governance like data mesh. Until that can become a reality, operations and DevOps teams are still expected to ensure the resilience and availability of Kafka workloads.

Sanjana Kaundinya
Managing data replication for Kafka cloud deployment has become increasingly complicated, as many businesses need to replicate Kafka topics across global deployments, while also navigating data sovereignty and security regulations. Sanjana Kaundinya, a software engineer who focuses on multiregion replication and cluster linking at Confluent, will be leading a session on the topic, “To Infinity and Beyond: Extending the Apache Kafka Replication Protocol Across Clusters.”
In today’s hybrid and multicloud world, downtime has become more costly — and more difficult to avoid. “When organizations have production traffic flowing through real-time critical applications, it’s difficult and sometimes even impossible to introduce downtime for managing operational issues,” Kaundinya says.
Organizations can automate and modify Kafka’s replication protocols to tailor its performance for their specific operational and disaster recovery requirements. But doing so comes at a steep cost, leaving organizations to consider the pros and cons of:
Trade-off No. 3: Global Resiliency vs. Operability
The overhead involved in building a truly resilient Kafka architecture at scale can be enormous. “Let’s say you write something in California. There may be some delay in New York, Europe or India, for that matter. How do you minimize that delay as much as possible? And how can you make it as seamlessly synchronous as possible without impacting performance, latency or availability?” Kaundinya says. “Distributed systems are very complicated. Then when you have a globally distributed system, it only becomes 10 times more complicated.”
Beyond Kafka: Connecting Across the Data Streaming Ecosystem
To sustain innovation and developer productivity, organizations using data streaming today need to maximize the value of their streaming data with the scalability of the cloud. That requires carefully navigating operational tradeoffs when developing, deploying and managing cloud native applications, especially in hybrid cloud or multicloud environments.
How organizations evaluate and balance these tradeoffs will depend on each company’s unique set of business goals and technical requirements. The ecosystem of data streaming technologies certainly consists of much more than Kafka — from other messaging systems to the supporting cast of data ingestion, integration, monitoring and developer tools.
Learn how companies across the data streaming ecosystem are approaching these challenges at Current 2022, Oct. 4-5, in-person in downtown Austin or with a virtual pass. And check out the event agenda to see the topics and discussions that speakers like Patel, Kaundinya and Gamov are bringing to the table.