Modal Title
Data

LinkedIn Layered Architecture Minimizes Kafka Scaling Issues

With Kafka, too many data producers can cause issues, as can having too many data consumers. Here's how LinkedIn separated the resources to alleviate exhaustion.
Oct 13th, 2022 7:32am by
Featued image for: LinkedIn Layered Architecture Minimizes Kafka Scaling Issues

Fan-in and fan-out are both problems that occur when producers and consumers all fight for resources from the same Kafka broker. But there is a way to avoid this, LinkedIn has found, and it’s a layered architecture.

Ryanne Dolan, a senior staff software engineer at LinkedIn presented this information at the Current 2022: The Next Generation Kafka Summit, in his talk “Fan-in Flames: Scaling Kafka to Millions of Producers.” He sat down with The New Stack afterward for a one-on-one interview to further discuss this information.

LinkedIn’s architecture separate producer and consumer groups for purposes of geo-replication. Data flows between the two through its Brooklin data bus. Dolan, having the experience of working with LinkedIn’s layered topology early in this career, noticed the problem of fan-in and fan-out later on while working with clients and realized that LinkedIn’s layered approach also solved the fan-in fan-out issues. Separation of resource usage helps alleviate exhaustion.

Dolan spent his entire career working at massive scale in big tech so when asked how often did the average size company really face fan-in fan-out issues he admitted, “I laugh to myself all the time because there’s this concept of being big timed, right? Where someone with a big fancy title or the big company steps all over. But I feel like working in big tech, occasionally I get small-timed where I’ll give a talk and people are like, ‘well, how does that appeal to me? How does that have anything to do with, you know, the vast majority of us engineers that are not in big tech?’” And to this, let’s present….

The Problem

The problem of fan-in occurs when too many producers are overwhelming a Kafka broker. This problem happens organically as an application scales. At extreme scale, fan-in can be more important than throughput.

Applications using many microservices, hosts, and containers, or even more commonly used application logs, tracing, and distributed tracing are all at risk or already experiencing the effects of fan-in.

Separation of resource usage helps alleviate exhaustion.

Fan-out happens when too many consumer groups overwhelming a Kafka broker. Fan-out is a throughput multiplier (IO Fin + Fin × Fout).1 MBps per producer  is moving through 100 producers and 10 consumers. This translates into 1.1GBps of data.  Fan-out is less organic and tends to happen though adding consumer groups.

Cache hydration or any type of broadcast or broadcast-esque architecture is a strong example of fan out.

The Great Equalizer

Dolan took no chances with relating the problem of fan-in and fan-out to developers large and small.  His talk included a simulation of the problem done with just 1,000 producers and a max of 10 consumer groups. The simulation was done very creatively on his home computer using Docker Compose because he couldn’t, “just borrow a data center, right? And it doesn’t make sense to spin up hundreds of 1,000s of machines in the cloud just for a talk.”

Control Environment

Server Specs: 800 MHz, 8 cores and each broker used .4 core.

The Applications

Application #1: Traditional Model — N Producers and M Consumer per Workload. In this model, both producers and consumers share resources from the same broker.

Application #2: Mirrored (a derivation of Application#1) — N + ` Producers and M + 1 Consumers per Workload. This approach adds a new layer and separates the producers from consumers so they aren’t sharing the same broker.

The Measurements

  • End-to-End Latency: the time between the record recreation (before send) and when it’s processed by a consumer (after fetch).
  • Send Latency: time between record creation and when it’s written to the disk on the last broker (before ACK).
  • Fetch Latency: time between when the record is written to fin on the last broker (before ACK) and when it’s processed by a consumer (after fetch).

Workload:

(constant) 10k RPS, 10kQPS, 300kBPS.

Simulations

Simulation #1 ran with 1,000 producers, 1 consumer, 1 workload.

The following results are in milliseconds.

End-to-End latency

Send Latency

Fetch Latency

The results in milliseconds

End-to-end and send make sense. The additional layer adds additional time. No surprises there. Fetch is where it starts to diverge from expectations. What is the difference between the direct and the mirrored from the fetch latency perspective? That is fan-in. In the direct scenario, the data is sent by 1,000 producers but in the mirrored scenario the data is only sent from one producer, mirror maker.

At worst, end-to-end with 1 consumer was 13% worse. Mirroring made latency 1.13x worse but not 2x worse as one may expect by going through two clusters.

Simulation #2 ran with 1,000 producers, 10 consumers, 10x 1 workload (10 workloads).

The following results are in milliseconds.

End-to-End Latency

Send Latency

Fetch Latency

Given the constraints, the end-to-end latency was unsurprisingly bad but what is surprising about the results is that the mirrored latency now is consistently faster. Though only slightly faster in the end-to-end latency, the mirrored fetch latency is significantly faster. This is all the proof of producers and consumers competing for resources that’s needed.

Fan-in can quickly overwhelm a broker, even at low or relatively low throughput,  as seen in the first simulation and fan-out pours gasoline on that match. Mirroring reduced end-to-end latency by 23%. Yes, there was an additional hop but latency was still reduced.

While this example is contrived it still clearly illustrates that producers and consumers compete for resources. Fan-in can quickly overwhelm a broker, even at low throughput. Fan-out magnifies this effect. Replication is a fan-out divider.

Layered Topologies to Reduce Fan-Out

There are two ways to do this internal replication and mirroring. Mirroring was used in the example above.

Mirroring is a cluster-to-cluster mirroring topology. This uses Mirror Maker, an external topology and enables cluster-to-cluster. This reduces fan-out and fan-in (as seen in the example above).

Internal Replication is a broker-to-broker replication and is already built into the Kafka brokers. This requires the use of KIP-392 fetch-from-follower. This adds durability.

Data Pipelines is another way to reduce stress on a Kafka broker is to add data pipelines, in addition to mirroring.

The simulation above shows us that this example !== optimal fan-in out.

There are two ways we can achieve more desired results. We can use data pipelines to filter the main topic into smaller topics and split the consumer groups that way.

The second way is to create many small topics and join them together in larger aggregated topics prior to sending them to the consumer groups. This way is advised over sending the small topics individually especially if joins are being performed.

Application Logs are a great example of the choice between one large topic or dividing into smaller topics. A Tech Company wants to save all of its logs in the cloud, have a specific stream for the developers to access from one part of the application, and a specific stream for the devs in another area.

This is with either diagram style. In the first diagram, all applications send log events to one big topic which sends everything to the cloud. The data pipeline routes read based on application ID, container ID, and host ID. Consumers can process a single application, container, or host.

In the second diagram, each application sends data to its own topic and the data pipeline aggregates data across containers and hosts and sends the data to the cloud.

Layered Topologies to Reduce Fan-In

The approach here is to separate the read vs write sets.

Dolan said, “If you’re seeing this problem with just one part of an application, a reasonable first step would be to split into read and write brokers.” Read vs write sets are, “Easiest [of the topologies] and works to a point,” he added. The “to a point” refers to scaling as it may be time to adopt a new method if the amount of data is small relative to the size of the cluster.

A more advanced and sophisticated application of this methodology is by creating an ingestion layer.

This includes producers sharded across multiple ingestion clusters.

Partitioning

Partitioning helps alleviate fan-in but the round-robin sticky partitioning presents a separate set of challenges to prepare for. To use round-robin sticky successfully, try to either limit the number of brokers per producer so all producers can’t send to one broker or measure the latency as to avoid slow partitions. While a transient latency spike isn’t a massive problem, it can cause issues downstream.

Shard or Combine Producers or Batch

All of the following, because they deal with separating and combining producers depends on the amount of data being sent per producer and the remainder of the topology but all of these are methods can potentially work for reducing fan-in.

Shard: divide the workload into non-overlapping groups.

Combine producers: avoid having many producer clients within the same application.

Batch:send more records per request.

LinkedIn’s Multilayered Topology

LinkedIn’s topology solves the issue of fan-in and fan-out because it separates the producers from consumers. Dolan said, “I usually talk about it in terms of geo-replication.” The intended purpose of the data pipelines is to deliver data from one data center to another.

But there’s a side effect to this two-layered topology — reduction of fan-in and fan-out. The producers tend to be connected to the local clusters while the consumers tend to be connected to the aggregate clusters and the only thing between them is LinkedIn’s Brooklin data bus.

Just by putting a pipeline between the two layers of clusters and separating the producers from consumers, LinkedIn avoids any fan-in, fan-out issues without actually targeting that problem specifically.

Conclusion

If an application is struggling with fan-in or fan-out, the best place to start is by separating reads and writes. In the instance of fan-in, fan-out, adding hop doesn’t always mean a longer time if resources are exhausted. If it’s at the place where you’re seeing this companywide, there’s likely already large deployment and mirroring is the suggestion. Fan-in and fan-out can lead to latency issues but they don’t have to. Layered approaches can help alleviate current issues or avoid them altogether.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Docker.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.