Apache Flink and Apache Kafka Streams: a comparison and guideline for users – Confluent
The open source stream processing space is currently exploding, with more systems becoming available presenting users with many alternatives. In the Apache Software Foundation alone, there are now more than 10 stream processing projects, some in incubation and others graduated to top-level project status. While the availability of alternatives benefits the industry and the users of these systems by enabling competition and thus, encouraging innovation, it can also be quite confusing: with all these options, which one is right for me both now and in the future? Stream processors can be evaluated on several dimensions, including performance (throughput and latency), integration with other systems, ease of use, fault tolerance guarantees, etc, but making such a comparison is not the topic of its post (and we are certainly biased).
For some time now, the Apache Kafka project has served as a common denominator in most open source stream processors as the the de-facto storage layer for storing and moving potentially large volumes of data in streaming fashion with low latency. Recently, the Kafka community introduced Kafka Streams, a stream processing library that ships as part of Apache Kafka. With the addition of Kafka StreamsandKafka Connect, Kafka has now added significant stream processing capabilities.
In this post, we focus on discussing how Flink and Kafka Streams compare with each other on stream processing, and we attempt to provide clarity on that question in this post. Flink and Kafka Streams were created with different use cases in mind. While they have some overlap in their applicability, they are designed to solve orthogonal problems and have very different sweet spots and placement in the data infrastructure stack.