You Are Here: Home » Kafka

Exactly-once Semantics is Possible: Here’s How Apache Kafka Does it

I’m thrilled that we have hit an exciting milestone the Kafka community has long been waiting for: we have introduced exactly-once semantics in Apache Kafka in the 0.11 release. In this post, I’d like to tell you what exactly-once semantics mean in Apache Kafka, why it is a hard problem, and how the new idempotence and transactions features in Kafka enable correct exactly-once stream processing using Kafka’ ...

Read more

Amazon Kinesis vs. Apache Kafka For Big Data Analysis – Dataconomy

Data processing today is done in form of pipelines which include various steps like aggregation, sanitization, filtering and finally generating insights by applying various statistical models. Amazon Kinesis is a platform to build pipelines for streaming data at the scale of terabytes per hour. Parts of the Kinesis platform are a direct competitor to the Apache Kafka project for Big Data Analysis. The platf ...

Read more

Apache Flink and Apache Kafka Streams: a comparison and guideline for users – Confluent

The open source stream processing space is currently exploding, with more systems becoming available presenting users with many alternatives. In the Apache Software Foundation alone, there are now more than 10 stream processing projects, some in incubation and others graduated to top-level project status. While the availability of alternatives benefits the industry and the users of these systems by enabling ...

Read more

Distributed, Real-time Joins and Aggregations on User Activity Events using Kafka Streams

In previous blog posts we introduced Kafka Streams and demonstrated an end-to-end Hello World streaming application that analyzes Wikipedia real-time updates through a combination of Kafka Streams and Kafka Connect. In this blog post we want to continue the introduction series on Kafka Streams by implementing a very common and very important use case in stream processing: to enrich an incoming stream of eve ...

Read more

Hello World, Kafka Connect + Kafka Streams

In the last few years, with the widespread adoption of Apache Kafka, stream processing has come to the forefront. More recently, several stream processing systems have emerged that integrate with Kafka. One of those systems, Apache Samza has a particularly interesting “hello world” tutorial for getting started with the system;  Hello Samza, as it is called, uses Wikipedia real-time updates published on its ...

Read more

Kafka Streams – the KStreams API – Random Thoughts on Coding

The last post covered the new Kafka Streams library, specifically the “low-level” Processor API. This time we are going to cover the “high-level” API, the Kafka Streams DSL. While the Processor API gives you greater control over the details of building streaming applications, the trade off is more verbose code. In most cases, however, the level of detail provided by the Processor API is not required and the ...

Read more

Introducing Kafka Streams: Stream Processing Made Simple

I’m really excited to announce a preview of a new feature in Apache Kafka called Kafka Streams. Kafka Streams is a Java library for building distributed stream processing apps using Apache Kafka. It will be part of the upcoming Kafka 0.10 release and we’ve made a preview version available to make it easy to try out now. The Kafka Streams source code is available under the Apache Kafka project.A stream proce ...

Read more

Spotify’s Event Delivery – The Road to the Cloud (Part II) | Labs

Whenever a user performs an action in the Spotify client—such as listening to a song or searching for an artist—a small piece of information, an event, is sent to our servers. Event delivery, the process of making sure that all events gets transported safely from clients all over the world to our central processing system, is an interesting problem. In this series of blog posts, we are going to look at some ...

Read more

Real-Time Data Pipelines with Spark, Kafka, and Cassandra (on Docker) | BlueData

In my experience as a Big Data architect and data scientist, I’ve worked with several different companies to build their data platforms. Over the past year, I’ve seen a significant increase in focus on real-time data and real-time insights. It’s clear that real-time analytics provide the opportunity to make faster (and better) decisions and gain competitive advantage. Immediate insights into real-time data ...

Read more

Building a Streaming Analytics Data Stack

This post lays out the blueprint for the pieces we used and how we put them together. We’ll cover: Ingest: how to bring in many different types of data streams. Index and querying: efficient storage and unified queries. Wiring it up: how data flows through the system. Optimization: making queries fast. We hope that this will be useful and can server as a high-level orientation for those who are getting star ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top