You Are Here: Home » Kafka

Apache Kafka and the four challenges of production machine learning systems – O’Reilly Media

Machine learning has become mainstream, and suddenly businesses everywhere are looking to build systems that use it to optimize aspects of their product, processes or customer experience. The cartoon version of machine learning sounds quite easy: you feed in training data made up of examples of good and bad outcomes, and the computer automatically learns from these and spits out a model that can make simila ...

Read more

Getting Started Analyzing Twitter Data in Apache Kafka through KSQL

KSQL is the open source streaming SQL engine for Apache Kafka. It lets you do sophisticated stream processing on Kafka topics, easily, using a simple and interactive SQL interface. In this short article we’ll see how easy it is to get up and running with a sandbox for exploring it, using everyone’s favourite demo streaming data source: Twitter. We’ll go from ingesting the raw stream of tweets, through to fi ...

Read more

Publishing with Apache Kafka at The New York Times | Confluent

At The New York Times we have a number of different systems that are used for producing content. We have several Content Management Systems, and we use third-party data and wire stories. Furthermore, given 161 years of journalism and 21 years of publishing content online, we have huge archives of content that still need to be available online, that need to be searchable, and that generally need to be availa ...

Read more

Introducing KSQL: Open Source Streaming SQL for Apache Kafka

What does it even mean to query streaming data, and how does this compare to a SQL database? Well, it’s actually quite different to a SQL database. Most databases are used for doing on-demand lookups and modifications to stored data. KSQL doesn’t do lookups (yet), what it does do is continuous transformations— that is, stream processing. For example, imagine that I have a stream of clicks from users and a t ...

Read more

Using Apache Kafka as a Scalable, Event Driven Backbone for Service Architectures

The last post in this microservices series looked at building systems on a backbone of events, where events become both a trigger as well as a mechanism for distributing state. These have a long history of implementation using a wide range of messaging technologies. But while Apache Kafka™ is a messaging system of sorts, it’s quite different from typical brokers. It comes with both pros and cons and, like a ...

Read more

Exactly-once Semantics is Possible: Here’s How Apache Kafka Does it

I’m thrilled that we have hit an exciting milestone the Kafka community has long been waiting for: we have introduced exactly-once semantics in Apache Kafka in the 0.11 release. In this post, I’d like to tell you what exactly-once semantics mean in Apache Kafka, why it is a hard problem, and how the new idempotence and transactions features in Kafka enable correct exactly-once stream processing using Kafka’ ...

Read more

Amazon Kinesis vs. Apache Kafka For Big Data Analysis – Dataconomy

Data processing today is done in form of pipelines which include various steps like aggregation, sanitization, filtering and finally generating insights by applying various statistical models. Amazon Kinesis is a platform to build pipelines for streaming data at the scale of terabytes per hour. Parts of the Kinesis platform are a direct competitor to the Apache Kafka project for Big Data Analysis. The platf ...

Read more

Apache Flink and Apache Kafka Streams: a comparison and guideline for users – Confluent

The open source stream processing space is currently exploding, with more systems becoming available presenting users with many alternatives. In the Apache Software Foundation alone, there are now more than 10 stream processing projects, some in incubation and others graduated to top-level project status. While the availability of alternatives benefits the industry and the users of these systems by enabling ...

Read more

Distributed, Real-time Joins and Aggregations on User Activity Events using Kafka Streams

In previous blog posts we introduced Kafka Streams and demonstrated an end-to-end Hello World streaming application that analyzes Wikipedia real-time updates through a combination of Kafka Streams and Kafka Connect. In this blog post we want to continue the introduction series on Kafka Streams by implementing a very common and very important use case in stream processing: to enrich an incoming stream of eve ...

Read more

Hello World, Kafka Connect + Kafka Streams

In the last few years, with the widespread adoption of Apache Kafka, stream processing has come to the forefront. More recently, several stream processing systems have emerged that integrate with Kafka. One of those systems, Apache Samza has a particularly interesting “hello world” tutorial for getting started with the system;  Hello Samza, as it is called, uses Wikipedia real-time updates published on its ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top