You Are Here: Home » Data Flow

Introducing KSQL: Open Source Streaming SQL for Apache Kafka

What does it even mean to query streaming data, and how does this compare to a SQL database? Well, it’s actually quite different to a SQL database. Most databases are used for doing on-demand lookups and modifications to stored data. KSQL doesn’t do lookups (yet), what it does do is continuous transformations— that is, stream processing. For example, imagine that I have a stream of clicks from users and a t ...

Read more

When every drop counts: Schneider Electric transforms agriculture with the Internet of Things for sustainable farming – Transform

In the grassy Canterbury Plains of New Zealand, Craig Blackburn raises cattle and sheep in a line of work with a long tradition, in which he keeps a close eye on crops, land, weather and water. But Blackburn blends modern technology with his agricultural roots to manage the 990-acre Blackhills farm, a complex, bustling operation with 2,100 cattle and 800 sheep. The farm runs on irrigated water from the scen ...

Read more

Exactly-once Semantics is Possible: Here’s How Apache Kafka Does it

I’m thrilled that we have hit an exciting milestone the Kafka community has long been waiting for: we have introduced exactly-once semantics in Apache Kafka in the 0.11 release. In this post, I’d like to tell you what exactly-once semantics mean in Apache Kafka, why it is a hard problem, and how the new idempotence and transactions features in Kafka enable correct exactly-once stream processing using Kafka’ ...

Read more

Introducing Apache Arrow: A Fast, Interoperable In-Memory Columnar Data Structure Standard – Cloudera Engineering Blog

Engineers from across the Apache Hadoop community are collaborating to establish Arrow as a de-facto standard for columnar in-memory processing and interchange. Here’s how it works. Apache Arrow is an in-memory data structure specification for use by engineers building data systems. It has several key benefits: A columnar memory-layout permitting O(1) random access. The layout is highly cache-efficient in a ...

Read more

Google’s new cloud service eases data preparation for machine learning | Computerworld

Google's new cloud service eases data preparation for machine learning BigQuery gets a bunch of updates for big data, too One of the challenges that data scientists face when running machine learning workloads is processing information before it’s ready for use. Google unveiled a new cloud service Thursday aimed at easing that pain.Google Cloud Dataprep will automatically detect data schemas, joins, and ano ...

Read more

The Seven Essentials of AI-Based Predictive Selling Having a complete 360 degree view of each customer is imperative for predictive sales success. Where does the data for this comprehensive customer profile come from? And when should you start creating your profiles? Last week, we looked at what AI-based predictive selling, also known as predictive sales, is doing right now. It’s making sales teams more eff ...

Read more

10 Ways AI Chatbots Will Change Customer Service | The Huffington Post

10 ways AI chatbots will bring about change to today’s customer service   1. Bots will free up time for humans to handle more complex situations.Since AI chatbots will have the ability to assist with rather simple quick response needs of the customer, it will give customer service representatives a chance to handle the even more pressing problems for its clients. If more high-touch interaction is requi ...

Read more

Stop overdoing it when cleaning your big data – TechRepublic

Stop overdoing it when cleaning your big dataEnough is enough--your big data might actually be getting too clean. Find out why it can be useful to keep bad, garbage data.When you got a job as a data scientist, I bet you didn't imagine you'd spend so much time cleaning up bad data. Don't feel badly—none of us did.When data science rolled on the scene, many of us who were already in the data warehousing and b ...

Read more

Apache Beam and Spark: New coopetition for squashing the Lambda Architecture? | ZDNet

The nice thing about open source projects and standards is that there are so many of them to choose from. And on January 10, the Apache community welcomed Beam as its latest "top level" project (getting top level means your project has made it to prime time in Apache). Google traditionally kept its technology to itself, typically publishing research papers that the open source community would then reinvent ...

Read more

Apache Kudu 1.0 is Released – Cloudera VISION

This week, the Apache Kudu team announced the release of Kudu 1.0. This release marks the one-year anniversary of Kudu’s public debut, and is the culmination of much hard work by a growing team of developers and community members. In this blog post, I’ll recap the original vision for Kudu, review our accomplishments over the last year, and share where I see the project going in the future. The Origins of Ku ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top