You Are Here: Home » Data Flow

Introducing Apache Arrow: A Fast, Interoperable In-Memory Columnar Data Structure Standard – Cloudera Engineering Blog

Engineers from across the Apache Hadoop community are collaborating to establish Arrow as a de-facto standard for columnar in-memory processing and interchange. Here’s how it works. Apache Arrow is an in-memory data structure specification for use by engineers building data systems. It has several key benefits: A columnar memory-layout permitting O(1) random access. The layout is highly cache-efficient in a ...

Read more

Google’s new cloud service eases data preparation for machine learning | Computerworld

Google's new cloud service eases data preparation for machine learning BigQuery gets a bunch of updates for big data, too One of the challenges that data scientists face when running machine learning workloads is processing information before it’s ready for use. Google unveiled a new cloud service Thursday aimed at easing that pain.Google Cloud Dataprep will automatically detect data schemas, joins, and ano ...

Read more

The Seven Essentials of AI-Based Predictive Selling Having a complete 360 degree view of each customer is imperative for predictive sales success. Where does the data for this comprehensive customer profile come from? And when should you start creating your profiles? Last week, we looked at what AI-based predictive selling, also known as predictive sales, is doing right now. It’s making sales teams more eff ...

Read more

10 Ways AI Chatbots Will Change Customer Service | The Huffington Post

10 ways AI chatbots will bring about change to today’s customer service   1. Bots will free up time for humans to handle more complex situations.Since AI chatbots will have the ability to assist with rather simple quick response needs of the customer, it will give customer service representatives a chance to handle the even more pressing problems for its clients. If more high-touch interaction is requi ...

Read more

Stop overdoing it when cleaning your big data – TechRepublic

Stop overdoing it when cleaning your big dataEnough is enough--your big data might actually be getting too clean. Find out why it can be useful to keep bad, garbage data.When you got a job as a data scientist, I bet you didn't imagine you'd spend so much time cleaning up bad data. Don't feel badly—none of us did.When data science rolled on the scene, many of us who were already in the data warehousing and b ...

Read more

Apache Beam and Spark: New coopetition for squashing the Lambda Architecture? | ZDNet

The nice thing about open source projects and standards is that there are so many of them to choose from. And on January 10, the Apache community welcomed Beam as its latest "top level" project (getting top level means your project has made it to prime time in Apache). Google traditionally kept its technology to itself, typically publishing research papers that the open source community would then reinvent ...

Read more

Apache Kudu 1.0 is Released – Cloudera VISION

This week, the Apache Kudu team announced the release of Kudu 1.0. This release marks the one-year anniversary of Kudu’s public debut, and is the culmination of much hard work by a growing team of developers and community members. In this blog post, I’ll recap the original vision for Kudu, review our accomplishments over the last year, and share where I see the project going in the future. The Origins of Ku ...

Read more

Data Outliers: 10 Ways To Prevent Big Data Damage – InformationWeek

Most business decision-makers aren't trained to understand data outliers, but they can learn the basics. Executives, managers, and employees without math degrees can ask smarter questions about analyses they're basing crucial judgments on. Here are some things to know. There's A Data Quality Problem People and machines may be responsible for poor quality data that makes its way into an analysis. Someone may ...

Read more

Dataflow/Beam & Spark: A Programming Model Comparison – Cloud Dataflow — Google Cloud Platform

With the programming model/SDK portion of Google Cloud Dataflow moving into an Apache Software Foundation incubator project, Apache Beam, we thought now a good time to discuss the unique features and capabilities that distinguish Dataflow from Apache Spark, from a strictly programming-model perspective. Dataflow is unique amongst data parallel systems in that it is built upon a comprehensive model for out-o ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top