You Are Here: Home » Data Pipelines

Apache Kafka and the four challenges of production machine learning systems – O’Reilly Media

Machine learning has become mainstream, and suddenly businesses everywhere are looking to build systems that use it to optimize aspects of their product, processes or customer experience. The cartoon version of machine learning sounds quite easy: you feed in training data made up of examples of good and bad outcomes, and the computer automatically learns from these and spits out a model that can make simila ...

Read more

Getting Started Analyzing Twitter Data in Apache Kafka through KSQL

KSQL is the open source streaming SQL engine for Apache Kafka. It lets you do sophisticated stream processing on Kafka topics, easily, using a simple and interactive SQL interface. In this short article we’ll see how easy it is to get up and running with a sandbox for exploring it, using everyone’s favourite demo streaming data source: Twitter. We’ll go from ingesting the raw stream of tweets, through to fi ...

Read more

Go Python, Go! Stream Processing for Python · Wallaroo Labs

We’ve been hard at work for 18 months on a new processing engine called Wallaroo for deploying and operating big data, fast data, and machine learning applications. We designed Wallaroo to make the infrastructure virtually disappear, so you get rapid deployment and easy-to-operate applications. It provides a simple model for building fast applications that scale automatically across any number of workers. W ...

Read more

Open Sourcing Vespa, Yahoo’s Big Data Processing and Serving Engine

Ever since we open sourced Hadoop in 2006, Yahoo – and now, Oath – has been committed to opening up its big data infrastructure to the larger developer community. Today, we are taking another major step in this direction by making Vespa, Yahoo's big data processing and serving engine, available as open source on GitHub. Vespa architecture overview Building applications increasingly means dealing with huge a ...

Read more

Unite Real-Time and Batch Analytics Using the Big Data Lambda Architecture, Without Servers! | AWS Big Data Blog

The Big Data Lambda Architecture seeks to provide data engineers and architects with a scalable, fault-tolerant data processing architecture and framework using loosely coupled, distributed systems. At a high level, the Lambda Architecture is designed to handle both real-time and historically aggregated batched data in an integrated fashion. It separates the duties of real-time and batch processing so purpo ...

Read more

Introducing KSQL: Open Source Streaming SQL for Apache Kafka

What does it even mean to query streaming data, and how does this compare to a SQL database? Well, it’s actually quite different to a SQL database. Most databases are used for doing on-demand lookups and modifications to stored data. KSQL doesn’t do lookups (yet), what it does do is continuous transformations— that is, stream processing. For example, imagine that I have a stream of clicks from users and a t ...

Read more

When every drop counts: Schneider Electric transforms agriculture with the Internet of Things for sustainable farming – Transform

In the grassy Canterbury Plains of New Zealand, Craig Blackburn raises cattle and sheep in a line of work with a long tradition, in which he keeps a close eye on crops, land, weather and water. But Blackburn blends modern technology with his agricultural roots to manage the 990-acre Blackhills farm, a complex, bustling operation with 2,100 cattle and 800 sheep. The farm runs on irrigated water from the scen ...

Read more

Azure Data Lake Store: a hyperscale distributed file service for big data analytics | the morning paper

Azure data lake store: a hyperscale distributed file service for big data analytics Douceur et al., SIGMOD’17 Today’s paper takes us inside Microsoft Azure’s distributed file service called the Azure Data Lake Store (ADLS). ADLS is the successor to an internal file system called Cosmos, and marries Cosmos semantics with HDFS, supporting both Cosmos and Hadoop workloads. Microsoft are in the process of migra ...

Read more

Stop overdoing it when cleaning your big data – TechRepublic

Stop overdoing it when cleaning your big dataEnough is enough--your big data might actually be getting too clean. Find out why it can be useful to keep bad, garbage data.When you got a job as a data scientist, I bet you didn't imagine you'd spend so much time cleaning up bad data. Don't feel badly—none of us did.When data science rolled on the scene, many of us who were already in the data warehousing and b ...

Read more

How FICO scores big with cloud-based collaboration and data solutions – Microsoft Enterprise

Data rules everything around us. From traffic lights to medical records, those ones and zeros quickly dictate everything from the ads we see to the music we hear. As more and more organizations adopt big data strategies, we as consumers see exciting new innovations and solutions that leverage these possibilities. Smart phones, driver-less cars—this wave of data analytics brings the future to life in excitin ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top