You Are Here: Home » Big Data

When every drop counts: Schneider Electric transforms agriculture with the Internet of Things for sustainable farming – Transform

In the grassy Canterbury Plains of New Zealand, Craig Blackburn raises cattle and sheep in a line of work with a long tradition, in which he keeps a close eye on crops, land, weather and water. But Blackburn blends modern technology with his agricultural roots to manage the 990-acre Blackhills farm, a complex, bustling operation with 2,100 cattle and 800 sheep. The farm runs on irrigated water from the scen ...

Read more

Exactly-once Semantics is Possible: Here’s How Apache Kafka Does it

I’m thrilled that we have hit an exciting milestone the Kafka community has long been waiting for: we have introduced exactly-once semantics in Apache Kafka in the 0.11 release. In this post, I’d like to tell you what exactly-once semantics mean in Apache Kafka, why it is a hard problem, and how the new idempotence and transactions features in Kafka enable correct exactly-once stream processing using Kafka’ ...

Read more

Baidu employs the PaddlePaddle framework internally for prediction systems, along with Python to make training models and deriving predictions a snap Many of the latest machine learning and data science tools purport to be easy to work with compared to previous generations of such frameworks and libraries. Chinese search engine giant Baidu now has an open source project in the same vein: a machine learning ...

Read more

Azure Data Lake Store: a hyperscale distributed file service for big data analytics | the morning paper

Azure data lake store: a hyperscale distributed file service for big data analytics Douceur et al., SIGMOD’17 Today’s paper takes us inside Microsoft Azure’s distributed file service called the Azure Data Lake Store (ADLS). ADLS is the successor to an internal file system called Cosmos, and marries Cosmos semantics with HDFS, supporting both Cosmos and Hadoop workloads. Microsoft are in the process of migra ...

Read more

Serverless Scaling for Ingesting, Aggregating, and Visualizing Apache Logs with Amazon Kinesis Firehose, AWS Lambda, and Amazon Elasticsearch Service | AWS Database Blog

In 2016, AWS introduced the EKK stack (Amazon Elasticsearch Service, Amazon Kinesis, and Kibana, an open source plugin from Elastic) as an alternative to ELK (Amazon Elasticsearch Service, the open source tool Logstash, and Kibana) for ingesting and visualizing Apache logs. One of the main features of the EKK stack is that the data transformation is handled via the Amazon Kinesis Firehose agent. In this pos ...

Read more

Running Streaming Jobs Once a Day For 10x Cost Savings – The Databricks Blog

This is the sixth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. Traditionally, when people think about streaming, terms such as “real-time,” “24/7,” or “always on” come to mind. You may have cases where data only arrives at fixed intervals. That is, data appears every hour or once a day. For these use cases, it is still beneficial to perform incrementa ...

Read more

Manage Query Workloads with Query Monitoring Rules in Amazon Redshift | AWS Big Data Blog

Data warehousing workloads are known for high variability due to seasonality, potentially expensive exploratory queries, and the varying skill levels of SQL developers. To obtain high performance in the face of highly variable workloads, Amazon Redshift workload management (WLM) enables you to flexibly manage priorities and resource usage. With WLM, short, fast-running queries don’t get stuck in queues behi ...

Read more

Google Spanner: Beginning of the End of the NoSQL World? – ACM SIGMOD Blog

Google has recently announced that its flagship wide-area database named Spanner has been made available on the Google Cloud. Google Spanner is the next generation globally-distributed database built inside Google and announced to the world through the paper published in OSDI 2012 [1]. This article explores the implication of Google Spanner, in particular to the NoSQL world. CAP Theorem: A Quick Recap The t ...

Read more

Introducing Apache Arrow: A Fast, Interoperable In-Memory Columnar Data Structure Standard – Cloudera Engineering Blog

Engineers from across the Apache Hadoop community are collaborating to establish Arrow as a de-facto standard for columnar in-memory processing and interchange. Here’s how it works. Apache Arrow is an in-memory data structure specification for use by engineers building data systems. It has several key benefits: A columnar memory-layout permitting O(1) random access. The layout is highly cache-efficient in a ...

Read more

Microsoft Updates its Deep Learning Toolkit | Cortana Intelligence and Machine Learning Blog

We are delighted to announce that Microsoft has brought Microsoft Cognitive Toolkit version 2.0 out of beta and is making the first release candidate available today. The toolkit, previously known as CNTK, is a system for deep learning used to speed advances in areas such as speech and image recognition and search relevance on CPUs and NVIDIA® GPUs. Cognitive Toolkit can be used on-premises or in the cloud ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top