You Are Here: Home » Technology » Open Sources » Hadoop

Uber’s case for incremental processing on Hadoop – O’Reilly Media

Uber’s mission is to provide “transportation as reliable as running water, everywhere, for everyone.” To fulfill this promise, Uber relies on making data-driven decisions at every level, and most of these decisions can benefit from faster data processing. For example, using data to understand areas for growth or accessing of fresh data by the city operations team to debug each city. Needless to say, the cho ...

Read more

Predictive policing: The future of law enforcement

As Dj Das, founder and CEO of Third Eye Consulting Services, sums it up, “For fighting crime and keeping every citizen safe, Microsoft has the most sophisticated cloud-based big data technologies stack, which can help police departments not only understand why and how crime occurs but also predict when and where it can happen. Powerful data analytics tools like Azure Stream Analytics and Azure ML, coupled w ...

Read more

DistCp Performance Improvements in Apache Hadoop – Cloudera Engineering Blog

Recent improvements to Apache Hadoop’s native backup utility, which are now shipping in CDH, make that process much faster. DistCp is a popular tool in Apache Hadoop for periodically backing up data across and within clusters. (Each run of DistCp in the backup process is referred to as a backup cycle.) Its popularity has grown in popularity despite relatively slow performance. In this post, we’ll provide a ...

Read more

How Uber Uses Spark and Hadoop to Optimize Customer Experience

If you’ve ever used Uber, you’re aware of how ridiculously simple the process is. You press a button, a car shows up, you go for a ride, and you press another button to pay the driver. But there’s a lot more going on behind the scene, and much of that infrastructure increasingly runs on Hadoop and Spark, as the Uber data team recently shared.Uber has the envious position of sitting at the junction of the di ...

Read more

Profiling Big Data | Mawazo

Data profiling is the process of examining data to learn about important characteristics of data. It’s an important part of any ETL process. It’s often necessary to do data profiling before embarking on any serious analytic work. I have implemented various open source Hadoop based data profiling Map Reduce jobs. Most of them are in the project chombo. Some are are in other projects. I will provide an overvi ...

Read more

The Data Infrastructure Meta-Analysis: How Top Engineering Organizations Built Their Big Data Stacks – The Data Point

In the process of validating the market for our recent RJMetrics Pipeline launch, we kept running across data points that we had never anticipated getting. It turns out that the “How we built our data infrastructure at [company name]” is approaching meme-like status on engineering blogs across the internet. We’ve found many such blog posts in the past several months, and have enjoyed reading each of them: Z ...

Read more

Cutting: Spark an ‘All-Around Win’ for Hadoop

Hadoop co-creator Doug Cutting said today that Apache Spark is “very clever” and is “pretty much an all-around win” for Hadoop, adding that it will enable developers to build better and faster data-oriented applications than MapReduce ever could.Cutting talked at length about Spark during today’s Cloudera webinar, titled “Uniting Spark and Hadoop: The One Platform Initiative.” The Hadoop distributor, which ...

Read more

Press Release – Third Eye Consulting Builds Big Data Analytics Application on Google Cloud Platform for AbsolutData

Consulting Team Uses Google Cloud Platform to Mine and Analyze Big Data to Deliver a Comprehensive Conversion Optimization Solution. Santa Clara, California (PRWEB) June 04, 2015 Third Eye Consulting Services and Solutions, a leading consulting and analytics expert in Big Data & Cloud solutions, is making its mark helping technology companies build Big Data applications and solutions. Providing Big Data ...

Read more

IBM Wants to Push Spark, Real-Time Big Data Tool, Into Mainstream

International Business Machines has thrown its support behind one of the fastest-growing open source efforts ever, which addresses the limitations of the Hadoop data platform by making it easier to analyze information in real time. That capability opens all sorts of potential business applications, such as instantly targeting ads toward people who pass in front of a digital billboard. As the Internet of Thi ...

Read more

Comparative Analysis of Big Data Analytical Tools – Hive, Tez, Impala, SparkSQL, PrestoDB, Drill & BigQuery – on Google Cloud Platform

BACK DUE TO POPULAR DEMAND. NOW ADDED COMPARISIONS WITH APACHE DRILL!  Big Data analytics is the answer for businesses to glean insights from data to take timely actions. Businesses now have a plethora of technologies at their disposal to perform such Big Data analytics. This webinar performs a comparative analysis of the various Big Data analytical tools available today.   All the tools were installed and ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top