You Are Here: Home » Technology » Open Sources » MapReduce (Page 2)

Why MapR Is Right to Give Back to Apache Hadoop

Big data startup MapR is now an official corporate contributor to the Apache Hadoop project, a somewhat interesting turn of affairs given its corporate mission to lure users away from Apache’s Hadoop Distributed File System. Although this might seem like an odd partnership — even more so now after EMC announced MapR as the storage foundation for its Apache Hadoop alternative — it demonstrates the type of co ...

Read more

BIG DATA CLOUD RECOMMENDS – The Evolving Role of the Enterprise Data Warehouse in the Era of Big Data Analytics

The Evolving Role of the Enterprise Data Warehouse in the Era of Big Data Analytics A white paper by Dr. Ralph Kimball The enterprise data warehouse (EDW) community has entered a new realm of meeting new and growing business requirements in the era of “Big Data.” A few of the common challenges include: extreme integration, semi- and un-structured data sources, petabytes of behavioral and image data accessed ...

Read more

Map Reduce Secondary Sort Does It All

I came across a question in Stack Overflow recently related to calculating a web chat room statistics using Hadoop Map Reduce. The answer to the question was begging for a solution based map reduce secondary sort. I will provide details, along with code snippet, to complement my answer to the question. The Problem The data consists of a time stamp, chat room zone and number of users. The data is logged once ...

Read more

Presence Data Analytic using MongoDb and Map Reduce

My last post was on location data query and indexing using MongoDB. Location data query and index support is an unique and powerful feature of MongoDB. Continuing along the same thread, I will dig into Map Reduce framework built right into MongoDB. Some NOSQL database systems provide built in map reduce framework. When the query engine is not enough for complex aggregate queries or other complex computation ...

Read more

The cloud will finally solve the ‘big data’ problem

Innovation around the management of large data sets is coming from the cloud, such as through MapReduce and Hadoop InfoWorld's own Pete Babb provided some good coverage around the "analytics cloud" recently debuted by IBM, called Blue Insight. You can think of Blue Insight as a system that gathers data from those who use it and externalizes the data to those who need it, doing so on a cloud -- a private clo ...

Read more

Solve cloud-related Big Data problems with MapReduce

Discover how MapReduce and cloud computing are ideal for dealing with lots of data At times, you need to be able to access more physical and virtual resources to achieve complex compute-intensive results, but setting up a grid system within an organization can face resource, logistical, and technical hurdles; even some political ones. Cloud computing comes to the rescue in this case. It also combines perfec ...

Read more

Recommendation Engine Powered by Hadoop (Part 2)

In Part 1 of this post the focus was on finding the correlation between items, based on rating data available in individual items. The MR job output was the correlation coefficient matrix, with correlation coefficient values between 0 and 1 for any item pair. Next step Armed with the item correlation data and items rating data for any visitor, we will find the new items correlated with the current items of ...

Read more

Recommendation Engine Powered by Hadoop (Part 1)

Personalized recommendations are ubiquitous in social network and shopping sites these days. How do they do it? Al long as enough user interaction data is available for items e.g., products in shopping sites, a kind of recommendation engine based on what’s know as Collaborative Filtering is not that difficult to build. My approach I will follow a technique called Item Based Collaborative Filtering. The basi ...

Read more

The SMAQ stack for big data

Storage, MapReduce and Query are ushering in data-driven products and service "Big data" is data that becomes large enough that it cannot be processed using conventional methods. Creators of web search engines were among the first to confront this problem. Today, social networks, mobile phones, sensors and science contribute to petabytes of data created daily. To meet the challenge of processing such large ...

Read more

© 2011 Third Eye Consulting Services & Solutions LLC.

Scroll to top