Recommendation Engine Powered by Hadoop (Part 2)

In Part 1 of this post the focus was on finding the correlation between items, based on rating data available in individual items. The MR job output was the correlation coefficient matrix, with correlation coefficient values between 0 and 1 for any item pair. Next step Armed with the item correlation data and items rating data for any visitor, we will find the new items correlated with the current items of ...

Read more

Recommendation Engine Powered by Hadoop (Part 1)

Personalized recommendations are ubiquitous in social network and shopping sites these days. How do they do it? Al long as enough user interaction data is available for items e.g., products in shopping sites, a kind of recommendation engine based on what’s know as Collaborative Filtering is not that difficult to build. My approach I will follow a technique called Item Based Collaborative Filtering. The basi ...

Read more

Using Flume to Collect Apache 2 Web Server Logs

Flume is a flexible, scalable, and reliable system for collecting streaming data. The Flume User Guide describes how to configure Flume, and the new Flume Cookbook contains instructions (called recipes) for common Flume use cases. In this post, we present a recipe that describes the common use case of using a Flume node collect Apache 2 web servers logs in order to deliver them to HDFS. Follow this posting ...

Read more

The Seeds of Apples Cloud

Apple has always sucked at the internet. With Ping and the new Apple TV, Apple sucks a little bit less at it. But Apple could be good at it. Apple's finally starting to reward people for buying into the Apple ecosystem, but everything they're doing is only a half-step toward what it could be, should be doing. It launched two social networks, and showed us how it's going to wirelessly connect iOS devices wit ...

Read more

10 Hadoop-able Problems

Anyway, a meeting was arranged for today where we could watch a presentation on Cloudera’s Hadoop (which you can see here at GoMeeting, although only on windows and only after registering (great, more vendor lockin!)). It was called ’10 Common Hadoopable Problems’ given by Jeff Hammerbacher (their Chief Scientist no less!) and was basically things that you can do with hadoop (that isn’t counting words…). I ...

Read more

VMware’s Cloud Strategy: Neither Deadwood Nor Sandalwood But More Than A Collection Of Driftwoods

Last week VMworld happened and some of the Clouderati were busy hanging out in the halls of Moscone Center at San Francisco. I was planning to attend the event but had to cancel due to personal reasons. But a steady stream of tweets and blog posts kept me updated about everything from the keynotes to the happenings at various booths on the Expo floor. Based on these third party reports, my takeaway from the ...

Read more

Another blog? Why bother?

There are blogs and there are blogs! So, why add another one to the blogosphere? Well, we felt (and we don’t think we are alone with this feeling – but please feel free to tell us if you do think so!) that it’s time that there should be a blog wholly dedicated to all things around “Big Data” & “Cloud Computing”. Mankind has never seen so much data as it is seeing today; Cloud Computing technologies are ...

Read more

2013 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation.

Scroll to top