Cassandra Range Query Made Simple

In Cassandra, rows are hash partitioned by default. If you want to data sorted by some attribute, column name sorting feature of Cassandra is usually exploited. If you look at the Cassandra slice range API, you will find that you can specify only the range start, range end and an upper limit on the number of columns fetched. However in many applications the need is to paginate through the data i.e each call ...

Read more

Data Loader for NOSQL Databases

In one of my recent projects, I had to load product data from a CSV file into HBase and also to index it for search purpose.. I decided to separate out the loader part of the project as a stand alone tool and make available as open source. Currently, it’s hosted in github. It supports HBase. I will be adding support for Cassandra soon. I am working on Solr indexing right now. Introduction The tool is very g ...

Read more

The Cassandra SF 2011 summit – BigDataCloud members get 20% off!

The Cassandra SF 2011 summit on July 11th 2011 promises to be a ‘must-go’ event for any one working with “Big Data”. The Cassandra community has grown so much in the last year, that now the organizers have a larger venue, with two rooms set aside for presentations, multiple rooms set aside for Birds of a Feather talks, committer meetups, and other small discussions. The agenda includes: Use Cases: Everyone ...

Read more

Cassandra Secondary Index Patterns

We all know that any real application needs to do query based on attributes other than the primary key or row key in case of Cassandra. Cassandra version .7 onwards provides native secondary index support. But there are several limitations. Follow this posting on Pranab Ghosh's blog... Find other postings from Pranab Ghosh... ...

Read more

Geo Spatial Indexing with MongoDB

MongoDB is another NoSQL database that seems to have rising popularity. Recently, I was evaluating NoSQL databases for a project. I was planning to use it for storing and managing vast amount of Hadoop post processed data for future queries and audit purpose. While I really liked the architecture of Cassandra, I was not happy with it’s limited querying and indexing capabilities. This is where MongoDB shines ...

Read more

NoSQL Is for the Birds

Scale breaks everything. Scale even breaks your assumptions about how best to store and query data. Scale does not care about your personal engineering preferences, or about SQL vs. NoSQL. The demands of rapid growth and ever-higher expectations for availability, performance, and cost efficiency force you to re-evaluate and re-imagine what you need, what is possible, and how to best achieve your business go ...

Read more

Recommendation Engine Powered by Hadoop (Part 2)

In Part 1 of this post the focus was on finding the correlation between items, based on rating data available in individual items. The MR job output was the correlation coefficient matrix, with correlation coefficient values between 0 and 1 for any item pair. Next step Armed with the item correlation data and items rating data for any visitor, we will find the new items correlated with the current items of ...

Read more

Recommendation Engine Powered by Hadoop (Part 1)

Personalized recommendations are ubiquitous in social network and shopping sites these days. How do they do it? Al long as enough user interaction data is available for items e.g., products in shopping sites, a kind of recommendation engine based on what’s know as Collaborative Filtering is not that difficult to build. My approach I will follow a technique called Item Based Collaborative Filtering. The basi ...

Read more

2013 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation.

Scroll to top