Best Practices For Managing Big Data

Big Data is the result of practically everything in the world being monitored and measured, creating data faster than the available technologies can store, process or manage it. Since it is a lot more intuitive to represent information as a “file” than a relational object, there has been a surge of unstructured data, making up as much as 80% of new data we must manage. Organizations are struggling to manage ...

Read more

Cassandra Range Query Made Simple

In Cassandra, rows are hash partitioned by default. If you want to data sorted by some attribute, column name sorting feature of Cassandra is usually exploited. If you look at the Cassandra slice range API, you will find that you can specify only the range start, range end and an upper limit on the number of columns fetched. However in many applications the need is to paginate through the data i.e each call ...

Read more

Data Loader for NOSQL Databases

In one of my recent projects, I had to load product data from a CSV file into HBase and also to index it for search purpose.. I decided to separate out the loader part of the project as a stand alone tool and make available as open source. Currently, it’s hosted in github. It supports HBase. I will be adding support for Cassandra soon. It supports Solr indexing right now. ...

Read more

OSCON Data | July 25-27, Portland, OR – BIGDATACLOUD MEMBERS SAVE 15%

New to the Open Source Conference this year is OSCON Data, for developers pioneering the evolving architectures and tools to manage data. See how Hadoop is used to optimize scalability and reliability at Yahoo. Find out how Facebook utilizes HBase to manage real-time messaging. Why Netflix moved from relational DBs to NoSQL cloud systems for personalized movie choosing. The in-depth sessions at OSCON Data, ...

Read more

Big Data: The Time is Now for Managing It and Leveraging the Advantages

Is the day of reckoning for big data upon us? To many observers, the growth in data is nothing short of incomprehensible. Data is streaming into, out of, and through enterprises from a dizzying array of sources-transactions, remote devices, partner sites, websites, and nonstop user-generated content. Not only are the data stores resulting from this information driving databases to scale into the terabyte an ...

Read more

Presence Data Analytic using MongoDb and Map Reduce

My last post was on location data query and indexing using MongoDB. Location data query and index support is an unique and powerful feature of MongoDB. Continuing along the same thread, I will dig into Map Reduce framework built right into MongoDB. Some NOSQL database systems provide built in map reduce framework. When the query engine is not enough for complex aggregate queries or other complex computation ...

Read more

Geo Spatial Indexing with MongoDB

MongoDB is another NoSQL database that seems to have rising popularity. Recently, I was evaluating NoSQL databases for a project. I was planning to use it for storing and managing vast amount of Hadoop post processed data for future queries and audit purpose. While I really liked the architecture of Cassandra, I was not happy with it’s limited querying and indexing capabilities. This is where MongoDB shines ...

Read more

NoSQL Is for the Birds

Scale breaks everything. Scale even breaks your assumptions about how best to store and query data. Scale does not care about your personal engineering preferences, or about SQL vs. NoSQL. The demands of rapid growth and ever-higher expectations for availability, performance, and cost efficiency force you to re-evaluate and re-imagine what you need, what is possible, and how to best achieve your business go ...

Read more

Reality Check: Very Large Data Sets – From Gigabytes to Petabytes

Today’s data volumes are growing incrementally The amount of data a business collects has grown exponentially over the past decade and there is no end in sight. This amount of data, although more data is good, presents significant data management issues for DBAs and data analysts. Firstly, how does a business contend with daily VLD (Very Large Data sets) volumes timely?  In my particular case, we were tryin ...

Read more

2013 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation.

Scroll to top
UA-18319319-1