You Are Here: Home » Technology » Cassandra » Data Loader for NOSQL Databases

Data Loader for NOSQL Databases

In one of my recent projects, I had to load product data from a CSV file into HBase and also to index it for search purpose.. I decided to separate out the loader part of the project as a stand alone tool and make available as open source. Currently, it’s hosted in github. It supports HBase. I will be adding support for Cassandra soon. I am working on Solr indexing right now.

Introduction

The tool is very generic and configurable. It takes a CSV file as input and writes to HBase or Cassandra.

The CSV file could have been generated from queries on Oracle or MySQL. So it could be used to migrate data from RDBMS to NOSQL databases.

It also takes a JSON file, which defines the the mapping between the columns in the CSV and the NOSQL column family and column along with other metadata.

Here is a quick summary of the features. The terminology I am using is based on HBase.

  • Loads data from CSV file.
  • Mapping between CSV columns and NOSQL column family and column is provided in JSON file.
  • There is many to many association between CSV column and NOSQL column family and column.
  • The row key for NOSQL could be created by concatenating multiple CSV columns.
  • Solr indexing of data as it’s being loaded.

The indexing feature is not implemented yet. I will be working on it next. A CSV column could be split into multiple parts and used to populate multiple NOSQL columns. On the flip side, multiple CSV columns could be consolidated to populate one NOSQL column.


About The Author

Big Data Consultant

Software professional with many years of experience in multiple business domains using myriad of technologies and platforms. A skilled architect and developer with strong problem solving and analytical capabilities, who creates the technical vision and actively engages in understanding customer requirements. Result oriented and hands on, who skillfully balances between meeting resource and time constraints, while doing it right.

Number of Entries : 14

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top