You Are Here: Home » Big Data » The Real World of Columnar Databases

The Real World of Columnar Databases

In recent times a number of vendors have announced so called ‘hybrid storage models’ for analytic databases (e.g. Aster Data). A ‘hybrid storage model’ is one where an entire table or partitions within it can be stored either along rows or columns. An oft mentioned motivation for this is that different kinds of analytics require different storage models and so the user should be able to choose as needed.

Below we discussed how a true end to end columnar architecture differs from columnar storage and why an end to end columnar architecture, as opposed to a columnar storage alone, is best suited for analytic workloads.

WHAT AND WHO IS COLUMNAR?

It is important to note that whether the underlying database is columnar or row-wise, the user’s SQL is always “relational” and independent of the underlying system.

The beginning of the columnar model began in the late 1960’s and has now been fully embraced by SybaseIQ, Sand, ParAccel and Vertica as core to the design of their database engines. Additionally, a columnar option has been adopted by Greenplum and Aster as an extension to their row-wise Postgres code bases, and Infobright as an extension to MySQL.

Of these vendors, only ParAccel, Vertica, Greenplum and Aster are MPP. ParAccel and Vertica began their columnar development in 2005*, Greenplum and Aster much more recently.

Columnar is, primarily, a storage model designed to perform far less I/O for analytic queries. This is important because disks are really, really slow mechanical devices whose speed has only improved about 2X in the past 10 years versus more than 10X for CPUs. Think “Ferrari on bicycle tires”.

A well-designed columnar architecture has significant implications all the way up the execution stack within the database. One example of a columnar architecture versus merely columnar storage is “late row assembly” where a fully columnar architecture can avoid scanning and transferring data from columns until pruned down later in the query. There’s been speculation in the industry whether the hybrid columnar players do this or not. I’ll not add to the speculation (okay, just did).


About The Author

Number of Entries : 1

Leave a Comment

© 2011 Third Eye Consulting Services & Solutions LLC.

Scroll to top