Tuesday, December 2, 2014

Columnar Database

A columnar database is a database management system (DBMS) that stores data in columns instead of rows. The goal of a columnar database is to efficiently write and read data to and from hard disk storage in order to speed up the time it takes to return a query.

Columnar database is a concept rather a particular architecture/implementation. In other words, there isn't one particular description on how these databases work; indeed, several are build upon traditional, row-oriented, DBMS, simply storing the info in tables with one (or rather often two) columns (and adding the necessary layer to access the columnar data in an easy fashion).

They generally differ from traditional (row-oriented) databases with regards to ...
•performance
•storage requirements
•easy modification of schema
In particular they offer advantages in the areas mentioned when the typical use is to compute aggregate values on a limited number of columns, as opposed to try and retrieve all/most columns for a given entity.

In a columnar database, all the column 1 values are physically together, followed by all the column 2 values, etc. The data is stored in record order, so the 100th entry for column 1 and the 100th entry for column 2 belong to the same input record. This allows individual data elements, such as customer name for instance, to be accessed in columns as a group, rather than individually row-by-row.

Here is an example of a simple database table with 4 columns and 3 rows:

ID         Last      First      Bonus
1          Doe      John      8000
2          Smith    Jane      4000
3          Beck     Sam       1000

In a row-oriented database management system, the data would be stored like this:
 
    1,Doe,John,8000; 2,Smith,Jane,4000; 3,Beck,Sam,1000;
  
In a column-oriented database management system, the data would be stored like this

    1,2,3;Doe,Smith,Beck; John,Jane,Sam; 8000,4000,1000;

One of the main benefits of a columnar database is that data can be highly compressed. The compression permits columnar operations — like MIN, MAX, SUM, COUNT and AVG— to be performed very rapidly.  Another benefit is that because a column-based DBMSs is self-indexing, it uses less disk space than a relational database management system (RDBMS) containing the same data.Read more at http://www.columnardatabase.org.

No comments:

Post a Comment