Subject: Big Data Analytics
3.2 NoSQL data architecture patterns
I. NOSQL Patterns
1. These storage solution differs quite significantly with the RDBMS model
and is also known as the NOSQL. Some of the key players include ...
• GoogleBigTable, HBase, Hypertable
• AmazonDynamo, Voldemort, Cassendra, Riak
• Redis
• CouchDB, MongoDB
1. These solutions has a number of characteristics in common
i. Key value store
ii. Run on large number of commodity machines
iii. Data are partitioned and replicated among these machines
iv. Relax the data consistency requirement. (because the CAP theorem
proves that you cannot get Consistency, Availability and Partitioning
at the the same time)
Key-value stores
1. Key-value stores are most basic types of NoSQL databases.
2. Designed to handle huge amounts of data.
3. Based on Amazon’s Dynamo paper.
4. Key value stores allow developer to store schema-less data.
5. In the key-value storage, database stores data as hash table where key is
unique and the value can be string, JSON, BLOB (basic large object) etc.
6. A key may be strings, hashes, lists, sets, sorted sets and values are stored
against these keys.
7. For example a key-value pair might consist of a key like "Name" that is associated with a
value like "Robin".
8. Key-Value stores can be used as collections, dictionaries, associative arrays etc.
9. Key-Value stores follows the 'Availability' and 'Partition' aspects of CAP theorem.
Subject: Big Data Analytics
10. Key-Values stores would work well for shopping cart contents, or individual
values like color schemes, a landing page URI, or a default account number.
11. Example of Key-value store DataBase : Redis, Dynamo, Riak. etc.
Figure 3.4: Pictorial Representation
Figure 3.5: Key value store
Subject: Big Data Analytics
Column-oriented databases
1. Column-oriented databases primarily work on columns and every column is
treated individually.
2. Values of a single column are stored contiguously.
3. Column stores data in column specific files.
4. In Column stores, query processors work on columns too.
5. All data within each column data file have the same type which makes it
ideal for compression.
6. Column stores can improve the performance of queries as it can access spe-
cific column data.
7. High performance on aggregation queries (e.g. COUNT, SUM, AVG, MIN,
MAX).
8. Works on data warehouses and business intelligence, customer relationship
management (CRM), Library card catalogs etc.
9. Example of Column-oriented databases : BigTable, Cassandra, SimpleDB
etc.
Figure 3.6: Column store
Subject: Big Data Analytics
Graph databases
1. A graph database stores data in a graph.
2. It is capable of elegantly representing any kind of data in a highly accessible way.
3. A graph database is a collection of nodes and edges
4. Each node represents an entity (such as a student or business) and each edge
represents a connection or relationship between two nodes.
5. Every node and edge is defined by a unique identifier.
6. Each node knows its adjacent nodes.
7. As the number of nodes increases, the cost of a local step (or hop) remains the same.
8. Index for lookups.
9. Example of Graph databases: OrientDB, Neo4J, Titan.etc.
Figure 3.7: Graph Database
Subject: Big Data Analytics
Document oriented databases
1. Document Oriented databases
2. A collection of documents
3. Data in this model is stored inside documents.
4. A document is a key value collection where the key allows access to its value.
5. Documents are not typically forced to have a schema and therefore are flexi-
ble and easy to change.
6. Documents are stored into collections in order to group different kinds of data.
7. Documents can contain many different key-value pairs, or key-array pairs, or
even nested documents.
8. Example of Document Oriented databases: MongoDB, CouchDB etc.
Figure 3.8: Document Store