NoSQL Systems For Big Data Management
NoSQL Systems For Big Data Management
Venkat N Gudivada
Dhana Rao
Vijay V. Raghavan
Weisburg Division of Computer Science Biological Sciences Department Center for Advanced Computer Studies
Marshall University
Marshall University
University of Louisiana at Lafayette
Huntington, WV, USA
Huntington, WV, USA
Lafayette, LA, USA
gudivada@marshall.edu
raod@marshall.edu
vijay@cacs.louisiana.edu
I. Introduction
Until recently, relational database management systems (RDBMS) were the mainstay for managing all types
of data irrespective of their naturally t to the relational
data model. The emergence of Big Data and mobile
computing necessitated new database functionality to
support applications such as real-time logle analysis,
cross-selling in eCommerce, location based services,
and micro-blogging. Many of these applications exhibit
a preponderance of insert and retrieve operations on a
very large scale. Relational databases were found to be
inadequate in handling the scale and varied structure
of data.
The above requirements ushered in an array of
choices for Big Data management under the umbrella
term NoSQL [1], [2], [3]. NoSQL (meaning not only
SQL) has come to describe a large class of databases
which do not have properties of traditional relational
databases and are generally not queried with SQL
(structured query language). NoSQL systems provide
data partitioning and replication as built-in features.
They typically run on cluster computers made from
commodity hardware and provide horizontal scalability.
Developing applications using NoSQL systems is
quite dierent from the process used with RDBMS [4].
978-1-4799-5069-0/14 $31.00 2014 IEEE
DOI 10.1109/SERVICES.2014.42
206
190
C. Eventual Consistency
RDBMS data models are optimized for row-wise processing based on the assumption that applications
process most of an entitys attributes. In contrast,
some NoSQL data models are designed to eciently
perform column-wise processing to support computing
aggregates on one or two attributes across all entities.
For example, data models of column-oriented NoSQL
databases resemble a data warehouse star schema
where the fact table at the center stores de-normalized
row data and each dimension table stores all columns
of a column-family.
E. Application Synthesized Relationships
RDBMS data models are designed to capture one-toone, one-to-many, and many-to-many relationships between entities. In contrast, many NoSQL data models do
not explicitly and inherently model relationships (graph
data model based NoSQL systems are an exception).
Most NoSQL systems do not support relational join
operation. NoSQL applications are expected to synthesize relationships by processing the data. If the data
is inherently rich in relationships, many NoSQL data
models are a poor t.
191
207
However, vertical scaling quickly reaches its saturation point. Other techniques for improving transaction
throughput include relaxing ACID compliance and using
methods such as disabling database logging.
Horizontal scalability, in contrast, refers to scaling
by adding new processors complete with their own
disks (aka nodes). Closely associated with horizontal
scaling is the partitioning of data. Each node/processor
contains only a subset of the data. The process of
assigning data to each node is referred to as sharding,
which is done automatically in some NoSQL systems.
It is easier to achieve horizontal scalability with
NoSQL databases. The complexity involved in enforcing ACID compliance simply does not exist for most
NoSQL systems as they are ACID non-compliant by
design. Furthermore, only insert and read operations
dominate in NoSQL databases as update and delete
operations fade into insignicance in terms of volume.
NoSQL systems also achieve horizontal scalability by
delegating two-phase commit required for transaction
implementation to applications. For example, MongoDB
does not guarantee ACID compliance for concurrent
operations. This may infrequently result in undesirable
outcomes such as phantom reads.
I. Horizontal Scalability
RDBMS for Big Data applications are forced to use
distributed storage. Reasons for this include data that
exceeds the disk size limits, need for partitioned table
storage, recovery from hard disk failures through data
redundancy, and data replication to enable high availability. A typical RDBMS query requires join operations
across multiple tables. How fast a query is processed
is limited by how quickly data can be moved from
hard disks to primary storage, and the amount of data
moved. Database update operations are usually run as
database transactions. If the table data is partitioned,
update operations run slowly due to contention for
exclusive locks.
Securing and releasing write locks, coordinating this
across various disks, and ACID (atomicity, consistency,
isolation, and durability) compliance slows down transaction throughput. Long running transactions exacerbate the transaction throughput further. Vertical scaling
is often proposed as a solution to the transaction
throughput problem, which involves adding more CPUs,
more cores per CPU, and additional main memory.
A. Shared-Nothing Architecture
In this architecture, each node is self-sucient and
acts independently to remove single point of resource
contention. Nodes share neither memory nor disk storage. Database systems based on shared-nothing architecture can scale almost linearly by adding new
nodes assembled using inexpensive commodity hardware components. Data is distributed across the nodes
in a non-overlapping manner, which is called sharding.
Though this concept existed for long with its roots in
distributed databases, it gained prominence with the
advent of NoSQL systems.
B. Hash Trees and Consistent Hashing
A hash tree (aka Merkle tree) is a tree in which every
non-leaf node is labeled with the hash of the labels of
its child nodes. Hash trees enable ecient and secure
verication of data transmitted between computers for
192
208
G. Vector Clocks
A vector clock is an algorithm to reason about events
based on event timestamps. It is an extension of multiversion concurrency control (MVCC) used in RDBMS to
multiple servers. Each server keeps its copy of vector
clock. When servers send and receive messages among
themselves, vector clocks are incremented and attached
with messages. A partial order relationship is dened
based on server vector clocks, and is used to derive
causal relationships between database item updates.
E. BASE Properties
In RDBMS, the consistency property ensures that all
transactions transform a database from one valid state
to another. Once a transaction updates a database item,
all database clients (i.e., programs and users) will see
the same value for the updated item.
ACID properties are to RDBMS as BASE is to NoSQL
systems. BASE refers to basic availability, soft state, and
eventual consistency. Basic availability implies disconnected client operation and delayed synchronization,
and tolerance to temporary inconsistency and its implications. Soft state refers to state change without input,
which is required for eventual consistency. Eventual
consistency means that if no further updates are made
to an updated database item for long enough period of
time, all clients will see the same value for the updated
item.
I. Memory-Mapped Files
A memory-mapped le is a segment of virtual memory which is associated with an operating system le
or le-like resource (e.g., a device, shared memory) on
a byte-for-byte correspondence. Memory-mapped les
increase I/O performance especially for large les.
J. CAP Theorem
Consistency, availability, and partition tolerance
(CAP) are the three primary concerns that determine
which data management system is suitable for a given
application. Consistency feature guarantees that all
clients of a data management system have the same
view of data. Availability assures that all clients can always read and write. Finally, partition tolerance ensures
that the system works well with data that is distributed
across physical network partitions. The CAP theorem
states that it is impossible for any data management
193
209
record access, and high speed insert and read operations. When a value changes, it is stored as a dierent
version of the same value using a timestamp. In other
words, the notion of update is eectively nonexistent.
Partial record access contributes to dramatic performance improvements for certain applications. Columnar databases perform aggregate operations such as
computing maxima, minima, average, and sum on large
datasets with extreme eciency.
Recall that a column family is a set of related
columns. Column databases require predening column families, and not columns. A column family may
contain any number of columns of any type of data,
as long as the latter can be persisted as byte arrays.
Columns in a family are logically related to each other,
and are physically stored together. Performance gain
is achieved by grouping columns with similar access
characteristics into the same family. Database schema
evolution is achieved by adding columns to column
families. A column family is similar to the column
concept in RDBMS.
Systems in this category include Google BigTable
(available through Google App Engine), Apache Cassandra, Apache HBase, Hypertable, Cloudata, Oracle RDBMS
Columnar Expression, Microsoft SQL Server 2012 Enterprise Edition. Table II summarizes characteristics of
some columnar databases.
V. Table-type/Column Databases
RDBMS are row-based systems as their processing
is row-centric. They are designed to eciently return
entire rows of data. Rows are uniquely identied by
system generated row ids. As the name implies, column
databases are column-centric. Conceptually, a columnar
database is like an RDBMS with an index on every
column without incurring the overhead associated with
the latter. It is also useful to think of column databases
as nested key-value systems.
Column database applications are characterized by
tolerance to temporary inconsistency, need for versioning, exible database schema, sparse data, partial
194
210
Table I
Key-Value Databases
Name
Salient Characteristics
Memcached
Shared-nothing architecture, in-memory object caching systems with no disk persistence. Automatic sharding but no
replication. Client libraries for popular programming languages including Java, .Net, PHP, Python, and Ruby.
Aerospike
Shared-nothing architecture, in-memory database with disk persistence. Automatic data partitioning and synchronous
replication. Data structures support for string, integer, BLOB, map, and list. ACID with relax option, backup and recovery,
high availability. Cross data center replication. Client access through Java, Lua, and REST.
Cassandra
Shared-nothing, master-master architecture, in-memory database with disk persistence. Key range based automatics data
partitioning. Synchronous and asynchronous replication across multiple data centers. High availability. Client interfaces
include Cassandra Query Language (CQL), Thrift, and MapReduce. Largest known Cassandra cluster has over 300 TB of
data in over 400-node cluster.
Redis
Shared-nothing architecture, in-memory database with disk persistence, ACID transactions. Supports several data
structures including sets, sorted sets, hashes, strings, and blocking queues. Backup and recovery. High availability.
Client interface through C and Lua.
Riak
Shared-nothing architecture, in-memory database with disk persistence, data teated as BLOBs, automatic data partitioning,
eventually consistency, backup and recovery, and high availability through multi data center replication. Client API
includes Erlang, JavaScript, MapReduce queries, full text search, and REST.
Voldemort
Shared-nothing architecture, in-memory database with disk persistence, automatic data partitioning and replication,
versioning, map and list data structures, ACID with relax option, backup and recovery, high availability. Protocol Buers,
Thrift, Avro and Java serialization options. Client access through Java API.
Table II
Column Databases
Name
Salient Characteristics
BigTable
A sparse, persistent, distributed, multi-dimensional sorted map. Features strict consistency and runs on distributed
commodity clusters. Ideal for data that is in the order of billions of rows with millions of columns.
HBase
Open Source Java implementation of BigTable. No data types, and everything is a byte array. Client access tools: shell
(i.e., command line), Java API, Thrift, REST, and Avo (binary protocol). Row keys are typically 64-bytes long. Rows are
byte-ordered by their row keys. Uses distributed deployment model. Works with Hadoop Distributed File System (HDFS),
but uses lesystem API to avoid strong coupling. HBase can also be used with CloudStore.
Cassandra
Provides eventual consistency. Client interfaces: phpcasa (a PHP wrapper), Pycassa (Python binding), command line/shell,
Thrift, and Cassandra Query Language (CQL). Popular for developing nancial services applications.
Table III
Graph Databases
Name
Salient Characteristics
Neo4J
In-memory or in-memory with persistence. Full support for transactions. Nodes/vertices in the graph are described using
properties and the relationships between nodes are typed and relationships can have their own properties. Deployed on
compute clusters in a single data center or across multiple geographically distributed data centers. Highly scalable and
existing applications have 32 billion nodes, 32 billion relationships, and 64 billion properties. Client interfaces: REST,
Cypher (SQL-like), Java, and Gremlin.
AllegroGraph
Full read concurrency, near full write concurrency, and dynamic and automatic indexing of committed data, soundex
support, ne granular security, geospatial and temporal reasoning, and social network analysis. Online backups, point-intime recovery, replication, and warm standby. Integration with Solr and MongoDB. Client interfaces: JavaScript, CLIF++,
and REST (Java Sesame, Java Jena, Python, C#, Clojure, Perl, Ruby, Scala, and Lisp).
195
211
Each document is an independent entity with potentially varied and nested attributes. Documents are indexed by their primary identiers as well as semistructured document eld values. Document databases
are ideal for applications that involve aggregates across
document collections.
Systems in this category include MongoDB, CouchDB,
Couchbase, RavenDB, and FatDB. Document databases
often integrate with full-text databases such as Solr,
Lucene, and ElasticSearch. For example, ElasticSearch
provides real-time response to document queries in
JSON format; RavenDB uses Lucene. Table IV summarizes characteristics of some document databases.
X. Hybrid Systems
Systems in this category are those that evolved
from the traditional RDBMS or those that fall into
more than one category discussed above. Systems in
this category include PostgreSQL, VoltDB, OrientDB,
Aerospike, ArangoDB, and Spanner. On Amazon EC2,
VoltDB achieved 95 thousand transactions per second
(TPS) in a Node.js application running on 12 nodes;
and 34 million TPS with 30 nodes. Table V summarizes
characteristics of some hybrid databases.
XI. Conclusions
Unprecedented data volumes, connected data, performance and scalability requirements of modern Big datadriven applications eectively challenged the practice
that RDBMS is the only approach for data management.
Consistency, availability, and partition tolerance are
the three primary concerns that determine which data
management system is suitable for a given application.
Netix moved from Oracle RDBMS to Apache Cassandra, and achieved over a million writes per second
across the cluster with over 10,000 writes per second
per node while maintaining the average latency at less
than 0.015 milliseconds. Total cost of Cassandra set
up and running on Amazon EC2 was at around $60 per
hour for a cluster of 48 nodes.
Cisco recently replaced an Oracle RAC solution for
master data management with Neo4J. Query times were
reduced from minutes to milliseconds in addition to
expressing queries on connected data with ease. This
application has 35 million nodes, 50 million relationships, and 600 million properties.
As the above two cases exemplify, NoSQL data models are designed for eciently supporting insert and
read operations; storing sparse data and column-wise
processing; enabling disconnected client operation and
delayed synchronization; and tolerance to temporary
inconsistency and its implications.
A more typical scenario for NoSQL systems to gain
ubiquitous use will come from using an array of data
management systems, each naturally suited for the type
of data and operations, all data management systems
abstracted away through a Web or application server.
Furthermore, Web servers will manage access control
and authorization centrally. Also, the migration from
196
212
Table IV
Document Databases
Name
Salient Characteristics
MongoDB
No transaction support. Only modier operations oer atomic consistency. Lack of isolation levels may result in phantom
reads. Uses memory-mapped les storage. Support is available for geospatial processing and MapReduce framework.
Indexing, replication, GridFS, and aggregation pipeline. JavaScript expressions as queries. Client access tools: JS Shell
(command line tool), and drivers for most programming languages. Suitable for applications that require auto-sharding,
high horizontal scalability for managing schema-less semi-structured documents. Stores documents in BSON format and
data is transferred across the wire in binary format.
CouchDB
Open Source database written in Erlang. JSON format for documents. Client access tools: REST API, CouchApps (an
application server), and MapReduce. JavaScript is used for writing MapReduce functions.
Couchbase
Incorporates functionality of CouchDB and Membase. Data is automatically partitioned across cluster nodes. All nodes
can do both reads and writes. Used in many commercial high availability applications and games.
Table V
Hybrid Databases
Name
Salient Characteristics
PostgreSQL
Recent versions provide support for JSON and key-value support through an add-on called hstore. Suitable for applications
that primarily depend on RDBMS for data management, but also face a need for scalable means for managing key-value
data.
VoltDB
In-memory database running on a single thread. Eliminates the overheads associated with locking and latching in multithreaded environments. Uses snapshots to save data to the disk. Database can be restored to a previous state using
snapshots. Data is distributed across several servers. Supports transactions. Supports only a subset of ANSI/ISO SQL.
Migration to VoltDB will require rewriting some of the existing SQL queries. Client interfaces: JSON API, Java, C++, C#,
PHP, Python, Node.js, Ruby, and Erlang.
VoltCache
A key-value database implemented on top of VoltDB. Client access is through a Memcached compatible API.
OrientDB
ArangoDB
Aerospike
Hybrid system like OrientDB, but also provides traditional RDBMS functionality.
Google Spanner
Multi-version, globally-distributed, and synchronously-replicated database. Spanner supports externally-consistent distributed transactions.
booktitle = Proceedings of the 8th ACM European Conference on Computer Systems, series = EuroSys 13, year =
2013, isbn = 978-1-4503-1994-2, location = Prague, Czech
Republic, pages = 183196, numpages = 14, publisher =
ACM, address = New York, NY, USA.
[4] D. McCreary and A. Kelly, Making Sense of NoSQL: A guide
for managers and the rest of us. Manning Publications,
2013.
[5] C. Mohan, History repeats itself: Sensible and NonsenSQL
aspects of the NoSQL hoopla, in Proceedings of the 16th
International Conference on Extending Database Technology, ser. EDBT 13. New York, NY, USA: ACM, 2013, pp.
1116.
References
[6] V. Abramova and J. Bernardino, NoSQL databases: MongoDB vs Cassandra, in Proceedings of the International
C* Conference on Computer Science and Software Engineering, ser. C3S2E 13. New York, NY, USA: ACM, 2013,
pp. 1422.
197
213