Why this topic?
< Client’s Application Roadmap
– “Reduction of cycle time for the document
intake process. Currently, it can take anywhere
from a few days to a few weeks from the time
the documents are received to when they are
available to the client.”
< New York Times used Hadoop/MapReduce to
convert pre-1980 articles that were TIFF
images to PDF.
2
Agenda
< Some history
< What is NoSQL
< CAP Theorem
< What is lost
< Types of NoSQL
< Data Model
< Frameworks
< Demo
< Wrapup
3
History of the World, Part 1
< Relational
Databases – mainstay of business
< Web-based applications caused spikes
– Especially true for public-facing e-Commerce sites
< Developers begin to front RDBMS with memcache or
integrate other caching mechanisms within the
application (ie. Ehcache)
4
Scaling Up
< Issues with scaling up when the dataset is just too
big
< RDBMS were not designed to be distributed
< Began to look at multi-node database solutions
< Known as ‘scaling out’ or ‘horizontal scaling’
< Different approaches include:
– Master-slave
– Sharding
5
Scaling RDBMS – Master/Slave
< Master-Slave
– All writes are written to the master. All reads
performed against the replicated slave databases
– Critical reads may be incorrect as writes may not
have been propagated down
– Large data sets can pose problems as master needs
to duplicate data to slaves
6
Scaling RDBMS - Sharding
< Partition or sharding
– Scales well for both reads and writes
– Not transparent, application needs to be partition-
aware
– Can no longer have relationships/joins across
partitions
– Loss of referential integrity across shards
7
Other ways to scale RDBMS
< Multi-Masterreplication
< INSERT only, not UPDATES/DELETES
< No JOINs, thereby reducing query time
– This involves de-normalizing data
< In-memory databases
8
What is NoSQL?
< Stands for Not Only SQL
< Class of non-relational data storage systems
< Usually do not require a fixed table schema nor do
they use the concept of joins
< All NoSQL offerings relax one or more of the ACID
properties (will talk about the CAP theorem)
9
Why NoSQL?
< For data storage, an RDBMS cannot be the be-
all/end-all
< Just as there are different programming languages,
need to have other data storage tools in the toolbox
< A NoSQL solution is more acceptable to a client now
than even a year ago
– Think about proposing a Ruby/Rails or Groovy/Grails
solution now versus a couple of years ago
10
How did we get here?
< Explosion of social media sites (Facebook,
Twitter) with large data needs
< Rise of cloud-based solutions such as Amazon
S3 (simple storage solution)
< Just as moving to dynamically-typed
languages (Ruby/Groovy), a shift to
dynamically-typed data with frequent schema
changes
< Open-source community
11
Dynamo and BigTable
< Three
major papers were the seeds of the NoSQL
movement
– BigTable (Google)
– Dynamo (Amazon)
• Gossip protocol (discovery and error detection)
• Distributed key-value data store
• Eventual consistency
– CAP Theorem (discuss in a sec ..)
12
The Perfect Storm
< Large datasets, acceptance of alternatives, and
dynamically-typed data has come together in a
perfect storm
< Not a backlash/rebellion against RDBMS
< SQL is a rich query language that cannot be rivaled
by the current list of NoSQL offerings
13
CAP Theorem
< Three properties of a system: consistency,
availability and partitions
< You can have at most two of these three properties
for any shared-data system
< To scale out, you have to partition. That leaves
either consistency or availability to choose from
– In almost all cases, you would choose availability over
consistency
14
Availability
< Traditionally, thought of as the server/process
available five 9’s (99.999 %).
< However, for large node system, at almost any point
in time there’s a good chance that a node is either
down or there is a network disruption among the
nodes.
– Want a system that is resilient in the face of network
disruption
15
Consistency Model
<A consistency model determines rules for visibility
and apparent order of updates.
< For example:
– Row X is replicated on nodes M and N
– Client A writes row X to node N
– Some period of time t elapses.
– Client B reads row X from node M
– Does client B see the write from client A?
– Consistency is a continuum with tradeoffs
– For NoSQL, the answer would be: maybe
– CAP Theorem states: Strict Consistency can't be
achieved at the same time as availability and partition-
tolerance.
16
Eventual Consistency
< When no updates occur for a long period of time,
eventually all updates will propagate through the
system and all the nodes will be consistent
< For a given accepted update and a given node,
eventually either the update reaches the node or the
node is removed from service
< Known as BASE (Basically Available, Soft state,
Eventual consistency), as opposed to ACID
17
What kinds of NoSQL
< NoSQL solutions fall into two major areas:
– Key/Value or ‘the big hash table’.
• Amazon S3 (Dynamo)
• Voldemort
• Scalaris
– Schema-less which comes in multiple flavors,
column-based, document-based or graph-based.
• Cassandra (column-based)
• CouchDB (document-based)
• Neo4J (graph-based)
• HBase (column-based)
18
Key/Value
Pros:
– very fast
– very scalable
– simple model
– able to distribute horizontally
Cons:
- many data structures (objects) can't be easily
modeled as key value pairs
19
Schema-Less
Pros:
- Schema-less data model is richer than key/value pairs
- eventual consistency
- many are distributed
- still provide excellent performance and scalability
Cons:
- typically no ACID transactions or joins
20
Common Advantages
< Cheap, easy to implement (open source)
< Data are replicated to multiple nodes (therefore identical
and fault-tolerant) and can be partitioned
– Down nodes easily replaced
– No single point of failure
< Easy to distribute
< Don't require a schema
< Can scale up and down
< Relax the data consistency requirement (CAP)
21
What am I giving up?
< joins
< group by
< order by
< ACID transactions
< SQL as a sometimes frustrating but still powerful
query language
< easy integration with other applications that support
SQL
22
Cassandra
< Originallydeveloped at Facebook
< Follows the BigTable data model: column-oriented
< Uses the Dynamo Eventual Consistency model
< Written in Java
< Open-sourced and exists within the Apache family
< Uses Apache Thrift as it’s API
23
Thrift
< Created at Facebook along with Cassandra
< Is a cross-language, service-generation framework
< Binary Protocol (like Google Protocol Buffers)
< Compiles to: C++, Java, PHP, Ruby, Erlang, Perl, ...
24
Searching
< Relational
– SELECT `column` FROM `database`,`table` WHERE
`id` = key;
– SELECT product_name FROM rockets WHERE id =
123;
< Cassandra (standard)
– keyspace.getSlice(key, “column_family”, "column")
– keyspace.getSlice(123, new ColumnParent(“rockets”),
getSlicePredicate());
25
Typical NoSQL API
< Basic API access:
– get(key) -- Extract the value given a key
– put(key, value) -- Create or update the value given its
key
– delete(key) -- Remove the key and its associated
value
– execute(key, operation, parameters) -- Invoke an
operation to the value (given its key) which is a
special data structure (e.g. List, Set, Map .... etc).
26
Data Model
< Within Cassandra, you will refer to data this
way:
– Column: smallest data element, a tuple with
a name and a value
:Rockets, '1' might return:
{'name' => ‘Rocket-Powered Roller Skates',
‘toon' => ‘Ready Set Zoom',
‘inventoryQty' => ‘5‘,
‘productUrl’ => ‘rockets\1.gif’}
27
Data Model Continued
– ColumnFamily: There’s a single structure used to group
both the Columns and SuperColumns. Called a
ColumnFamily (think table), it has two types, Standard &
Super.
• Column families must be defined at startup
– Key: the permanent name of the record
– Keyspace: the outer-most level of organization. This
is usually the name of the application. For example,
‘Acme' (think database name).
28
Cassandra and Consistency
< Talked previous about eventual consistency
< Cassandra has programmable read/writable
consistency
– One: Return from the first node that responds
– Quorom: Query from all nodes and respond with the
one that has latest timestamp once a majority of
nodes responded
– All: Query from all nodes and respond with the one
that has latest timestamp once all nodes responded.
An unresponsive node will fail the node
29
Cassandra and Consistency
– Zero: Ensure nothing. Asynchronous write done in
background
– Any: Ensure that the write is written to at least 1
node
– One: Ensure that the write is written to at least 1
node’s commit log and memory table before receipt to
client
– Quorom: Ensure that the write goes to node/2 + 1
– All: Ensure that writes go to all nodes. An
unresponsive node would fail the write
30
Consistent Hashing
< Partition using consistent hashing
– Keys hash to a point on a
fixed circular space
– Ring is partitioned into a set of
ordered slots and servers and
keys hashed over these slots
< Nodes take positions on the circle.
< A, B, and D exists.
– B responsible for AB range.
– D responsible for BD range.
– A responsible for DA range.
< C joins.
– B, D split ranges.
– C gets BC from D.
31
Domain Model
< Design your domain model first
< Create your Cassandra data store to fit your domain
model
<Keyspace Name="Acme">
<ColumnFamily CompareWith="UTF8Type" Name="Rockets" />
<ColumnFamily CompareWith="UTF8Type" Name="OtherProducts" />
<ColumnFamily CompareWith="UTF8Type" Name="Explosives" />
…
</Keyspace>
32
Data Model
ColumnFamily: Rockets
Key Value
1 Name Value
name Rocket-Powered Roller Skates
toon Ready, Set, Zoom
inventoryQty 5
brakes false
2 Name Value
name Little Giant Do-It-Yourself Rocket-Sled Kit
toon Beep Prepared
inventoryQty 4
brakes false
3 Name Value
name Acme Jet Propelled Unicycle
toon Hot Rod and Reel
inventoryQty 1
wheels 1
33
Data Model Continued
– Optional super column: a named list. A super
column contains standard columns, stored in recent
order
• Say the OtherProducts has inventory in categories. Querying
(:OtherProducts, '174927') might return:
{‘OtherProducts' => {'name' => ‘Acme Instant Girl', ..},
‘foods': {...}, ‘martian': {...}, ‘animals': {...}}
• In the example, foods, martian, and animals are all super
column names. They are defined on the fly, and there can be
any number of them per row. :OtherProducts would be the
name of the super column family.
– Columns and SuperColumns are both tuples with a
name & value. The key difference is that a standard
Column’s value is a “string” and in a SuperColumn the
value is a Map of Columns.
34
Data Model Continued
< Columns are always sorted by their name. Sorting
supports:
– BytesType
– UTF8Type
– LexicalUUIDType
– TimeUUIDType
– AsciiType
– LongType
< Each of these options treats the Columns' name as a
different data type
35
Hector
< Leading Java API for Cassandra
< Sits on top of Thrift
< Adds following capabilities
– Load balancing
– JMX monitoring
– Connection-pooling
– Failover
– JNDI integration with application servers
– Additional methods on top of the standard get, update,
delete methods.
< Under discussion
– hooks into Spring declarative transactions
36
Hector and JMX
37
Code Examples: Tomcat Configuration
Tomcat context.xml
<Resource name="cassandra/CassandraClientFactory"
auth="Container"
type="me.prettyprint.cassandra.service.CassandraHostConfigurator"
factory="org.apache.naming.factory.BeanFactory"
hosts="localhost:9160"
maxActive="150"
maxIdle="75" />
J2EE web.xml
<resource-env-ref>
<description>Object factory for Cassandra clients.</description>
<resource-env-ref-name>cassandra/CassandraClientFactory</resource-
env-ref-name>
<resource-env-ref-
type>org.apache.naming.factory.BeanFactory</resource-env-ref-type>
</resource-env-ref>
38
Code Examples: Spring Configuration
Spring applicationContext.xml
<bean id="cassandraHostConfigurator“
class="org.springframework.jndi.JndiObjectFactoryBean">
<property name="jndiName">
<value>cassandra/CassandraClientFactory</value></property>
<property name="resourceRef"><value>true</value></property>
</bean>
<bean id="inventoryDao“
class="com.acme.erp.inventory.dao.InventoryDaoImpl">
<property name="cassandraHostConfigurator“
ref="cassandraHostConfigurator" />
<property name="keyspace" value="Acme" />
</bean>
39
Code Examples: Cassandra Get Operation
try {
cassandraClient = cassandraClientPool.borrowClient();
// keyspace is Acme
Keyspace keyspace = cassandraClient.getKeyspace(getKeyspace());
// inventoryType is Rockets
List<Column> result = keyspace.getSlice(Long.toString(inventoryId), new
ColumnParent(inventoryType), getSlicePredicate());
inventoryItem.setInventoryItemId(inventoryId);
inventoryItem.setInventoryType(inventoryType);
loadInventory(inventoryItem, result);
} catch (Exception exception) {
logger.error("An Exception occurred retrieving an inventory item", exception);
} finally {
try {
cassandraClientPool.releaseClient(cassandraClient);
} catch (Exception exception) {
logger.warn("An Exception occurred returning a Cassandra client to the pool", exception);
}
}
40
Code Examples: Cassandra Update Operation
try {
cassandraClient = cassandraClientPool.borrowClient();
Map<String, List<ColumnOrSuperColumn>> data = new HashMap<String,
List<ColumnOrSuperColumn>>();
List<ColumnOrSuperColumn> columns = new ArrayList<ColumnOrSuperColumn>();
// Create the inventoryId column.
ColumnOrSuperColumn column = new ColumnOrSuperColumn();
columns.add(column.setColumn(new Column("inventoryItemId".getBytes("utf-8"),
Long.toString(inventoryItem.getInventoryItemId()).getBytes("utf-8"), timestamp)));
column = new ColumnOrSuperColumn();
columns.add(column.setColumn(new Column("inventoryType".getBytes("utf-8"),
inventoryItem.getInventoryType().getBytes("utf-8"), timestamp)));
….
data.put(inventoryItem.getInventoryType(), columns);
cassandraClient.getCassandra().batch_insert(getKeyspace(),
Long.toString(inventoryItem.getInventoryItemId()), data, ConsistencyLevel.ANY);
} catch (Exception exception) {
…
}
41
Some Statistics
< FacebookSearch
< MySQL > 50 GB Data
– Writes Average : ~300 ms
– Reads Average : ~350 ms
< Rewritten with Cassandra > 50 GB Data
– Writes Average : 0.12 ms
– Reads Average : 15 ms
42
Some things to think about
< Ruby on Rails and Grails have ORM baked in. Would
have to build your own ORM framework to work with
NoSQL.
– Some plugins exist.
< Same would go for Java/C#, no Hibernate-like
framework.
– A simple JDO framework does exist.
< Support for basic languages like Ruby.
43
Some more things to think about
< Troubleshooting performance problems
< Concurrency on non-key accesses
< Are the replicas working?
< No TOAD for Cassandra
– though some NoSQL offerings have GUI tools
– have SQLPlus-like capabilities using Ruby IRB
interpreter.
44
Don’t forget about the DBA
< Itdoes not matter if the data is deployed on a
NoSQL platform instead of an RDBMS.
< Still need to address:
– Backups & recovery
– Capacity planning
– Performance monitoring
– Data integration
– Tuning & optimization
< What happens when things don’t work as
expected and nodes are out of sync or you
have a data corruption occurring at 2am?
< Who you gonna call?
– DBA and SysAdmin need to be on board
45
Where would I use it?
< For most of us, we work in corporate IT and a
LinkedIn or Twitter is not in our future
< Where would I use a NoSQL database?
< Do you have somewhere a large set of uncontrolled,
unstructured, data that you are trying to fit into a
RDBMS?
– Log Analysis
– Social Networking Feeds (many firms hooked in
through Facebook or Twitter)
– External feeds from partners (EAI)
– Data that is not easily analyzed in a RDBMS such as
time-based data
– Large data feeds that need to be massaged before
entry into an RDBMS
46
Summary
< Leading users of NoSQL datastores are social
networking sites such as Twitter, Facebook,
LinkedIn, and Digg.
< To implement a single feature in Cassandra, Digg
has a dataset that is 3 terabytes and 76 billion
columns.
< Not every problem is a nail and not every solution is
a hammer.
47
Questions
48
Resources
< Cassandra
– http://cassandra.apache.org
< Hector
– http://wiki.github.com/rantav/hector
– http://prettyprint.me
< NoSQL News websites
– http://nosql.mypopescu.com
– http://www.nosqldatabases.com
< High Scalability
– http://highscalability.com
< Video
– http://www.infoq.com/presentations/Project-
Voldemort-at-Gilt-Groupe
49