0% found this document useful (0 votes)

553 views27 pages

Introduction to Cassandra Basics

This document provides an overview and introduction to Apache Cassandra, including: - What Cassandra is and how it differs from relational databases in its data model and capabilities for high availability. - How to install Cassandra on Amazon Web Services and configure a single node cluster. - Best practices for data modeling in Cassandra, including denormalizing data and modeling tables to match queries. - An example Twitter-like application called "Twissandra" that demonstrates how to model and query data in Cassandra.

Uploaded by

Marco Antonio Martinez Andrade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

553 views27 pages

Introduction to Cassandra Basics

Uploaded by

Marco Antonio Martinez Andrade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Getting to know

by Michelle Darling
[email protected]
August 2013

Agenda:

What is Cassandra?
Installation, CQL3
Data Modelling
Summary

Only 15 min to cover these, so

please hold questions til the
end, or email me :-) and Ill
summarize Q&A for everyone.

Unfortunately, no time for:

DB Admin
Detailed Architecture
Partitioning /
Consistent Hashing
Consistency Tuning
Data Distribution &
Replication
System Tables
App Development
Using Python, Ruby etc
to access Cassandra
Using Hadoop to
stream data into
Cassandra

What is Cassandra?
NoSQL Distributed DB
Consistency - A__ID
Availability - High
Point of Failure - none
Good for Event
Tracking & Analysis

Fortuneteller of Doom
from Greek Mythology. Tried to
warn others about future disasters,
but no one listened. Unfortunately,
she was 100% accurate.

Time series data

Sensor device data
Social media analytics
Risk Analysis
Failure Prediction

Rackspace: Which servers

are under heavy load
and are about to crash?

The Evolution of Cassandra

2005
Data Model
Wide rows, sparse arrays
High performance through very
fast write throughput.

2006

Infrastructure
Peer-Peer Gossip
Key-Value Pairs
Tunable Consistency

Originally for Inbox Search

But now used for Instagram

2008: Open-Source Release / 2013: Enterprise & Community Editions

Other NoSQL vs.

NoSQL Taxonomy:

Key-Value Pairs
Dynamo, Riak, Redis
Column-Based
BigTable, HBase,
Cassandra
Document-Based
MongoDB, Couchbase
Graph
Neo4J

Cassandra
C* Differentiators:
Production-proven at
Netflix, eBay, Twitter,
20 of Fortune 100
Clear Winner in
Scalability,
Performance,
Availability

Big Data Capable

-- DataStax

Architecture

Cluster (ring)
Nodes (circles)
Peer-to-Peer Model
Gossip Protocol

Partitioner:
Consistent Hashing

Netflix
Streaming Video

Personalized
Recommendations per
family member
Built on Amazon Web
Services (AWS) +
Cassandra

Cloud installation using

Amazon Web Services (AWS)
Elastic Compute Cloud (EC2)

Free for the 1st year! Then pay only for what you use.
Sign up for AWS EC2 account: Big Data University Video 4:34 minutes,

Amazon Machine Image (AMI)

Preconfigured installation template
Choose: DataStax AMI for Cassandra
Community Edition
Follow these *very good* step-by-step
instructions from DataStax.
AMIs also available for CouchBase, MongoDB
(make sure you pick the free tier community versions to avoid
monthly charge$$!!!).

AWS EC2 Dashboard

DataStax AMI Setup

--clustername Michelle
--totalnodes 1
--version community

Roll your Own Installation

DataStax Community Edition

Install instructions
For Linux, Windows,
MacOS:
http://www.datastax.com/2012/01/gettingstarted-with-cassandra

Video: Set up a 4node Cassandra

cluster in under 2
minutes
http://www.screenr.com/5G6

Invoke CQLSH, CREATE KEYSPACE

./bin/cqlsh
cqlsh> CREATE KEYSPACE big_data
with strategy_class = org.apache.cassandra.
locator.SimpleStrategy
with strategy_options:replication_factor=1;
cqlsh> use big_data;
cqlsh:big_data>

Tip: Skip Thrift -- use CQL3

Thrift RPC

CQL3

// Your Column
Column col = new Column(ByteBuffer.wrap("name".
getBytes()));
col.setValue(ByteBuffer.wrap("value".getBytes()));
col.setTimestamp(System.currentTimeMillis());
// Don't ask
ColumnOrSuperColumn cosc = new ColumnOrSuperColumn();
cosc.setColumn(col);

- Uses cqlsh
- SQL-like language
- Runs on top of Thrift RPC
- Much more user-friendly.

// Prepare to be amazed
Mutation mutation = new Mutation();
mutation.setColumnOrSuperColumn(cosc);
List<Mutation> mutations = new ArrayList<Mutation>();
mutations.add(mutation);
Map mutations_map = new HashMap<ByteBuffer, Map<String,
List<Mutation>>>();
Map cf_map = new HashMap<String, List<Mutation>>();
cf_map.set("Standard1", mutations);
mutations_map.put(ByteBuffer.wrap("key".getBytes()),
cf_map);
cassandra.batch_mutate(mutations_map,
consistency_level);

Thrift code on left

equals this in CQL3:

INSERT INTO (id, name)

VALUES ('key',
'value');

CREATE TABLE
cqlsh:big_data> create table user_tags (
user_id varchar,
tag varchar,
value counter,
primary key (user_id, tag)
):
TABLE user_tags: How many times has a user
mentioned a hashtag?
COUNTER datatype - Computes & stores counter value
at the time data is written. This optimizes query
performance.

UPDATE TABLE
SELECT FROM TABLE
cqlsh:big_data> UPDATE user_tags SET
value=value+1 WHERE user_id = paul AND tag =
cassandra

cqlsh:big_data> SELECT * FROM user_tags

user_id | tag
| value
--------+-----------+---------paul
| cassandra | 1

DATA MODELING
A Major Paradigm Shift!
RDBMS

Cassandra

Structured Data, Fixed Schema

Unstructured Data, Flexible Schema

Array of Arrays
2D: ROW x COLUMN

Nested Key-Value Pairs

3D: ROW Key x COLUMN key x COLUMN values

DATABASE

KEYSPACE

TABLE

TABLE a.k.a COLUMN FAMILY

ROW

ROW a.k.a PARTITION. Unit of replication.

COLUMN

COLUMN [Name, Value, Timestamp]. a.k.a CLUSTER. Unit

of storage. Up to 2 billion columns per row.

FOREIGN KEYS, JOINS,

ACID Consistency

Referential Integrity not enforced, so A_CID.

BUT relationships represented using COLLECTIONS.

Cassandra
3D+: Nested Objects

RDBMS
2D: Rows
x columns

Example:
Twissandra Web App
Twitter-Inspired
sample application
written in Python +
Cassandra.
Play with the app:
twissandra.com
Examine & learn
from the code on
GitHub.

Features/Queries:
Sign In, Sign Up
Post Tweet
Userline (Users tweets)
Timeline (All tweets)
Following (Users being
followed by user)
Followers (Users
following this user)

Twissandra.com vs Twitter.com

Twissandra - RDBMS Version

Entities
USER, TWEET
FOLLOWER, FOLLOWING
FRIENDS
Relationships:
USER has many TWEETs.
USER is a FOLLOWER of many
USERs.
Many USERs are FOLLOWING
USER.

Twissandra - Cassandra Version

Tip: Model tables to mirror queries.

TABLES or CFs
TWEET
USER, USERNAME
FOLLOWERS, FOLLOWING
USERLINE, TIMELINE
Notes:
Extra tables mirror queries.
Denormalized tables are
pre-formedfor faster
performance.

Tip: Remember,
Skip Thrift -- use CQL3

TABLE

What does C* data look like?

TABLE Userline
List all of users Tweets
*************
Row Key: user_id
Columns
Column Key: tweet_id
at Timestamp
TTL (Time to Live) seconds til expiration
date.
*************

Cassandra Data Model = LEGOs?

Flex
Sch ible
ema

Summary:
Go straight from SQL
to CQL3; skip Thrift, Column

Families, SuperColumns, etc

Denormalize tables to
mirror important queries.
Roughly 1 table per impt query.

Choose wisely:
Partition Keys
Cluster Keys
Indexes
TTL
Counters
Collections
See DataStax Music Service
Example

Consider hybrid
approach:

20% - RDBMS for highly

structured, OLTP, ACID
requirements.
80% - Scale Cassandra to
handle the rest of data.

Remember:
Cheap: storage,
servers, OpenSource
software.
Precious: User AND
Developer Happiness.

Resources
C* Summit 2013:

Slides
Cassandra at eBay Scale (slides)
Data Modelers Still Have Jobs Adjusting For the NoSQL
Environment (Slides)
Real-time Analytics using
Cassandra, Spark and Shark
slides

Cassandra By Example: Data

Modelling with CQL3 Slides
DATASTAX C*OLLEGE CREDIT:
DATA MODELLING FOR APACHE
CASSANDRA slides

I wish I found these 1st:

How do I Cassandra?
slides
Mobile version of
DataStax web docs
(link)

Apache Cassandra Database - Instaclustr
No ratings yet
Apache Cassandra Database - Instaclustr
8 pages
Learn Cassandra
100% (2)
Learn Cassandra
37 pages
SS1123 - D2T - Apache Cassandra Overview PDF
100% (1)
SS1123 - D2T - Apache Cassandra Overview PDF
45 pages
Cassandra Architecture PDF
No ratings yet
Cassandra Architecture PDF
112 pages
Cassandra: Decentralized Storage System
No ratings yet
Cassandra: Decentralized Storage System
37 pages
Cassandra for Developers & Analysts
No ratings yet
Cassandra for Developers & Analysts
6 pages
Cassandra Datastax
100% (1)
Cassandra Datastax
10 pages
PostgreSQL and NoSQL
100% (7)
PostgreSQL and NoSQL
36 pages
Apache Kafka Description
No ratings yet
Apache Kafka Description
36 pages
50 Apache Cassandra Interview Questions
No ratings yet
50 Apache Cassandra Interview Questions
10 pages
SparkInternals All
No ratings yet
SparkInternals All
90 pages
Cassandra Best Practices
100% (1)
Cassandra Best Practices
49 pages
Apache Cassandra
No ratings yet
Apache Cassandra
3 pages
Hive and HBase for Data Engineers
No ratings yet
Hive and HBase for Data Engineers
25 pages
A Crash Course in Redis
No ratings yet
A Crash Course in Redis
6 pages
Key-Value Databases Explained
No ratings yet
Key-Value Databases Explained
75 pages
Cassandra Certification Guide
No ratings yet
Cassandra Certification Guide
0 pages
Cassandra DBA
No ratings yet
Cassandra DBA
5 pages
Cassandra for Developers
100% (2)
Cassandra for Developers
183 pages
Learning Concurrent Programming in Scala: Chapter No. 1 "Introduction"
No ratings yet
Learning Concurrent Programming in Scala: Chapter No. 1 "Introduction"
21 pages
Cassandra
100% (1)
Cassandra
31 pages
8 Data Modeling Patterns in Redis
No ratings yet
8 Data Modeling Patterns in Redis
56 pages
Public - Crash Course - Apache Spark - Berlin - 2018 PDF
No ratings yet
Public - Crash Course - Apache Spark - Berlin - 2018 PDF
76 pages
Stream Processing with Kafka Overview
No ratings yet
Stream Processing with Kafka Overview
46 pages
Dynamodb DG
No ratings yet
Dynamodb DG
705 pages
Apache Spark Training Overview
No ratings yet
Apache Spark Training Overview
30 pages
Introduction to Hadoop HDFS
No ratings yet
Introduction to Hadoop HDFS
9 pages
Running Cloud Native Applications On Digitalocean Kubernetes
No ratings yet
Running Cloud Native Applications On Digitalocean Kubernetes
28 pages
MongoDB: NoSQL for Big Data Experts
No ratings yet
MongoDB: NoSQL for Big Data Experts
3 pages
MongoDB Indexes Guide
No ratings yet
MongoDB Indexes Guide
68 pages
Integrating Apache Nifi and Apache Kafka
No ratings yet
Integrating Apache Nifi and Apache Kafka
5 pages
JVM Internals
No ratings yet
JVM Internals
23 pages
Spark & Scala for Developers
No ratings yet
Spark & Scala for Developers
40 pages
Couchbase Server Architecture Overview
No ratings yet
Couchbase Server Architecture Overview
12 pages
Apache Spark Graph Processing - Sample Chapter
No ratings yet
Apache Spark Graph Processing - Sample Chapter
22 pages
Kafka Sparkstreaming
No ratings yet
Kafka Sparkstreaming
75 pages
Modern Data Pipelines Overview
No ratings yet
Modern Data Pipelines Overview
91 pages
AWS Glue for ETL Developers
No ratings yet
AWS Glue for ETL Developers
5 pages
Cassandra Tutorial
100% (3)
Cassandra Tutorial
111 pages
Exploring Reactive Integrations With: Akka Streams
No ratings yet
Exploring Reactive Integrations With: Akka Streams
66 pages
100 Interview Questions
No ratings yet
100 Interview Questions
13 pages
Understanding JVM Architecture in Java 8
No ratings yet
Understanding JVM Architecture in Java 8
23 pages
Apache Kafka Cookbook - Sample Chapter
100% (1)
Apache Kafka Cookbook - Sample Chapter
14 pages
MongoDB Datatypes
No ratings yet
MongoDB Datatypes
14 pages
Spark Interview Questions
100% (1)
Spark Interview Questions
8 pages
ZooKeeper: Cluster Coordination Guide
No ratings yet
ZooKeeper: Cluster Coordination Guide
13 pages
Java Interview Prep Guide
No ratings yet
Java Interview Prep Guide
24 pages
Hortonworks Cluster Config Guide.1.0
No ratings yet
Hortonworks Cluster Config Guide.1.0
15 pages
Basics of Apache Kafka
100% (1)
Basics of Apache Kafka
168 pages
Asynchronous Java with Vert.x Guide
No ratings yet
Asynchronous Java with Vert.x Guide
119 pages
Ultimate Mongodb Cheatsheet
No ratings yet
Ultimate Mongodb Cheatsheet
5 pages
Kafka and Spark Streaming
No ratings yet
Kafka and Spark Streaming
45 pages
02 - Apache Spark On Amazon EMR
No ratings yet
02 - Apache Spark On Amazon EMR
31 pages
ACID Properties in DBMS.8
No ratings yet
ACID Properties in DBMS.8
4 pages
Spring JDBC or DAO
No ratings yet
Spring JDBC or DAO
121 pages
Apache Spark Quick Guide
100% (2)
Apache Spark Quick Guide
21 pages
Cloudera Certification Dump - 410-Anil
100% (3)
Cloudera Certification Dump - 410-Anil
49 pages
Cassandra Presentation Final
100% (3)
Cassandra Presentation Final
71 pages
Dzone Refcard 153 Apache Cassandra 2020
No ratings yet
Dzone Refcard 153 Apache Cassandra 2020
11 pages
Wide-Column Stores: Big Data Management Phil Bartie
No ratings yet
Wide-Column Stores: Big Data Management Phil Bartie
46 pages
Nstreme vs Nv2: Kualitas Koneksi Mikrotik
No ratings yet
Nstreme vs Nv2: Kualitas Koneksi Mikrotik
18 pages
1998 France Panini Eguide
No ratings yet
1998 France Panini Eguide
23 pages
Hotspot 2.0 Mikrotik
No ratings yet
Hotspot 2.0 Mikrotik
40 pages
H.264 Pro Recorder Manual
No ratings yet
H.264 Pro Recorder Manual
28 pages
Pinnacle Studio
No ratings yet
Pinnacle Studio
474 pages
Antena Omnidireccional ANT24 12DBi
No ratings yet
Antena Omnidireccional ANT24 12DBi
1 page
2.4 GHz Yagi Antenna 12 dBi Gain
No ratings yet
2.4 GHz Yagi Antenna 12 dBi Gain
1 page
Block Rb2011uas 2hnd
No ratings yet
Block Rb2011uas 2hnd
1 page
Nanotechnology
100% (18)
Nanotechnology
299 pages
Powerhouse Marketing Plans
100% (1)
Powerhouse Marketing Plans
369 pages
Hacking Roomba
No ratings yet
Hacking Roomba
458 pages
Ict200 - Final Test
No ratings yet
Ict200 - Final Test
8 pages
DBMS Question Bank From University 2019,2019,2018 QP'S: B (48 Marks)
No ratings yet
DBMS Question Bank From University 2019,2019,2018 QP'S: B (48 Marks)
4 pages
Essential Database Basics Guide
No ratings yet
Essential Database Basics Guide
131 pages
Accord Chemistry Cartridge
No ratings yet
Accord Chemistry Cartridge
2 pages
Database Management Systems
No ratings yet
Database Management Systems
1 page
Oracle 11G SQL 2nd Edition Test Bank
No ratings yet
Oracle 11G SQL 2nd Edition Test Bank
17 pages
Materializations MD
No ratings yet
Materializations MD
3 pages
Gym Management Mini Project
No ratings yet
Gym Management Mini Project
10 pages
DPA - UNIT4 (Normalization and Concepts of SQL)
No ratings yet
DPA - UNIT4 (Normalization and Concepts of SQL)
62 pages
345-SQL Database Fundamentals R 2015
No ratings yet
345-SQL Database Fundamentals R 2015
9 pages
Ashokit SQL T Notes-1
No ratings yet
Ashokit SQL T Notes-1
17 pages
Database Normalization
No ratings yet
Database Normalization
4 pages
DBMS Lab Manual for 5th Semester
No ratings yet
DBMS Lab Manual for 5th Semester
65 pages
Mongodb m201 Performance v1
No ratings yet
Mongodb m201 Performance v1
5 pages
Oracle Database Link Setup Guide
No ratings yet
Oracle Database Link Setup Guide
6 pages
Advance Database Chap One
No ratings yet
Advance Database Chap One
53 pages
Oracle PLSQL MCQ Questions
No ratings yet
Oracle PLSQL MCQ Questions
10 pages
MySQL Basics and SQL Commands Guide
No ratings yet
MySQL Basics and SQL Commands Guide
33 pages
98 364 Study Guide
100% (1)
98 364 Study Guide
65 pages
Navathe SQL
No ratings yet
Navathe SQL
62 pages
ER - EER To Relational Mapping Final
No ratings yet
ER - EER To Relational Mapping Final
30 pages
Class XII CS SQL Practice Questions
No ratings yet
Class XII CS SQL Practice Questions
84 pages
Prisha - Practical File
No ratings yet
Prisha - Practical File
84 pages
Mca 3 Sem Lab Manual Dbms Oracle
33% (3)
Mca 3 Sem Lab Manual Dbms Oracle
10 pages
SQL Dbms
No ratings yet
SQL Dbms
20 pages
Advanced Database Lab Questions
No ratings yet
Advanced Database Lab Questions
2 pages
SQL Basics: Queries and Commands
No ratings yet
SQL Basics: Queries and Commands
92 pages
SQL Create Database Example
No ratings yet
SQL Create Database Example
21 pages
Introductory Database Question Model 1. Brief Answer Questions: (1 Marks Each)
No ratings yet
Introductory Database Question Model 1. Brief Answer Questions: (1 Marks Each)
7 pages
Script - HR Objects and Data
No ratings yet
Script - HR Objects and Data
52 pages

Introduction to Cassandra Basics

Uploaded by

Introduction to Cassandra Basics

Uploaded by

Getting to know

Only 15 min to cover these, so

Unfortunately, no time for:

Time series data

Rackspace: Which servers

The Evolution of Cassandra

Originally for Inbox Search

2008: Open-Source Release / 2013: Enterprise & Community Editions

Other NoSQL vs.

Big Data Capable

Cloud installation using

Amazon Machine Image (AMI)

AWS EC2 Dashboard

DataStax AMI Setup

DataStax AMI Setup

Roll your Own Installation

Video: Set up a 4node Cassandra

Invoke CQLSH, CREATE KEYSPACE

Tip: Skip Thrift -- use CQL3

Thrift code on left

INSERT INTO (id, name)

cqlsh:big_data> SELECT * FROM user_tags

Structured Data, Fixed Schema

Unstructured Data, Flexible Schema

Nested Key-Value Pairs

TABLE a.k.a COLUMN FAMILY

ROW a.k.a PARTITION. Unit of replication.

COLUMN [Name, Value, Timestamp]. a.k.a CLUSTER. Unit

FOREIGN KEYS, JOINS,

Referential Integrity not enforced, so A_CID.

Twissandra - RDBMS Version

Twissandra - Cassandra Version

What does C* data look like?

Cassandra Data Model = LEGOs?

Families, SuperColumns, etc

20% - RDBMS for highly

Cassandra By Example: Data

I wish I found these 1st:

You might also like