0% found this document useful (0 votes)

304 views7 pages

Cassandra Database Guide

1. This document discusses querying data stored in a Cassandra database. 2. It provides examples of different types of queries including: simple select statements to retrieve data from tables; queries using aggregates to count and group data; complex queries involving joins between tables and denormalized data models. 3. The document also discusses challenges with querying Cassandra including the inability to perform joins natively and provides suggestions for modeling data to maximize the types of queries that are possible, such as using maps and lists instead of tuples. It also covers creating user-defined aggregate functions.

Uploaded by

white

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

304 views7 pages

Cassandra Database Guide

Uploaded by

white

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Cassandra

Advanced Topics on NoSQL databases

A4 - S8

ESILV
nicolas.travers (at) devinci.fr
Chapter 1
Chapter 1. Create a database onhttps://chewbii.com
Nicolas Travers Cassandra

Create a database on Cassandra

1.1 Files Transfer

1.1.1 Transfer the file to the container. in your command line (OS console/shell not CLI console):

docker cp PATH_TO_FOLDER/DBLP.tar.gz Cassandra:/

Of course “PATH_TO_FOLDER” is the folder where you have put the DBLP.tar.gz file. You can drag and
drop it into the shell.
“Cassandra” is the name of the container. If you did not named it like this, please change to the corresponding
name.
1.1.2 Open container’s Docker CLI, available on the Docker dashboard (docker console)

1.1.3 In the Docker CLI environment, unarchive the file in the container:

tar xzvf DBLP.tar.gz

1.1.4 Open a CQLSH console for the following steps:

cqlsh

1.2 Create Your Keyspace

CREATE KEYSPACE IF NOT EXISTS DBLP

WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor': 3 };

Here we create a ”DBLP” database for which the replication factor is set to 3, in order to manage falt tolerance.
To select the database:
USE DBLP;

Nicolas Travers Cassandra 2/7

Chapter 1. Create a database on Cassandra
1.3. Create Tables and Import Data
1.3 Create Tables and Import Data
All those instructions has to be applied in the Docker CLI environment in CQLSH (the COPY and SOURCE instruc-
tions are not recognized by TablePlus).

1.3.1 Publications
Create the “publications” table:

CREATE TABLE publications (

art_id TEXT, type TEXT, title text, pages_start INT, pages_end int, booktitle text,
journal_series text, journal_editor text, journal_volume int, journal_isbn text,
url text, year int,
PRIMARY KEY (art_id)
);
ALTER TABLE publications WITH GC_GRACE_SECONDS = 0;
CREATE INDEX btree_publi_type on publications(type);

Import data from the CSV file:

COPY publications(art_id,type,year,title,pages_start,pages_end,booktitle,journal_series,
journal_editor,journal_volume,journal_isbn,url)
FROM 'DBLP_publis.csv' WITH HEADER = true AND DELIMITER=';';

1.3.2 Authors
Create the “authors” table:

CREATE TABLE authors (

art_id TEXT, author TEXT, pos INT,
PRIMARY KEY ((author), art_id)
);
ALTER TABLE authors WITH GC_GRACE_SECONDS = 0;
CREATE INDEX btree_authors_art_id on authors(art_id);
CREATE INDEX btree_authors_pos on authors(pos);

Import data from the CSV file:

COPY authors(art_id,author,pos) FROM 'authors.csv' WITH HEADER = true AND DELIMITER=';';

1.3.3 Authors publications

Create the “authors_publis” table:

CREATE TABLE authors_publis (

art_id TEXT, author TEXT, type TEXT, title text, pages_start INT, pages_end int,
booktitle text, journal_series text, journal_editor text, journal_volume int,
journal_isbn text, url text, year int, pos int,
PRIMARY KEY ((author), art_id)
);
ALTER TABLE authors_publis WITH GC_GRACE_SECONDS = 0;

CREATE INDEX btree_authors_publi_type on authors_publis(type);

CREATE INDEX btree_authors_publi_title on authors_publis(title);

Nicolas Travers Cassandra 3/7

Chapter 1. Create a database on Cassandra
1.3. Create Tables and Import Data
Import data from the CSV file:

COPY authors_publis(art_id,author,type,year,title,pages_start,pages_end,booktitle,
journal_series,journal_editor,journal_volume,journal_isbn,url,pos)
FROM 'authors_publis.csv' WITH HEADER = true AND DELIMITER=';';

Nicolas Travers Cassandra 4/7

Chapter 2 Cassandra
Chapter 2. Querying Nicolas Travers https://chewbii.com

Querying Cassandra
New we have a database, we can query it. However, since we use a query language very close to SQL, we will try
to use it in this way. This practice work will be a little bit disturbing. The goal is to show how to use Cassandra in
the proper way, not in the way to use traditional databases.
The queries can be either executed in “CQLSH ” (in the Docker CLI environment) ‘TablePlus”.

2.1 CQL : Simple Queries

The Cassandra query Language CQL is available here: https://cassandra.apache.org/doc/latest/cql/dml.
html#select. It is inspired from SQL. In the following, express the following queries in CQL:

2.1.1 List of pubications,

2.1.2 List of publications titles,

2.1.3 Booktitle of publications id “series/sci/2008-156”,

2.1.4 Number of “Book” publications,

2.1.5 Number of publications WHERE booktitle is equal to “HICSS”,

2.1.6 Use “ALLOW FILTERING” to execute the query,

2.1.7 The good approach is to create a seconday index on the “booktitle” attribute to be more eﬀicient.
Execute the query again without “ALLOW FILTERING”,

2.1.8 Number of publications where type is “Article” and booktitle is equal to “HICSS”,

2.1.9 Number of authors whose position is equal to 3,

2.1.10 Number of authors whose position is above to 3,

Bonus: The Token Hash Function Give the number of publications for which “token(art_id)” is below 0 (and
above). You can also give the token and arti_id of each publication.

2.2 Complex Queries: Aggregates

Some grouping queries are available on the partitioning key (the complex part of the modelization).

2.2.1 Count the number of publications per author,

2.2.2 Count the number of publications per author when they are in third position,

2.2.3 Try to count the number of authors per position.

Hint: Choose an other partitioning key.

2.2.4 Distribution of positions for author “Oscar Castillo”,

2.3 Complex Queries: Joins & Denormalization

2.3.1 Give authors’ name for publication which title is “Medical imaging archiving: A comparison between several
NoSQL solutions.”. Join between tables publications and authors,

Nicolas Travers Cassandra 5/7

Chapter 2. Querying Cassandra
2.4. Hard Queries: User Define Aggregate functions
2.3.2 There is no way to do a join un CQL1 . A first denormalization step has been done on this dataset with table
“authors_publis”. Try the previous query on this denormalized table.

2.3.3 Give titles and position of publications from “Oscar Castillo”,

2.3.4 Give authors’ name who published with “Oscar Castillo”,

2.3.5 To answer this query, it requires a new denormalization with SET, MAP, LIST, TYPES or TUPLE. Create a
table “publicationsNorm’’ which can insert documents in file “DBLP.json”. An example is given below:

INSERT INTO publicationsNorm JSON

'{"id":"series/cogtech/BrandhermSNL13", "type":"Article", "year":2013,
"title":"A SemProM Use Case: Health Care and Compliance.",
"authors":["Boris Brandherm","Michael Schmitz","Robert Ne?elrath","Frank Lehmann"],
"pages":{"start":349, "end":361}, "booktitle":"SemProM",
"journal":{"series":"", "editor":"", "volume":0, "isbn":[ "" ]},
"url":"db/series/cogtech/364237376.html#BrandhermSNL13", "cites":[ "" ]}';

2.3.6 Once this sample can be inserted, import the whole dataset with this command (in the Docker CLI environ-
ment):

SOURCE '/DBLP.json';

2.3.7 Create an index on attribute ’title’ of this new table,

2.3.8 Give authors’ name for publication “Data Quality” in this new table,

2.3.9 Give the journal’s series of this publication,

2.3.10 Give the pages’ end of this publication,

2.3.11 Give the first author of this publication,

2.3.12 Give title’s publications where authors’ name is “Oscar Castillo”,

2.3.13 Give titles and the starting page of publications which ends at page 99 while using an index,

2.3.14 Give titles of journal series : “Advances in Database Systems”

Comments on Cassandra Cassandra is used to manipulate mostly simple tables or very few denormalization. For
instance, a sensor management system is really interesting with Partitioning Keys like in question 2.3.2.
To model and manipulate tuples, it is preferable to avoid TYPE and TUPLE. Use MAP or LIST to maximize
the queries you wish to apply.

2.4 Hard Queries: User Define Aggregate functions

To agregate values from different rows, we need to create User Define Aggregate functions (UDA). To achieve this,
we need first to activate this functionnality with parameter “enable_user_defined_functions” in Cassandra.
For this:

• Edit2 file “cassandra.yaml” (config Cassandra folder),

• Find “user_define”,

• Modify the parameter with ’true’ value (a space is mandatory between ’:’ and ’true’),
1 Since it is a distributed database, a wide and costly broadcast is necessary and has to be avoided absolutely.
2 To edit in the Docker CLI, use vim or nano (apt-get install if necessary)

Nicolas Travers Cassandra 6/7

Chapter 2. Querying Cassandra
2.4. Hard Queries: User Define Aggregate functions
• Save the config file,
• restart the Cassandra server.

Then, we can create the UDA:

2.4.1 Create the state function:

CREATE OR REPLACE FUNCTION avgState ( state tuple<int,bigint>, val int )

CALLED ON NULL INPUT RETURNS tuple<int,bigint> LANGUAGE java
AS 'if (val !=null) { state.setInt(0, state.getInt(0)+1);
state.setLong(1, state.getLong(1)+val.intValue()); }
return state;';

2.4.2 Create final function:

CREATE OR REPLACE FUNCTION avgFinal ( state tuple<int,bigint> )

CALLED ON NULL INPUT RETURNS double LANGUAGE java
AS 'double r = 0;
if (state.getInt(0) == 0) return null;
r = state.getLong(1);
r/= state.getInt(0);
return Double.valueOf(r);';

2.4.3 Create the UDA function

CREATE AGGREGATE IF NOT EXISTS average ( int )

SFUNC avgState STYPE tuple<int,bigint>
FINALFUNC avgFinal INITCOND (0,0);

2.4.4 Compute the average position of “Oscar Castillo” in his publications,

2.4.5 Idem with the average number of pages,

2.4.6 Create a new UDA to produce an equivalence to ”GROUP BY + COUNT” on textual attributes, like for:

SELECT countGroup(pos) FROM authors_publis;

The type of the “state” parameter must be a “MAP<int, int>”.

Nicolas Travers Cassandra 7/7

8.2. CQL Exercises
100% (1)
8.2. CQL Exercises
16 pages
No SQL
No ratings yet
No SQL
32 pages
All MCQ
No ratings yet
All MCQ
16 pages
L3: Finding Similar Items: Locality Sensitive Hashing
No ratings yet
L3: Finding Similar Items: Locality Sensitive Hashing
54 pages
DBSCAN Clustering Explained in ML
No ratings yet
DBSCAN Clustering Explained in ML
15 pages
OLAP
100% (1)
OLAP
107 pages
Unit I Introduction To Data Science
No ratings yet
Unit I Introduction To Data Science
79 pages
70+ SQL Interview Questions
No ratings yet
70+ SQL Interview Questions
19 pages
DL Lab Manual
No ratings yet
DL Lab Manual
29 pages
358 33 Powerpoint Slides DSC Chapter 15
No ratings yet
358 33 Powerpoint Slides DSC Chapter 15
55 pages
Data Warehousing & Mining Overview
75% (4)
Data Warehousing & Mining Overview
14 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
14 pages
Data Mining Concepts and Techniques Guide
No ratings yet
Data Mining Concepts and Techniques Guide
44 pages
Supervised Learning Essentials
No ratings yet
Supervised Learning Essentials
30 pages
Question For Practice Oracle (11g) With Answers
100% (4)
Question For Practice Oracle (11g) With Answers
32 pages
DATA Replication
No ratings yet
DATA Replication
4 pages
Unit 2
No ratings yet
Unit 2
11 pages
Data Warehousing & BI Essentials
No ratings yet
Data Warehousing & BI Essentials
41 pages
Hadoop Overview and Insights
No ratings yet
Hadoop Overview and Insights
46 pages
Data Mining and Warehousing Overview
No ratings yet
Data Mining and Warehousing Overview
1 page
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
13 pages
Daa Sem4
No ratings yet
Daa Sem4
4 pages
CNNs for Image Pattern Recognition
100% (1)
CNNs for Image Pattern Recognition
3 pages
Neo4j:Cypher Query Language (Part-II)
No ratings yet
Neo4j:Cypher Query Language (Part-II)
37 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
DPCO Unit 1 - New
No ratings yet
DPCO Unit 1 - New
78 pages
Computer Arithmetic Operations
No ratings yet
Computer Arithmetic Operations
5 pages
Full Stack UNIT 3
No ratings yet
Full Stack UNIT 3
36 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Feature Engineering
No ratings yet
Feature Engineering
6 pages
10
No ratings yet
10
4 pages
Modeling With UML: Solutions
No ratings yet
Modeling With UML: Solutions
6 pages
Neo4j Cypher Refcard 4
No ratings yet
Neo4j Cypher Refcard 4
21 pages
Java 8 Features: Lambdas & Streams Guide
No ratings yet
Java 8 Features: Lambdas & Streams Guide
35 pages
Classification Exam Prep
No ratings yet
Classification Exam Prep
9 pages
DMDW Full Notes
No ratings yet
DMDW Full Notes
26 pages
Spark-GraphX and Neo4j
No ratings yet
Spark-GraphX and Neo4j
32 pages
Efficient Sequential Pattern Mining
No ratings yet
Efficient Sequential Pattern Mining
7 pages
Bda - 2 Unit
No ratings yet
Bda - 2 Unit
12 pages
Ocs353 DSF Unit III Notes
No ratings yet
Ocs353 DSF Unit III Notes
11 pages
Understanding Hash Tables and Algorithms
100% (1)
Understanding Hash Tables and Algorithms
30 pages
Airline Reservation System Overview
No ratings yet
Airline Reservation System Overview
33 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
50 pages
Hdfs Part 1
No ratings yet
Hdfs Part 1
72 pages
MongoDB-Assignment 3
No ratings yet
MongoDB-Assignment 3
4 pages
DCCN Notes
No ratings yet
DCCN Notes
27 pages
Introduction To ETL in Python: Stefano Francavilla
No ratings yet
Introduction To ETL in Python: Stefano Francavilla
62 pages
Staruml Documentation: Release 2.0.0
No ratings yet
Staruml Documentation: Release 2.0.0
61 pages
Unit 6 Database Progrmming With Nodejs and MongoDB
No ratings yet
Unit 6 Database Progrmming With Nodejs and MongoDB
47 pages
PHP Full Notes
No ratings yet
PHP Full Notes
49 pages
Chapter 3 Exercises
50% (2)
Chapter 3 Exercises
3 pages
Blockchain and Cryptocurrency Course Overview
No ratings yet
Blockchain and Cryptocurrency Course Overview
59 pages
Chapter 3 Class Diagram Solutions
No ratings yet
Chapter 3 Class Diagram Solutions
14 pages
Sales Forecasting with Kernel SVM
No ratings yet
Sales Forecasting with Kernel SVM
6 pages
Classification Algorithms Overview
No ratings yet
Classification Algorithms Overview
206 pages
Top 60 Hadoop Interview Q&A Guide
No ratings yet
Top 60 Hadoop Interview Q&A Guide
3 pages
Chapter 8: Complex Data Solutions
No ratings yet
Chapter 8: Complex Data Solutions
4 pages
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
100% (1)
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
135 pages
Understanding Hadoop and MapReduce
No ratings yet
Understanding Hadoop and MapReduce
21 pages
Cassandra TP
No ratings yet
Cassandra TP
10 pages
Storage Devices Overview
No ratings yet
Storage Devices Overview
40 pages
Secure Design & Defense Models Guide
No ratings yet
Secure Design & Defense Models Guide
9 pages
PGP-AIML Career Transition Handbook
No ratings yet
PGP-AIML Career Transition Handbook
23 pages
Buster Hernandez Aka Brian Kil
No ratings yet
Buster Hernandez Aka Brian Kil
38 pages
Abap On Hana Cum S/4 Hana Training
No ratings yet
Abap On Hana Cum S/4 Hana Training
7 pages
Lesson 4 Lec - Data Security Awareness
No ratings yet
Lesson 4 Lec - Data Security Awareness
20 pages
Module One
No ratings yet
Module One
37 pages
SEM 6 Syllabus
No ratings yet
SEM 6 Syllabus
30 pages
Network Security Lecture Notes
No ratings yet
Network Security Lecture Notes
26 pages
Programming Variables Explained
No ratings yet
Programming Variables Explained
39 pages
Oracle WebCenter Interview Questions and Answers
No ratings yet
Oracle WebCenter Interview Questions and Answers
10 pages
EPOI ONU Broadband and VoIP Configuration
No ratings yet
EPOI ONU Broadband and VoIP Configuration
14 pages
Router Set-Up Guide - Technicolor TG588 in Bridge Modem Mode
No ratings yet
Router Set-Up Guide - Technicolor TG588 in Bridge Modem Mode
6 pages
LO1 Basic Network Overview
No ratings yet
LO1 Basic Network Overview
51 pages
GL300A @track Air Interface Protocol R200
No ratings yet
GL300A @track Air Interface Protocol R200
95 pages
TimeSheets Threat Report
No ratings yet
TimeSheets Threat Report
30 pages
Botnet Analysis Report Final Unclassified v2.0
No ratings yet
Botnet Analysis Report Final Unclassified v2.0
195 pages
VMDR Lab Guide for IT Professionals
No ratings yet
VMDR Lab Guide for IT Professionals
59 pages
Cryptography and Network Security Notes
No ratings yet
Cryptography and Network Security Notes
40 pages
Vikas Kumar: Work Experience Skills
No ratings yet
Vikas Kumar: Work Experience Skills
1 page
Internet Surfing and Security
No ratings yet
Internet Surfing and Security
15 pages
Parallel Computing Using MPI
No ratings yet
Parallel Computing Using MPI
17 pages
Report of Sony Picture Attack
No ratings yet
Report of Sony Picture Attack
5 pages
IT Administrator Responsibilities SOP
No ratings yet
IT Administrator Responsibilities SOP
4 pages
Checkpoint Firewall Training
100% (1)
Checkpoint Firewall Training
48 pages
Rising Antivirus: User Guide & Review
No ratings yet
Rising Antivirus: User Guide & Review
10 pages
Enhancing Io TSecurityand Privacyinthe Cloud
No ratings yet
Enhancing Io TSecurityand Privacyinthe Cloud
10 pages
Privacy and Security in The Digital Age Seminar
No ratings yet
Privacy and Security in The Digital Age Seminar
22 pages
OCA Customizing OCPP Implementations
No ratings yet
OCA Customizing OCPP Implementations
23 pages
Cloud Computing for IT Professionals
No ratings yet
Cloud Computing for IT Professionals
33 pages

Cassandra Database Guide

Uploaded by

Cassandra Database Guide

Uploaded by

Cassandra

Advanced Topics on NoSQL databases

Create a database on Cassandra

1.1 Files Transfer

docker cp PATH_TO_FOLDER/DBLP.tar.gz Cassandra:/

tar xzvf DBLP.tar.gz

1.1.4 Open a CQLSH console for the following steps:

1.2 Create Your Keyspace

CREATE KEYSPACE IF NOT EXISTS DBLP

Nicolas Travers Cassandra 2/7

CREATE TABLE publications (

Import data from the CSV file:

CREATE TABLE authors (

Import data from the CSV file:

COPY authors(art_id,author,pos) FROM 'authors.csv' WITH HEADER = true AND DELIMITER=';';

1.3.3 Authors publications

CREATE TABLE authors_publis (

CREATE INDEX btree_authors_publi_type on authors_publis(type);

Nicolas Travers Cassandra 3/7

Nicolas Travers Cassandra 4/7

2.1 CQL : Simple Queries

2.1.1 List of pubications,

2.1.2 List of publications titles,

2.1.3 Booktitle of publications id “series/sci/2008-156”,

2.1.4 Number of “Book” publications,

2.1.5 Number of publications WHERE booktitle is equal to “HICSS”,

2.1.6 Use “ALLOW FILTERING” to execute the query,

2.1.9 Number of authors whose position is equal to 3,

2.1.10 Number of authors whose position is above to 3,

2.2 Complex Queries: Aggregates

2.2.1 Count the number of publications per author,

2.2.3 Try to count the number of authors per position.

2.2.4 Distribution of positions for author “Oscar Castillo”,

2.3 Complex Queries: Joins & Denormalization

Nicolas Travers Cassandra 5/7

2.3.3 Give titles and position of publications from “Oscar Castillo”,

2.3.4 Give authors’ name who published with “Oscar Castillo”,

INSERT INTO publicationsNorm JSON

2.3.7 Create an index on attribute ’title’ of this new table,

2.3.9 Give the journal’s series of this publication,

2.3.10 Give the pages’ end of this publication,

2.3.11 Give the first author of this publication,

2.3.12 Give title’s publications where authors’ name is “Oscar Castillo”,

2.3.14 Give titles of journal series : “Advances in Database Systems”

2.4 Hard Queries: User Define Aggregate functions

• Edit2 file “cassandra.yaml” (config Cassandra folder),

Nicolas Travers Cassandra 6/7

Then, we can create the UDA:

2.4.1 Create the state function:

CREATE OR REPLACE FUNCTION avgState ( state tuple<int,bigint>, val int )

2.4.2 Create final function:

CREATE OR REPLACE FUNCTION avgFinal ( state tuple<int,bigint> )

2.4.3 Create the UDA function

CREATE AGGREGATE IF NOT EXISTS average ( int )

2.4.4 Compute the average position of “Oscar Castillo” in his publications,

SELECT countGroup(pos) FROM authors_publis;

The type of the “state” parameter must be a “MAP<int, int>”.

Nicolas Travers Cassandra 7/7

You might also like