0% found this document useful (0 votes)
21 views159 pages

NoSQL Apache Cassandra

Uploaded by

Nihar Rakholiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views159 pages

NoSQL Apache Cassandra

Uploaded by

Nihar Rakholiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 159

NoSQL Column Family Database:

Prepared By: Prof. Preeti Sharma


Assistant Professor, CE, Faculty of Technology, Dharmsinh Desai University, Nadiad.
Ph.D. Pursuing CE/IT, LDCE, GTU, Ahmedabad
M.E. IT, LDCE, Ahmedabad
B.E. CE, VGEC, Chandkheda, Ahmedabad
D.E. CE, GPG, Ahmedabad
Column Family NoSQL Database:

● A column family database, such as Apache Cassandra or HBase, is a type of NoSQL


database that stores data in a column-oriented format rather than the traditional
row-oriented approach of relational databases.
● Columns are grouped into column families. Each column has a name and a value, and
columns can be added or removed from rows dynamically. Each row can have different
columns, there is no strict schema.
● Super Columns: Some column family databases (like early versions of Cassandra)
support super columns, which are essentially columns of columns. They provide a
hierarchical structure within a row.
● In Apache Cassandra, a super column is a data structure that helps organize data more hierarchically
within a column family.
● It is a type of column that contains sub-columns (also called a map of columns), which allows for more
complex data storage and retrieval.
● In early versions of Cassandra, super columns were used to model nested structures.

● But they were later deprecated because they were inefficient and hard to manage.

● Modern Cassandra uses composite columns or collections (maps, sets, lists) to represent the same
hierarchical data in a cleaner way.
Introduction to Apache Cassandra:

What is Apache Cassandra?

● Apache Cassandra is an open-source, distributed NoSQL database designed for handling large amounts of data
across many commodity servers without a single point of failure.
● Highly Scalable: Easily add more nodes without downtime.
● Fault Tolerant: Built to handle hardware failures with no downtime.
● Write-Optimized: Optimized for heavy write operations.
● Flexible Schema: Schema-less data model allows for flexible data representation.
● Eventual Consistency: Tunable consistency levels for applications that need high availability and fault
tolerance.

In essence, Cassandra is a great choice for applications that need to store vast amounts of data and ensure it’s always
available, even in the face of hardware failures, but can tolerate eventual consistency rather than strict consistency.
History of Cassandra:
History of Apache Cassandra:
● Cassandra was initially designed at Facebook.
● The design of Cassandra was influenced by Amazon’s Dynamo, a highly available and distributed key-value
store that was used for Amazon’s retail operations. Dynamo provided concepts like eventual consistency,
replication, and partitioning, which Cassandra adopted and expanded on. While Dynamo was a key influence,
Cassandra also integrated ideas from other systems, including Google’s Bigtable and Amazon's S3 for
large-scale storage.

Apache Cassandra was designed to meet these challenges with the following design objectives in mind:
● Full multi-primary database replication
● Global availability at low latency
● Scaling out on commodity hardware
● Linear throughput increase with each additional processor
● Online load balancing and cluster growth
● Partitioned key-oriented queries
● Flexible schema
Features of Apache Cassandra:
Features of Apache Cassandra:
Write Optimized:

● Cassandra is optimized for write-heavy workloads. It handles massive write operations efficiently by
using memtables (in-memory structures) and SSTables (on-disk files) to store data in an append-only
fashion.
● Example:
○ Yahoo uses Cassandra to store time-series data related to user behavior and ad analytics. Their
system receives millions of events per second, and Cassandra is ideal for this because of its ability
to efficiently handle high write throughput, ensuring real-time analytics can occur.

Tunable Consistency:

● The Tunable Consistency feature in Apache Cassandra allows you to balance the trade-off between
consistency and availability depending on your application’s requirements. This flexibility is key to
Cassandra's ability to offer high availability and fault tolerance, as it enables users to control how strictly
data consistency is enforced in a distributed system.
Cassandra Query Language (CQL):

● Cassandra Query Language (CQL) Feature: Cassandra provides a SQL-like query language called CQL
(Cassandra Query Language), which is familiar to those who have worked with relational databases but
tailored for Cassandra's NoSQL model.
● Example:
○ Spotify relies on Cassandra to handle real-time user activity data, which includes listening history,
preferences, and user-generated playlists. Since the system needs to support millions of concurrent
users, performance and scalability are essential.

Elastic Scalability:

● Elastic scalability is a core feature of Apache Cassandra that allows the database to scale dynamically
based on workload requirements, without requiring downtime or major changes to the system. This means
that you can add or remove nodes to the cluster as needed, and Cassandra will automatically rebalance
data to accommodate the new configuration.
Why to use Apache Cassandra?

The following are some of the important reasons to use Apache Cassandra.

● Apache Cassandra can scale from Gigabyte to Petabyte.


● Apache Cassandra can achieve linear performance by adding nodes in the cluster.
● The replication and the data distribution take place easily.
● Cassandra can be easily deployed on commodity hardware and the cloud system.
● In Cassandra architecture, there is no separate caching layer involved because its peer-to-peer
architecture removes the need for caching layer.
● Cassandra provides the flexible schema structure for structure, semistructured, and unstructured data.
● Cassandra compresses the data by using Google's Snappy Data Compression Algorithm.
Cassandra Data
Model:
Components of Apache Cassandra:
Components of Apache Cassandra:
Cluster
● The Cassandra Cluster is the combination of individual nodes which are connected in the ring format and
operate together. Each node in the cluster contains the replica and in case of failure, the replica carries a
charge. Cassandra handles the ring cluster architecture and assigns them the task.

Keyspace
● Keyyspace is a fundamental concept that represents a logical container for data. Keyspaces help in logically
organizing data into distinct namespaces. For example, a company might use different keyspaces for different
departments, such as sales_keyspace and marketing_keyspace, to keep their data isolated and manage it
separately.

Column Family
● Cassandra Column Family is the collection of sorted rows which are used to store related data.

Column
● Cassandra Column is the basic data structure used to store various data such as key, value, timestamp, big
integer, text, Boolean, double, float, and so on.
Architecture of Cassandra:
The following is the list of the important component of Apache Cassandra architecture

● Cluster
● Data cluster
● Node
● Commit log
● Meme-table
● SSTable
● Bloom filter
Components of Cassandra Architecture Node
Node is the place where data is stored. It is the basic
There are following components in the Cassandra Architecture: component of Cassandra.
Data Center
A collection of nodes are called data center. Many
nodes are categorized as a data center.
Cluster
The cluster is the collection of many data centers.
Commit Log
Every write operation is written to Commit Log.
Commit log is used for crash recovery.
Mem-table
After data written in Commit log, data is written in
Mem-table. Data is written in Mem-table
temporarily.
SSTable
When Mem-table reaches a certain threshold, data is
flushed to an SSTable disk file.
● Cassandra is designed such that it has no master
or slave nodes.
● It has a ring-type architecture, that is, its nodes
are logically distributed like a ring.
● Data is automatically distributed across all the
nodes.
● Similar to HDFS, data is replicated across the
nodes for redundancy.
● Data is kept in memory and lazily written to the
disk.
● Hash values of the keys are used to distribute the
data among nodes in the cluster.
● In a ring architecture, each node is assigned a
token value.
Additional features of Cassandra
architecture are:

● Cassandra architecture supports


multiple data centers.
● Data can be replicated across
data centers.

You can keep three copies of data in


one data center and the fourth copy in a
remote data center for remote backup.
Data reads prefer a local data center to a
remote data center.
Rack:
The term ‘rack’ is usually used when explaining network
topology.
A rack is a group of machines housed in the same physical
box.
Each machine in the rack has its own CPU, memory, and hard
disk. However, the rack has no CPU, memory, or hard disk of
its own.

Features of racks are:


● All machines in the rack are connected to the
network switch of the rack
● The rack’s network switch is connected to the
cluster.
● All machines on the rack have a common power
supply. It is important to notice that a rack can fail
due to two reasons: a network switch failure or a
power supply failure.
● If a rack fails, none of the machines on the rack can
be accessed. So it would seem as though all the
nodes on the rack are down.
Memtable:
● A memtable is an in-memory data structure where Cassandra initially writes new data.
● It acts as a write buffer before the data is eventually flushed to disk.
● By writing to memory, Cassandra can quickly handle write requests without immediately performing disk I/O,
which is much slower.
● It reduces the number of disk writes by batching multiple updates into a single flush operation.
● When a write request arrives, the data is first written to the memtable.
● Memtables are typically implemented as sorted trees (example Red-Black Trees), allowing efficient
in-memory queries and updates.
● When the memtable reaches a certain size or age, it is flushed to disk to create SSTable files
SSTable (Sorted String Table)

● SSTable is a persistent, immutable data file format used to store data on disk. Once data is written to an
SSTable, it is not modified, ensuring efficient and reliable storage.
● SSTables store the data that has been flushed from the memtable, ensuring that data is not lost even if the
system restarts.
● During the flush operation, data from the memtable is sorted and written to an SSTable. This process involves
creating multiple files:
● Data File: Contains the actual data rows.
● Index File: Allows quick lookups by storing indexes that map keys to their positions in the data file.
● Bloom Filter: An auxiliary data structure used to quickly check if a key is present in the SSTable, reducing
unnecessary disk reads.
● Compaction: Over time, as more SSTables are created, they are periodically compacted. Compaction merges
multiple SSTables into a smaller number of larger SSTables, improving read performance and reclaiming disk
space.
Working of Bloom Filter:
● Bloom filter is stored as part of each SSTable (Sorted String Table) on disk.
● Each SSTable has its own Bloom filter. When an SSTable is created, a Bloom Filter is also created for that
SSTable.
● Bloom filter is used to quickly determine whether a data item is likely to be in a specific SSTable or not.
● It can return two types of results:
○ Definitely not present (no false negatives)
○ Possibly present (with a certain probability of false positives)
How Bloom Filters are Used in Cassandra?

● Each element is hashed by multiple hash functions to generate several bit positions in the bit array.

Why multiple Hash functions are used for the same record?
Bloom filters are a probabilistic data structure that allows Cassandra to determine one of two possible states: - The
data definitely does not exist in the given file, or - The data probably exists in the given file.

The primary goal of using multiple hash functions is to reduce the false positive rate, which is the probability that
the Bloom filter incorrectly indicates that an element is present when it is not.

● Single Hash Function: If only one hash function is used, each element is mapped to a single bit in the bit
array. This can lead to high false positive rates, especially if the bit array is relatively small compared to the
number of elements.
● Multiple Hash Functions: By using multiple hash functions, each element is mapped to several bits in the bit
array. This spreads the bits more evenly across the array, reducing the likelihood of collisions and thus the
false positive rate.
● For a given element (like key "A"), each hash function produces a different bit position in the bit array. These
positions are used to set the corresponding bits to 1.
Insertion into the Bloom Filter

Suppose we want to insert customer IDs into a Bloom Filter.

Let’s use a Bloom Filter with:

● A bit array of size 10 bits (for simplicity).


● 3 hash functions.

Let’s insert the customer IDs "123", "456", and "789" into the Bloom Filter.
Hashing and Setting Bits:

1. Insert ID "123": Based on three different Hash functions, the key is hashed.
○ H1("123") -> Index 2
○ H2("123") -> Index 5
○ H3("123") -> Index 8
○ Set bits at indices 2, 5, and 8 to 1 in the Bloom Filter.
2. Insert ID "456":
○ H1("456") -> Index 1
○ H2("456") -> Index 4
○ H3("456") -> Index 7
○ Set bits at indices 1, 4, and 7 to 1 in the Bloom Filter.
3. Insert ID "789":
○ H1("789") -> Index 0
○ H2("789") -> Index 3
○ H3("789") -> Index 6
○ Set bits at indices 0, 3, and 6 to 1 in the Bloom Filter.

Assuming the Bloom Filter bit array looks like this after inserting all three IDs: [1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
Querying the Bloom Filter

Let’s check if a new customer ID, say "1234", is in the Bloom Filter.

Hashing "1234":

1. H1("1234") -> Index 2


2. H2("1234") -> Index 5
3. H3("1234") -> Index 8

We check the bits at indices 2, 5, and 8 in the Bloom Filter:

● Bit at Index 2: 1
● Bit at Index 5: 1
● Bit at Index 8: 1

Since all these bits are set to 1, the Bloom Filter indicates that "1234" might be in the set. However, it’s possible
that "1234" is not actually in the set, so a disk read might still be necessary to confirm its presence.
False Positive Example

To illustrate a false positive, suppose we check for an ID "0000" that was never added. The hash functions might
produce:

1. H1("0000") -> Index 2


2. H2("0000") -> Index 5
3. H3("0000") -> Index 8

The Bloom Filter indicates indices 2, 5, and 8 are all set to 1, so it will incorrectly tell us that "0000" might be in
the set, even though it was not. This is a false positive.
Exercise:
Assume we have a Bloom filter with a bit array of size 10
(i.e., 10 bits, all initialized to 0). It uses following two hash
functions.

● Hash Function 1: h1(x) = (sum of ASCII values of


characters in x) % 10
● Hash Function 2: h2(x) = (product of ASCII values
of characters in x) % 10

Words to Insert: "cat", "dog", and "fish" into the Bloom


filter.

Query the Bloom filter for the word "rat" and “cat”.
Solution:
Insertion Process

1. Insert "cat"

● Hash Function 1:
○ ASCII values: c = 99, a = 97, t = 116
○ Sum: 99 + 97 + 116 = 312
○ h1("cat") = 312 % 10 = 2
● Set bit at index 2 to 1.
● Hash Function 2:
○ Product: 99 * 97 * 116 = 1127884
○ h2("cat") = 1127884 % 10 = 4
● Set bit at index 4 to 1.

Bit Array: [0, 0, 1, 0, 1, 0, 0, 0, 0, 0]


Insert "dog" Insert "fish"

● Hash Function 1: ● Hash Function 1:


○ ASCII values: d = 100, o = 111, g ○ ASCII values: f = 102, i = 105, s =
= 103 115, h = 104
○ Sum: 100 + 111 + 103 = 314 ○ Sum: 102 + 105 + 115 + 104 = 426
○ h1("dog") = 314 % 10 = 4 ○ h1("fish") = 426 % 10 = 6
● Set bit at index 4 to 1 (already set). ● Set bit at index 6 to 1.
● Hash Function 2: ● Hash Function 2:
○ Product: 100 * 111 * 103 = ○ Product: 102 * 105 * 115 * 104 =
1144300 12875040
○ h2("dog") = 1144300 % 10 = 0 ○ h2("fish") = 12875040 % 10 = 0
● Set bit at index 0 to 1. ● Set bit at index 0 to 1 (already set).
Updated bit array: [1, 0, 1, 0, 1, 0, 0, 0, Updated bit array: [1, 0, 1, 0, 1, 0, 1, 0, 0,
0, 0] 0]

Query Example for False Positive

Suppose we query the Bloom filter for the word "rat":

● Hash Function 1:
○ Sum of ASCII values: 114 + 97 + 116 = 327
○ h1("rat") = 327 % 10 = 7
○ Check bit at index 7: It is 0.
● Hash Function 2:
○ Product of ASCII values: 114 * 97 * 116 = 1297732
○ h2("rat") = 1297732 % 10 = 2
○ Check bit at index 2: It is 1.
● Result: Since one of the bits (index 7) is 0, the Bloom filter correctly identifies that "rat" is definitely
not in the filter (true negative).
Easy Data
Distribution
Transparent Distribution of Data:
● Cassandra ensures that data is evenly distributed across all nodes without the user worrying about node
locations. This is achieved through the following mechanisms:.
● It uses following concept to implement Transparent distribution:
○ Partitioning
○ Partition Key
○ Hashing
○ Token ring and Virtual Nodes (VNodes)
○ Replication
Data Partitioning and Hashing:
Partitioning:

Data is divided into partitions based on a partition key. Each partition is placed on nodes in the cluster.

Partition Key

The partition key decides where the data lives in the cluster. Rows with the same partition key go to the same node.

Hashing

● Cassandra applies a hash function (Murmur3 by default) on the partition key to convert it into a token.
This token determines the position of the data in the token ring.
Token Ring & Virtual Nodes (VNodes)

● The cluster is visualized as a ring of token ranges.


● Each node owns multiple token ranges (via VNodes), ensuring even distribution.
● Example: If the ring has token values from -2^63 to +2^63, each node owns a share of this range.

5. Replication

To ensure high availability, data is replicated across multiple nodes. The number of copies is controlled by the
Replication Factor (RF).
Example: With RF = 3, each piece of data is stored on 3 different nodes.
Example: Social Media Application

Imagine you’re building a social media application where users can post status updates. You need to design a Cassandra table to store
these status updates efficiently.

You create a table called status_updates to store the status updates for each user:

CREATE TABLE status_updates (


user_id UUID,
post_id UUID,
content TEXT,
created_at TIMESTAMP,
PRIMARY KEY (user_id, post_id)
);

How the Partition Key Works

1. Data Distribution: When you insert a new status update, Cassandra uses the user_id to determine where to store the data. The
user_id is hashed to produce a token, and Cassandra uses this token to map the data to a specific node in the cluster.
2. Grouping Data: All status updates for a given user_id will be stored on the same node. This means that when you query status
updates for a specific user, Cassandra can efficiently fetch all posts from the node where they are stored.
Virtual Nodes:
● Virtual nodes in a Cassandra cluster are also called vnodes.
● Vnodes can be defined for each physical node in the cluster.
● Each node in the ring can hold multiple virtual nodes.
● By default, each node has 256 virtual nodes.
● Virtual nodes help achieve finer granularity in the partitioning of
data, and data gets partitioned into each virtual node using the
hash value of the key.
● On adding a new node to the cluster, the virtual nodes on it get
equal portions of the existing data. So there is no need to
separately balance the data by running a balancer.
● The image depicts a cluster with four physical node. Each
physical node in the cluster has four virtual nodes. So there are
16 vnodes in the cluster.
Why do we need Virtual Nodes?
Without VNodes (Single Large Range)

● Each node owns one big continuous token range.


● Suppose Node 3 has token range 1000–2000.
● If Node 3 fails, all the data for that range must be rebuilt.

By Cassandra’s replication rules, usually only one neighbor (say Node 4) has the replica for that entire range.
So Node 4 must do all the repair work → slow + heavy load.
With VNodes (Many Small Ranges per Node)

● Each node owns many small token ranges (e.g., 256 random ranges across the ring).
● Example: Node 3 might own ranges like 100–200, 800–900, 1500–1600, etc.
● These small ranges overlap with different neighbors, not just one.

So when Node 3 fails:

● Range 100–200 might be replicated on Node 1 → Node 1 helps recover it.


● Range 800–900 might be replicated on Node 5 → Node 5 helps recover it.
● Range 1500–1600 might be replicated on Node 7 → Node 7 helps recover it.

Instead of one node doing 100% of the repair, the workload is spread across multiple neighbors.
Token Generator:
● The token generator is used in Cassandra versions earlier than version 1.2 to assign a token to each node in the cluster.
In these versions, there was no concept of virtual nodes and only physical nodes were considered for distribution of
data.
● The token generator tool is used to generate a token for each node in the cluster based on the data centers and number
of nodes in each data center.
● A token in Cassandra is a 127-bit integer assigned to a node. Data partitioning is done based on the token of the nodes
as described earlier in this lesson.
● Starting from version 1.2 of Cassandra, vnodes are also assigned tokens and this assignment is done automatically so
that the use of the token generator tool is not required.
Example of Token Generator:

A token generator is an interactive tool which generates tokens for the topology specified. Let us now look at an
example in which the token generator is run for a cluster with 2 data centers.

● Type token-generator on the command line to run the tool.


● A question is asked next: “How many data centers will participate in this cluster?” In the example, specify 2
as the number of data centers and press enter.
● Next, the question: “How many nodes are in data center number 1?” is asked. Type 5 and press enter.
● The next question is: “How many nodes are in data center number 2?” Type 4 and press enter.
● The example shows the token numbers being
generated for 5 nodes in data center 1 and 4
nodes in data center 2.
● The first node always has the token value as 0.
● These token numbers will be copied to the
Cassandra.yaml configuration file for each
node.

DataCenter 1

DataCenter 2
The following diagram depicts a four node cluster with token values of 0, 25, 50 and 75.

For a given key, a hash value is generated in the range of 1 to 100. Keys with hash values in the range 1 to 25 are stored on
the first node, 26 to 50 are stored on the second node, 51 to 75 are stored on the third node, and 76 to 100 are stored on the
fourth node. Please note that actual tokens and hash values in Cassandra are 127-bit positive integers.
Fault Tolerance:
Replication in Cassandra:
● One piece of data can be replicated to multiple
(replica) nodes, ensuring reliability and fault
tolerance. Cassandra supports the notion of a
replication factor (RF), which describes how many
copies of your data should exist in the database. So
far, our data has only been replicated to one replica
(RF = 1).
● Cassandra allows replication based on nodes, racks,
and data centers, unlike HDFS that allows
replication based on only nodes and racks.
● Replication across data centers guarantees data
availability even when a data center is down.
SimpleStrategy in Cassandra

● SimpleStrategy is used when you have just one


data center.

● SimpleStrategy places the first replica on the node


selected by the partitioner.

● After that, remaining replicas are placed in


clockwise direction in the Node ring.
NetworkTopologyStrategy in Cassandra:

● NetworkTopologyStrategy is used when you have


more than two data centers.
● In NetworkTopologyStrategy, replicas are set for
each data center separately.
● NetworkTopologyStrategy places replicas in the
clockwise direction in the ring until reaches the first
node in another rack.
● This strategy tries to place replicas on different racks
in the same data center.
● This is due to the reason that sometimes failure or
problem can occur in the rack. Then replicas on
other nodes can provide data.
● Here is the pictorial representation of the Network
topology strategy:
● In the following figure, we can see the data is copied two times which are indicated as the (red and blue).
Network Topology:
Network topology refers to how the nodes, racks and data centers in a cluster are organized. You can specify a
network topology for your cluster as follows:

● Specify in the Cassandra-topology.properties file.


● Your data centers and racks can be specified for each node in the cluster.
● Specify <ip-address>=<data center>:<rack name>.
● For unknown nodes, a default can be specified.
● You can also specify the hostname of the node instead of an IP address.
The following diagram depicts an example of a topology configuration file

● This file shows the topology defined for


four nodes.
● The node with IP address 192.168.1.100 is
mapped to data center DC1 and is present
on the rack RAC1.
● The node with IP address 192.168.2.200 is
mapped to data center DC2 and is present
on the rack RAC2.
● Similarly, the node with IP address
10.20.114.10 is mapped to data center DC2
and rack RAC1 and the node with IP
address 10.20.114.11 is mapped to data
center DC2 and rack RAC1.
● There is also a default assignment of data
center DC1 and rack RAC1 so that any
unassigned nodes will get this data center
and rack.
Snitches:
● In Apache Cassandra, a snitch is a component that helps Cassandra understand the network topology of the
cluster.
● The snitch provides information about the location of nodes (e.g., data centers and racks) so that Cassandra
can make informed decisions about data replication, read and write operations, and failure handling.
● Snitches define the topology in Cassandra. A snitch defines a group of nodes into racks and data centers.
● Two types of snitches are most popular:
○ Simple Snitch - A simple snitch is used for single data centers with no racks.
○ Property File Snitch - A property file snitch is used for multiple data centers with multiple racks.
PropertyFile Snitch:

Configuration File: PropertyFileSnitch relies on a properties file (cassandra-rackdc.properties) to define the data
center and rack for each node. This file maps IP addresses or hostnames of nodes to their respective data centers and
racks.
File Location and Format: The properties file is typically located in the Cassandra configuration directory (e.g.,
/etc/cassandra/). It has a simple key-value format where each line specifies the data center and rack information for
a node.

Example:
# cassandra-rackdc.properties
192.168.1.1=dc1:rack1
192.168.1.2=dc1:rack1
192.168.2.1=dc2:rack2

In this example:

● 192.168.1.1 and 192.168.1.2 are in dc1 (data center 1) and rack1 (rack 1).
● 192.168.2.1 is in dc2 (data center 2) and rack2 (rack 2).
Determine Node Location:

● Snitches help Cassandra determine which data center and rack a node belongs to.

Data Replication:

● By understanding the topology, snitches enable Cassandra to place replicas of data in different data centers and
racks, enhancing data availability and fault tolerance.

Read and Write Routing:

● Snitches help route read and write requests to the most appropriate nodes based on their location. This
minimizes latency and balances the load across the cluster.

Failure Detection:

● Snitches provide information that helps Cassandra detect and handle node and data center failures effectively.
This ensures that data remains available even if some nodes or data centers fail.
Gossip Protocol:
Cassandra has no master node.
Every node must:

1. Know about all other nodes in the cluster.


2. Detect failures quickly.
3. Share schema, state, and location info.

Without gossip, Cassandra wouldn’t know how to:

● Route a request to the right node.


● Replicate data across datacenters.
● Handle node failures gracefully.
How Gossip Works
● Runs every 1 second on each node.
● Each node randomly contacts 3 other nodes and exchanges information.
● This info is merged and quickly spreads across the cluster → eventually, all nodes know about everyone else
(like gossip spreading in real life).

What Information Gossip Shares


1. Node state → Alive, dead, or joining.
2. Token ranges → Which tokens a node is responsible for.
3. Schema version → To keep schema synchronized across nodes.
4. Datacenter & Rack info (via snitches).
5. Application state → if the node is bootstrapping or decommissioning.
● In step 1, one node connects to three other nodes.
● In step 2, each of the three nodes connects to three other nodes, thus connecting to nine nodes in total in step 2.
● So a total of 13 nodes are connected in 2 steps
Seed Nodes:
Cluster Discovery: When a new node starts up, it needs to discover other nodes in the cluster to join and start
functioning. The seed nodes are pre-configured in the new node’s settings, allowing it to contact these nodes to get
information about the existing cluster.

Initial Contact: The seed node helps in the initial discovery phase. The new node contacts the seed node to get
information about the rest of the cluster. The seed node then provides information about other nodes in the cluster,
allowing the new node to connect to them.

Not a Unique Role: It’s worth noting that seed nodes are not special or unique in terms of cluster operations; they
don't have any additional responsibilities beyond this initial discovery process. Once the new node has connected to
other nodes in the cluster, it doesn’t need to rely on the seed node anymore for cluster operations.

Configuration: In the cassandra.yaml configuration file, you specify the list of seed nodes under the
seed_provider configuration. It’s a good practice to list more than one seed node to provide redundancy. This way, if
one seed node is unavailable, the new node can still discover and join the cluster via another seed node.

Best Practices: To ensure high availability and reliability, you should configure multiple seed nodes (typically 2-3)
across different machines or data centers if possible. This way, the failure of a single node will not prevent new
nodes from joining the cluster.
Seed Nodes:

Seed nodes are used to bootstrap the gossip protocol. The features of
seed nodes are:

● They are specified in the configuration file Cassandra.yaml.


● Seed nodes are used for bootstrapping the gossip protocol
when a node is started or restarted.
● They are used to achieve a steady state where each node is
connected to every other node but are not required during the
steady state.

On startup, two nodes connect to two other nodes that are specified as
seed nodes. Once all the four nodes are connected, seed node
information is no longer required as steady state is achieved.
Configurations:
● The main configuration file in Cassandra is the Cassandra.yaml file. We will look at this file in more detail in
the lesson on installation.
● Right now, let us remember that this file contains the name of the cluster, seed nodes for this node, topology
file information, and data file location.
● This file is located in /etc/Cassandra in some installations and in /etc/Cassandra/conf directory in others
Efficient Writes in
Cassandra:
Cassandra Write Process:
The Cassandra write process ensures fast writes. Steps in the Cassandra write process are:

1. Data is written to a commitlog on disk.


2. The data is sent to a responsible node based on the hash value.
3. Nodes write data to an in-memory table called memtable.
4. From the memtable, data is written to an sstable in memory. Sstable stands for Sorted String table. This has
a consolidated data of all the updates to the table.
5. From the sstable, data is updated to the actual table.
6. If the responsible node is down, data will be written to another node identified as tempnode. The tempnode
will hold the data temporarily till the responsible node comes alive.
● When a write operation (like an insert or update) is performed, the data is first written to the commit log before
being written to the in-memory data structure (MemTable) and eventually to the disk as an SSTable. This ensures
that in case of a crash or power failure, Cassandra can recover the data by replaying the commit log.
● Write-Ahead Logging: The commit log acts as a write-ahead log. This means it records the changes that will be
made to the database before they are applied to the data structures that are more directly used by queries. This
approach guarantees that no writes are lost, even if the system crashes right after the write operation.
Write Consistency Levels:
● ANY: A write is considered successful if it is written to at least one node. This level provides the highest
availability but the lowest consistency, as it might not wait for the write to be recorded in the commit log or
the MemTable.

● ONE: A write is successful if it is acknowledged by at least one replica node. This level offers a good balance
between availability and consistency but does not guarantee that all replicas will have the latest data.

● QUORUM: A write is successful if a majority (quorum) of the replica nodes acknowledge the write.
For a replication factor (RF) of NNN, the quorum is ⌈N/2⌉. For instance, if RF is 3, then a quorum would be
2. This level provides a strong consistency guarantee while still maintaining good availability.

● ALL: A write is considered successful only if all replica nodes acknowledge it. This level provides the
highest consistency but at the cost of availability, as the write will fail if even a single node is down or
unreachable.
How are write requests accomplished?

● The coordinator sends a write request to all replicas that own the row being written.
● As long as all replica nodes are up and available, they will get the write regardless of the consistency level
specified by the client.
● The write consistency level determines how many replica nodes must respond with a success acknowledgment
in order for the write to be considered successful.
● Success means that the data was written to the commit log and the memtable as described in how data is
written.
● The coordinator node forwards the write to replicas of that row, and responds to the client once it receives write
acknowledgments from the number of nodes specified by the consistency level.
Exceptions:
○ If the coordinator cannot write to enough replicas to meet the requested consistency level, it throws an
Unavailable Exception and does not perform any writes.
○ If there are enough replicas available but the required writes don't finish within the timeout window, the
coordinator throws a Timeout Exception.
● For example, in a single datacenter
10-node cluster with a replication
factor of 3, an incoming write will go
to all 3 nodes that own the requested
row.
● If the write consistency level specified
by the client is ONE, the first node to
complete the write responds back to
the coordinator, which then proxies the
success message back to the client.
● A consistency level of ONE means
that it is possible that 2 of the 3
replicas can miss the write if they
happen to be down at the time the
request is made.
● If a replica misses a write, the row is
made consistent later using one of the
built-in repair mechanisms: hinted
handoff, read repair, or anti-entropy Single datacenter cluster with 3 replica nodes and consistency set to ONE
node repair.
Hinted Handoff:
● Hinted Handoff is a mechanism in Apache Cassandra designed to handle temporary failures or
unavailability of nodes during write operations.
● It helps maintain data consistency and availability by ensuring that writes are eventually
propagated to all replicas, even if some of them are temporarily down.
How Hinted Handoff Works?
● On occasion, a node becomes unresponsive while data is being written. Reasons for unresponsiveness are
hardware problems, network issues, or overloaded nodes that experience long garbage collection (GC)
pauses.
● By design, hinted handoff inherently allows Cassandra to continue performing the same number of writes
even when the cluster is operating at reduced capacity.
● After the failure detector marks a node as down, missed writes are stored by the coordinator for a period of
time, if hinted handoff is enabled in the cassandra.yaml file.
● In Cassandra 3.0 and later, the hint is stored in a local hints directory on each node for improved replay.
● The hint consists of a target ID for the downed node, a hint ID that is a time UUID for the data, a message
ID that identifies the Cassandra version, and the data itself as a blob.
● Hints are flushed to disk every 10 seconds, reducing the staleness of the hints.
● When gossip discovers when a node has comes back online, the coordinator replays each remaining hint to
write the data to the newly-returned node, then deletes the hint file.
● If a node is down for longer than max_hint_window_in_ms (3 hours by default), the coordinator stops
writing new hints.
How Hinted Handoff Works

1. Write Operation: When a write request is made, Cassandra writes the data to the appropriate nodes based on
the replication strategy.
2. Node Failure: If a node is down or unreachable, the other nodes will store a hint for the write operation that
was meant for the unavailable node.

Hint Logs are files or data structures in Cassandra where hints about write operations are stored. These hints
are kept temporarily to ensure that write operations are not lost when a node is down or unreachable during
the write.

3. Hint Storage: These hints are stored on the nodes that successfully received the write request.
4. Node Recovery: When the previously unavailable node comes back online, the nodes that stored the hints will
send these hints to the recovered node.
5. Replay Hint: The recovered node then replays the hints and applies the write operations, ensuring it
eventually gets all the updates.
● For example, consider a cluster
consisting of three nodes, A, B,
and C,with a replication factor of
2.
● When a row K is written to the
coordinator (node A in this case),
even if node C is down, the
consistency level of ONE or
QUORUM can be met.
● Why? Both nodes A and B will
receive the data, so the
consistency level requirement is
met.
● A hint is stored for node C and
written when node C comes up. In
the meantime, the coordinator can
acknowledge that the write
succeeded.
Cassandra Read Process:
● The Cassandra read process ensures fast reads. Read happens across all nodes in parallel.
● If a node is down, data is read from the replica of the data.
● Priority for the replica is assigned on the basis of distance.

● Features of the Cassandra read process are:


○ Data on the same node is given first preference and is considered data local.
○ Data on the same rack is given second preference and is considered rack local.
○ Data on the same data center is given third preference and is considered data center local.
○ Data in a different data center is given the least preference.

Data in the memtable and sstable is checked first so that the data can be retrieved faster if it is already in memory.
It has two data centers:

● data center 1
● data center 2

Data center 1 has two racks, while data center


2 has three racks.

Fifteen nodes are distributed across this


cluster with nodes 1 to 4 on rack 1, nodes 5 to
7 on rack 2, and so on
Example of Cassandra Read Process:
The diagram explains the Cassandra read process in a cluster with two data centers, five racks, and 15 nodes.

Data row 1 is a row of data with four replicas.

● The first copy is stored on node 3


● The second copy is stored on node 5
● The third copy is stored on node 7

All these nodes are in data center 1. The fourth copy is stored on node 13 of data center 2.

If a client process is running on data node 7 wants to access data row 1; node 7 will be given the highest preference as the data is local
here. The next preference is for node 5 where the data is rack local. The next preference is for node 3 where the data is on a different
rack but within the same data center.

The least preference is given to node 13 that is in a different data center. So the read process preference in this example is node 7, node
5, node 3, and node 13 in that order.
Read Consistency Levels:
Read Consistency Levels:

● ONE: A read is considered successful if at least one replica node returns the data. This level is quick but does
not guarantee the most recent data, as it might not be synchronized with all replicas.
● QUORUM: A read is successful if a majority of replica nodes return the data. This level ensures that the data
is recent and consistent across the majority of nodes, providing a good balance between consistency and
performance.
● ALL: A read is successful only if all replica nodes return the data. This level ensures the highest consistency
but can be slower and less available if some nodes are down.
● LOCAL_QUORUM: This is similar to the QUORUM consistency level but applies only to replicas in the
same data center. It helps reduce cross-data-center latency while still ensuring consistency.
● EACH_QUORUM: A read or write operation is successful if a quorum of nodes in each data center (in a
multi-data-center setup) acknowledges the operation. This ensures consistency across all data centers.
SERIAL and SERIAL_LOCAL Consistency Levels:
SERIAL = Cassandra checks across the whole cluster (all datacenters) to make sure the read/write you’re doing
with an IF condition is the most up-to-date and correct.
Global correctness, but slower.

LOCAL_SERIAL = Cassandra only checks inside your local datacenter for the most up-to-date value.
Faster, but might not include very recent changes made in another datacenter.

Example: Library Book Borrowing

Imagine Cassandra is a library system spread across 2 datacenters:

● Datacenter A = Delhi branch


● Datacenter B = Mumbai branch

You want to borrow a book, but only if nobody else has borrowed it
Example 1: Serial Consistency

Objective: Ensure that a user profile update is consistent across all data centers and appears in a strict order.

1. Application Requirement: You have a user profile with an account_balance field. When a user makes a
purchase, you need to update their account balance.
2. Operation: You want to ensure that the update to account_balance happens in a strictly serialized manner
across all replicas, irrespective of which data center the request hits.

Example 2: Local Serial Consistency

Objective: Ensure that a user profile update is consistent within a single data center, but not necessarily across
multiple data centers.

1. Application Requirement: The same e-commerce application with user profiles. Now, you want to ensure
strong consistency for profile updates, but only within a single data center.
2. Operation: You are fine with eventual consistency across data centers but need strong consistency within a
single data center to avoid conflicts during high traffic.
Examples of read consistency levels:

A single datacenter cluster with a


consistency level of QUORUM

● In a single datacenter cluster with a replication


factor of 3, and a read consistency level of
QUORUM, 2 of the 3 replicas for the given
row must respond to fulfill the read request.
● If the contacted replicas have different
versions of the row, the replica with the most
recent version will return the requested data. In
the background, the third replica is checked for
consistency with the first two, and if needed, a
read repair is initiated for the out-of-date
replicas.
Fig: Single datacenter cluster with 3 replica nodes and consistency set to
QUORUM
A single datacenter cluster with a
consistency level of ONE

● In a single datacenter cluster with a


replication factor of 3, and a read
consistency level of ONE, the closest
replica for the given row is contacted
to fulfill the read request.
● In the background a read repair is
potentially initiated, based on the
read_repair_chance setting of the
table, for the other replicas.

Single datacenter cluster with 3 replica nodes and consistency set to ONE
A two datacenter cluster with a consistency level of
QUORUM

● In a two datacenter cluster with a replication factor of 3, and a


read consistency of QUORUM, 4 replicas for the given row must
respond to fulfill the read request.
● The 4 replicas can be from any datacenter. In the backgraound,
the remaining replicas are checked for consistency with the first
four, and if needed, a read repair is initiated for the out-of-date
replicas.
● Multiple datacenter cluster with 3 replica nodes and consistency
level set to QUORUM
A two datacenter cluster with a consistency
level of LOCAL_QUORUM

● In a multiple datacenter cluster with a replication


factor of 3, and a read consistency of
LOCAL_QUORUM, 2 replicas in the same
datacenter as the coordinator node for the given
row must respond to fulfill the read request.
● In the background, the remaining replicas are
checked for consistency with the first 2, and if
needed, a read repair is initiated for the out-of-date
replicas.
● Multiple datacenter cluster with 3 replica nodes
and consistency set to LOCAL_QUORUM
A two datacenter cluster with a consistency level
of ONE

● In a multiple datacenter cluster with a replication


factor of 3, and a read consistency of ONE, the closest
replica for the given row, regardless of datacenter, is
contacted to fulfill the read request.
● In the background a read repair is potentially
initiated, based on the read_repair_chance setting of
the table, for the other replicas.
● Multiple datacenter cluster with 3 replica nodes and
consistency set to ONE
A two datacenter cluster with a consistency level
of LOCAL_ONE

● In a multiple datacenter cluster with a replication


factor of 3, and a read consistency of LOCAL_ONE,
the closest replica for the given row in the same
datacenter as the coordinator node is contacted to
fulfill the read request.
● In the background a read repair is potentially
initiated, based on the read_repair_chance setting of
the table, for the other replicas.
● Multiple datacenter cluster with 3 replica nodes and
consistency set to LOCAL_ONE
Anti Entropy
(For Efficient Read
Operations):
Anti Entropy:

● "Anti-entropy"is a process used to ensure consistency across different nodes in the cluster by resolving
discrepancies in data.
● Specifically, it’s a way to repair differences (or "conflicts") that may occur due to the eventual consistency
model that Cassandra uses.
● Since Cassandra allows data to be written to different nodes (replicas) and doesn't require an immediate sync
between them, sometimes inconsistencies can arise. This might happen if one replica has stale or missing data,
or if a network partition occurred and nodes couldn't communicate properly.
● Anti-entropy repair in Cassandra is the mechanism that tries to fix these inconsistencies. It uses a process
known as Merkle Trees to compare the data between two replicas and identify differences.
Read Repair in Anti Entropy:
● Read Repair in Cassandra is an automatic process that ensures data consistency across replicas during a read
operation.
● When a client queries the database, Cassandra may detect that different replicas have inconsistent data for the
same piece of information.
● Read Repair is triggered to fix this inconsistency.
● Cassandra compares the data returned from different replicas for the same row. If discrepancies are detected
between these replicas (i.e., they have different versions of the same data), Cassandra will repair the
inconsistency by updating the outdated replicas with the most recent version of the data.
● It ensures that all replicas are consistent with the most recent write and helps bring the system closer to
consistency over time.
How does Read Repair Work?
1. Client Issues a Read Request: A client requests data (for example, a row with a specific row_id) from the
Cassandra cluster.
2. Coordinator Queries Replicas: The coordinator node (which receives the read request) queries multiple
replicas (e.g., nodes in the cluster) for the data.
3. Replicas Return Data: The replicas return their version of the requested data. Each replica stores data with a
timestamp indicating when the data was last updated.
4. Detecting Inconsistent Data: The coordinator compares the data returned from the replicas. If any
discrepancies are found (e.g., replicas have different values for the same data), the coordinator identifies that
the replicas are inconsistent.
5. Repairing the Inconsistent Data: The coordinator will then use the data from the replica with the most
recent timestamp (the most up-to-date version) and propagate it to the other replicas that have outdated data.
This ensures that all replicas are consistent with the latest value.
6. Returning Data to the Client: After the read repair is performed, the client receives the most recent (and
consistent) data from the coordinator.
Role of Merkle Tree in Read Repair:
● Merkle Trees are used as an efficient way to detect data inconsistencies between replicas during the read
repair process.
● They enable Cassandra to quickly identify which part of the data needs to be repaired without having to
compare the entire dataset on each node.
● A Merkle tree, also known as a Hash tree, is a data structure that efficiently verifies the integrity of large
datasets by hashing data into a tree-like structure, where each leaf node represents a hash of a data block and
non-leaf nodes represent hashes of their child nodes.
Merkle Tree:
EXAMPLE:
Step 2: Combine adjacent hashes and hash
Insert the following data blocks into a Merkle tree: them again

● D1=8, D2=16, D3=32, D4=64 Next, we combine the adjacent hashes and apply
the simple hash function to the concatenated
H(N)=(∑of digits of N) mod10 result.

1. Combine H(D1)=8 and H(D2)=7:


8∥7=87
STEP - 1: We start by applying the simple hash function to
each data block: Apply the hash function:
H(87)=(8+7) mod 10=5
1. H(D1)=H(8)=8
2. H(D2)=H(16)=7 2. Combine H(D3)=5 and H(D4)=0:
3. H(D3)=H(32)=5 5∥0=50
4. H(D4)=H(64)=0 Apply the hash function:
H(50)=(5+0) mod 10 =5
Step 3: Combine the results from Step 2 and hash them again

Now, we combine the results from Step 2: [0] (Merkle Root)


/ \
1. Combine H(5)and H(5): [5] [5]
5∥5=55 Apply the hash function: / \ / \
H(55)=(5+5)mod 10 = 0 [8] [7] [5] [0]
Step-By-Step Process of using Merkle Tree in Read
Repair:
1. Read Request: A client issues a read request to the coordinator node.
2. Coordinator Queries Replicas: The coordinator node sends read requests to the replicas holding the data.
3. Generate Merkle Trees: Each replica generates a Merkle Tree for the data it holds and computes the root hash.
4. Compare Root Hashes: The coordinator node compares the root hashes of the Merkle Trees from all the replicas.
5. Mismatch Detected: If the root hashes are different, it means that some data is inconsistent across replicas.
6. Navigate Merkle Tree: The coordinator navigates through the Merkle Trees to identify which parts of the data
differ.
7. Repair Inconsistent Data: The coordinator identifies the most recent data version (based on timestamps) and
initiates a read repair. The most up-to-date data is propagated to the replicas with outdated data.
8. Return Data to Client: Once the read repair is done, the coordinator returns the correct, consistent data to the
client.
Repair Process:
● Detection: The coordinator node compares Merkle trees and identifies inconsistent data ranges.
● Correction: Inconsistent data is corrected by streaming the correct data from one replica to others. This process
updates the replicas with the latest data.

Repair Types:
● Full Repair: Compares and repairs all SSTables (Sorted String Tables) in a node, ensuring complete consistency.
● Incremental Repair: Focuses on SSTables that have not been previously repaired, using metadata to track repaired
versus unrepaired data. It’s more efficient if repairs are run frequently.

Compaction Strategies:
● Anti-Compaction: Compaction is the process of merging multiple SSTables into a single SSTable to improve read
performance and reclaim disk space. Anti-compaction is typically triggered as part of a repair operation.

Repair Modes:
● Sequential Repair: Repairs nodes one at a time. It creates snapshots of SSTables, minimizing data changes during
repair.
● Parallel Repair: Repairs multiple nodes simultaneously, improving repair speed but increasing network load.
Full Repair vs Incremental Repair:

1. Full Repair:
● Scope: Compares and repairs all SSTables on a node.
● Process: Generates Merkle trees for the entire dataset and identifies discrepancies.
● Resource Usage: High, as it involves complete data processing and comparison.
● Duration: Generally longer due to the comprehensive nature of the repair.
● Use Case: Best for initial consistency checks or when a thorough repair is needed.

2. Incremental Repair:
● Scope: Focuses on SSTables that have not been repaired before, using metadata to track repaired and
unrepaired data.
● Process: Builds Merkle trees only for unrepaired SSTables, compares them, and corrects discrepancies.
● Resource Usage: Lower, as it deals with a subset of data.
● Duration: Shorter and more efficient due to its targeted approach.
● Use Case: Suitable for routine maintenance and frequent repairs to minimize performance impact.
Why is Read Repair Needed?
Cassandra is designed for eventual consistency, which means that nodes might have different versions of the same
data due to various reasons, such as:

● Network partitions: When nodes are temporarily unable to communicate, they may diverge and store
different versions of the same data.
● Node failures: If a node is down during writes, it might miss some updates.
● Writes happening at different times on different nodes, causing temporary inconsistency.

Read Repair ensures that even if data is inconsistent between replicas, it gets fixed during subsequent read
operations, ensuring that the system eventually converges to consistency.
Tunable
Consistency:
Tunable Consistency:
In Apache Cassandra, "tunable consistency" refers to the ability to configure the consistency level of read and write
operations to balance between data consistency and system performance. This flexibility allows you to make
trade-offs based on your application's needs.

Trade-offs

● Higher Consistency: Requires more replicas to acknowledge a request, which means stronger guarantees that
the data is accurate but can affect performance and latency.
● Lower Consistency: Faster and can handle more load, but offers weaker guarantees that the data is accurate.
Examples of Tunable Consistency in Practice
Example 1: Balancing Consistency and Performance
Suppose you have a Cassandra cluster with a replication factor of 3 (RF=3). For a write operation, you might
choose:

● Write Consistency Level: QUORUM (which requires 2 out of 3 replicas to acknowledge the write)
● Read Consistency Level: QUORUM (which also requires 2 out of 3 replicas to respond)

This configuration balances consistency and performance. It ensures that most of the replicas have the latest data
and reduces the risk of stale reads while maintaining a reasonable level of availability.

Example 2: Maximizing Availability During Network Partitions


In a scenario where you are concerned about network partitions and want to ensure that the system remains
available, you might choose:

● Write Consistency Level: ONE (only one replica needs to acknowledge the write)
● Read Consistency Level: ONE (only one replica needs to respond to the read request)

This configuration maximizes availability but at the cost of potentially higher risk of data inconsistency if some
replicas are out of sync.
Example 3: Ensuring Strong Consistency
For applications requiring strong consistency, such as financial systems, you might use:

● Write Consistency Level: ALL (all replicas must acknowledge the write)
● Read Consistency Level: ALL (all replicas must respond to the read request)

This setup ensures that all replicas are in sync and provides the highest level of consistency but may impact performance
and availability, particularly if some replicas are down.
What happens if Consistency is not maintained?
Failure Scenarios: Node Failure
Cassandra is highly fault tolerant. The effects of node failure are as follows:

● Other nodes detect the node failure.


● Request for data on that node is routed to other nodes that have the replica of that data.
● Writes are handled by a temporary node until the node is restarted.
● Any memtable or sstable data that is lost is recovered from commitlog.
● A node can be permanently removed using the nodetool utility.
Failure Scenarios: Disk Failure
When a disk becomes corrupt, Cassandra detects the problem and takes corrective action. The effects of Disk Failure
are as follows:

● The data on the disk becomes inaccessible


● Reading data from the node is not possible
● This issue will be treated as node failure for that portion of data
● Memtable and sstable will not be affected as they are in-memory tables
● Commitlog has replicas and they will be used for recovery
Failure Scenarios: Rack Failure
Sometimes, a rack could stop functioning due to power
failure or a network switch problem.

The effects of Rack Failure are as follows:

● All the nodes on the rack become inaccessible


● Reading data from the rack nodes is not possible
● The reads will be routed to other replicas of the
data
● This will be treated as if each node in the rack
has failed
Failure Scenarios: Data Centre Failure
Data center failure occurs when a data center is shut down
for maintenance or when it fails due to natural calamities.
When that happens:

● All data in the data center will become


inaccessible.
● All reads have to be routed to other data centers.
● The replica copies in other data centers will be
used.
● Though the system will be operational, clients
may notice slowdown due to network latency.
This is because multiple data centers are
normally located at physically different locations
and connected by a wide area network.
Cassandra Query
Language (CQL):
CQLSH in Cassandra:
● Apache Cassandra cqlsh stands for Cassandra Query Language Shell.
● It is the command-line interface tool that is used to interact with Cassandra to define the schema, insert data
and execute the query.
● CQLSH comes with the Cassandra package and is present in the bin/ directory.
A keyspace acts as a container or
namespace for tables.
This organization allows you to group
related tables together under a common
namespace.
Cassandra DML (Data Manipulation Language)
Insert Operation:
Update Operation:
Delete Operation:
Select Operation:
Cassandra “where” clause:
Create Index:
Drop Index:
Cassandra TTL (Time to Live) using Automatic Data Expiration
Cassandra Collections:
In Cassandra Query Language (CQL), collections are used to store and manage multiple values in a single
column.

Schema Flexibility

Using collections can make the schema more flexible and dynamic. Instead of creating separate tables to manage
related data, you can use collections to embed these relationships directly within a single row. This allows you to:

● Avoid Joins: Reduce the need for complex queries and joins by embedding related data within a single
column.
● Adapt to Data Changes: Easily adjust the schema as your data requirements evolve without needing to
redesign your database structure.

There are mainly three types of collections that Cassandra supports:


1. Set
2. List
3. Map
Cassandra Set Collection:
Sets are unordered collections of unique elements. They do not allow duplicate values and do not maintain the
order of elements.
Cassandra List Collection

Lists are ordered collections of elements where each element can appear multiple times. They maintain the order of
insertion, meaning that the order in which elements are added is preserved.
Cassandra Map Collection
Maps are collections of key-value pairs where each key is unique, and each key maps to a single value. Maps are
useful for scenarios where you need to associate a specific value with a unique key.
Cassandra JMX(Java Management Extensions)
Authentication:
● JMX authentication ensures that only authorized users can access management and
monitoring interfaces.
● Without authentication, anyone with network access could potentially monitor or
manipulate your Cassandra nodes, which could lead to security breaches or
unauthorized data access.
Configure Authentication and Authorization:
● In Cassandra, by default authentication and authorization options are disabled.
● You have to configure Cassandra.yaml file for enabling authentication and authorization.
● Open Cassandra.yaml file and uncomment lines that deals with internal authentication and
authorization.
Logging in
● Now authentication is enabled, if you try to access any keyspace, Cassandra will return an error.
● By default, Cassandra provides the super account with username ‘cassandra’ and password ‘cassandra’.
By logging in to ‘Cassandra’ account, you can do whatever you want.
● Let’s see the below screenshot for this, where it will not allow you to login if you are not using the default
Cassandra “username” and “password”.
● Now, in the second screenshot, you can see after using Cassandra default login credential, you are
able to login.
● You can also create another user with this account. It is recommended to change the password from
the default.
● Here is the example of login Cassandra user and change default password.
Create New User
New accounts can be created with the ‘Cassandra’ account.
For creating a new user, login, the password is specified along with whether the user is super user or not. Only Super
user can create new users.

create user robin with password 'manager' superuser;


create user robin with password 'newhire';

You can get a list of all users by the following syntax: list users;
Authorization
Authorization is the assigning permission to users that what action a particular user can perform.
Here is the generic syntax for assigning permission to users.
GRANT permission ON resource TO user
There are following types of permission that can be granted to the user.
1. ALL
2. ALTER
3. AUTHORIZE
4. CREATE
5. DROP
6. MODIFY
7. SELECT
Here are examples of assigning permission to the user.
Create user laura with password 'newhire';
grant all on dev.emp to laura;
revoke all on dev.emp to laura;
grant select on dev.emp to laura;

A new user ‘laura’ is created with password ‘newhire’.


Here is the example where user ‘laura’ try to access emp_bonus table. Laura has only permission to access dev.emp
and no permission to this table dev.emp_bonus that’s why an error was returned.
You can get a list of all permissions that is assigned to the user. Here is the example of getting permission
information.

You can also list all the permission on the resource. Here is the example of getting permission from a table.
Enabling JMX Authentication
With the default settings of Cassandra, JMX can only be accessed from the localhost. If you want to
access JMX remotely, change the LOCAL_JMX setting in Cassandra-env.sh and enable authentication
or SSL.
After enabling JMX authentication, make sure OpsCenter and nodetool are configured to use
authentication.
Procedure
There are following steps for enabling JMX authentication.
1. In the cassandra-env.sh file, add or update following lines.
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=true"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password"

Also, change the LOCAL_JMX setting in Cassandra-env.sh


LOCAL_JMX=no
2. Copy the jmxremote.password.template from /jdk_install_location/lib/management/ to /etc/cassandra/ and
rename it tojmxremote.password.
cp />jdk_install_dir/lib/management/jmxremote.password.template /etc/cassandra/jmxremote.password
3. Change the ownership of jmxremote.password to the user you run Cassandra with and change permission to read
only
chown cassandra:cassandra /etc/cassandra/jmxremote.password
chmod 400 /etc/cassandra/jmxremote.password
4. Edit jmxremote.password and add the user and password for JMX-compliant utilities:
monitorRole QED
controlRole R&D
cassandra cassandrapassword
5. Add the Cassandra user with read and write permission to
/jdk_install_location/lib/management/jmxremote.access
monitorRole readonly
cassandra readwrite
controlRole readwrite \
create javax.management.monitor.,javax.management.timer. \
unregiste
6. Restart Cassandra
7. Run nodetool with the Cassandra user and password.
$ nodetool status -u cassandra -pw cassandra
Thanks!!

You might also like