Cassandra Notes
Cassandra Notes
(10-Marks)
1. Define Data and Information. Explain how they differ with examples.
Definition:
⦁ Data: Raw, unprocessed facts or observations collected from various sources, lacking
context or meaning until analyzed. It can be numbers, text, images, or any other form
(e.g., sensor readings from an IoT device).
⦁ Information: Processed data that has been organized, analyzed, and given context to
provide meaning and support decision-making. It is the output of interpreting data (e.g.,
a report summarizing sensor trends).
Differences:
⦁ Structure: Data is unorganized (e.g., a list of temperatures: 25°C, 27°C, 23°C), while
information is structured and meaningful (e.g., "Average temperature today is 25°C,
indicating stable weather").
⦁ Utility: Data is passive and needs interpretation, while information is actionable (e.g.,
data from a smart grid shows power usage; information predicts a peak load at 6 PM).
⦁ Example in Context: In an IoT healthcare system, raw data might be heart rate readings
(e.g., 72, 75, 80 bpm) collected every minute. Information emerges when these are
analyzed to show "Patient’s heart rate averaged 75 bpm with a spike at 80 bpm,
suggesting possible stress—alert doctor."
2. What is RDBMS? List its limitations in the context of big data applications.
Definition:
A Relational Database Management System (RDBMS) is a software system that manages data
stored in a structured format using tables (rows and columns) with predefined schemas. It uses
SQL (Structured Query Language) for querying and ensures data integrity through relationships
(e.g., primary and foreign keys). Examples include MySQL, Oracle, and PostgreSQL.
1
⦁ Scalability Issues: RDBMS is designed for vertical scaling (adding more power to a single
server), which is costly and limited compared to the horizontal scaling (adding more
servers) required for big data’s massive volume (e.g., petabytes of IoT sensor data).
⦁ Schema Rigidity: RDBMS requires a fixed schema, making it inflexible for handling
unstructured or semi-structured big data (e.g., JSON logs from smart devices) that
evolves rapidly.
⦁ Performance with High Velocity: The row-based storage and join operations in RDBMS
struggle with real-time processing of high-velocity data streams (e.g., millions of
transactions per second in a smart city).
⦁ Limited Parallel Processing: RDBMS lacks native support for distributed computing,
unlike big data frameworks (e.g., Hadoop), leading to bottlenecks when analyzing large
datasets across multiple nodes.
⦁ Cost and Complexity: Managing large-scale RDBMS instances (e.g., Oracle RAC) involves
high licensing costs and complex administration, whereas big data solutions like NoSQL
databases are more cost-effective for scale.
⦁ Example: In a big data IoT application tracking global shipping, an RDBMS might fail to
handle 1 TB of unstructured GPS and sensor data daily, whereas a NoSQL solution like
Cassandra could scale horizontally and process it efficiently.
⦁ Atomicity: Ensures that a transaction is fully completed or fully rolled back; no partial
updates occur.
⦁ Consistency: Guarantees that a transaction brings the database from one valid state to
another, adhering to defined rules (e.g., constraints).
2
Cassandra is a distributed NoSQL database designed for high availability and scalability,
particularly for big data and IoT applications. However, it does not fully adhere to ACID
properties due to its eventual consistency model and distributed architecture. Here's how it
behaves:
⦁ Atomicity:
⦁ Cassandra ensures atomicity at the row level within a single partition. If a write
operation updates multiple columns in one row, it either succeeds or fails
entirely. However, atomicity across multiple rows or partitions is not
guaranteed unless explicitly managed (e.g., using lightweight transactions with
IF NOT EXISTS).
⦁ Consistency:
⦁ Isolation:
⦁ Cassandra provides isolation through its write path, where updates are written
to a commit log and memtable, ensuring no interference during a single
operation. However, it does not support full transaction isolation levels (e.g.,
Serializable) like RDBMS. Concurrent writes to the same row can lead to "last
write wins" conflicts unless resolved with timestamps or lightweight
transactions.
⦁ Durability:
Example in Context:
In an IoT smart grid application, Cassandra stores power usage data. A write operation updating
voltage and current for a single sensor (row) is atomic and durable. However, if two nodes
update the same row concurrently with different values, the latest timestamp wins (weak
isolation), and consistency depends on the chosen level (e.g., QUORUM ensures eventual
3
consistency across nodes).
(15-Marks questions)
1. Discuss in detail the evolution and history of Apache Cassandra. Why was it developed and
how has it grown?
⦁ Origins (2008): Cassandra was initially developed by Facebook to power its Inbox
Search feature, which required handling massive amounts of data with high availability
and fault tolerance. It was created by Avinash Lakshman (who also co-authored
Amazon’s Dynamo) and Prashant Malik, combining ideas from Amazon’s Dynamo
(decentralized design) and Google’s Bigtable (column-family data model).
⦁ Maturity (2011–2015): With the 1.0 release in 2011, Cassandra became production-
ready, adding support for CQL (Cassandra Query Language), which resembled SQL and
improved usability. Version 2.0 (2013) introduced materialized views and lightweight
transactions, while version 3.0 (2015) enhanced performance with improved
compaction and storage engine upgrades.
4
vertical scaling constraints.
⦁ High Availability: The requirement for zero downtime during server failures (e.g.,
during peak usage) led to a design inspired by Dynamo’s peer-to-peer architecture.
⦁ Fault Tolerance: With data centers worldwide, Cassandra was designed to handle node
failures and network partitions, ensuring data accessibility.
⦁ Community and Ecosystem: The open-source model fostered a global community, with
companies like Netflix, Apple, and Uber adopting Cassandra for real-time applications.
The DataStax enterprise edition further commercialized it, adding tools like OpsCenter.
⦁ Use Cases: It grew to support diverse applications, from IoT sensor data storage to e-
commerce order processing, handling petabytes of data with high throughput.
⦁ Global Impact: By 2025, Cassandra’s adoption in smart cities (e.g., traffic data) and
healthcare (e.g., patient monitoring) reflects its growth into a robust, globally
recognized solution, with over 2,000 contributors and continuous updates.
Example:
Netflix uses Cassandra to manage 1.3 billion requests daily across multiple data centers,
showcasing its scalability and fault tolerance, a direct result of its original design goals.
2. Compare Cassandra with Traditional RDBMS in terms of scalability, performance, and data
structure.
Scalability:
5
capacity for a smart grid application.
⦁ Traditional RDBMS: Relies on vertical scaling (upgrading hardware like CPU or RAM),
which is limited by physical constraints and costly. Systems like Oracle or MySQL
struggle with large-scale distributed environments, often requiring sharding or
replication with complex management.
Performance:
Data Structure:
⦁ Cassandra: A wide-column store NoSQL database, storing data in tables with flexible
schemas (e.g., rows can have different column sets). It uses a key-value pair approach
within column families, supporting unstructured or semi-structured data (e.g., JSON
from smart devices). This flexibility suits evolving IoT data models.
⦁ Traditional RDBMS: Uses a rigid, tabular structure with fixed schemas (e.g., rows and
columns in MySQL), enforcing relationships via foreign keys. This is efficient for
structured data (e.g., customer records) but inflexible for unstructured big data,
requiring schema changes for new data types.
3. Explain CAP theorem with examples. How does Cassandra address the trade-offs?
6
Explanation of CAP Theorem:
The CAP theorem, proposed by Eric Brewer, states that a distributed system can only guarantee
two out of three properties simultaneously:
⦁ Consistency: All nodes see the same data at the same time after a write operation.
⦁ Availability: Every request receives a response, even if some nodes fail, ensuring no
downtime.
⦁ Partition Tolerance: The system continues to operate despite network partitions (e.g.,
node failures or communication breaks).
Examples:
⦁ Consistency over Availability: A banking system using an RDBMS (e.g., Oracle) ensures
all accounts reflect the same balance after a transaction (consistency), but if a partition
occurs, it may reject requests until resolved (unavailable).
⦁ Availability over Consistency: A social media platform like Twitter uses eventual
consistency, allowing users to post during a partition (available), but some followers
might see updates later (inconsistent).
⦁ Partition Tolerance in Action: An IoT smart grid with Cassandra operates across
multiple regions; if a datacenter fails, it remains functional but may temporarily show
inconsistent data until synchronized.
7
⦁ Replication Strategy: With a replication factor (e.g., 3), Cassandra stores copies across
nodes, ensuring partition tolerance. If one node fails, others serve data, maintaining
availability.
Example in Context:
In a distributed IoT healthcare system, Cassandra stores patient vitals across three data centers.
During a network partition, it uses a QUORUM consistency level to ensure most nodes agree on
critical updates (e.g., heart rate spikes), maintaining consistency and partition tolerance. For
non-critical data (e.g., routine logs), it switches to ONE, ensuring availability by accepting writes
on any available node, with consistency restored later. This flexibility addresses trade-offs based
on application needs.
UNIT-2
(10-Marks)
1. Explain the differences between a logical and a physical data model with examples.
⦁ Logical Data Model: Represents the conceptual structure of data, focusing on entities,
relationships, and attributes without detailing how data is stored or implemented. It is
independent of the database management system (DBMS) and emphasizes business
requirements.
8
⦁ Key Characteristics: Defines what data exists (e.g., entities like "Customer" and
"Order"), their relationships (e.g., one-to-many), and attributes (e.g.,
CustomerID, OrderDate), using tools like Entity-Relationship (ER) diagrams.
⦁ Example: In an IoT smart home system, a logical model might include entities
"Sensor" and "Room," with a relationship "Sensor monitors Room," and
attributes like SensorID and RoomTemperature, abstracting the business need
to track temperature per room.
⦁ Physical Data Model: Specifies how data is stored, accessed, and managed in a specific
DBMS, including tables, columns, indexes, partitions, and storage details. It translates
the logical model into a technical implementation.
⦁ Key Characteristics: Includes physical storage details (e.g., table names, data
types, indexes), optimization strategies (e.g., partitioning), and performance
considerations (e.g., caching).
⦁ Example: For the same smart home system, a physical model in Cassandra
might define a table "SensorData" with columns (SensorID text, RoomID text,
Temperature float, Timestamp timestamp), partitioned by SensorID, and
indexed for quick timestamp-based queries.
Differences:
⦁ Abstraction Level: Logical models are abstract and business-focused, while physical
models are technical and implementation-specific.
⦁ Detail: Logical models omit storage or performance details, while physical models
include them (e.g., data types, indexing).
⦁ Example in Context: A logical model for an e-commerce system might define "Product"
and "Order" entities with a many-to-one relationship. The physical model in an RDBMS
might create tables with primary keys (ProductID, OrderID) and foreign key constraints,
while in Cassandra, it might use a wide-row structure with OrderID as the partition key
and ProductID in a collection column.
2. Describe how Cassandra designs differ from RDBMS in terms of data modeling and
9
querying.
⦁ Cassandra:
⦁ Uses a wide-column store model, organizing data into tables with flexible,
sparse column families. Data is modeled around query patterns rather than
normalized relationships, using partition keys and clustering columns to
distribute data across nodes.
⦁ RDBMS:
⦁ Approach: Normalization (e.g., 3NF) ensures data integrity but may require
joins, which can complicate queries. For instance, linking "Orders" and
"Customers" tables via CustomerID.
Querying Differences:
⦁ Cassandra:
⦁ Uses CQL (Cassandra Query Language), a SQL-like language, but optimized for
high-throughput, write-heavy workloads. Queries are designed based on
partition keys, limiting flexibility (e.g., cannot query without specifying the
partition key unless indexed).
10
"SELECT * FROM SensorReadings WHERE SensorID = 'S001' AND Timestamp >
'2025-06-20'"), with tunable consistency (e.g., QUORUM).
⦁ Limitation: Joins are not supported; data must be pre-joined during modeling.
⦁ RDBMS:
⦁ Uses SQL with full support for complex queries, including joins, aggregations,
and subqueries (e.g., "SELECT * FROM Orders JOIN Customers ON
[Link] = [Link] WHERE OrderDate >
'2025-06-20'").
⦁ Flexibility: Offers rich querying capabilities but may incur latency with large
datasets or complex joins.
Example in Context:
In an IoT traffic monitoring system, Cassandra might model data with a table partitioned by
CameraID, storing timestamped vehicle counts, optimized for queries like "Get counts for
Camera C001 today." An RDBMS might normalize this into "Cameras" and "VehicleCounts"
tables, requiring a join to retrieve the same data, which could be slower with millions of
records.
3. What steps are involved in evaluating and refining a data model in Cassandra?
⦁ Analyze the application’s read and write requirements (e.g., time-series queries
for IoT sensor data). Identify primary access patterns, such as retrieving data by
SensorID and Timestamp, to design tables around these needs.
⦁ Create tables based on query patterns, selecting partition keys (e.g., SensorID)
to distribute data evenly and clustering columns (e.g., Timestamp) for ordering.
Define column families to denormalize data, avoiding joins. For example, a table
"SensorData" might include (SensorID, Timestamp, Value).
11
⦁ Test Data Distribution:
⦁ Load sample data and use tools like nodetool ring to check partition key
distribution. Ensure no hotspots (e.g., all data under one SensorID) to prevent
performance issues. Adjust partition keys if uneven distribution is detected.
⦁ Evaluate Performance:
⦁ Run benchmark tests with realistic workloads (e.g., 1 million inserts, 10,000
reads per second) using tools like Cassandra Stress Tool. Measure latency,
throughput, and error rates. For instance, check if a query on SensorID takes
<10ms.
⦁ Gather feedback from application performance and user experience (e.g., slow
dashboard updates). Redesign tables or add new ones (e.g., a summary table
for aggregated data) if query patterns change, such as adding weekly averages.
Example in Context:
For an IoT weather system, an initial model with a partition key of StationID and clustering by
Timestamp is tested. If queries for hourly averages are slow, a Materialized View is added to
precompute averages, and performance is re-evaluated, ensuring sub-second response times
12
for 1000 stations.
(15-Marks)
Keyspace Design:
⦁ Users Table
⦁ Structure:
sql
CollapseWrap
Copy
CREATE TABLE ecommerce_db.users (
13
user_id text PRIMARY KEY,
email text,
name text,
address text,
registration_date timestamp
);
⦁ Primary Key: user_id (partition key) ensures each user’s data is uniquely
distributed across nodes.
⦁ Products Table
⦁ Structure:
sql
CollapseWrap
Copy
CREATE TABLE ecommerce_db.products (
product_id text,
category text,
name text,
price decimal,
stock int,
);
⦁ Primary Key: Composite key with category (partition key) and product_id
(clustering column) to distribute data by category and order products
14
within each category.
⦁ Orders Table
⦁ Structure:
sql
CollapseWrap
Copy
CREATE TABLE ecommerce_db.orders (
order_id text,
user_id text,
order_date timestamp,
total_amount decimal,
status text,
⦁ Explanation: Supports queries like "Get last 10 orders for user123," with
CLUSTERING ORDER BY DESC for recent orders.
⦁ Order_Items Table
⦁ Structure:
sql
15
CollapseWrap
Copy
CREATE TABLE ecommerce_db.order_items (
order_id text,
product_id text,
quantity int,
unit_price decimal,
);
⦁ Primary Key: Composite key with order_id (partition key) and product_id
(clustering column) to store items per order.
Explanation of Design:
⦁ Partitioning Strategy: Partition keys (e.g., user_id, category) ensure even data
distribution, preventing hotspots in a system with 1 million users.
⦁ Example: For a user "user123" placing order "O001" with product "P001"
(electronics category), data is stored across nodes, with queries like "SELECT *
FROM orders WHERE user_id = 'user123' LIMIT 10" returning recent orders
efficiently.
16
philosophies, rooted in their intended use cases—transactional systems vs. distributed,
high-scale data stores.
Key Differences:
⦁ Data Model:
⦁ Scalability:
⦁ Consistency Model:
⦁ Querying Approach:
⦁ Data Distribution:
17
requiring manual sharding for distribution (e.g., MySQL with Galera
cluster).
⦁ Philosophical Goal: RDBMS prioritizes data integrity and complex querying for
transactional systems (e.g., e-commerce payments), while Cassandra focuses on
high availability and scalability for write-heavy, distributed systems (e.g., real-
time analytics).
Steps:
⦁ Gather application requirements (e.g., track IoT sensor data). Identify key
queries (e.g., "Get last hour’s readings for Sensor S001") to drive table
design.
⦁ Create keyspaces and tables based on queries. Define partition keys (e.g.,
SensorID) for data distribution and clustering columns (e.g., Timestamp)
for ordering. Example:
sql
18
CollapseWrap
Copy
CREATE KEYSPACE sensor_db WITH replication = {'class':
'NetworkTopologyStrategy', 'dc1': 3};
sensor_id text,
timestamp timestamp,
value float,
⦁ Load sample data (e.g., 1 million sensor readings) using tools like cqlsh or
a data generator. Use nodetool ring to verify even distribution across
nodes, adjusting partition keys if hotspots occur (e.g., too many readings
for one SensorID).
⦁ Performance Evaluation:
19
Copy
CREATE MATERIALIZED VIEW sensor_db.readings_by_time AS
⦁ Integrate the model with the application (e.g., a Java app using DataStax
driver). Perform end-to-end tests (e.g., simulate 10,000 users querying
sensor data) to validate query performance and data integrity.
⦁ Deploy the model to a production cluster using a CI/CD pipeline (e.g., via
Terraform or Ansible). Monitor post-deployment with tools like Grafana,
tracking metrics like read/write latency and node health. Set alerts for
anomalies (e.g., latency > 50ms).
⦁ Continuous Refinement:
⦁ Collect user and system feedback (e.g., slow dashboard updates). Refine
by adding new tables (e.g., aggregated hourly data) or adjusting TTL (e.g.,
expire data after 30 days) to manage storage.
Example in Context:
For an IoT weather system, the process starts with designing a readings table, populating
it with 1 million records, and testing 100 reads/second. If latency exceeds 10ms, a
Materialized View for hourly averages is added. After deployment to a 5-node cluster,
monitoring reveals a hotspot, prompting a partition key change to (SensorID, Region),
ensuring scalability and efficiency.
20
UNIT-3
10-Marks
1. Explain the role of the [Link] file in Cassandra configuration. Discuss at least five
critical properties.
Critical Properties:
⦁ rpc_address: Defines the IP address for client connections (e.g., "[Link]" for all
interfaces). This enables clients to query the node, critical for real-time order processing
tonight.
⦁ num_tokens: Determines the number of virtual nodes (vnodes) per physical node
(default 256). Increasing this (e.g., to 512) improves data distribution across a 10-node
cluster handling 1 million orders.
Explanation:
21
The [Link] file acts as the central control point, allowing fine-tuning to match workload
demands. For instance, during the current sale, adjusting num_tokens and endpoint_snitch
ensures balanced data distribution and efficient cross-region replication, while listen_address
and rpc_address maintain cluster connectivity and client access.
⦁ data_file_directories:
⦁ Purpose: Specifies the directories where SSTables (on-disk data files) are stored
(e.g., "/var/lib/cassandra/data"). Multiple directories can be listed for load
balancing across disks.
⦁ commitlog_directory:
⦁ Purpose: Defines the location for the commit log, a durable record of all writes
before they are flushed to SSTables (e.g., "/var/lib/cassandra/commitlog").
⦁ Importance: Isolating the commit log on a separate, fast device (e.g., another
SSD) improves write performance and recovery speed after a node restart,
critical for maintaining order integrity during peak traffic.
⦁ hints_directory:
⦁ Purpose: Stores hint files that track data for nodes that are temporarily down,
enabling repair when they recover (e.g., "/var/lib/cassandra/hints").
22
partitions (e.g., a datacenter outage), allowing seamless operation and
recovery, which is vital for real-time e-commerce updates.
Explanation:
Configuring these directories on separate drives (e.g., SSD for data_file_directories, another for
commitlog_directory) leverages I/O parallelism, enhancing throughput and fault tolerance. For
example, during tonight’s sale, a well-configured setup ensures quick writes to the commit log
and efficient SSTable access, while hints facilitate recovery if a node fails under load.
3. Discuss the use and significance of [Link] in JVM tuning for Cassandra
performance optimization.
⦁ Memory Configuration:
⦁ Use: Sets JVM heap size via MAX_HEAP_SIZE and HEAP_NEWSIZE (e.g.,
MAX_HEAP_SIZE="8G" for an 8GB heap).
23
XX:ParallelGCThreads=4") based on CPU cores.
Explanation:
The [Link] file bridges Cassandra’s performance with JVM capabilities, allowing
customization for workload-specific needs. For instance, during tonight’s sale, increasing
MAX_HEAP_SIZE to 8GB and using G1GC ensures the cluster handles 100,000 writes/second
without GC-related delays, while thread tuning maximizes CPU utilization across nodes.
15-Marks
Overview:
Deploying Cassandra in a production multi-node cluster requires a robust configuration strategy
to ensure scalability, high availability, and performance, especially for a high-traffic scenario like
an e-commerce sale at 10:43 PM IST on June 20, 2025. This strategy balances resource
allocation, data durability, and network efficiency across a 5-node cluster.
Configuration Strategy:
⦁ Cluster Name:
⦁ Rationale: A unique name ensures nodes join the intended cluster, preventing
misconfiguration during the sale peak with 1 million orders.
⦁ Seed Nodes:
24
⦁ Setting: seed_provider: [{class_name:
[Link], parameters: [{seeds:
"[Link],[Link]"}]}] in [Link].
⦁ IP Settings:
⦁ Directory Separation:
⦁ Memory Tuning:
25
⦁ -XX:+UseG1GC for the G1 garbage collector, -XX:MaxGCPauseMillis=200
to limit pauses.
⦁ Rationale: Tuning heap size and GC prevents out-of-memory errors, while G1GC
minimizes latency (<200ms) for real-time queries, supporting 10,000 concurrent
users.
Implementation Steps:
⦁ Configure [Link] on each node with the above settings, ensuring consistency.
⦁ Start nodes sequentially, verifying cluster status with nodetool status at 10:43 PM IST.
Example:
During the sale, the 5-node cluster handles 1 million orders, with seed nodes facilitating join,
SSDs ensuring fast data access, and JVM tuning maintaining sub-10ms latency for order lookups.
Overview:
Modifying Cassandra to handle large-scale workloads (e.g., 1 million orders at 10:43 PM IST on
June 20, 2025) involves adjusting configuration files to optimize performance, scalability, and
reliability. This process ensures the system adapts to increased data volume and concurrency.
Step-by-Step Process:
⦁ Update [Link]:
26
endpoint_snitch to "GossipingPropertyFileSnitch" for multi-data-center support,
and adjust concurrent_writers to 32 for high write throughput.
⦁ Adjust [Link]:
⦁ Importance: Ensures data locality and fault tolerance across regions, critical for
global order processing.
⦁ Use Cassandra Stress Tool to simulate 1 million writes and 100,000 reads,
monitoring latency and throughput with OpsCenter.
⦁ Roll out changes to the 5-node cluster, restarting nodes with nodetool drain
and nodetool start. Monitor with Grafana for metrics (e.g., CPU > 80%, latency >
27
50ms).
Example:
For the sale, increasing num_tokens and heap size in [Link] and [Link]
allows the cluster to scale to 1.5 million orders, while [Link] prevents log saturation, and
[Link] ensures data availability across India-South and US-East.
Overview:
Incorrect configuration in [Link] can lead to performance degradation, data loss, or
cluster instability, especially under the high load of an e-commerce sale at 10:43 PM IST on June
20, 2025. Understanding these implications is key to maintaining a 5-node cluster handling 1
million orders.
Implications:
⦁ Data Inconsistency: Improper replication or consistency settings may lead to data loss
or unavailability.
⦁ Cluster Instability: Wrong network or seed node settings can cause nodes to fail to join,
disrupting operations.
⦁ Resource Exhaustion: Poor directory or memory settings can exhaust disk space or
memory, crashing nodes.
⦁ Implication: Orders for 500,000 users may fail to sync, causing transaction
28
losses.
⦁ Effect: Causes uneven data distribution, creating hotspots on nodes with fewer
tokens. This overloads disk I/O, increasing latency to >50ms for order queries.
⦁ Implication: Results in lost sales and customer complaints during the peak sale.
⦁ Effect: Forces all SSTables onto a single disk, exhausting 1 TB space with 10 TB
of order data. This causes node crashes and data unavailability.
Mitigation:
Regular validation with nodetool status and monitoring with Grafana can detect these issues.
For example, adjusting listen_address to a node-specific IP and adding multiple
data_file_directories on SSDs ensures stability and performance during the sale.
Example:
If num_tokens is misconfigured to 16, a node handling 2 TB of orders may crash, while correct
configuration to 256 balances the load, maintaining sub-10ms latency for 1 million orders at
10:43 PM IST.
29
UNIT-4
10-Marks
1. Explain the output and significance of the 'nodetool status' command in Cassandra.
⦁ Status: A letter indicating node state (e.g., "UN" for Up/Normal, "UD" for Up/Leaving,
"DN" for Down/Normal).
⦁ State: The node’s role (e.g., "N" for Normal, "L" for Leaving).
⦁ Load: The amount of data stored on the node (e.g., 1.23 TB).
⦁ Example Output:
text
CollapseWrap
Copy
Datacenter: dc1
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
30
UN [Link] 1.25 TB 256 20.0% 550e8400-e29b-11d4-a716-446655440001 rack1
Significance:
⦁ Cluster Health: Indicates operational status (e.g., "UN" nodes are healthy, "DN" nodes
need investigation), critical for ensuring 1 million e-commerce orders process smoothly
at 10:49 PM IST.
⦁ Load Balancing: Monitors data distribution (e.g., 1.23 TB vs. 1.20 TB) to detect
imbalances, preventing hotspots during peak traffic.
⦁ Fault Detection: Identifies down nodes (e.g., [Link]) for repair or replacement,
ensuring high availability.
⦁ Capacity Planning: Helps assess if the 5-node cluster can handle 10 TB of data, guiding
node additions.
⦁ Example: During tonight’s sale, nodetool status reveals a "DN" node, prompting
immediate repair to maintain order processing for 10,000 concurrent users.
2. Describe how the 'nodetool info' command helps administrators monitor Cassandra nodes.
31
⦁ Heap Memory Usage: Current and maximum heap (e.g., 6G/8G).
⦁ Example Output:
ID : 550e8400-e29b-11d4-a716-446655440000
Load : 1.23 TB
Generation No : 1624200000
Uptime : 10d 2h
Rack : rack1
⦁ Health Monitoring: Confirms gossip and Thrift activity, ensuring node communication
and client access, vital for 100,000 writes/second during the sale.
⦁ Resource Utilization: Tracks heap usage (e.g., 6G/8G) to detect memory pressure,
guiding JVM tuning to avoid crashes under 10,000 users.
⦁ Load Assessment: Monitors data load (e.g., 1.23 TB) to identify storage constraints,
aiding capacity planning for 10 TB of order data.
⦁ Uptime and Stability: Verifies uptime (e.g., 10d 2h) to assess reliability, critical for
32
uninterrupted e-commerce operations.
⦁ Troubleshooting: Identifies issues (e.g., "Native Transport active: false") for quick
resolution, preventing downtime.
⦁ Example: At 10:49 PM IST, nodetool info shows 90% heap usage, prompting an increase
to 12G in [Link] to handle the sale peak.
3. List and Explain the Different Thread Pool Metrics Available Through 'nodetool tpstats'.
⦁ ReadStage:
⦁ Metrics: Active, Pending, Completed, Blocked tasks (e.g., Active: 5, Pending: 2).
⦁ Significance: High pending tasks (e.g., 10) indicate read bottlenecks, suggesting
index optimization for 100,000 order lookups.
⦁ MutationStage:
⦁ Significance: Excessive pending writes (e.g., 15) during 1 million order inserts
signal the need to increase concurrent_writers in [Link].
⦁ CompactionExecutor:
33
traffic.
⦁ RequestResponseStage:
⦁ GossipStage:
⦁ MigrationStage:
⦁ Significance: Pending tasks (e.g., 2) during schema updates can disrupt order
processing, requiring off-peak execution.
⦁ Example Output:
ReadStage 5 2 100000 0
MutationStage 8 3 150000 0
CompactionExecutor 2 1 5000 0
RequestResponseStage 3 0 80000 0
GossipStage 1 0 1000 0
MigrationStage 0 0 10 0
Explanation:
34
These metrics help diagnose performance issues. For instance, at 10:49 PM IST, 3 pending
MutationStage tasks during 1 million order writes suggest increasing concurrent_writers to 32,
while 2 pending ReadStage tasks indicate indexing needs for faster order retrieval, ensuring
sub-10ms latency for 10,000 users.
15-Marks
⦁ nodetool status:
⦁ Demonstration
⦁ Action: Restart the node with nodetool start or repair data with nodetool repair to
restore availability for 500,000 order processes.
nodetool info:
35
⦁ Scenario: Heap memory usage reaches 90% (7.2G/8G) on node [Link], slowing
order queries.
⦁ Demonstration:
nodetool tpstats:
⦁ Demonstration:
36
⦁ Action: Increase concurrent_writers to 32 in [Link] and retest to
reduce pending tasks.
⦁ nodetool repair:
⦁ Action: Schedule during off-peak (e.g., 2 AM IST) to avoid impacting the sale,
ensuring data integrity.
⦁ nodetool cleanup:
Explanation:
These commands enable proactive maintenance, ensuring the cluster handles the sale load at
10:56 PM IST. For example, repair restores consistency, while cleanup optimizes space,
maintaining sub-10ms latency for order processing.
2. Analyze a Scenario Where nodetool tpstats Reveals Thread Pool Congestion. What
Corrective Actions Would You Recommend?
Scenario Analysis:
At 10:56 PM IST on June 20, 2025, during an e-commerce sale with 1 million orders, nodetool
tpstats on a 5-node cluster shows:
37
⦁ Observations:
⦁ Root Causes:
⦁ Insufficient thread pool sizes, high I/O contention, or inadequate JVM memory
(e.g., 8G heap at 90% usage) under peak load.
Corrective Actions:
38
pending reads and 30 pending writes.
⦁ Rationale: Reduces queue buildup, ensuring sub-10ms latency for reads and
writes.
⦁ Rationale: Frees I/O resources, improving read performance for order queries.
⦁ Use nodetool status to check load (e.g., 1.25 TB vs. 1.20 TB) and nodetool
cleanup on uneven nodes.
⦁ Add a node if pending tasks exceed 50 after tuning, verified by nodetool tpstats.
Implementation:
Apply changes, restart nodes with nodetool drain and nodetool start, and re-run nodetool
tpstats to confirm pending tasks drop (e.g., to 5 for MutationStage). Monitor with Grafana to
ensure stability.
Example:
After tuning, ReadStage pending drops to 5, and MutationStage to 10, reducing latency to 5ms,
ensuring smooth order processing for the sale peak.
3. Explain the Role of Gossip in Cassandra. How Can nodetool info Help Verify Its Status and
39
Issues?
⦁ Node Discovery: New nodes join the cluster by contacting seed nodes, sharing state via
gossip messages.
⦁ State Propagation: Each node periodically (every second) exchanges state information
(e.g., uptime, load) with up to three random peers, ensuring all nodes have a consistent
view.
⦁ Failure Detection: Marks nodes as down if they miss heartbeats, triggering repairs (e.g.,
hint replay).
⦁ Example: If node [Link] fails, gossip updates other nodes, enabling the cluster to
redistribute 1.20 TB of data.
⦁ Issue Detection: Gossip active: false signals a failure, possibly due to network
issues or listen_address misconfiguration, halting state updates.
⦁ Verification: Consistent load and uptime across nodes (e.g., 1.20–1.25 TB, 10d
uptime) suggest healthy gossip, balancing 10 TB of data.
⦁ Issue Detection: Divergent values (e.g., 0 TB load) indicate a node isn’t receiving
gossip, possibly due to a seed node failure.
40
⦁ Thrift and Native Transport Status:
⦁ Issue Detection: false values with active gossip suggest internal inconsistencies,
requiring nodetool drain and restart.
⦁ Actionable Insights:
⦁ Example: At 10:56 PM IST, nodetool info shows Gossip active: true and Load:
1.23 TB, confirming healthy gossip, while a false state would trigger a network
diagnostic.
Explanation:
Gossip ensures cluster cohesion, and nodetool info provides a real-time health check. For the
sale, verifying gossip status prevents data inconsistency, ensuring all 1 million orders are
processed reliably.
41
UNIT-5
10-marks
1. Explain the role and purpose of the commit log and memtable in Cassandra's write path.
Commit Log:
⦁ Role: The commit log is a durable log file that records all write operations before they
are applied to the memtable. It is stored on disk (e.g., /var/lib/cassandra/commitlog)
and serves as a crash-recovery mechanism.
42
Memtable:
⦁ Role: The memtable is an in-memory data structure (e.g., a sorted skip list) that holds
recent write data before it is flushed to disk as an SSTable. It resides in the JVM heap.
⦁ Purpose: Provides fast write performance by allowing in-memory storage and quick
lookups for recent data. During the sale peak, the memtable handles 100,000 order
writes/second, enabling sub-10ms latency for updates.
⦁ Process: Writes are first written to the commit log, then inserted into the memtable.
Once the memtable reaches a size threshold (e.g., 128MB) or a time limit, it is flushed
to an SSTable.
⦁ The write path begins with a client request, which is logged in the commit log for
durability and stored in the memtable for performance. This dual mechanism ensures
both speed (memtable) and reliability (commit log), critical for handling 1 million orders
tonight.
2. Describe the flushing process in Cassandra. When does it happen, and what are the
consequences?
Flushing Process:
The flushing process in Cassandra involves transferring data from the memtable to an SSTable
on disk. It occurs in the following steps:
⦁ When the memtable reaches a configurable size threshold (e.g., 128MB by default, set
via memtable_flush_writers in [Link]) or a time interval (e.g., every 10
minutes, controlled by memtable_flush_period_in_ms), it is marked as immutable.
⦁ A new memtable is created to handle incoming writes, while the old memtable is
written to disk as an SSTable in the data_file_directories.
⦁ The commit log entries for the flushed data are cleared to free space, ensuring the log
doesn’t grow indefinitely.
When It Happens:
⦁ Size-Based Trigger: Occurs when the memtable size exceeds the threshold, e.g., after
100,000 order writes accumulate 150MB of data at 11:07 PM IST.
43
⦁ Time-Based Trigger: Happens periodically (e.g., every 10 minutes) even if the size limit
isn’t reached, ensuring regular data persistence.
⦁ Manual Trigger: Administrators can force flushing with nodetool flush during
maintenance, e.g., to stabilize the cluster under load.
Consequences:
⦁ Positive:
⦁ Frees up memory in the JVM heap, preventing out-of-memory errors during the
sale peak with 10,000 concurrent users.
⦁ Persists data to SSTables, enabling durable storage and reducing commit log
size for faster recovery.
⦁ Negative:
⦁ Triggers compaction later, which can consume CPU and I/O resources, slowing
reads if not tuned (e.g., via compaction_throughput_mb_per_sec).
⦁ Example: If flushing occurs for 200MB of order data, it ensures durability but may
temporarily delay reads, requiring SSD optimization to maintain sub-10ms performance.
3. List and Describe the Main Components of an SSTable and Their Purposes.
⦁ Description: Contains the actual data in a sorted order based on the partition
key and clustering columns (e.g., order data sorted by user_id and order_date).
44
efficient read access. For the 1 million orders at 11:07 PM IST, it holds 10 TB of
sorted records.
⦁ Description: A table of contents listing all component files (e.g., .db, .index) for
the SSTable.
45
Explanation:
These components work together to balance storage efficiency and query performance. For
instance, during the sale, the Bloom filter and index file enable rapid order retrieval, while the
statistics file aids in optimizing compactions, maintaining cluster health under load.
15-Marks
1. Draw and Explain the Complete Write Path in Cassandra, Including Commit Log, Memtable,
and Flushing to SSTables.
⦁ Another arrow points to "Memtable" (an in-memory structure), where the write is
inserted.
From the Memtable, an arrow labeled "Flush Trigger (Size/Time)" leads to "SSTable"
(on-disk file, e.g., /var/lib/cassandra/data), with a feedback loop to "Clear Commit Log"
after flushing.
⦁ Step 1: Client Write Request: A write operation (e.g., inserting an order with user_id,
order_date, and total_amount) arrives from a client during the e-commerce sale peak.
⦁ Step 2: Commit Log Update: The write is synchronously appended to the commit log on
disk, ensuring durability. For 100,000 writes/second, this guarantees recovery if a node
fails, protecting 1 million orders.
⦁ Step 3: Memtable Insertion: The write is added to the memtable, a sorted in-memory
structure (e.g., a skip list), enabling fast lookups and writes with sub-10ms latency. The
memtable grows with each write.
⦁ Step 4: Flush Trigger: When the memtable reaches a size threshold (e.g., 128MB) or a
time limit (e.g., 10 minutes), it becomes immutable. A new memtable handles new
writes, and the old one is scheduled for flushing.
46
persists data for 10 TB of order data.
⦁ Step 6: Commit Log Clearance: After successful flushing, the corresponding commit log
segments are deleted, freeing space and preparing for the next cycle.
⦁ Example: At 11:13 PM IST, an order for "user123" is logged, inserted into the
memtable, and flushed to an SSTable when it hits 150MB, ensuring durability and
performance for 10,000 concurrent users.
Significance:
This write path balances speed (memtable) and reliability (commit log), with flushing ensuring
persistent storage, critical for handling the sale’s high throughput.
2. Illustrate and Discuss How Cassandra Reads Data with Optimizations Like Bloom Filters,
Row Cache, and Compression Maps.
⦁ "SSTable Data File" with "Compression Map" to decompress and retrieve rows.
An arrow loops back to "Row Cache" for future reads if data is cached.
⦁ Step 1: Client Read Request: A query (e.g., SELECT * FROM orders WHERE user_id =
'user123') arrives at 11:13 PM IST during the sale.
⦁ Step 2: Bloom Filter Check: The Bloom filter, an in-memory probabilistic data structure,
checks if the partition key exists in the SSTable. If likely present (low false positive rate),
it proceeds; if not, it skips the SSTable, reducing I/O for 10,000 queries/second.
⦁ Step 3: Partition Key Lookup: The index file maps the key to an offset in the SSTable
data file, enabling a seek to the relevant data range, optimizing read latency to
sub-10ms.
47
⦁ Step 4: Row Cache Utilization: If the requested row (e.g., recent order for "user123") is
in the row cache (configured via row_cache_size_in_mb), it is returned directly from
memory, bypassing disk I/O for frequent accesses.
⦁ Step 5: SSTable Data Retrieval with Compression Map: If not cached, the data file is
accessed. The compression map (stored in the SSTable) provides offsets for
decompressing compressed blocks, retrieving the row efficiently from 10 TB of data.
⦁ Step 6: Cache Update: The retrieved row is added to the row cache for future reads,
improving performance for repeated queries.
⦁ Example: For "user123"’s orders, the Bloom filter skips irrelevant SSTables, the index
locates the data, and the row cache serves a recent order, while the compression map
handles a 1GB compressed block, ensuring fast response times.
Discussion of Optimizations:
⦁ Bloom Filters: Reduce unnecessary disk reads, critical for large datasets (e.g., 10 TB),
though false positives may increase I/O slightly.
⦁ Row Cache: Enhances read performance for hot data (e.g., popular products), but
requires tuning to avoid memory pressure (e.g., 512MB limit).
⦁ Compression Maps: Minimize storage (e.g., 2:1 ratio) and I/O by decompressing only
needed blocks, though excessive compression can slow reads if CPU-bound.
⦁ Trade-Offs: These optimizations prioritize speed but require careful configuration (e.g.,
bloom_filter_fp_chance at 0.01) to balance accuracy and performance during the sale
peak.
3. Analyze the Role of SSTables in Cassandra. What Makes Them Efficient for Reads and How
Are They Structured?
⦁ Data Persistence: Store all committed data (e.g., 10 TB of e-commerce orders at 11:13
PM IST) durably on disk.
⦁ Read Optimization: Enable efficient data retrieval through indexing and sorting.
48
⦁ Compaction Support: Facilitate merging and cleanup of data during compaction,
maintaining performance.
⦁ Sorted Structure: Data is sorted by partition key (e.g., user_id), allowing binary search-
like access, reducing seek time to sub-10ms.
⦁ Index File: Provides quick offset lookups, minimizing disk scans for large datasets (e.g., 1
million orders).
⦁ Bloom Filters: Pre-filter non-existent keys, cutting I/O by up to 90% for irrelevant
SSTables.
⦁ Compression: Reduces storage (e.g., 2:1 ratio) and I/O, with compression maps
enabling block-level decompression, optimizing read bandwidth.
⦁ Row Cache Integration: Caches frequently accessed rows in memory, bypassing disk for
hot data (e.g., recent orders).
⦁ Immutability: Once written, SSTables are read-only, avoiding write locks and enabling
parallel reads across nodes.
Structure of SSTables:
⦁ Data File (.db): Contains the sorted key-value data (e.g., user_id, order_date,
total_amount), persisted from the memtable. It holds the bulk of 10 TB of order data.
⦁ Index File (.index): Stores a summary of partition keys with offsets, enabling rapid
location of data ranges (e.g., offset for user123’s orders).
⦁ Filter File (.bloom): A Bloom filter predicting key existence, reducing unnecessary reads
(e.g., checks for user999).
⦁ Statistics File (.stats): Metadata like min/max keys and row counts, aiding compaction
and repair (e.g., min key = 'user001').
⦁ Summary File (.summary): A sampled index of keys at intervals (e.g., every 100th key),
optimizing memory usage for range queries.
49
compression map for block offsets.
⦁ TOC File (.toc): Lists all component files, ensuring integrity during maintenance.
⦁ Example: An SSTable for "user123"’s orders includes a .db file with 100 rows, an .index
for offsets, and a .bloom filter, enabling a 5ms read for the latest order.
Analysis:
SSTables’ efficiency stems from their immutable, sorted nature and supporting files, which
minimize I/O and leverage memory caches. However, their read performance depends on
proper indexing and compression tuning (e.g., compression_chunk_length_kb at 64KB). During
the sale, their structure ensures scalability, but excessive SSTable growth requires compaction
to prevent read degradation.
50