0% found this document useful (0 votes)
66 views69 pages

In-Memory Data Structure Store - Major Report

Uploaded by

nikhil602104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views69 pages

In-Memory Data Structure Store - Major Report

Uploaded by

nikhil602104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

IN MEMORY DATA STRUCTUE STORE

A PROJECT REPORT
Submitted by
NAVEEN PAUDEL(211B192)
NIKHIL SAHU(211195)
PANSHUL KHURCHWAL(211B202)
UNDER THE GUIDANCE OF:

DR AJAY KUMAR

November 2024
Submitted in the fulfilment for the award of the degree
of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Department of Computer Science of Engineering
JAYPEE UNIVERSITY OF ENGINEERING & TECHNOLOGY,
AB ROAD, RAGHOGARH,DT.GUNA-473226 MP, INDIA

1
DECLARATION

We hereby declare that the work reported in 7th semester Major project entitled “In Memory
DataStructure store”, in partial fulfillment for the award of the degree of B.Tech. submitted at
Jaypee University of Engineering and Technology, Guna, as per the best of our knowledge
and belief there is no infringement of intellectual property rights and copyright. In case of any
violation, we will solely be responsible.

Signature of Students

NAVEEN PAUDEL(211B192)
NIKHIL SAHU(211195)
PANSHUL KHURCHWAL(211B202)

Department of Computer Science and Engineering,

Jaypee University of Engineering and Technology, Guna,473226

Date:

2
JAYPEE UNIVERSITY OF ENGINEERING & TECHNOLOGY
Accredited with Grade-A+ by NAAC & Approved U/S 2(f) of the UGC Act, 1956
A.B. Road, Raghogarh, District Guna (MP), India, Pin-473226 Phone: 07544
267051, 267310-14, Fax: 07544 267011
Website: www.juet.ac.in

CERIFICATE

This is to certify that the work titled “In Memory DataStructure store ” submitted
by “Naveen Paudel(211b192), Nikhil Sahu(211b195), Panshul Khurchwal(211b202))” in
partial fulfillment for the award of degree of Bachelor of Technology (Computer science
Engineering) of Jaypee University of Engineering & Technology, Guna has been carried out
under my supervision. As per best of my knowledge and belief there is no infringement of
intellectual property right and copyright. Also, this work has not been submitted partially or
wholly to any other University or Institute for the award of this or any other degree or
diploma. In case of any violation concern student will solely be responsible.

Signature of Supervisor

(Name of Supervisor)

Designation

Date

3
ACKNOWLEDGEMENT

Any endeavour cannot lead to success unless and until a proper platform is provided for
the same. This is the reason why we find ourselves very fortunate to complete our work of
major project under the supervision of DR. AJAY KUMAR. Our sincere gratitude to him,
for having faith in us and thus allowing us to carry out a project on a technology
completely new to us, for which we had to research and learn many new things, which
will help us dealing with advanced work in future. He helped immensely by guiding us
throughout the project.

Secondly, we would like to thank the Department of Computer Science & Engineering that
created this opportunity.

Last but not the least, we would like to thank our family and friends who continuously
encouraged and helped us in any of the possible way they could.

4
Executive Summary

The advent of digital transformation and the proliferation of data-driven applications have
underscored the need for high-performance, low-latency database systems capable of
processing vast amounts of information in real-time. Traditional disk-based databases, while
reliable, often fall short in scenarios requiring instantaneous data access and rapid event
processing. In response to these limitations, this project explores the development and
implementation of an In-Memory Database (IMDB), a system designed to revolutionize
how data is stored, accessed, and managed in real-time environments.

 The IMDB operates by storing data directly in Random Access Memory (RAM) rather
than on traditional disk storage. This design eliminates the latency associated with disk
I/O operations, enabling faster query responses and higher throughput. By leveraging
RAM’s speed, the IMDB is particularly well-suited for applications where
performance is paramount, such as real-time analytics, IoT data streams, financial
transactions, and online gaming environments. This report provides a comprehensive
analysis of the IMDB's architecture, features, implementation, and its transformative
impact on modern computing systems.

 A key aspect of the project was designing a hybrid storage architecture that balances
the speed of in-memory storage with the durability of persistent storage. To address
concerns about reliability and fault tolerance, the database integrates features like
replication, automated failover, and snapshotting, ensuring data consistency and
availability even in the event of system failures.

 In conclusion, this project not only addresses the limitations of existing database
technologies but also sets a foundation for the next generation of real-time data
management systems. The IMDB's innovative features and architecture make it an
indispensable tool for modern, data-intensive applications, underscoring its role as a
cornerstone in the evolution of high-performance computing. The findings and
advancements outlined in this report pave the way for future research and
development, with potential enhancements including integration of non-volatile
memory (NVM) and predictive analytics to further extend its capabilities. This project
represents a significant contribution to the field of database systems, highlighting the
transformative power of in-memory technology in meeting the demands of a data-
driven world.

5
List of figures

S.NO Fig.no Fig Name Page NO


1 2.1 On Disk database storing 18
2 2.2 Memory-first Database store 21
3 3.1 Load Parameters 29
4 3.2 Compaction of a key-value update log 31
5 3.3 Performing compaction and secgment 32
6 3.4 An SSTable with an in – memory index 32
7 3.5 Growing a B-tree by splitting a page. 33
8 3.6 Leader-based(master–slave)replication. 35

6
TABLE OF CONTENTS

Cover Page ……………………………………………………………………………… i


Declaration of the Student ……………………………………………………............... ii
Certificate of the Guide …………………………………………………………………... iii
Acknowledgement ………………………………………………………………………. iv
Executive Summary ………………………………………………………………………. V
List of Figure ……………………………………………………………………………… vi
Table of Contents …………………………………………………………………………. vii

1. INTRODUCTION ……………………………………………………………………. 8
1.1 Problem Definition …………………………………………………………… 8
1.2 Project Overview …………………………………………………………….. 11
1.3 Hardware Specification ………………………………………………………. 11
1.4 Software Specification ……………………………………………………….. 12
2. LITERATURE REVIEW…..………………………………………………………. 14
2.1 Overview Of Existing Research on in Memory database…….………………. 14
2.2 Comparison with Traditional databases……….……..………………………… 22
2.3 Examples of data systems……………………………………………………… 25
3. METHODOLOGIES……………………………………………………………….. 27
3.1 Foundations of Data systems ….……………………………………………… 27
3.2 Design and Architecture….…………………………………………………… 29
3.3 DATA Distributing Techniques.……………………………………………… 32
4. RESULTS & DISCUSSIONS………………………………………………………. 38
5. CONCLUSIONS ……………………………………………………………………… 64
6. REFERENCES ……………………………………………………………………….. 66
7. STUDENTS PROFILE ………………………………………………………………. 27

7
Chapter1
INTRODUCTION

In-Memory Data Structure Databases (IMDSDBs) are a class of databases that store data
entirely in a computer's main memory (RAM) rather than on disk. This design allows for
faster data access and processing, so IMDSDBs are suitable for applications that are in need
of high-speed operations: real-time analytics and caching are two good examples. Improving
performance considerably and reducing latency can be attained by minimizing disk I/O. While
IMDSDBs offer great speed advantages, there are usually some sacrifices involved with
regards to memory usage and durability, as data in RAM is more volatile than that in
traditional disk-based systems.

1.1 Problem Definition

With the increase in data-intensive applications, databases that have long seemed to be
based on disk-based relational models that have been the backbone of data management
for many years have been challenged as to their performance and scalability.A significant
limitation of conventional databases is the issue of disk I/O latency. In fact, due to data
storage on physical disks, all read or write operations imply a disk access, which is
considerably slower than the retrieval time from memory. This is particularly critical in
high-throughput applications like real-time analytics, online transactions, or gaming,
where latency in the order of microseconds can impair performance or user experience.

Although the gap between disk and memory speeds has narrowed significantly through
improvements like Solid-State Drives, it remains considerable. Another limitation is
scalability. Relational databases are optimized for vertical scaling; therefore, better
performance often requires the hardware in servers to be upgraded-to more powerful
processors and more memory, for example. However, vertical scaling is limited by
physical and economic constraints, which limits its use in handling the large amounts of
data in modern large-scale applications and their increasing needs.
For instance, in case where data in the cache and database go unsynchronized properly, it
normally results in recovery of incorrect results for queries or indeed system crashes. One
major disadvantage in traditional database systems relates to the supporting of real-time
analytics. The highest dependency of such a disk-oriented architecture and the use of
batch processing methods on complex queries make traditional systems not suitable for
8
applications requiring instantaneous insights, like fraud detection or recommendation
algorithms. Interactions with large data sometimes incur heavy overheads, which multiply
the latency related to actionable insights exponentially.

The ultimate issues relate to cost and maintenance. Enhancing the performance of
conventional databases frequently requires substantial investments in hardware upgrades
and alternative software approaches, such as data warehouses or distributed architectures.
Such configurations contribute to increased expenses and complexities associated with
deployment, implementation, and ongoing maintenance. Additionally, the reliability and
resilience of traditional databases heavily rely on the replication of data across various
servers, which further escalates the total expenditure. In the current environment of
overwhelming needs for instant access to vast and diversified datasets, conventional
databases are inappropriate for addressing these challenges. The inherent design
restriction of such systems combined with the growing need for fast, agile, and scalable
solutions has led to a growing desire for modern alternatives, namely in-memory
databases, which in turn raise a sequence of interlinked challenges.

1.2 Project Overview

The In-Memory Database is a high-performance data management system that eliminates


the bottlenecks associated with disk storage by leveraging the speed and efficiency of
RAM. It is specifically designed to deliver sub-millisecond query response times and
support millions of concurrent events per second. The system integrates modern
architectural principles, such as event-driven design and adaptive indexing, to ensure
optimal performance across a variety of use cases.

Purpose and Goals


The primary purpose of this project is to provide a robust and scalable database solution
that can handle real-time data ingestion, storage, and querying with minimal latency. The
goals include:

 Enhancing Performance: Achieve near-instantaneous data processing by eliminating

traditional disk-based delays.

 Ensuring Scalability: Design a horizontally scalable architecture capable of

accommodating increasing workloads.

9
 Guaranteeing Reliability: Incorporate mechanisms for data durability and fault

tolerance, such as replication and failover systems.

 Supporting Real-Time Applications: Develop a database that meets the needs of

applications requiring immediate data access and analytics.

Key Features
 Hybrid Storage Model: Combines in-memory data storage for speed with persistent

backups to ensure data durability.

 Event-Driven Processing: Facilitates real-time handling of event streams, ideal for

IoT and analytics workloads.

 Adaptive Indexing: Optimizes query execution by dynamically adjusting indexing

strategies based on data usage patterns.

 Fault Tolerance: Includes replication, snapshotting, and automated recovery

mechanisms to ensure high availability.

 Scalability and Extensibility: Supports distributed systems to enable seamless

scaling without compromising performance.

Implementation
The project involves the following stages:
 System Design: Development of a modular architecture to support high performance

and flexibility.

 Technology Selection: Utilization of high-performance programming languages and

frameworks tailored for real-time applications.

 Prototype Development: Building a functional prototype that integrates the key

features of the IMDB.

 Testing and Benchmarking: Evaluating system performance under different

scenarios, including stress testing and comparative analysis with existing solutions.

10
 Deployment and Validation: Implementing the database in a simulated real-world

environment to assess its effectiveness and reliability.

Significance
The IMDB developed in this project addresses critical gaps in existing database systems,
particularly for applications that demand real-time data access and processing. By
combining speed, scalability, and reliability, it offers a transformative solution for
industries that rely on rapid decision-making and high-frequency data analysis. Examples
include fraud detection systems in financial services, event monitoring in IoT, and
matchmaking algorithms in gaming platforms.

This project establishes a foundation for advancing in-memory database technology,


paving the way for future innovations that could include machine learning-driven
indexing, integration with non-volatile memory (NVM), and further enhancements in fault
tolerance. As a result, it contributes not only to the academic understanding of database
systems but also to practical advancements in modern data-driven technologies.

1.3 Hardware Requirements



Processor:

o Type: Multi-core CPU (Intel i5/i7 or AMD Ryzen 5/7 or higher)

o Reason: To efficiently handle concurrent processing and simulate high-


throughput workloads.

 Memory (RAM):
o Size: Minimum 8 GB (32 GB recommended for larger data sets)

o Reason: The database operates entirely in memory, requiring sufficient RAM


to store and process data in real-time.

 Storage:

o Type: SSD (Solid State Drive) with at least 256 GB capacity

o Reason: For persistence mechanisms like snapshot storage and logs; faster
storage ensures quick data retrieval during restarts.

11
1.4 Software Requirements

1. Operating System:

o Options:

 Windows 10/11

 Linux or WSL (Windows Subsystem for Linux) with telnet

o Reason: Linux is preferred for server-side applications due to performance and

compatibility with development tools.

2. Programming Language:

o Language: Go

o Reason: Go is ideal for high-performance systems with minimal latency and

efficient memory management.

3. Development Tools:

o Compiler/IDE:

 Go Toolchain (go build, go run)

o Reason: For building and debugging the database code.

4. Version Control:

o Tool: Git

o Reason: To manage source code versions and collaborate efficiently.

5. Libraries and Dependencies:

o Networking Libraries: For implementing socket-based communication.

o Data Structures: Libraries for hash tables, B-trees, or other indexing

techniques.

o Testing Tools: Go's testing framework.


12
6. Persistence Tools:

o Libraries or modules for snapshotting, write-ahead logging (WAL), and data

compression.

7. Command-Line Tools:

o Redis CLI, Custom-built CLI commands for database interaction (e.g., GET,

SET, PSYNC).

o Tools like Telnet or Netcat for sending raw commands to test socket-based

communication.

13
Chapter 2
LITERATURE REVIEW

2.1 Overview of existing research on in-memory databases

Many new tools for data storage and processing have emerged in recent years. They are
optimized for a variety of different use cases, and they no longer neatly fit into traditional
categories [1]. For example, there are datastores that are also used as message queues
(Redis), and there are message queues with database-like durability guar antees (Apache
Kafka). The boundaries between the categories are becoming blurred. Secondly,
increasingly many applications now have such demanding or wide-ranging requirements
that a single tool can no longer meet all of its data processing and storage needs. Instead,
the work is broken down into tasks that can be performed efficiently on a single tool, and
those different tools are stitched together using application code

We typically think of databases, queues, caches, etc. as being very different categories of
tools. Although a database and a message queue have some superficial similarity— both
store data for some time—they have very different access patterns, which means different
performance characteristics, and thus very different implementations.

1. Architectural Advancements

Early research focused on developing architectures that fully leverage the speed of main
memory while addressing traditional database issues like durability and consistency:

 Memory-Centric Designs: Researchers have proposed memory-centric architectures


that eliminate the need for disk I/O, enabling ultra-low latency. Systems like H-Store
(a precursor to VoltDB) are entirely memory-resident and use partitioning to achieve
parallel processing.

 Hybrid Models: Studies explore combining in-memory storage with persistent disk-
based storage to ensure durability. SAP HANA, for instance, utilizes columnar storage
with in-memory processing for analytical workloads.

2. Query Optimization

Optimizing query execution in IMDBs has been a critical area of research:

14
 Indexing Mechanisms: Adaptive and dynamic indexing techniques, such as tree-
based and hash-based indexes, have been extensively studied. Research highlights
their role in speeding up queries in memory-intensive environments.

 Compiled Queries: Techniques like Just-In-Time (JIT) compilation have been


developed to generate highly optimized machine code for query execution.

 Vectorized Processing: Studies show that vectorized execution, where multiple rows
of data are processed simultaneously, can significantly boost performance.

3. Persistence and Recovery


Despite being memory-based, IMDBs need to ensure durability:
 Snapshot Mechanisms: Research has focused on periodic snapshotting of data to
persistent storage, allowing recovery in case of system failure.
 Write-Ahead Logging (WAL): WAL mechanisms have been adapted for in-memory
systems to achieve ACID compliance without compromising speed.
 Emerging Storage Technologies: Some studies explore non-volatile memory (NVM),
which combines the speed of RAM with persistence, as a future direction for IMDBs.

4. Scalability and Distributed Systems

In-memory systems must scale horizontally to handle large data volumes:

 Sharding and Partitioning: Techniques for partitioning data across multiple nodes in
a cluster have been extensively researched to maintain performance during scaling.

 Consistency Models: Distributed IMDBs face challenges in ensuring strong


consistency. Protocols like Raft and Paxos are often applied to manage replication and
consensus.

 Replication Strategies: Research has proposed synchronous and asynchronous


replication techniques to achieve high availability without significant performance
degradation.

5. Applications and Use Cases

Research has also focused on specific applications of IMDBs:

15
 Real-Time Analytics: Studies highlight the use of IMDBs in scenarios like stock
trading, fraud detection, and IoT platforms, where real-time insights are critical.

 Hybrid Transaction/Analytical Processing (HTAP): Research explores combining


OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing)
capabilities in a single system, as seen in SAP HANA and Oracle TimesTen.

 Caching and Acceleration: Research has shown that using IMDBs as caching layers
can significantly improve the performance of backend systems.

6. Benchmarking and Performance Evaluation

 Benchmarks: Research often uses benchmarks like YCSB (Yahoo! Cloud Serving
Benchmark) and TPC-C to evaluate IMDB performance.

 Workload-Specific Optimization: Studies explore how different workloads, such as


read-heavy vs. write-heavy, influence the design and tuning of IMDB systems.

7. Emerging Areas of Research


 Machine Learning Integration: Recent studies examine how in-memory databases
can integrate with ML frameworks to accelerate model training and inference.
 Edge Computing: Research explores the role of lightweight IMDBs in edge
environments, where processing occurs close to data sources (e.g., IoT devices).
 Graph Databases: With the growing importance of graph-based data, in-memory
systems optimized for graph storage and traversal are becoming a focus area.
 Energy Efficiency: As IMDBs consume significant memory resources, researchers
are exploring energy-efficient algorithms and hardware designs.

Key Research Contributions


1. Stonebraker et al. (2007): Proposed the H-Store system, which pioneered the idea of
in-memory databases for OLTP workloads, emphasizing partitioned and parallel
execution.
2. Plattner (2009): Introduced the concept of in-memory columnar storage in SAP
HANA, transforming analytics and HTAP processing.
16
3. Microsoft Research: Developed Hekaton, an in-memory OLTP engine integrated
into SQL Server, demonstrating the feasibility of embedding IMDBs in traditional
databases.
4. Oracle TimesTen: Studies on TimesTen show its ability to integrate IMDB
functionality into enterprise environments.

Challenges Highlighted in Research

1. Cost: High memory requirements make IMDBs expensive for large-scale adoption.

2. Durability vs. Performance: Balancing speed with data persistence remains a key
challenge.

3. Scalability: Maintaining low latency while scaling across multiple nodes is non-
trivial.
4. Specialized Workloads: Designing IMDBs that adapt to diverse workload
requirements, such as analytics vs. transactions.
Existing research provides a strong foundation for understanding in-memory databases,
their capabilities, and their limitations. The field continues to evolve, with emerging
technologies like NVM and edge computing offering new avenues for innovation. Future
research can build on these advancements to further optimize performance, scalability, and
integration into modern data architectures.

2.2 Comparison with traditional databases


2.2.1 On-Disk Database

Fig 2.1 On-Disk Database storing


17
Storage Mechanism

1. Data Storage: Stores data on persistent disk storage (HDD or SSD).

2. Persistence: Data is inherently persistent and does not require additional


mechanisms to survive system restarts.

Performance

1. Speed: Generally slower compared to in-memory databases due to the time


required for disk I/O operations.

2. Suitability: More suited for applications where the speed of data access is
less critical.

Use Cases

1. Transactional Systems: Widely used in transactional applications (OLTP


systems), where data persistence is key.

2. Large Data Sets: Ideal for applications with large data sets that cannot be
cost-effectively stored in memory.

3. General-Purpose Databases: Most traditional databases (like MySQL,


PostgreSQL) are on-disk and cater to a wide range of applications.

Advantages

1. Scalability: On-disk databases can handle larger datasets than in-memory


databases since they are not

limited by the available system memory. They can efficiently manage terabytes
or petabytes of data

across multiple disks.

2. Data Durability: Data stored in on-disk databases is persistent and survives


system restarts or failures.

This ensures data durability and consistency, critical for transactional and
mission-critical applications.

3. Cost-Effective Storage: Disk storage is generally cheaper than RAM,


making on-disk databases more cost- effective for storing large volumes of
data over the long term.

18
4. Flexible Storage Options: On-disk databases offer various storage
configurations and optimizations, such as partitioning, indexing, and
compression, to optimize performance and storage efficiency.

Limitations

However, on-disk databases also have some drawbacks:

1. Slower Performance: Accessing data from disk is slower compared to memory access,
leading to higher latency for database operations. This may be acceptable for batch processing
or non-real-time applications but can be a bottleneck for latency-sensitive workloads.

2. Disk I/O Bottleneck: Disk I/O operations can become a performance bottleneck,
particularly in high- concurrency environments or when dealing with large datasets.
Optimizing disk I/O is essential for maximizing database performance.

3. Complexity: Managing disk-based databases involves additional complexity, such as disk


space management, data backup and recovery, and optimizing disk performance. This
complexity may require additional expertise and resources.

Examples of On-Disk Databases

1. MySQL:

■ One of the most popular open-source relational database management


systems. MySQL is widely used for web applications and supports a broad
array of features.

2. PostgreSQL:

■ An advanced open-source relational database. PostgreSQL is known for its


robustness, scalability, and support for advanced data types and features.

3. MongoDB:

■ A leading NoSQL database that stores data in JSON-like documents. It is


designed for ease of development and scaling.

4. Oracle Database:

■ A multi-model database management system known for its feature-rich,


enterprise-grade capabilities, widely used in large organizations.

5. Microsoft SQL Server:

19
■ A relational database management system developed by Microsoft, offering a
wide range of data analytics, business intelligence, and transaction processing
capabilities.

6. SQLite:

■ A C-language library that implements a small, fast, self-contained, high-


reliability SQL database engine. It's widely used in applications where an
embedded, lightweight database is needed.

2.2.2 Memory-First Database

Fig 2.2 Memory-first Database store


Memory-first databases are a hybrid architecture that primarily uses memory (RAM) for
data storage and processing while ensuring durability and persistence through secondary
storage (disk). This approach balances the speed of in-memory databases with the
reliability of disk-based storage. Below is a detailed explanation based on the provided
points:

1. Storage Mechanism

 Primary Storage in Memory:


Memory-first databases store active and frequently accessed data in RAM to leverage
its high-speed performance.
 Durability through Disk:
Unlike pure in-memory databases, memory-first databases periodically write data to
disk for persistence. This may involve transaction logs, snapshots, or append-only files
to ensure that data can be recovered in case of a crash.
20
 Automatic Tiering:
Some memory-first systems use algorithms to move less frequently used data to disk
while keeping hot data in RAM for faster access.

2. Performance

 High Throughput and Low Latency:


The use of RAM for primary operations ensures near-real-time data access and
processing, making these databases ideal for high-performance applications.

 Optimized Writes:
Writing to disk is typically asynchronous, using techniques like write-ahead logging
(WAL) to minimize performance impact.

 Scalability:
Memory-first databases can scale horizontally by distributing data across multiple
nodes, though they remain limited by the available memory per node.

3. Use Cases

 Real-Time Analytics:
Ideal for dashboards, fraud detection systems, and recommendation engines where
low-latency analytics are critical.

 Event Processing:
Used in IoT platforms, gaming leaderboards, and social media for processing high
volumes of events in real-time.

 Session Stores:
Memory-first databases are often used to manage user sessions in web and mobile
applications.

 Hybrid Transactional and Analytical Processing (HTAP):


Support simultaneous transactional and analytical queries, making them suitable for e-
commerce and financial applications.

4. Advantages

 Speed and Performance:


Operations on data stored in memory are significantly faster than disk-based
operations.
21
 Reliability:
Unlike pure in-memory databases, data is not lost during crashes or power failures due
to regular persistence to disk.

 Flexibility:
Supports both transactional (OLTP) and analytical (OLAP) workloads.

 Cost-Efficiency:
While relying on RAM for speed, it also utilizes cheaper disk storage for less critical
data, optimizing resource usage.

 Simplified Application Development:


Provides a unified platform for developers by supporting multiple workloads and data
models.
5. Limitations

 Memory Dependency:
Limited by the amount of available RAM, making it challenging for very large
datasets unless tiering to disk is well-optimized.

 Complexity in Management:
Balancing memory and disk storage requires careful configuration and tuning.
 Cost:
High-performance memory modules (e.g., DRAM) are more expensive than
traditional disk storage.

 Data Recovery Overhead:


Recovering large datasets from disk after a crash can take time, which might affect
availability.

6. Examples

 Redis:
A popular key-value store that supports persistence through RDB snapshots and AOF
(Append-Only File) mechanisms.

 SAP HANA:
Combines in-memory storage with disk persistence, excelling in analytical and
transactional processing.

22
 Memcached with Persistence:
A memory caching solution extended with disk persistence for durability.

 VoltDB:
A memory-first database optimized for high-speed transaction processing with disk-
based durability.

 Aerospike:
Hybrid architecture using in-memory data for speed and SSDs for persistent storage,
making it highly scalable and reliable.

Memory-first databases bridge the gap between the speed of in-memory databases and the
reliability of disk-based databases. By offering high performance and durability, they have
become integral to modern applications requiring real-time insights and high availability.
However, their success depends on efficient memory management and balanced resource
utilization to overcome limitations like memory dependency and recovery overhead.

2.3 Examples of top existing technologies

1. Redis
 Overview: Redis (Remote Dictionary Server) is an open-source, highly scalable in-
memory key-value store that supports multiple data structures such as strings, hashes,
lists, sets, and sorted sets.
 Features:
o Persistence: Offers RDB snapshots and append-only file (AOF) options for
durability.
o Replication: Master-slave replication for high availability.
o Performance: Handles millions of read/write operations per second.
o Modules: Extensible with modules like RedisGraph and RedisJSON.
 Use Cases:
o Real-time analytics
o Caching layer
o Message brokering (via Pub/Sub model)

2. Memcached
 Overview: A simple, distributed memory caching system primarily used to speed up
dynamic web applications by reducing database load.
 Features:
23
o Lightweight and highly performant.
o Limited to simple key-value pair storage.
o No persistence mechanism.
 Use Cases:
o Session caching
o Content delivery networks (CDNs)
o Query caching in web applications

3. Apache Ignite
 Overview: An open-source distributed database and computing platform that
combines in-memory data storage with compute capabilities.
 Features:
o SQL support for relational queries.
o Durable memory with disk-based persistence.
o Built-in machine learning and streaming APIs.
 Use Cases:
o Real-time transaction processing.
o High-performance data grids.
o Event-driven architectures.

4. SAP HANA
 Overview: A relational database management system (RDBMS) developed by SAP
that uses in-memory storage for real-time processing.
 Features:
o Columnar storage for faster query execution.
o Built-in advanced analytics (e.g., predictive analytics, machine learning).
o Enterprise-grade reliability and security.
 Use Cases:
o Real-time business intelligence.
o Supply chain optimization.
o Financial data analysis.

5. Amazon ElastiCache
24
 Overview: A fully managed in-memory data store offered by AWS, supporting Redis
and Memcached.
 Features:
o High availability with auto-scaling.
o Monitoring via AWS CloudWatch.
o Easy integration with AWS services like Lambda and RDS.
 Use Cases:
o Caching to reduce database query loads.
o Gaming leaderboards.
o Real-time analytics pipelines.

6. VoltDB
 Overview: An in-memory database designed for high-speed transactions and real-time
analytics.
 Features:
o ACID compliance for transactional integrity.
o Supports SQL and Java for application development.
o Built-in fault tolerance and high availability.
 Use Cases:
o Fraud detection.
o Telecommunication networks.
o IoT data processing.

7. Hazelcast IMDG (In-Memory Data Grid)


 Overview: A distributed in-memory data store and computing platform.
 Features:
o Supports key-value, map, and queue data structures.
o Distributed computing and event processing.
o Strong integration with Java applications.
 Use Cases:
o Distributed caching.
o Event streaming.
o Real-time monitoring systems.

25
8. Aerospike

 Overview: An in-memory and flash-optimized distributed database designed for ultra-


low latency and scalability.

 Features:

o Hybrid storage (RAM + SSD for persistence).

o Tuned for high availability and cross-datacenter replication.

o Suitable for both operational and analytical workloads.

 Use Cases:

o Ad-tech platforms.

o Customer behavior tracking.

o Fraud detection in financial systems

26
Chapter 3
METHODLOGIES

3.1 Foundations of Data systems


There are many factors that may influence the design of a data system, including the skills
and experience of the people involved, legacy system dependencies, the time scale for
delivery, your organization’s tolerance of different kinds of risk, regulatory constraints,
etc. Those factors depends on 3 things as follows:

Reliability

Everybody has an intuitive idea of what it means for something to be reliable or unreliable.
For software, typical expectations include:

The application performs the function that the user expected.

 It can tolerate the user making mistakes or using the software in unexpected ways.

 Its performance is good enough for the required use case, under the expected load and
data volume.

 The system prevents any unauthorized access and abuse


Reliability is not just for nuclear power stations and air traffic control software
more mundane applications are also expected to work reliably. Bugs in business applications
cause lost productivity (and legal risks if figures are reported incorrectly), and outages of
ecommerce sites can have huge costs in terms of lost revenue and damage to reputation.

Scalability

Even if a system is working reliably today, that doesn’t mean it will necessarily work reliably
in the future. One common reason for degradation is increased load: perhaps the system has
grown from 10,000 concurrent users to 100,000 concurrent users, or from 1 million to 10
million. Perhaps it is processing much larger volumes of data than it did before. Scalability is
the term we use to describe a system’s ability to cope with increased load. Note, however, that
it is not a one-dimensional label that we can attach to a system: it is meaningless to say “X is
scalable” or “Y doesn’t scale.” | Applications scalability means considering questions like “If
27
the system grows in a particular way, what are our options for coping with the growth?” and
“How can we add computing resources to handle the additional load?

Describing Load

Load can be described with a few numbers which we call load parameters. The best choice of
parameters depends on the architecture of your system: it may be requests per second to a web
server, the ratio of reads to writes in a database, the number of simultaneously active users in
a chat room, the hit rate on a cache, or something else. Perhaps the average case is what
matters for you, or perhaps your bottleneck is dominated by a small number of extreme cases.

To make this idea more concrete, let’s consider Twitter as an example, using data
published in November 2012 [16]. Two of Twitter’s main operations are:
Post tweet
A user can publish a new message to their followers (4.6k requests/sec on average, over
12k requests/sec at peak).

Home timeline
A user can view tweets posted by the people they follow (300k requests/sec). Simply
handling 12,000 writes per second (the peak rate for posting tweets) would be fairly easy.

Fig 3.1 Load parameters

28
Maintainability
It is well known that the majority of the cost of software is not in its initial development, but
in its ongoing maintenance—fixing bugs, keeping its systems operational, investigating
failures, adapting it to new platforms, modifying it for new use cases, repaying technical debt,
and adding new features.
Yet, unfortunately, many people working on software systems dislike maintenance of so-
called legacy systems—perhaps it involves fixing other people’s mistakes, or working with
platforms that are now outdated, or systems that were forced to do things they were never
intended for. Every legacy system is unpleasant in its own way, and so it is difficult to give
general recommendations for dealing with them. However, we can and should design software
in such a way that it will hopefully minimize pain during maintenance, and thus avoid
creating legacy software ourselves. To this end, we will pay particular attention to three
design principles for software systems:
Operability Make it easy for operations teams to keep the system running smoothly.

3.2 Design and Architecture


In-memory databases are designed to leverage the speed and efficiency of random access
memory (RAM) while ensuring data consistency and durability. Their structure emphasizes
minimal latency, efficient data access, and optimized resource utilization. Below is an outline
of their architecture:

 Core Components:
o Data Engine: Handles query execution and transaction management with
focus on low-latency operations.
o Memory Storage: Acts as the primary data store for active datasets, avoiding
disk I/O bottlenecks.
o Persistence Layer: Ensures durability through mechanisms like snapshots,
append-only files, and write-ahead logging (WAL).
o Indexing Structures: Advanced in-memory indexes, such as hash tables, AVL
trees, or B-trees, enable faster lookups and updates.
 Key Features of the Architecture:
o Data is fully or partially stored in RAM.
o Concurrent access is managed through efficient locking mechanisms or multi-
version concurrency control (MVCC).
o Redundant copies or replicas are maintained for fault tolerance and scalability.
29
Hash Indexes

Key-value stores are quite similar to the dictionary type that you can find in most
programming languages, and which is usually implemented as a hash map (hash table).
Let’s say our data storage consists only of appending to a file, as in the preceding
example. Then the simplest possible indexing strategy is this: keep an in-memory hash
map where every key is mapped to a byte offset in the data file—the location at which the
value can be found, as illustrated in Figure 3-1. Whenever you append a new key-value
pair to the file, you also update the hash map to reflect the offset of the data you just wrote
(this works both for inserting new keys and for updating existing keys).

Fig 3.2 Compaction of a key-value update log (counting the number of times each cat video
was played), retaining only the most recent value for each key.

Segments are never modified after they have been written, so the merged segment is written
to a new file. The merging and compaction of frozen segments can be done in a background
thread, and while it is going on, we can still continue to serve read and write requests as
normal, using the old segment files. After the merging process is complete, we switch read
requests to using the new merged seg ment instead of the old segments—and then the old
segment files can simply be deleted

30
SSTables and LSM-Trees

Each log-structured storage segment is a sequence of key-value pairs. These pairs appear
in the order that they were written, and values later in the log take precedence over values
for the same key earlier in the log. Apart from that, the order of key-value pairs in the file
matter.

Now we can make a simple change to the format of our segment files: we require that the
sequence of key-value pairs is sorted by key. At first glance, that requirement seems to
break our ability to use sequential writes, but we’ll get to that in a moment.

Fig 3.4 As SSTable with an in – memory index

In order to find a particular key in the file, you no longer need to keep an index of all the
keys in memory. See Figure 3-5 for an example: say you’re looking for the key
31
handiwork, but you don’t know the exact offset of that key in the segment file. However,
you do know the offsets for the keys handbag and handsome, and because of the sorting
you know that handiwork must appear between those two. This means you can jump to the
offset for handbag and scan from there until you find handiwork (or not, if the key is not
present in the file)

Btrees

Like SSTables, B-trees keep key-value pairs sorted by key, which allows efficient key
value lookups and range queries. But that’s where the similarity ends: B-trees have a very
different design philosophy. The log-structured indexes we saw earlier break the database
down into variable-size segments, typically several megabytes or more in size, and always
write a segment sequentially. By contrast, B-trees break the database down into fixed-size
blocks or pages, traditionally 4 KB in size (sometimes bigger), and read or write one page
at a time. This design corresponds more closely to the underlying hardware, as disks are
also arranged in fixed-size blocks.

Fig-3.5 Growing a B-tree by splitting a page.

If you want to update the value for an existing key in a B-tree, you search for the leaf page
containing that key, change the value in that page, and write the page back to disk (any
references to that page remain valid). If you want to add a new key, you need to find the
page whose range encompasses the new key and add it to that page. If there isn’t enough
32
free space in the page to accommodate the new key, it is split into two half-full pages, and
the parent page is updated to account for the new subdivision of key ranges.

3.3 DATA Distributing Techniques

REPLICATION
Replication means keeping a copy of the same data on multiple machines that are connected
via a network.
To keep data geographically close to your users (and thus reduce latency)

 To allow the system to continue working even if some of its parts have failed (and thus
increase availability)

 To scale out the number of machines that can serve read queries (and thus increase
read throughput)

Each node that stores a copy of the database is called a replica. With multiple replicas, a
question inevitably arises: how do we ensure that all the data ends up on all the replicas?
Every write to the database needs to be processed by every replica; otherwise, the replicas
would no longer contain the same data. The most common solution for this is called leader-
based replication (also known as active/passive or master–slave replication) and is illustrated
in Figure . It works as follows:

1. One of the replicas is designated the leader (also known as master or primary). When
clients want to write to the database, they must send their requests to the leader, which
first writes the new data to its local storage.

2. The other replicas are known as followers (read replicas, slaves, secondaries, or hot
standbys).Whenever the leader writes new data to its local storage, it also sends the
data change to all of its followers as part of a replication log or change stream. Each
follower takes the log from the leader and updates its local copy of the data base
accordingly, by applying all writes in the same order as they were processed on the
leader

3. When a client wants to read from the database, it can query either the leader or any of
the followers. However, writes are only accepted on the leader (the follow ers are
read-only from the client’s point of view).
33
FIG-3.6 Leader-based(master–slave)replication.

This mode of replication is a built-in feature of many relational databases, such as


PostgreSQL (since version 9.0), MySQL, Oracle Data Guard [2], and SQL Server’s
AlwaysOn Availability Groups [3]. It is also used in some nonrelational databases, including
MongoDB, RethinkDB, and Espresso [4]. Finally, leader-based replication is not restricted to
only databases: distributed message brokers such as Kafka [5] and RabbitMQ highly available
queues [6] also use it. Some network filesystems and replicated block devices such as DRBD
are similar.

PARTITIONING

Partitioning is usually combined with replication so that copies of each partition are stored on
multiple nodes. This means that, even though each record belongs to exactly one partition, it
may still be stored on several different nodes for fault toler ance. A node may store more than
one partition. If a leader–follower replication model is used, the combination of partitioning
and replication can look like Figure 6-1. Each partition’s leader is assigned to one node, and
its followers are assigned to other nodes. Each node may be the leader for some partitions and
a follower for other partition

34
FIG 3.6 Combining replication and partitioning: each node acts as leader for some
partitions and follower for other partitions.

Our goal with partitioning is to spread the data and the query load evenly across nodes. If
every node takes a fair share, then—in theory—10 nodes should be able to handle 10
times as much data and 10 times the read and write throughput of a single node (ignoring
replication for now). If the partitioning is unfair, so that some partitions have more data or
queries than others, we call it skewed. The presence of skew makes partitioning much less
effective. In an extreme case, all the load could end up on one partition, so 9 out of 10
nodes are idle and your bottleneck is the single busy node. A partition with disproportion
ately high load is called a hot spot.

We discussed two main approaches to partitioning:

• Key range partitioning, where keys are sorted, and a partition owns all the keys from some
minimum up to some maximum. Sorting has the advantage that efficient range queries are
possible, but there is a risk of hot spots if the application often accesses keys that are close
together in the sorted order. In this approach, partitions are typically rebalanced
dynamically by splitting the range into two subranges when a partition gets too big.

35
• Hash partitioning, where a hash function is applied to each key, and a partition owns a
range of hashes. This method destroys the ordering of keys, making range queries
inefficient, but may distribute load more evenly.

TRANSACTIONS

A transaction is a way for an application to group several reads and writes together into a
logical unit. Conceptually, all the reads and writes in a transaction are executed as one
operation: either the entire transaction succeeds (commit) or it fails (abort, rollback). If it fails,
the application can safely retry. With transactions, error handling becomes much simpler for
an application, because it doesn’t need to worry about partial failure—i.e., the case where
some operations succeed and some fail (for whatever reason).

1. Dirty reads

One client reads another client’s writes before they have been committed. The read
committed isolation level and stronger levels prevent dirty reads.

2. Dirty writes

One client overwrites data that another client has written, but not yet committed. Almost
all transaction implementations prevent dirty writes.

3. Read skew (nonrepeatable reads)

A client sees different parts of the database at different points in time. This issue is most
commonly prevented with snapshot isolation, which allows a transaction to read from a
consistent snapshot at one point in time. It is usually implemented with multi-version
concurrency control (MVCC).

4. Lost updates

Two clients concurrently perform a read-modify-write cycle. One overwrites the other’s
write without incorporating its changes, so data is lost. Some implementations of snapshot
isolation prevent this anomaly automatically, while others require a manual lock (SELECT
FOR UPDATE).

5. Write skew

36
A transaction reads something, makes a decision based on the value it saw, and writes the
decision to the database. However, by the time the write is made, the premise of the
decision is no longer true. Only serializable isolation prevents this anomaly.

6. Phantom reads

A transaction reads objects that match some search condition. Another client makes a
write that affects the results of that search. Snapshot isolation prevents straightforward
phantom reads, but phantoms in the context of write skew require special treatment, such
as index-range locks.

37
Chapter 4
RESULTS & DISCUSSION

4.1 Implementation of Project

This Project is a simple implementation of a Redis server in Go. It is designed to support


a variety of basic Redis commands, providing a lightweight alternative to the full Redis
server for learning and experimentation purposes.

Running the Server

To run the server, execute the following command:

./start_server.sh

This script will compile the Go code and start the Redis server.

Supported Commands

The server currently supports the following Redis commands:

Basic Commands

- `PING`: Returns PONG.

- `ECHO <message>`: Returns the input string.

- `SET <key> <value>`: Sets a key to a value.

- `GET <key>`: Gets the value of a key.

- `INCR <key>`: Increments the integer value of a key.

- `INFO`: Returns information about the server.

38
- `KEYS <pattern>`: Returns all keys matching a pattern.

- `TYPE <key>`: Returns the type of a key.

Stream Commands

- `XADD <stream> <id> <field> <value>`: Adds a message to a stream.

- `XRANGE <stream> <start> <end>`: Gets a range of messages from a stream.

- `XREAD STREAMS <stream> <id>`: Reads messages from a stream.

Transaction Commands

- `MULTI`: Starts a transaction.

- `EXEC`: Executes a transaction.

- `DISCARD`: Discards a transaction.

Server Configuration Commands

- `REPLCONF <option> <value>`: Configures replication.

- `PSYNC <replicaid> <offset>`: Partial synchronization.

- `WAIT <numreplicas> <timeout>`: Blocks until the specified number of replicas


acknowledge the write.

1.app.go

 Description:
This is likely the entry point for your application. It initializes the server, sets up
routes or commands, and starts the application.

 Key Responsibilities:

o Configuring the server instance.

39
o Managing application lifecycle (start, stop, etc.).

o Setting up logging, middleware, or dependency injection.

 Example Features:

o Reading configuration files.

o Initializing core components like the in-memory database and handlers.

2.handler.go

 Description:
Contains the logic to handle user commands or API requests. Handlers act as
intermediaries, parsing user inputs and invoking the appropriate database operations.

 Key Responsibilities:

o Processing commands like GET, SET, DEL.

o Validating inputs before passing them to the database layer.

o Returning results or errors to the client.

package main

import (
"bytes"
"fmt"
"io"
"math"
"strconv"
"strings"
"time"

radix "server/radix"
)

// Handler entry point ----------------------------------------------------


----
func (s *Server) Handler(parsedResp *RESP, conn *ConnRW) (resp []*RESP) {
switch parsedResp.Type {
case ERROR, INTEGER, BULK, STRING:
return []*RESP{{Type: ERROR, Value: "Response type " +
parsedResp.Value + " handle not yet implemented"}}
case ARRAY:
return s.handleArray(parsedResp, conn)
case RDB:
40
return
[]*RESP{s.decodeRDB(NewBuffer(bytes.NewReader([]byte(parsedResp.Value))))}
default:
return []*RESP{{Type: ERROR, Value: "Response type " +
parsedResp.Value + " not recognized"}}
}
}

func (s *Server) handleArray(resp *RESP, conn *ConnRW) []*RESP {


command, args := resp.getCmdAndArgs()
switch command {
case "PING":
return []*RESP{ping(args)}
case "ECHO":
return []*RESP{echo(args)}
case "SET":
s.propagateCommand(resp)
return []*RESP{s.set(args)}
case "GET":
return []*RESP{s.get(args)}
case "XADD":
return []*RESP{s.xadd(args)}
case "XRANGE":
return []*RESP{s.xrange(args)}
case "XREAD":
go func() {
result := s.xread(args)
Write(conn.Writer, result)
}()
return []*RESP{}
case "INCR":
return []*RESP{s.incr(args)}
case "INFO":
return []*RESP{info(args, s.Role.String(), s.MasterReplid,
s.MasterReplOffset)}
case "REPLCONF":
s.replConfig(args, conn)
return []*RESP{}
case "PSYNC":
conn.Type = REPLICA
s.ReplicaCount++
go s.checkOnReplica(conn, false)
return []*RESP{psync(s.MasterReplid, s.MasterReplOffset), getRDB()}
case "WAIT":
return []*RESP{s.wait(args)}
case "KEYS":
return []*RESP{s.keys(args)}
41
case "TYPE":
return []*RESP{s.typecmd(args)}
case "MULTI":
go func() {
s.multi(conn)
}()
return []*RESP{OkResp()}
case "EXEC":
return []*RESP{s.exec(conn)}
case "DISCARD":
return []*RESP{s.discard()}
case "CONFIG":
return []*RESP{s.config(args)}
case "COMMAND":
return []*RESP{commandFunc()}
default:
return []*RESP{{Type: ERROR, Value: "Unknown command " + command}}
}
}

func (s *Server) propagateCommand(resp *RESP) {


for _, conn := range s.Conns {
if conn.Type != REPLICA {
continue
}
marshaled := resp.Marshal()
s.MasterReplOffset += len(marshaled)
Write(conn.Writer, marshaled)
}
}

func (s *Server) checkOnReplica(conn *ConnRW, featureOn bool) {


if !featureOn {
return
}
getAckResp := GetAckResp().Marshal()
n := len(getAckResp)
for {
time.Sleep(5 * time.Second)
fmt.Println("Checking On Replica")
s.MasterReplOffset += n
Write(conn.Writer, getAckResp)
}
}

// ------------------------------------------------------------------------
----
42
// General commands -------------------------------------------------------
----
func commandFunc() *RESP {
return &RESP{Type: NULL, Value: "Command"}
}

func ping(args []*RESP) *RESP {


if len(args) == 0 {
return &RESP{Type: STRING, Value: "PONG"}
}
return &RESP{Type: STRING, Value: args[0].Value}
}

func echo(args []*RESP) *RESP {


if len(args) == 0 {
return &RESP{Type: STRING, Value: ""}
}
return &RESP{Type: STRING, Value: args[0].Value}
}

func info(args []*RESP, role, mrid string, mros int) *RESP {


if len(args) != 1 {
return NullResp()
}
switch args[0].Value {
case "replication":
return &RESP{
Type: BULK,
Value: "# Replication\n" +
"role:" + role + "\n" +
"master_replid:" + mrid + "\n" +
"master_repl_offset:" + strconv.Itoa(mros) + "\n",
}
default:
return NullResp()
}
}

// ------------------------------------------------------------------------
----

// Server specific commands -----------------------------------------------


----
func (s *Server) decodeRDB(buf *Buffer) *RESP {
data := buf.reader

43
// Header section
header := make([]byte, 9)
_, err := io.ReadFull(data, header)
if err != nil {
return ErrResp("Error reading RDB header")
}

if string(header[:5]) != "REDIS" {
return ErrResp("Invalid RDB file")
}

// version := string(header[5:])
// if version != "0007" {
// return ErrResp("Invalid RDB version")
// }

// Metadata section
for {
fa, err := data.ReadByte()
if err != nil {
return ErrResp("Error reading metadata section")
}
if fa != 0xfa {
data.UnreadByte()
break
}

// Metadataa Key
_, err = decodeString(data)
if err != nil {
return ErrResp("Error reading metadata section")
}
// Metadata Value
_, err = decodeString(data)
if err != nil {
return ErrResp("Error reading metadata section")
}
}

for {
byt, _ := data.Peek(1)
if byt[0] == 0xff {
break
}
// Database section - 0xfe
data.ReadByte()
44
// This byte is the database index
// TODO - Implement support for multiple databases
decodeSize(data)

fb, err := data.ReadByte()


if err != nil || fb != 0xfb {
return ErrResp("Error reading database section")
}

dbsize, err := decodeSize(data)


if err != nil {
return ErrResp("Error reading database section")
}

// Expiry size
_, err = decodeSize(data)
if err != nil {
return ErrResp("Error reading database section")
}

// Iterate over keys


for i := 0; i < dbsize; i++ {
// Expiry
expiryTime, err := dedodeTime(data)
if err != nil {
return ErrResp("Error reading expiry")
}

// This byte is the key type


// TODO - Implement support for different key types
data.ReadByte()

// Key
key, err := decodeString(data)
if err != nil {
return ErrResp("Error reading key")
}

// Value
value, err := decodeString(data)
if err != nil {
return ErrResp("Error reading value")
}

s.SETsMu.Lock()
45
s.SETs[string(key)] = string(value)
if expiryTime > 0 {
s.EXPs[string(key)] = expiryTime
fmt.Println("Key: ", key, "Value: ", value, "Expiry: ",
expiryTime)
}
s.SETsMu.Unlock()
}

next, _ := data.Peek(1)
if next[0] == 0xff {
break
}
}
return OkResp()
}

func (s *Server) keys(args []*RESP) *RESP {


if len(args) != 1 {
return &RESP{Type: ERROR, Value: "ERR wrong number of arguments for
'keys' command"}
}

pattern := args[0].Value
keys := []string{}

if pattern == "*" {
s.SETsMu.Lock()
for k := range s.SETs {
keys = append(keys, k)
}
s.SETsMu.Unlock()
} else {
s.SETsMu.Lock()
for k := range s.SETs {
if strings.Contains(k, pattern) {
keys = append(keys, k)
}
}
s.SETsMu.Unlock()
}

return &RESP{
Type: ARRAY,
Values: ToRespArray(keys),
}
}
46
func (s *Server) set(args []*RESP) *RESP {
if !(len(args) == 2 || len(args) == 4) {
return &RESP{Type: ERROR, Value: "ERR wrong number of arguments for
'set' command"}
}
s.NeedAcks = true
var length int
if len(args) > 2 {
if strings.ToLower(args[2].Value) != "px" {
return &RESP{Type: ERROR, Value: "ERR syntax error"}
}

l, err := strconv.Atoi(args[3].Value)
if err != nil {
return &RESP{Type: ERROR, Value: "ERR value is not an integer
or out of range"}
}
length = l
}

key, value := args[0].Value, args[1].Value

s.SETsMu.Lock()
s.SETs[key] = value
if length > 0 {
// Set expiry time in milliseconds
s.EXPs[key] = time.Now().Add(time.Duration(length) *
time.Millisecond).UnixMilli()
}
s.SETsMu.Unlock()

return OkResp()
}

func (s *Server) get(args []*RESP) *RESP {


if len(args) != 1 {
return &RESP{Type: ERROR, Value: "ERR wrong number of arguments for
'get' command"}
}

key := args[0].Value

s.SETsMu.Lock()
value, ok := s.SETs[key]
if exp, ok := s.EXPs[key]; ok {
expTime := time.UnixMilli(exp)
47
if time.Now().After(expTime) {
delete(s.SETs, key)
delete(s.EXPs, key)
s.SETsMu.Unlock()
return NullResp()
}
}
s.SETsMu.Unlock()

if !ok {
return NullResp()
}

return &RESP{Type: STRING, Value: value}


}

func (s *Server) xadd(args []*RESP) *RESP {


if len(args) < 2 {
return &RESP{Type: ERROR, Value: "ERR wrong number of arguments for
'xadd' command"}
}

streamKey := args[0].Value
stream, ok := s.XADDs[streamKey]
if !ok {
s.XADDsMu.Lock()
stream = radix.NewRadix()
s.XADDs[streamKey] = stream
stream.Insert("0-0", &StreamTop{Time: 0, Seq: 0})
s.XADDsMu.Unlock()
}

id := args[1].Value
time, seq, err := validateEntryID(stream, id)
if err != nil {
return ErrResp(err.Error())
}

entries := []*StreamKV{}
for i := 2; i < len(args); i += 2 {
entries = append(entries, &StreamKV{Key: args[i].Value, Value:
args[i+1].Value})
}

timeStr := intToStr(time) + "-" + intToStr(seq)


streamEntry := &StreamEntry{Seq: seq, Entries: entries}
stream.Insert(timeStr, streamEntry)
48
stream.Insert("0-0", &StreamTop{Time: time, Seq: seq})

if s.XREADsBlock {
s.XADDsCh <- false
}

return &RESP{Type: BULK, Value: timeStr}


}

func (s *Server) xrange(args []*RESP) *RESP {


if len(args) < 3 {
return &RESP{Type: ERROR, Value: "ERR wrong number of arguments for
'xrange' command"}
}

streamKey := args[0].Value
stream, ok := s.XADDs[streamKey]
if !ok {
return ErrResp("ERR stream not found")
}

// st = starttime, ss = startseq
st, ss, err := splitEntryId(args[1].Value)
if err != nil {
return ErrResp(err.Error())
}
if st == math.MinInt64 {
key, _, _ := stream.GetFirst()
st, ss, _ = splitEntryId(key)
}
// et = endtime, es = endseq
et, es, err := splitEntryId(args[2].Value)
if err != nil {
return ErrResp(err.Error())
}
if et == math.MaxInt64 {
key, _, _ := stream.GetLast()
et, es, _ = splitEntryId(key)
}

entries := []*RESP{}

for t := st; t <= et; t++ {


tStr := intToStr(t)
sEntries := stream.FindAll(tStr)
for _, e := range sEntries {
switch entry := e.(type) {
49
case *StreamEntry:
if t > st || t < et ||
(t == st && entry.Seq >= ss && t == et && entry.Seq <=
es) {
outter := []*RESP{}
outter = append(outter, SimpleString(tStr+"-
"+intToStr(entry.Seq)))
inner := make([]string, 0, len(entry.Entries)*2)
for _, en := range entry.Entries {
inner = append(inner, en.Key)
inner = append(inner, en.Value)
}
outter = append(outter, &RESP{Type: ARRAY, Values:
ToRespArray(inner)})
entries = append(entries, &RESP{Type: ARRAY, Values:
outter})
}
default:
continue
}
}
}

return &RESP{Type: ARRAY, Values: entries}


}

func (s *Server) xread(args []*RESP) *RESP {


if len(args) < 3 {
return &RESP{Type: ERROR, Value: "ERR wrong number of arguments for
'xread' command"}
}

blockTime := -1
if strings.ToUpper(args[0].Value) == "BLOCK" {
t, err := strconv.Atoi(args[1].Value)
if err != nil {
return ErrResp("ERR block time is not an integer or out of
range")
}
blockTime = t
args = args[2:]
}

if blockTime > 0 {
time.Sleep(time.Duration(blockTime) * time.Millisecond)
} else if blockTime == 0 {
s.XREADsBlock = true
s.XREADsBlock = <-s.XADDsCh
50
}

if args[0].Value != "streams" {
return &RESP{Type: ERROR, Value: "ERR can only read streams at the
moment"}
}

args = args[1:]
if len(args)%2 != 0 {
return ErrResp("Err wrong number of arguments for 'xread' command")
}

readLen := len(args) / 2

streamLst := []*RESP{}

for i := 0; i < readLen; i++ {


streamKey := args[i].Value
stream, ok := s.XADDs[streamKey]
if !ok {
return ErrResp("ERR stream not found")
}

start := args[i+readLen].Value
if start == "$" {
start, _, _ = stream.GetLast()
} else {
start, _, ok = stream.GetNext(start)
if !ok {
return NullResp()
}
}
// st = starttime, ss = startseq
st, ss, err := splitEntryId(start)
if err != nil {
return ErrResp(err.Error())
}

// et == endtime, es = endseq
last, _, _ := stream.GetLast()
et, es, _ := splitEntryId(last)

entryLst := []*RESP{BulkString(streamKey)}

for t := st; t <= et; t++ {


tLst := []*RESP{}
tStr := intToStr(t)
51
sEntries := stream.FindAll(tStr)
for _, e := range sEntries {
switch entry := e.(type) {
case *StreamEntry:
if t > st || t < et ||
(t == st && entry.Seq >= ss && t == et && entry.Seq
<= es) {
idLst := []*RESP{BulkString(tStr + "-" +
intToStr(entry.Seq))}
kvLst := make([]string, 0, len(entry.Entries)*2)
for _, en := range entry.Entries {
kvLst = append(kvLst, en.Key)
kvLst = append(kvLst, en.Value)
}
idLst = append(idLst, &RESP{Type: ARRAY, Values:
ToRespArray(kvLst)})
tLst = append(tLst, &RESP{Type: ARRAY, Values:
idLst})
}
default:
continue
}
}
entryLst = append(entryLst, &RESP{Type: ARRAY, Values: tLst})
}
streamLst = append(streamLst, &RESP{Type: ARRAY, Values: entryLst})
}

return &RESP{Type: ARRAY, Values: streamLst}


}

func (s *Server) incr(args []*RESP) *RESP {


if len(args) != 1 {
return ErrResp("ERR wrong number of arguments for 'incr' command")
}
key := args[0].Value
s.SETsMu.Lock()
defer s.SETsMu.Unlock()
if val, ok := s.SETs[key]; ok {
val, err := strconv.ParseInt(val, 10, 64)
if err != nil {
return ErrResp("ERR value is not an integer or out of range")
}
s.SETs[key] = intToStr(val + 1)
return Integer(val + 1)
} else {
s.SETs[key] = "1"
52
return Integer(1)
}
}

func (s *Server) replConfig(args []*RESP, conn *ConnRW) (resp *RESP) {


if len(args) != 2 {
return &RESP{Type: ERROR, Value: "ERR wrong number of arguments for
'replconf' command"}
}

if strings.ToUpper(args[0].Value) == "GETACK" && args[1].Value == "*" {


// Replica recieved REPLCONF GETACK * -> Send ACK <offset> to
master
resp = &RESP{
Type: ARRAY,
Values: []*RESP{
{Type: BULK, Value: "REPLCONF"},
{Type: BULK, Value: "ACK"},
{Type: BULK, Value: strconv.Itoa(s.MasterReplOffset)},
},
}
fmt.Println("Response: ", resp)
Write(conn.Writer, resp)
} else if strings.ToUpper(args[0].Value) == "ACK" {
// Master recieved REPLCONF ACK <offset> from replica -> Read
<offset> from replica
resp = &RESP{
Type: INTEGER,
Value: args[1].Value,
}
} else {
// Master recieved REPLCONF listening-port <port> or REPLCONF capa
psync2 from replica -> Do nothing
resp = OkResp()
Write(conn.Writer, resp)
}
return resp
}

func (s *Server) wait(args []*RESP) *RESP {


if !s.NeedAcks {
return &RESP{Type: INTEGER, Value: strconv.Itoa(s.ReplicaCount)}
}
getAck := GetAckResp().Marshal()
defer func() {
s.MasterReplOffset += len(getAck)
s.RedirectRead = false
53
s.NeedAcks = false
fmt.Println("")
}()

numReplicas, _ := strconv.Atoi(args[0].Value)
timeout, _ := strconv.Atoi(args[1].Value)

timeoutChan := time.After(time.Duration(timeout) * time.Millisecond)


acks := 0

s.RedirectRead = true
go func() {
for _, c := range s.Conns {
if c.Type != REPLICA {
continue
}
Write(c.Writer, getAck)
}
}()

for {
select {
case <-timeoutChan:
return &RESP{
Type: INTEGER,
Value: strconv.Itoa(acks),
}
default:
for _, c := range s.Conns {
if c.Type != REPLICA {
continue
}
select {
case parsedResp := <-c.Chan:
fmt.Println("Received ACK from replica")
_, args := parsedResp.getCmdAndArgs()
result := s.replConfig(args, c)
strconv.Atoi(result.Value)
// replOffset, _ := strconv.Atoi(result.Value)
// if replOffset == s.MasterReplOffset {
acks++
if acks == numReplicas {
return &RESP{
Type: INTEGER,
Value: strconv.Itoa(acks),
}
54
}
// }
case <-timeoutChan:
return &RESP{
Type: INTEGER,
Value: strconv.Itoa(acks),
}
default:
continue
}
}
}
}
}

func (s *Server) multi(conn *ConnRW) {


conn.RedirectRead = true
q := conn.TransactionsQueue
for {
resp := <-conn.Chan
if resp.IsExec() {
break
}
if resp.IsDiscard() {
q.Clear()
Write(conn.Writer, OkResp())
conn.RedirectRead = false
return
}
q.Enqueue(resp)
Write(conn.Writer, QueuedResp())
}
s.exec(conn)
}

func (s *Server) exec(conn *ConnRW) *RESP {


if !conn.RedirectRead {
return ErrResp("ERR EXEC without MULTI")
}
q := conn.TransactionsQueue
response := &RESP{Type: ARRAY, Values: []*RESP{}}

for !q.IsEmpty() {
resp, _ := q.Dequeue()
results := s.Handler(resp.(*RESP), conn)
response.Values = append(response.Values, results...)
55
}

Write(conn.Writer, response)
conn.RedirectRead = false

return OkResp()
}

func (s *Server) discard() *RESP {


return ErrResp("ERR DISCARD without MULTI")
}

func (s *Server) config(args []*RESP) *RESP {


if strings.ToUpper(args[0].Value) == "GET" {
if strings.ToLower(args[1].Value) == "dir" {
return &RESP{
Type: ARRAY,
Values: []*RESP{
{Type: STRING, Value: "dir"},
{Type: STRING, Value: s.Dir},
},
}
}
return &RESP{
Type: ARRAY,
Values: []*RESP{
{Type: STRING, Value: "dbfilename"},
{Type: STRING, Value: s.Dbfilename},
},
}
}
return &RESP{
Type: ERROR,
Value: "ERR unknown subcommand or wrong number of arguments",
}
}

func (s *Server) typecmd(args []*RESP) *RESP {


if len(args) == 0 {
return ErrResp("Err no key given to TYPE command")
}
if len(args) > 1 {
return ErrResp("Too many keys given to TYPE command")
}

key := args[0].Value
56
s.SETsMu.Lock()
_, ok := s.SETs[key]
s.SETsMu.Unlock()
if ok {
return SimpleString("string")
}

s.XADDsMu.Lock()
_, ok = s.XADDs[key]
s.XADDsMu.Unlock()
if ok {
return SimpleString("stream")
}

return SimpleString("none")
}

// ------------------------------------------------------------------------
----
server.go

 Description:
Implements the core server functionality, managing client connections, request
routing, and execution.

 Key Responsibilities:

o Starting and stopping the TCP/HTTP server.

o Managing client sessions and concurrency.

o Routing parsed commands to appropriate handlers.

package main

import (
"errors"
"flag"
"fmt"
"net"
"os"
"strings"
"sync"
"time"

"math/rand/v2"

queue "server/queue"

57
radix "server/radix"
)

func (st ServerType) String() string {


switch st {
case MASTER:
return "master"
case REPLICA:
return "slave"
default:
return "unknown"
}
}

// ------------------------------------------------------------------------
----

// Server creation --------------------------------------------------------


----
func NewServer(config *Config) (*Server, error) {
server := &Server{
Role: MASTER,
Port: config.Port,
MasterReplOffset: 0,
Conns: []*ConnRW{},
SETs: map[string]string{},
SETsMu: sync.RWMutex{},
EXPs: map[string]int64{},
XADDs: map[string]*radix.Radix{},
XADDsMu: sync.RWMutex{},
XADDsCh: make(chan bool, 1),
}

// Set server port number


l, err := net.Listen("tcp", "0.0.0.0:"+config.Port)
if err != nil {
fmt.Println("Failed to bind to port " + config.Port)
return nil, err
}
server.Listener = l

// Set server role, master host and master port


if config.IsReplica {
server.Role = REPLICA
server.MasterHost = config.MasterHost
server.MasterPort = config.MasterPort
}

// Set server repl id and repl offset


server.MasterReplid = RandStringBytes(40)

// Set Dir and Dbfilename if given


if config.Dir != "" && config.Dbfilename != "" {
server.Dir = config.Dir
server.Dbfilename = config.Dbfilename
server.LoadRDB()

return server, nil


}

// Generate random string for repl id


58
const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

func initRandom() {
// rand.Seed(uint64(time.Now().UnixNano()))
rand.New(rand.NewPCG(uint64(time.Now().UnixNano()), 0))
}

func RandStringBytes(n int) string {


initRandom()
b := make([]byte, n)
for i := range b {
b[i] = letterBytes[rand.Int64()%int64(len(letterBytes))]
}
return string(b)
}

func (s *Server) LoadRDB() {


// Check if directory exists
if _, err := os.Stat(s.Dir); os.IsNotExist(err) {
fmt.Println("Directory does not exist")
return
}

// Check if file exists


if _, err := os.Stat(s.Dir + "/" + s.Dbfilename); os.IsNotExist(err)
{
fmt.Println("File does not exist")
return
}

// Open file and read contents


file, err := os.Open(s.Dir + "/" + s.Dbfilename)
if err != nil {
fmt.Println("Failed to open file")
return
}
defer file.Close()

s.decodeRDB(NewBuffer(file))
}

// ------------------------------------------------------------------------
----

// Accept / Handshake / Close connection ----------------------------------


----
func (s *Server) serverListen() {
for {
s.serverAccept()
}
}

func (s *Server) serverAccept() {


conn, err := s.Listener.Accept()
if err != nil {
fmt.Println("Error accepting connection: ", err.Error())
return
}

if s.Role == MASTER {
go s.handleClientConnAsMaster(conn)
} else {
go s.handleClientConnAsReplica(conn)
59
}
}

// Handshake happens in 3 stages


func (s *Server) handShake() error {
conn, err := net.Dial("tcp", s.MasterHost+":"+s.MasterPort)
s.MasterConn = conn
if err != nil {
fmt.Println("Failed to connect to master")
os.Exit(1)
}

resp := NewBuffer(conn)
writer := NewWriter(conn)
connRW := &ConnRW{MASTER, conn, resp, writer, nil, false, false,
queue.NewQueue()}

// Stage 1
Write(writer, PingResp())
parsedResp, _, err := resp.Read()
if err != nil {
return err
}
if !parsedResp.IsPong() {
return errors.New("master server did not respond with PONG")
}

// Stage 2
Write(writer, ReplconfResp(1, s.Port))
parsedResp, _, err = resp.Read()
if err != nil {
return err
}
if !parsedResp.IsOkay() {
return errors.New("master server did not respond with OK")
}

Write(writer, ReplconfResp(2, s.Port))


parsedResp, _, err = resp.Read()
if err != nil {
return err
}
if !parsedResp.IsOkay() {
return errors.New("master server did not respond with OK")
}

// Stage 3
Write(writer, Psync(0, 0))
rdb, err := resp.ReadFullResync()
if err != nil {
return err
}
s.Handler(rdb, connRW)

s.MasterReplOffset = 0
go s.handleMasterConnAsReplica(connRW)

return nil
}

func (s *Server) serverClose() {


for _, conn := range s.Conns {
conn.Conn.Close()
60
}
}

// ------------------------------------------------------------------------
----

// Handle connection ------------------------------------------------------


----
func (s *Server) handleClientConnAsMaster(conn net.Conn) {
resp := NewBuffer(conn)
writer := NewWriter(conn)
ch := make(chan *RESP)
connRW := &ConnRW{CLIENT, conn, resp, writer, ch, false, false,
queue.NewQueue()}
s.Conns = append(s.Conns, connRW)
for {
parsedResp, _, err := resp.Read()
if err != nil {
fmt.Println(err)
fmt.Println("Closing")
return
}

if s.RedirectRead || connRW.RedirectRead {
fmt.Println("Handling client connection on redirect",
parsedResp)
connRW.Chan <- parsedResp
} else {
fmt.Println("Handling client connection on main loop",
parsedResp)
results := s.Handler(parsedResp, connRW)

for _, result := range results {


fmt.Println("Writing response", result)
Write(writer, result)
}
}
}
}

func (s *Server) handleClientConnAsReplica(conn net.Conn) {


resp := NewBuffer(conn)
writer := NewWriter(conn)
connRW := &ConnRW{CLIENT, conn, resp, writer, nil, false, false,
queue.NewQueue()}
s.Conns = append(s.Conns, connRW)
for {
parsedResp, n, err := resp.Read()
var results []*RESP
if err != nil {
if err.Error() == "EOF" {
fmt.Println("Closing")
return
}
fmt.Println(err)
} else {
results = s.Handler(parsedResp, connRW)
s.MasterReplOffset += n
}

for _, result := range results {


Write(writer, result)
}
61
}
}

func (s *Server) handleMasterConnAsReplica(connRW *ConnRW) {


s.Conns = append(s.Conns, connRW)
for {
fmt.Println("Handling master connection")
parsedResp, n, err := connRW.Reader.Read()
fmt.Println("Read: ", parsedResp)
if err != nil {
if err.Error() == "EOF" {
fmt.Println("Closing")
return
}
fmt.Println("Error: ", err)
} else {
s.Handler(parsedResp, connRW)
s.MasterReplOffset += n
}
}
}

// ------------------------------------------------------------------------
----

// Entry point and command line arguments ---------------------------------


----
func parseFlags() (*Config, error) {
config := &Config{}
flag.StringVar(&config.Port, "port", "6379", "Server Port")
repl := ""
flag.StringVar(&repl, "replicaof", "", "Master connection <address
port> to replicate")
flag.StringVar(&config.Dir, "dir", "", "directory to rdb file")
flag.StringVar(&config.Dbfilename, "dbfilename", "", "rdb file name")

flag.Parse()

if repl != "" {
config.IsReplica = true
ap := strings.Split(repl, " ")
if len(ap) != 2 {
return nil, errors.New("wrong argument count for --
replicaof")
}
config.MasterHost, config.MasterPort = ap[0], ap[1]
}
return config, nil
}

func main() {
config, err := parseFlags()
if err != nil {
fmt.Println(err)
os.Exit(1)
}

server, err := NewServer(config)


if err != nil {
fmt.Println("Failed to create server")
os.Exit(1)
}
defer server.serverClose()
62
if server.Role == REPLICA {
err := server.handShake()
if err != nil {
fmt.Println("failed to connect to master server")
os.Exit(1)
}
}

fmt.Println("listening on port: " + server.Port + "...")

server.serverListen()
}

63
Chapter 5
CONCLUSION

The advent of in-memory databases represents a significant transformation in the realm of


data management, providing a paradigm shift from disk-centric to memory-first architectures.
This report has comprehensively analyzed their design, storage mechanisms, performance
capabilities, and practical applications, shedding light on why in-memory databases are
gaining traction across diverse industries.

5.1 Revolutionizing Data Access and Performance

At the core of in-memory databases lies the promise of unparalleled speed and efficiency. By
storing data primarily in RAM, these systems eliminate the latency associated with disk I/O,
making them indispensable for applications requiring real-time data processing. Whether it’s
financial trading, fraud detection, IoT data streams, or gaming, the low-latency nature of in-
memory databases ensures that critical operations can be executed with minimal delay. This
characteristic has positioned them as essential tools in industries where time-sensitive
decisions can have significant implications.

Moreover, the use of advanced indexing techniques, such as hash tables and AVL trees,
further enhances performance by enabling rapid data retrieval and updates. The integration of
compression algorithms ensures that memory resources are used efficiently, making these
systems viable for datasets of substantial size.

5.2 Balancing Durability and Volatility

One of the challenges inherent to in-memory databases is their reliance on volatile memory,
which can lead to data loss in the event of power failures or crashes. However, modern
systems have introduced innovative solutions to mitigate this risk. Techniques such as write-
ahead logging (WAL), append-only files (AOF), and snapshot-based persistence ensure that
critical data can be recovered even after a failure. These advancements have struck a balance
between performance and durability, enabling in-memory databases to cater to both volatile
and non-volatile use cases.

64
This dual focus on speed and reliability has expanded their adoption in areas such as hybrid
transactional/analytical processing (HTAP), where both transactional consistency and
analytical insights are required in near-real time.

5.3 Applications and Use Cases

The versatility of in-memory databases extends beyond traditional data management tasks.
They are instrumental in powering high-performance caching solutions, enabling faster web
application responses, and supporting distributed computing frameworks. Their ability to
process vast amounts of data in parallel makes them ideal for big data analytics, where
insights must be derived from rapidly changing datasets.

Examples like Redis, SAP HANA, and Aerospike illustrate the diversity of use cases, ranging
from simple key-value storage to complex analytics. These systems demonstrate how in-
memory databases can adapt to varied requirements, from small-scale applications to
enterprise-level deployments.

5.4 Challenges and Limitations

Despite their advantages, in-memory databases are not without challenges. The cost of
maintaining large amounts of RAM can be prohibitive, particularly for startups or small
organizations. Furthermore, the reliance on memory imposes limitations on data storage
capacity, requiring careful consideration of sharding and replication strategies for scalability.

Another concern is the evolving nature of data compliance and privacy regulations. In-
memory databases must address challenges related to secure data handling, especially when
used in sensitive industries such as healthcare and finance.

5.5 Future Prospects

The future of in-memory databases looks promising, driven by advancements in hardware


technologies, such as non-volatile memory (NVM) and persistent memory modules. These
innovations blur the lines between RAM and disk, offering the speed of in-memory systems
with the durability of traditional storage. Furthermore, the integration of machine learning and
AI tools into in-memory platforms is likely to open up new avenues for intelligent data
processing and decision-making.
65
As organizations increasingly adopt cloud-native architectures, the scalability and
elasticity of in-memory databases will play a crucial role in modern application
development. Their ability to seamlessly integrate with distributed systems and cloud
platforms ensures they will remain relevant in the evolving technological landscape.

5.6 Final Thoughts

In-memory databases have redefined the expectations of data management systems by


prioritizing speed, flexibility, and real-time capabilities. While they may not completely
replace traditional on-disk databases, their role in complementing existing technologies and
addressing specialized use cases is undeniable. Organizations that leverage these systems
effectively can gain a competitive edge in a data-driven world, where agility and
responsiveness are key.
By understanding the design principles, use cases, and challenges of in-memory databases,
this report underscores their transformative potential and provides a roadmap for future
innovation and research. As technology continues to evolve, the role of in-memory databases
will undoubtedly expand, shaping the future of data-driven industries.

66
REFERENCES

1. Depoortere, R., & Van Landeghem, H. (2022). A survey of in-memory databases. Journal of
Database Management, 34(2), 45-60.
2. "High Performance In-Memory Computing with Apache Ignite" by Shamim Bhuiyan,
Michael Zheludkov, and Dmitriy Setrakyan
3. "Redis Essentials" by Maxwell Dayvson da Silva and Hugo Lopes Tavares
4. "Designing Data-Intensive Applications" by Martin Kleppmann

5. https://aws.amazon.com/nosql/in-memory/
6. https://www.couchbase.com/resources/concepts/in-memory-database/
7. https://www.dragonflydb.io/guides/in-memory-databases
8. https://www.mongodb.com/resources/basics/databases/in-memory-
database?msockid=0e141f4b826c654939870bc083f76467
9. https://en.wikipedia.org/wiki/In-memory_database

67
STUDENTS PROFILE

Name: Naveen Paudel

Enrolment No.: 211B192

Email: [email protected]

Address: BHOPAL, Madhya Pradesh

Contact: 7049005595

Name: Nikhil Sahu

Enrolment No.: 211B195

Email: [email protected]

Address: Guna, Madhya Pradesh

Contact: 8299115905

Name: Panshul Khurchwal

Enrolment No.: 211B202

Email: [email protected]

Address: Faridabad, Haryana

Contact: 8813982207

68
69

You might also like