0% found this document useful (0 votes)
185 views39 pages

Adsa Lecture Notes

The document covers advanced data structures and algorithms, focusing on the role of algorithms in computing, complexity analysis, and the importance of efficient algorithms. It discusses various data structures such as arrays, linked lists, stacks, queues, trees, graphs, and hashing, along with performance measurement techniques and randomized algorithms like Randomized Quicksort. The significance of efficient algorithms in optimizing resource utilization, system performance, and problem-solving across different domains is also highlighted.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
185 views39 pages

Adsa Lecture Notes

The document covers advanced data structures and algorithms, focusing on the role of algorithms in computing, complexity analysis, and the importance of efficient algorithms. It discusses various data structures such as arrays, linked lists, stacks, queues, trees, graphs, and hashing, along with performance measurement techniques and randomized algorithms like Randomized Quicksort. The significance of efficient algorithms in optimizing resource utilization, system performance, and problem-solving across different domains is also highlighted.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

M23CST102 - ADVANCED DATA STRUCTURES AND ALGORITHMS

UNIT – I ROLE OF ALGORITHMS IN COMPUTING & COMPLEXITY ANALYSIS 9


Review of Basic Concepts, Asymptotic Analysis of Recurrences Asymptotic notation-
Importance of efficient algorithms- Program performance measurement. Randomized
Algorithms. Randomized Quicksort, Analysis of Hashing algorithms. The Recursion-
Implementing Recursion using Stacks, Queues – ADT.

REVIEW OF BASIC CONCEPTS


Data structures are fundamental building blocks in computer science that enable efficient organization, storage, and
retrieval of data. They provide a way to manage and manipulate data, facilitating the development of efficient algorithms.
Here are some basic concepts of data structures:

Arrays:An array is a collection of elements stored in contiguous memory locations.


Elements are accessed using an index or a key.
Arrays have a fixed size, and accessing elements is generally fast.
Linked Lists:A linked list is a collection of nodes where each node contains data and a reference to the next node in the
[Link] lists can be singly linked (each node points to the next node) or doubly linked (each node points to both
the next and previous nodes).
Stacks:A stack is a Last In, First Out (LIFO) data structure, where the last element added is the first one to be removed.
Operations include push (add an element to the top) and pop (remove the top element).
Queues:A queue is a First In, First Out (FIFO) data structure, where the first element added is the first one to be removed.
Operations include enqueue (add an element to the back) and dequeue (remove the front element).
Trees:A tree is a hierarchical data structure consisting of nodes connected by edges.
The top node is called the root, and nodes with no children are called leaves.
Common types of trees include binary trees, binary search trees, and AVL trees.
Graphs:A graph is a collection of nodes (vertices) and edges connecting these nodes.
Graphs can be directed (edges have a direction) or undirected.
Common representations include adjacency matrices and adjacency lists.
Hash Tables:A hash table is a data structure that stores key-value pairs.
It uses a hash function to compute an index into an array of buckets or slots, where the desired value can be found.
Heaps:A heap is a specialized tree-based data structure that satisfies the heap property.
Heaps are commonly used to implement priority queues.
Sorting and Searching Algorithms:Various algorithms are used to organize and search data, such as bubble sort,
quicksort, mergesort, linear search, and binary search.
Complexity Analysis:Understanding the time and space complexity of algorithms and data structures is crucial. Big O
notation is commonly used to describe the upper bound of an algorithm's time or space complexity.
These concepts provide a foundation for understanding and implementing more advanced data structures and algorithms
in computer science.

ASYMPTOTIC ANALYSIS OF RECURRENCES ASYMPTOTIC NOTATION


Asymptotic analysis is a mathematical method for describing the limiting behavior of a function as its input size
approaches infinity. This analysis is commonly used in computer science to analyze the efficiency of algorithms and
understand their performance characteristics. Recurrences are equations that define the running time of algorithms in
terms of the size of the input.
Three common asymptotic notations used in the analysis of recurrences are:
1. Big-O Notation (O):
 Big-O notation provides an upper bound on the growth rate of a function.
 For a given function g(n), O(f(n)) represents the set of functions that grow at most as fast as a constant multiple
of f(n) for sufficiently large n.
 Example: If an algorithm has a running time of O(n^2), it means the running time grows at most quadratically
with the input size.
2. Omega Notation (Ω):
 Omega notation provides a lower bound on the growth rate of a function.
 For a given function g(n), Ω(f(n)) represents the set of functions that grow at least as fast as a constant multiple
of f(n) for sufficiently large n.
 Example: If an algorithm has a running time of Ω(n), it means the running time grows at least linearly with the
input size.
3. Theta Notation (Θ):
 Theta notation provides a tight bound on the growth rate of a function, combining both upper and lower bounds.
 For a given function g(n), Θ(f(n)) represents the set of functions that grow at the same rate as a constant multiple
of f(n) for sufficiently large n.
 Example: If an algorithm has a running time of Θ(n log n), it means the running time grows exactly at the rate of
n log n for sufficiently large n.
When analyzing recurrences, especially in the context of algorithmic complexity, these notations help express the
efficiency of an algorithm without getting into the details of constant factors and lower-order terms. Solving recurrences
using techniques like the Master Theorem is common in algorithm analysis, and the results are often expressed using one
of the asymptotic notations.
IMPORTANCE OF EFFICIENT ALGORITHMS-

Efficient algorithms play a crucial role in computer science and various fields for several reasons:
1. Optimal Resource Utilization:
 Efficient algorithms use computational resources such as time and memory more effectively, leading to optimal
performance. This is essential in applications where resource constraints are significant.
2. Improved System Performance:
 Efficient algorithms contribute to faster execution times and reduced processing overhead. This is particularly
important in real-time systems, where timely responses are critical.
3. Scalability:
 Efficient algorithms are designed to handle larger datasets and input sizes gracefully. This scalability is essential
as systems and datasets continue to grow.
4. Cost-Effectiveness:
 In various applications, especially in cloud computing and distributed systems, efficient algorithms can lead to
cost savings. They allow for better resource management, reducing the need for excessive computing power.
5. Energy Efficiency:
 Energy consumption is a significant concern in modern computing. Efficient algorithms often result in lower
energy consumption, which is crucial for mobile devices, data centers, and other energy-sensitive applications.
6. Improved User Experience:
 In software applications, efficient algorithms contribute to a smoother and more responsive user experience.
Faster response times and reduced latency enhance user satisfaction.
7. Optimized Storage:
 Algorithms that efficiently organize and manage data contribute to optimal storage usage. This is vital in
applications dealing with large databases, file systems, and distributed storage.
8. Effective Problem Solving:
 In many fields, including artificial intelligence, optimization, and data analysis, efficient algorithms are essential
for solving complex problems in a reasonable amount of time. This allows for practical solutions to real-world
challenges.
9. Algorithmic Competitiveness:
 In competitive industries, businesses and systems that employ efficient algorithms gain a competitive edge. The
ability to process data faster, make quicker decisions, and deliver timely results can be a key differentiator.
10. Scientific Advancements:
 In scientific research, simulations, and data analysis, efficient algorithms enable researchers to process and
analyze large datasets, simulate complex systems, and make breakthroughs in various domains.
11. Global Impact:
 In fields like healthcare, finance, transportation, and communication, efficient algorithms contribute to
advancements that have a direct impact on people's lives. For example, efficient algorithms play a role in
medical imaging, financial modeling, traffic optimization, and communication networks.
In summary, the importance of efficient algorithms extends across various domains, influencing system performance,
cost-effectiveness, energy consumption, user experience, and the ability to solve complex problems. As technology
continues to advance, the role of efficient algorithms becomes increasingly critical in shaping the capabilities and impact
of computing systems.

PROGRAM PERFORMANCE MEASUREMENT


Program performance measurement is a crucial aspect of software development, helping developers identify bottlenecks,
optimize code, and ensure efficient resource utilization. There are several techniques and tools available for measuring
and analyzing the performance of a program. Here are some common methods:
1. Profiling:
 Profiling tools help developers identify the areas of code that consume the most resources, such as CPU time or
memory.
 Profilers can provide information about function call times, memory usage, and other performance-related
metrics.
 Examples of profiling tools include:
 CPU Profilers: Such as gprof for C/C++ or built-in profilers in modern IDEs.
 Memory Profilers: Such as Valgrind or the memory profiler in Visual Studio.
2. Benchmarking:
 Benchmarking involves running a program or a specific code segment under controlled conditions to measure its
performance.
 Developers can use benchmarking frameworks to automate the process and compare the performance of
different implementations.
 Popular benchmarking tools include JMH (Java Microbenchmarking Harness) for Java applications.
3. Execution Time Measurement:
 Manually measuring the execution time of specific code segments can provide insights into performance.
 Using timers or built-in timing functions in programming languages allows developers to measure the time taken
by a piece of code.
 For example, in Python, the timeit module can be used for simple time measurements.
4. Memory Usage Monitoring:
 Tools and techniques are available to monitor a program's memory usage.
 Profilers often provide information about memory consumption, but standalone tools like ps (process status) or
top (task overview) in Unix-like systems can also be useful.
5. Resource Monitoring:
 Monitoring system-level resources (CPU, memory, disk I/O) during program execution can help identify
performance issues.
 Tools like perf in Linux or the Windows Performance Monitor provide insights into system-level resource
usage.
6. Code Review and Static Analysis:
 Code review and static analysis tools can help identify potential performance issues and coding patterns that
may lead to inefficiencies.
 Tools like SonarQube, linting tools, and IDE-based static analyzers can assist in identifying performance-related
code smells.
7. Logging and Tracing:
 Incorporating logging and tracing mechanisms in the code can help developers understand the flow of execution
and identify performance bottlenecks.
 Tools like the Elastic Stack (Elasticsearch, Logstash, Kibana) or distributed tracing tools like Jaeger can assist in
analyzing logs and traces.
8. Code Profiling in Integrated Development Environments (IDEs):
 Many IDEs provide built-in profiling tools that can be used to analyze the performance of code during
development.
 Visual Studio, IntelliJ IDEA, and Eclipse are examples of IDEs with profiling capabilities.
When measuring program performance, it's essential to consider the specific requirements and constraints of the
application, as well as the target platform and deployment environment. Continuous performance monitoring and
optimization are integral parts of the software development lifecycle, ensuring that applications meet performance
expectations and deliver a positive user experience.

RANDOMIZED ALGORITHMS
Randomized algorithms are algorithms that use randomness to make decisions during their execution. Unlike
deterministic algorithms, which produce the same output for a given input every time they run, randomized algorithms
introduce an element of randomness to achieve certain desirable properties. Randomized algorithms are particularly
useful in situations where the input data is uncertain, dynamic, or difficult to analyze deterministically. Here are some
key aspects of randomized algorithms:
1. Probabilistic Analysis:
 Randomized algorithms are often analyzed using probabilistic techniques. Instead of providing exact guarantees,
they offer probabilistic bounds on their performance.
 The analysis typically involves considering the average-case or expected behavior over a range of possible
inputs.
2. Monte Carlo and Las Vegas Algorithms:
 Two main types of randomized algorithms are Monte Carlo and Las Vegas algorithms.
 Monte Carlo Algorithms: These algorithms may produce incorrect results with a small probability. However,
the probability of error can be controlled, and they often run faster than their deterministic counterparts.
 Las Vegas Algorithms: These algorithms always produce the correct result, but their runtime may vary. They
use randomness to speed up their expected running time.
3. Randomized Data Structures:
 Randomization is often employed in the design of data structures to achieve better average-case performance.
 Examples include randomized skip lists and randomized binary search trees, where random choices during
insertion or deletion operations help maintain balance.
4. Randomized Quicksort:
 Quicksort, a sorting algorithm, can be randomized by selecting a random element as the pivot during the
partition step. This helps avoid worst-case scenarios that may occur with a fixed pivot selection strategy.
5. Primality Testing:
 Algorithms for testing whether a number is prime can be randomized. The Miller-Rabin primality test is an
example of a probabilistic algorithm used for this purpose.
6. Randomized Selection:
 The problem of finding the kth smallest (or largest) element in an array can be solved using randomized
algorithms. The randomized quickselect algorithm is an example.
7. Hashing:
 Randomized algorithms are widely used in hashing for building hash functions and handling collisions.
 Universal hashing is a technique where a hash function is randomly chosen from a family of hash functions to
minimize collisions.
8. Randomized Algorithms in Machine Learning:
 Randomized algorithms find applications in machine learning, especially in areas like randomized optimization,
randomized sampling, and randomized algorithms for approximating solutions to large-scale problems.
9. Communication Networks:
 Randomized algorithms are used in routing and network protocols to handle uncertainties and provide efficient
solutions in dynamic network environments.
10. Randomized Algorithm for Matrix Multiplication:
 The Las Vegas algorithm for matrix multiplication introduces randomness to reduce the complexity of the
multiplication, improving its efficiency in certain scenarios.
While randomized algorithms offer advantages in terms of simplicity and efficiency in some contexts, they are not
suitable for all situations. The choice of using randomness in algorithm design depends on the specific requirements of
the problem and the trade-offs between deterministic and randomized approaches. Probabilistic analysis plays a crucial
role in understanding the expected behavior and performance of randomized algorithms.
RANDOMIZED QUICKSORT
Randomized Quicksort is a variant of the Quicksort algorithm that introduces an element of randomness to improve its
average-case performance. The primary idea is to randomly select a pivot element during each partition step, which helps
avoid worst-case scenarios that can occur with deterministic pivot selection strategies. In the worst case, the time
complexity of traditional Quicksort can degrade to O(n^2), but Randomized Quicksort provides an expected time
complexity of O(n log n) due to the random pivot selection.
Here is an outline of the Randomized Quicksort algorithm:
1. Random Pivot Selection:
 Instead of always choosing the first, last, or middle element as the pivot, Randomized Quicksort randomly
selects a pivot from the array.
2. Partitioning:
 The array is partitioned into two subarrays such that elements less than the pivot are on the left, and elements
greater than the pivot are on the right.
3. Recursion:
 The algorithm recursively applies the same process to the subarrays created in the partitioning step.
4. Base Case:
 The recursion stops when the subarrays become small enough to be efficiently sorted using a different sorting
algorithm (e.g., insertion sort).
The randomized pivot selection reduces the likelihood of encountering a worst-case scenario where the array is already
sorted or nearly sorted. As a result, the expected time complexity of Randomized Quicksort is O(n log n) for a random
input.
Here's a simple implementation of Randomized Quicksort in Python:
import random

def randomized_quicksort(arr):
if len(arr) <= 1:
return arr
else:
pivot = [Link](arr)
less = [x for x in arr if x < pivot]
equal = [x for x in arr if x == pivot]
greater = [x for x in arr if x > pivot]
return randomized_quicksort(less) + equal + randomized_quicksort(greater)

# Example usage:
arr = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
sorted_arr = randomized_quicksort(arr)
print(sorted_arr)
It's important to note that the [Link](arr) line is responsible for selecting a random pivot element. This random
pivot selection helps improve the average-case performance of the algorithm.
While Randomized Quicksort has an expected time complexity of O(n log n), its worst-case time complexity remains
O(n^2) in rare situations, but the probability of encountering such cases is reduced compared to the deterministic version
of Quicksort. The randomization makes the algorithm more robust and suitable for various input scenarios.
ANALYSIS OF HASHING ALGORITHMS
Hashing algorithms play a crucial role in computer science and are widely used in various applications, such as data
retrieval, security, and distributed systems. The analysis of hashing algorithms involves evaluating their performance,
collision resolution strategies, and other relevant properties. Here are key aspects of the analysis of hashing algorithms:
1. Hash Function Properties:
 A good hash function should evenly distribute keys across the hash table to minimize collisions.
 Desirable properties of hash functions include:
 Deterministic: The same input should always produce the same hash value.
 Efficient to Compute: The hash function should be fast to compute.
 Uniform Distribution: The hash values should be evenly distributed over the range.
2. Collision Resolution:
 Collisions occur when two different keys produce the same hash value.
 Common collision resolution strategies include:
 Chaining: Each bucket in the hash table is a linked list, and colliding elements are stored in the same
bucket.
 Open Addressing: Collisions are resolved by placing the colliding element in the next available slot in
the hash table.
3. Load Factor:
 The load factor of a hash table is the ratio of the number of stored elements to the total number of buckets.
 A higher load factor can lead to more collisions, affecting performance, while a lower load factor may result in
wasted space.
 Balancing load factor is crucial for efficient hashing.
4. Collision Handling Efficiency:
 The efficiency of collision resolution strategies is a key factor in the overall performance of hashing algorithms.
 Analyzing the worst-case and average-case time complexity of collision resolution operations (e.g., insertion,
deletion, and search) provides insights into the efficiency of the algorithm.
5. Universal Hashing:
 Universal hashing is a technique that uses a family of hash functions, and a random function is chosen at
runtime.
 It helps reduce the likelihood of collisions and is particularly useful in situations where the distribution of keys
is unknown.
6. Security Analysis (for Cryptographic Hash Functions):
 In the context of cryptographic applications, hash functions must exhibit specific security properties.
 Collision resistance, preimage resistance, and second preimage resistance are crucial properties in cryptographic
hash functions.
7. Time and Space Complexity:
 Analyzing the time and space complexity of hashing algorithms provides a comprehensive understanding of
their efficiency.
 The complexity of hash function computation, collision resolution, and overall operations impact the algorithm's
performance.
8. Dynamic Resizing:
 Dynamic resizing, such as doubling or halving the size of the hash table, helps adapt to changes in the number of
stored elements.
 The efficiency of dynamic resizing operations, including rehashing, affects the overall performance.
9. Cuckoo Hashing:
 Cuckoo hashing is an alternative approach that uses multiple hash functions and two hash tables.
 Analyzing the worst-case behavior and average-case performance of cuckoo hashing is essential for
understanding its efficiency.
10. Real-world Considerations:
 Practical considerations, such as cache efficiency, locality, and hardware-specific characteristics, can impact the
performance of hashing algorithms in real-world scenarios.
Analyzing hashing algorithms involves a combination of theoretical analysis and practical considerations. Different
applications may require different trade-offs, and the choice of a hashing algorithm depends on the specific requirements
of the use case.
THE RECURSION
Recursion is a programming and problem-solving technique where a function calls itself directly or indirectly in order to
solve a problem. In the context of programming, a recursive function is a function that calls itself during its execution.
This technique is particularly useful for solving problems that can be broken down into smaller, similar sub-problems.
Here are key concepts related to recursion:
1. Base Case:
 Every recursive function must have a base case, which defines the simplest version of the problem that does not
require further recursion.
 The base case is essential to prevent infinite recursion and ensure that the function eventually terminates.
2. Recursive Case:
 The recursive case defines how the problem is broken down into smaller sub-problems and how the function
calls itself with these sub-problems.
 Each recursive call should move towards the base case to ensure progress towards the termination condition.
3. Function Call Stack:
 Recursive function calls are managed by the function call stack. Each recursive call pushes a new frame onto the
stack, storing local variables and the return address.
 The stack is popped as the base case is reached or when the function completes its execution.
4. Indirect Recursion:
 Indirect recursion occurs when a function calls another function, and that function eventually calls the original
function, creating a cycle of function calls.
5. Tail Recursion:
 A tail-recursive function is a type of recursion where the recursive call is the last operation in the function. Some
programming languages offer optimizations for tail-recursive functions, turning them into iterative processes.
6. Advantages of Recursion:
 Simplicity: Recursive solutions can be more concise and easier to understand for certain problems.
 Divide and Conquer: Recursive algorithms often follow a divide-and-conquer approach, breaking down a
problem into smaller sub-problems.
7. Disadvantages of Recursion:
 Memory Usage: Recursion uses the call stack, and deep recursion can lead to stack overflow errors if not
managed carefully.
 Performance: Recursive solutions may have higher overhead due to multiple function calls and stack
operations.
8. Examples of Recursive Problems:
 Factorial: n! = n * (n-1) * ... * 1
 Fibonacci Sequence: F(n) = F(n-1) + F(n-2)
 Binary Search: A recursive approach to searching a sorted array.
 Tree Traversal: Recursive algorithms are often used for traversing tree structures.
Here's a simple example of a recursive function in Python to calculate the factorial of a number:
def factorial(n):
# Base case
if n == 0 or n == 1:
return 1
# Recursive case
else:
return n * factorial(n-1)

# Example usage
result = factorial(5)
print(result) # Output: 120
In this example, the base case is when n is 0 or 1, and the recursive case involves calling the factorial function with a
smaller value (n-1). Recursion provides an elegant solution for problems that exhibit a recursive structure.

IMPLEMENTING RECURSION USING STACKS


Recursion relies on the call stack to manage function calls and their local variables. When implementing recursion using
an explicit stack (instead of relying on the call stack provided by the programming language), it often involves manually
managing the state of each recursive call. Here's a simple example of implementing recursion using a stack in Python:
Let's consider the example of calculating the factorial of a number using an iterative approach with a stack:
def factorial_iterative(n):
stack = []
result = 1

while n > 1 or stack:


if n > 1:
[Link](n)
n -= 1
else:
result *= [Link]()

return result

# Example usage
result_iterative = factorial_iterative(5)
print(result_iterative) # Output: 120
In this example:
 We use a stack (stack) to manage the state of the iterative process.
 The loop continues until n becomes 1 and the stack is empty.
 If n is greater than 1, we push it onto the stack and decrement n. Otherwise, we multiply the result by the top element of
the stack.
This example mimics the behavior of a recursive function without actually using recursion. The stack keeps track of the
state that would otherwise be managed by the call stack in a recursive solution.
Keep in mind that the explicit use of a stack can be less intuitive and may have additional complexity compared to direct
recursion. Recursive solutions often leverage the implicit call stack provided by programming languages, which makes
the code more concise and easier to understand.
If you are working in a language that supports tail call optimization, you might consider using a tail-recursive function,
which can be transformed into an iterative process by the compiler or interpreter. However, not all languages support this
optimization.

IMPLEMENTING RECURSION USING QUEUES


Implementing recursion using queues is less common than using stacks, but it is possible for certain types of problems. In
this example, I'll demonstrate a simple case using a queue to implement a breadth-first traversal of a binary tree. This is a
form of recursion where each level of the tree is processed before moving to the next level.
Here's a Python example:
from collections import deque

class TreeNode:
def __init__(self, value):
[Link] = value
[Link] = None
[Link] = None

def breadth_first_traversal(root):
if not root:
return []

result = []
queue = deque([root])

while queue:
current_node = [Link]()
[Link](current_node.value)

if current_node.left:
[Link](current_node.left)
if current_node.right:
[Link](current_node.right)

return result

# Example usage:
# Construct a simple binary tree
# 1
# /\
# 2 3
# /\
# 4 5
root = TreeNode(1)
[Link] = TreeNode(2)
[Link] = TreeNode(3)
[Link] = TreeNode(4)
[Link] = TreeNode(5)

result_bfs = breadth_first_traversal(root)
print(result_bfs) # Output: [1, 2, 3, 4, 5]
In this example:
 We use a deque (double-ended queue) to implement the queue for breadth-first traversal.
 The breadth_first_traversal function iterates through the tree level by level, using the queue to keep track of nodes to
process.
 The result list stores the values of the visited nodes.
It's important to note that not all recursive algorithms can be easily adapted to use queues. This example is specific to
problems where a breadth-first traversal is applicable. Recursive algorithms often use a stack-based approach, where the
order of processing follows the depth-first strategy.
For more general recursion, a stack-based approach is typically more suitable. If you encounter a problem that requires
recursion and involves a queue, it's essential to carefully design the algorithm to fit the specific problem requirements.
ADT.
ADT stands for Abstract Data Type, which is a high-level description of a set of operations that can be performed on a
data structure, along with the properties of those operations. It emphasizes what operations can be performed on a data
structure and not how those operations are implemented. ADTs are important in computer science and programming as
they provide a way to abstract and encapsulate data structures, allowing for modularity and reusability.
Here are key concepts related to Abstract Data Types (ADTs):
1. Abstraction:
 ADTs abstract the underlying details of a data structure, focusing on the operations that can be performed and
the properties of those operations.
 This abstraction allows users to work with data structures at a higher level, without needing to know the internal
implementation details.
2. Encapsulation:
 Encapsulation is the bundling of data and the operations that can be performed on that data into a single unit,
often referred to as an object.
 It helps hide the internal details of the data structure and provides a clear interface for interacting with it.
3. Operations:
 ADTs define a set of operations that can be performed on the data structure. These operations include creating,
accessing, modifying, and deleting elements.
 Examples of operations for common ADTs:
 Stack ADT: push, pop, isEmpty
 Queue ADT: enqueue, dequeue, isEmpty
 List ADT: insert, delete, retrieve
4. Data Structure Independence:
 ADTs are independent of the specific data structures that can implement them.
 For example, a stack ADT can be implemented using an array or a linked list, but the user interacts with it using
the same set of operations defined by the ADT.
5. Formal Specification:
 ADTs are often formally specified through contracts that define the behavior of each operation. These contracts
specify preconditions, postconditions, and invariants.
 Precondition: Conditions that must be true before an operation is executed.
 Postcondition: Conditions that must be true after an operation completes.
 Invariant: A property that remains true before and after each operation.
6. Modularity and Reusability:
 ADTs promote modularity by encapsulating the functionality of a data structure into a self-contained unit. This
enhances code organization and maintainability.
 The same ADT interface can be reused across different applications or projects, allowing for code reuse.
7. Examples of ADTs:
 Stack: Represents a last-in, first-out (LIFO) data structure.
 Queue: Represents a first-in, first-out (FIFO) data structure.
 List: Represents a collection of elements with operations for insertion, deletion, and retrieval.
 Set: Represents a collection of unique elements without duplicates.
 Map (Dictionary): Represents a collection of key-value pairs.
8. Relationship with Data Structures:
 ADTs provide a blueprint for the behavior of a data structure, while data structures define how that behavior is
implemented.
 Different data structures (arrays, linked lists, trees, etc.) can implement the same ADT.
In summary, ADTs provide a conceptual framework for designing and understanding data structures, offering a way to
describe the behavior of data structures independently of their implementations. They contribute to code organization,
modularity, and reusability in software development.

UNIT– II HIERARCHICAL DATA STRUCTURES

Binary Search Trees: Basics – Querying a Binary search tree – Insertion and Deletion-Application to Splay Trees.
External Memory ADT - B-Trees. Applications to Shortest PathAlgorithms .Basic operations on B-Trees. Red Black
trees: Properties of Red-Black Trees –Rotations – Insertion – Deletion-Priority Queues and Their Extensions: Heap-
Binomial heaps,Fibonacci heaps. String Matching algorithms

BINARY SEARCH TREES: BASICS


Binary Search Trees (BSTs) are a type of binary tree data structure that maintains a specific ordering property, making it
efficient for searching, insertion, and deletion of elements. In a Binary Search Tree, each node has at most two children,
with the left child containing elements less than the node's value and the right child containing elements greater than the
node's value.
Here are the basic concepts of Binary Search Trees:
1. Binary Tree Structure:
 A Binary Search Tree is a binary tree, meaning each node has at most two children: a left child and a right child.
 For each node in the tree, all elements in its left subtree are less than its value, and all elements in its right
subtree are greater than its value.
2. Search Operation:
 The key feature of BSTs is their efficient search operation.
 To search for a specific element in a BST, compare the target value with the value of the current node.
 If the target value is smaller, move to the left subtree; if it is greater, move to the right subtree.
 Repeat the process until the target is found or the subtree becomes empty.
3. Insertion Operation:
 To insert an element into a BST, perform a search operation to find the appropriate location for the new node.
 Once the location is found (either as a leaf or as the child of a node with an empty subtree), insert the new node
with the given value.
4. Deletion Operation:
 Deleting a node from a BST involves three cases:
 If the node to be deleted is a leaf, simply remove it.
 If the node to be deleted has one child, replace it with its child.
 If the node to be deleted has two children, find the node's in-order successor (or predecessor), replace
the node's value with the successor's value, and recursively delete the successor.
5. In-Order Traversal:
 In-order traversal of a BST visits all nodes in ascending order.
 For each node, recursively visit its left subtree, process the node's value, and then recursively visit its right
subtree.
6. BST Property:
 The Binary Search Tree property ensures that for every node 'N':
 All nodes in its left subtree have values less than 'N'.
 All nodes in its right subtree have values greater than 'N'.
7. Time Complexity:
 The time complexity of search, insertion, and deletion operations in a well-balanced BST is O(log n), where 'n'
is the number of nodes.
 However, in the worst case (skewed tree), the time complexity can be O(n).
8. Balanced vs. Unbalanced Trees:
 A balanced BST has approximately equal numbers of nodes in its left and right subtrees, leading to efficient
operations.
 An unbalanced BST, where one subtree is much larger than the other, can lead to worst-case time complexities.
9. Self-Balancing BSTs:
 To address the issue of unbalanced trees, self-balancing BSTs like AVL trees or Red-Black trees automatically
maintain balance during insertions and deletions.
10. Applications:
 BSTs are used in various applications, including databases, symbol tables, and file systems, where efficient
search, insertion, and deletion operations are critical.
Here's a simple example of a Python implementation of a Binary Search Tree:
class TreeNode:
def __init__(self, key):
[Link] = key
[Link] = None
[Link] = None

def insert(root, key):


if root is None:
return TreeNode(key)
if key < [Link]:
[Link] = insert([Link], key)
elif key > [Link]:
[Link] = insert([Link], key)
return root

def in_order_traversal(root):
if root:
in_order_traversal([Link])
print([Link], end=" ")
in_order_traversal([Link])

# Example usage:
root = None
keys = [50, 30, 20, 40, 70, 60, 80]

for key in keys:


root = insert(root, key)

print("In-order traversal:")
in_order_traversal(root)
This example demonstrates the basic structure of a Binary Search Tree with insertion and in-order traversal operations.
The in-order traversal output should display the keys in ascending order: 20, 30, 40, 50, 60, 70, 80.
Querying a Binary Search Tree (BST) involves searching for a specific element in the tree. The search operation takes
advantage of the binary search property of the tree, where elements in the left subtree are smaller than the current node,
and elements in the right subtree are larger.
Here's how you can query (search) for a specific element in a Binary Search Tree using Python:
class TreeNode:
def __init__(self, key):
[Link] = key
[Link] = None
[Link] = None

def search(root, key):


# Base cases: root is None or the key is present at the root
if root is None or [Link] == key:
return root
# Key is smaller than the root's key, search in the left subtree
if key < [Link]:
return search([Link], key)

# Key is larger than the root's key, search in the right subtree
return search([Link], key)

# Example usage:
# Construct a Binary Search Tree
# 50
# / \
# 30 70
# /\ / \
# 20 40 60 80
root = TreeNode(50)
[Link] = TreeNode(30)
[Link] = TreeNode(70)
[Link] = TreeNode(20)
[Link] = TreeNode(40)
[Link] = TreeNode(60)
[Link] = TreeNode(80)

# Query for a specific key (e.g., 40)


key_to_search = 40
result_node = search(root, key_to_search)

if result_node:
print(f"Key {key_to_search} found in the BST.")
else:
print(f"Key {key_to_search} not found in the BST.")
In this example:
 The search function takes the root of the BST and the key to be searched.
 It recursively compares the key with the current node's key, navigating to the left or right subtree accordingly.
 If the key is found, the function returns the corresponding node; otherwise, it returns None if the key is not present in the
BST.
Keep in mind that this search operation has a time complexity of O(log n) on average in a balanced BST. However, in the
worst case (e.g., a skewed tree), the time complexity can be O(n). To ensure efficient searching, it's essential to maintain
the balance of the BST or consider using self-balancing BST implementations like AVL trees or Red-Black trees.

INSERTION AND DELETION ON BINARY SEARCH TREES


Insertion and deletion operations on Binary Search Trees (BSTs) involve maintaining the ordering property of the tree
while adding or removing elements. Here's how you can perform insertion and deletion on a BST using Python:
class TreeNode:
def __init__(self, key):
[Link] = key
[Link] = None
[Link] = None

def insert(root, key):


if root is None:
return TreeNode(key)

if key < [Link]:


[Link] = insert([Link], key)
elif key > [Link]:
[Link] = insert([Link], key)

return root

# Example usage:
# Construct an empty Binary Search Tree
root = None

# Keys to be inserted
keys_to_insert = [50, 30, 20, 40, 70, 60, 80]

# Insert keys into the BST


for key in keys_to_insert:
root = insert(root, key)
Deletion Operation:
def find_min(node):
current = node
while [Link] is not None:
current = [Link]
return current

def delete(root, key):


if root is None:
return root

if key < [Link]:


[Link] = delete([Link], key)
elif key > [Link]:
[Link] = delete([Link], key)
else:
# Node with only one child or no child
if [Link] is None:
return [Link]
elif [Link] is None:
return [Link]

# Node with two children: Get the in-order successor (smallest


# in the right subtree) or in-order predecessor (largest in the left subtree)
[Link] = find_min([Link]).key

# Delete the in-order successor/predecessor


[Link] = delete([Link], [Link])

return root

# Example usage:
# Deleting a key (e.g., 30) from the BST
key_to_delete = 30
root = delete(root, key_to_delete)
In these examples:
 The insert function recursively inserts a new key into the BST while maintaining the ordering property.
 The delete function removes a key from the BST while preserving the ordering property. It handles cases where the node
to be deleted has no children, one child, or two children.
It's important to note that maintaining the balance of the BST is crucial to ensure efficient search, insertion, and deletion
operations. In practice, self-balancing BSTs like AVL trees or Red-Black trees are often used to automatically balance
the tree during these operations.
APPLICATION TO SPLAY TREES
1. Search:
 Perform a standard binary search to find the target node.
 Splay the found node to the root.
2. Insertion:
 Insert the new node as in a standard binary search tree.
 Splay the newly inserted node to the root.
3. Deletion:
 Delete the target node as in a standard binary search tree.
 Splay the parent of the deleted node to the root.
Applications to Splay Trees:
1. Caching:
 Splay Trees are used in caching scenarios where frequently accessed elements are brought closer to the root for
faster access.
2. Data Compression:
 Splay Trees are employed in certain data compression algorithms where frequently accessed symbols are kept
closer to the root to improve compression efficiency.
3. Dynamic Optimality:
 Splay Trees are related to the study of dynamic optimality in binary search trees, aiming to achieve good
performance for a sequence of access operations.
4. Online Algorithm:
 Splay Trees are considered online algorithms because they adapt dynamically to the access pattern of the
elements during runtime.
Python Implementation (Basic Splay Tree Structure):
class SplayTreeNode:
def __init__(self, key):
[Link] = key
[Link] = None
[Link] = None

class SplayTree:
def __init__(self):
[Link] = None

# Splaying operation to bring the accessed node to the root


def splay(self, key):
# Implement splaying logic here

def search(self, key):


# Perform standard binary search
# Splay the accessed node to the root

def insert(self, key):


# Perform standard binary search tree insertion
# Splay the newly inserted node to the root

def delete(self, key):


# Perform standard binary search tree deletion
# Splay the parent of the deleted node to the root
In the SplayTree class, the splay, search, insert, and delete methods need to be implemented to perform the respective
operations while maintaining the splay tree property. The exact splaying logic depends on the specific configuration of
the nodes and whether it involves zig-zig, zig-zag, or zig operations.
Splay Trees offer flexibility and efficiency for certain dynamic access patterns, and their simplicity makes them suitable
for various applications where frequent access patterns change over time.
EXTERNAL MEMORY ADT
External Memory (EM) refers to the scenario where data is too large to fit entirely in the main memory (RAM) of a
computer and must be stored in external storage, such as disk drives. External Memory Algorithms and Abstract Data
Types (ADTs) are designed to efficiently process and manage data stored in external memory.
Here are key considerations and aspects related to External Memory ADT:
1. Block Transfer Model:
 External Memory algorithms often operate under the Block Transfer Model, where data is read or written in
fixed-size blocks (pages) between internal and external memory.
 The goal is to minimize the number of block transfers between internal and external memory, as these transfers
are much more costly than internal computations.
2. External Memory Arrays:
 An External Memory Array is an ADT that extends the concept of an array to external memory.
 It is divided into blocks, and each block is stored in external storage.
3. Sorting in External Memory:
 External Memory Sorting algorithms efficiently sort large datasets that don't fit entirely in internal memory.
 Popular external memory sorting algorithms include External Merge Sort and Polyphase Merge Sort.
4. B-trees and External Memory Search Structures:
 B-trees are often used as external memory search structures to provide efficient search, insertion, and deletion
operations.
 External Memory B-trees are designed to minimize the number of block transfers during these operations.
5. Priority Queues:
 External Memory Priority Queues are data structures that efficiently maintain a set of elements with priorities.
 They are used in applications such as task scheduling and event-driven simulations.
6. Graph Algorithms in External Memory:
 External Memory Graph Algorithms deal with large graphs that don't fit entirely in internal memory.
 Algorithms for tasks like breadth-first search and minimum spanning tree construction are adapted to external
memory scenarios.
7. Cache-Oblivious Algorithms:
 Cache-Oblivious Algorithms are designed to perform well in multiple levels of memory hierarchy, including
external memory.
 They do not rely on knowledge of the block size and are efficient across different memory configurations.
8. I/O-Efficient Algorithms:
 I/O-Efficient Algorithms aim to minimize the number of I/O operations between internal and external memory.
 They are crucial for optimizing data-intensive applications, such as databases and large-scale data processing.
9. External Memory Hashing:
 External Memory Hashing techniques extend hashing to external memory scenarios.
 They are used to efficiently support operations like search, insert, and delete on large datasets.
10. Streaming Algorithms:
 Streaming algorithms are designed to process data in a single pass, making them suitable for scenarios with
continuous data streams.
 External Memory Streaming Algorithms efficiently process data that arrives sequentially.
The design of External Memory ADTs and algorithms is motivated by the need to efficiently manage and process data
that exceeds the capacity of internal memory. These techniques play a crucial role in handling large-scale datasets in
applications such as database systems, geographic information systems, and data-intensive scientific computing
B-TREES
B-trees (Balanced Trees) are a type of self-balancing search tree data structure that maintains sorted data and allows
searches, insertions, deletions, and sequential access in logarithmic time. B-trees are particularly well-suited for use in
external memory storage systems, where data is stored on disk or other external storage devices. The balanced nature of
B-trees ensures that the height of the tree remains logarithmic, leading to efficient operations.
Here are the key features and concepts related to B-trees:
1. Definition:
 A B-tree is a self-balancing tree data structure in which each node can have multiple children, typically denoted
by the parameter �B.
 A B-tree of order �B is characterized by the following properties:
 Each internal node has at most 2�−12B−1 keys and at least �−1B−1 keys.
 Each internal node with �k keys has �+1k+1 children.
 All leaves of the tree are at the same level.
2. Balancing Property:
 The balancing property of B-trees ensures that all paths from the root to the leaves have the same length,
resulting in a balanced tree.
 Balancing is maintained through split and merge operations during insertions and deletions.
3. Search Operation:
 Searching in a B-tree is similar to searching in a binary search tree. Starting from the root, key comparisons
determine the path to follow.
 B-trees are designed to minimize the number of disk accesses, making them suitable for external memory
storage systems.
4. Insertion Operation:
 To insert a key into a B-tree, the tree is traversed to find the appropriate leaf node.
 If the leaf has room for the new key, it is inserted. Otherwise, the leaf is split, and the median key is moved up to
the parent node.
5. Deletion Operation:
 Deleting a key involves searching for the key and removing it from the tree.
 If the key is in an internal node, it is replaced by its in-order successor (or predecessor), and the successor (or
predecessor) is recursively deleted from its leaf position.
6. Split and Merge Operations:
 Split operations are performed during insertions when a node becomes full. The node is split into two, and the
median key is moved up to the parent.
 Merge operations are performed during deletions when a node becomes less than half-full. The node is merged
with a neighboring sibling, and a key from the parent is moved down.
7. Bulk Loading:
 B-trees support efficient bulk loading, where a sorted sequence of keys is inserted into an empty tree. This
operation avoids the need for repeated splits.
8. Concurrency and Locking:
 B-trees can be efficiently implemented in a concurrent environment with proper locking mechanisms.
 Concurrent B-trees support multiple operations simultaneously without compromising data integrity.
9. Applications:
 B-trees are widely used in databases, file systems, and other storage systems where data needs to be efficiently
organized and searched.
 They are particularly well-suited for scenarios where data is stored on disk or other external storage devices.
10. Variants:
 There are variations of B-trees, including B+ trees (commonly used in databases), B* trees, and others, each
with specific modifications to the original B-tree design.
Here's a simplified example of a B-tree in Python (for educational purposes only):
class BTreeNode:
def __init__(self, leaf=True):
[Link] = []
[Link] = []
[Link] = leaf

class BTree:
def __init__(self, order):
[Link] = BTreeNode()
[Link] = order
def search(self, key, node=None):
# Search for a key in the B-tree
# Implement search logic here

def insert(self, key):


# Insert a key into the B-tree
# Implement insertion logic here

def delete(self, key):


# Delete a key from the B-tree
# Implement deletion logic here
In this example, the BTree class represents a B-tree, and the BTreeNode class represents a node in the B-tree. The actual
implementation of search, insertion, and deletion operations would involve navigating the tree, splitting and merging
nodes as necessary to maintain the balancing property of the B-tree. The order of the B-tree (�B) determines the
maximum number of keys a node can hold.
APPLICATIONS TO SHORTEST PATHALGORITHMS
Shortest path algorithms are used to find the most efficient path between two points in a graph, where the efficiency is
measured by the sum of the weights assigned to the edges. B-trees, however, are typically associated with efficient
searching, insertion, and deletion operations in data structures. The two concepts are not directly related, and B-trees are
not typically used for solving shortest path problems.
Shortest path algorithms, on the other hand, can be categorized into two main types: single-source shortest path and all-
pairs shortest path.
1. Single-Source Shortest Path Algorithms:
 Dijkstra's Algorithm: This algorithm finds the shortest path from a single source vertex to all other vertices in
a weighted graph with non-negative edge weights. It uses a priority queue to efficiently select the vertex with the
smallest tentative distance.
 Bellman-Ford Algorithm: This algorithm also finds the shortest paths from a single source vertex to all other
vertices in a weighted graph, even when there are negative edge weights. It can detect negative weight cycles.
2. All-Pairs Shortest Path Algorithms:
 Floyd-Warshall Algorithm: This algorithm finds the shortest paths between all pairs of vertices in a weighted
graph. It works for graphs with both positive and negative edge weights but is less efficient than single-source
algorithms for sparse graphs.
 Johnson's Algorithm: This algorithm combines Dijkstra's algorithm with Bellman-Ford to efficiently find all
pairs shortest paths in the presence of negative edge weights.
These algorithms are widely used in various applications, including:
 Routing in Networks: Shortest path algorithms are used in computer networks and the internet to find the most efficient
route for data packets.
 Transportation and Logistics: Shortest path algorithms are employed in determining optimal routes for delivery trucks,
vehicles, or logistics planning.
 GPS and Navigation Systems: Navigation systems use shortest path algorithms to find the quickest route between two
locations.
 Game Development: Shortest path algorithms are used in video game development for character or enemy movement.
 Network Design: Shortest path algorithms are crucial in designing efficient communication networks.
While B-trees and shortest path algorithms serve different purposes, they are both essential tools in computer science and
are applied in a variety of fields to solve different types of problems related to data organization and graph traversal
Basic operations on B-Trees
B-Trees support various basic operations that ensure efficient searching, insertion, and deletion of keys while maintaining
the balanced property of the tree. The balancing property of B-Trees is crucial for keeping the tree height logarithmic
and, consequently, optimizing performance. Here are the basic operations on B-Trees:
1. Search Operation:
 To search for a key in a B-Tree, start from the root and recursively navigate down the tree, following the
appropriate child based on key comparisons.
 If the key is present in a node, the search is successful.
 If the key is not in a node, move to the appropriate child based on key comparisons and continue the search.
2. Insertion Operation:
 To insert a key into a B-Tree, start by searching for the appropriate leaf node where the key should be inserted.
 If the leaf has space for the key, insert it into the node in a sorted order.
 If the leaf is full, split the leaf into two and promote the middle key to the parent node. This may cause a
cascading split up to the root.
3. Deletion Operation:
 To delete a key from a B-Tree, start by searching for the key in the tree.
 If the key is in an internal node, replace it with its in-order predecessor or successor and recursively delete the
predecessor or successor from its leaf position.
 If the key is in a leaf node, remove it from the node. If the node becomes less than half-full, consider borrowing
from siblings or merging with a sibling.
4. Split Operation:
 Splitting is a fundamental operation in B-Trees that occurs during insertions when a node becomes full.
 When a node is split, the middle key is promoted to the parent node, and the remaining keys are distributed
between the two new child nodes.
5. Merge Operation:
 Merging is another fundamental operation in B-Trees that occurs during deletions when a node becomes less
than half-full.
 When a node is merged with a sibling, a key from the parent node is moved down, and the two nodes are
combined into a single node.
6. Bulk Loading:
 B-Trees support efficient bulk loading, especially when inserting a sorted sequence of keys into an empty tree.
This operation avoids repeated splits and results in a balanced tree.
7. Range Queries:
 B-Trees support efficient range queries, where a range of keys is selected. The tree can be traversed to
efficiently find all keys within the specified range.
8. Search for Successor/Predecessor:
 B-Trees support finding the in-order successor or predecessor of a given key efficiently. This is useful during
deletion operations.
9. Concurrency and Locking:
 B-Trees can be implemented with support for concurrent operations in a multi-threaded or multi-user
environment. Proper locking mechanisms are used to maintain data integrity.
10. Rebalancing:
 Rebalancing is performed during insertions and deletions to maintain the balanced property of the B-Tree. It
involves split and merge operations as needed.
The efficiency of B-Trees makes them suitable for use in various applications, including databases, file systems, and
external storage systems, where they provide fast and reliable access to large datasets.
RED BLACK TREES
Red-Black Trees are a type of self-balancing binary search tree data structure. They maintain balance during insertions
and deletions, ensuring that the height of the tree remains logarithmic. Red-Black Trees are widely used in various
applications, particularly in implementing associative containers like sets and maps in many programming languages and
libraries.
Here are the key features and properties of Red-Black Trees:
1. Binary Search Tree Property:
 Like other binary search trees, Red-Black Trees follow the binary search property: for each node, all nodes in its
left subtree have values less than the node's value, and all nodes in its right subtree have values greater than the
node's value.
2. Coloring:
 Each node in a Red-Black Tree is associated with a color, either red or black.
 The color of a node is used to maintain balance during insertions and deletions.
3. Properties of Red-Black Trees:
 Root Property: The root is black.
 Red Property: Red nodes have only black children.
 Black Height Property: Every path from a node to its descendant leaves has the same number of black nodes.
This ensures that the tree is balanced.
4. Balancing Operations:
 Red-Black Trees use rotations and color flips to maintain balance during insertions and deletions.
 Rotations include left rotations, right rotations, and combinations of both to adjust the tree structure.
 Color flips involve changing the colors of nodes to maintain the red-black properties.
5. Insertion Operation:
 When a new node is inserted into the tree, it is initially colored red.
 After insertion, the tree may violate the red-black properties, and balancing operations are performed to restore
balance.
6. Deletion Operation:
 When a node is deleted from the tree, it is replaced by its in-order successor (or predecessor), and then the tree
may be adjusted to maintain the red-black properties.
 If the replacement node is black, additional adjustments may be needed.
7. Search Operation:
 The search operation in a Red-Black Tree is the same as in a regular binary search tree. It follows the binary
search property to navigate the tree and find the desired node.
8. Time Complexity:
 The time complexity of search, insertion, and deletion operations in a well-balanced Red-Black Tree is O(log n),
where 'n' is the number of nodes in the tree.
 The balancing operations ensure that the height of the tree remains logarithmic.
9. Applications:
 Red-Black Trees are commonly used as the underlying data structure for associative containers, such as sets and
maps, in programming languages like C++ (std::set, std::map) and Java (TreeSet, TreeMap).
 They are also used in various database systems and file systems for efficient indexing.
10. Variants:
 There are variations of Red-Black Trees, including AVL trees, which are another type of self-balancing binary
search tree with stricter balance conditions.
Here's a simplified example of a Red-Black Tree in Python (for educational purposes only):
class Node:
def __init__(self, key, color='red'):
[Link] = key
[Link] = color
[Link] = None
[Link] = None
[Link] = None

class RedBlackTree:
def __init__(self):
[Link] = Node(None, color='black') # NIL sentinel node
[Link] = [Link]

# Insertion and balancing methods go here

# Actual implementation of insertion and balancing is more involved.


In a complete implementation, you would have methods to perform insertion and the associated balancing operations to
maintain the red-black properties. The NIL node acts as a sentinel, simplifying boundary conditions in the code.
PROPERTIES OF RED-BLACK TREES
Red-Black Trees are a type of self-balancing binary search tree with specific properties that ensure balanced structure and
efficient search, insertion, and deletion operations. The key properties of Red-Black Trees are:
1. Node Color:
 Each node in a Red-Black Tree is colored either red or black.
 The color is used to maintain balance during tree modifications.
2. Root Property:
 The root of the tree is always black.
 This property ensures that the black height (the number of black nodes on any path from the root to a leaf) is
consistent across all paths.
3. Red Property:
 Red nodes cannot have red children.
 This property prevents consecutive red nodes along any path, maintaining balance and preventing the tree from
becoming too skewed.
4. Black Height Property:
 Every path from a node to its descendant leaves has the same number of black nodes.
 This ensures that the tree is balanced and the height remains logarithmic.
5. Leaf Nodes (NIL Nodes):
 The leaf nodes (external nodes) are considered black. These nodes are often represented by a single NIL node.
 The NIL nodes simplify the implementation and help in handling boundary cases.
6. Color of NIL Nodes:
 The color of NIL nodes is considered black. This ensures that the black height is consistent for all paths,
including those leading to NIL nodes.
7. Red-Black Properties after Insertion:
 When a new node is inserted, it is initially colored red.
 If the parent of the new node is also red, additional adjustments are made to maintain the red-black properties.
8. Balancing Operations:
 Balancing operations include rotations (left and right) and color flips to restore the red-black properties after
insertions and deletions.
 These operations ensure that the tree remains balanced and the black height is consistent.
9. Red-Black Properties after Deletion:
 When a node is deleted, it is replaced by its in-order successor (or predecessor).
 If the replacement node is black, the tree may violate the black height property, and additional adjustments are
made.
10. Time Complexity:
 The time complexity of search, insertion, and deletion operations in a well-balanced Red-Black Tree is O(log n),
where 'n' is the number of nodes in the tree.
 The balancing operations ensure that the height of the tree remains logarithmic.
11. Applications:
 Red-Black Trees are commonly used in implementations of associative containers, such as sets and maps, in
programming languages and libraries.
 They are used in various database systems and file systems for efficient indexing.
Red-Black Trees strike a balance between maintaining a relatively balanced structure and minimizing the number of
rotations and adjustments required during modifications. These properties make them well-suited for applications where
efficient search and modification operations are critical.
ROTATIONS – INSERTION – DELETION OF RED BLACK TREES
Red-Black Trees are a type of self-balancing binary search tree with specific properties that ensure balanced structure and
efficient search, insertion, and deletion operations. The key properties of Red-Black Trees are:
1. Node Color:
 Each node in a Red-Black Tree is colored either red or black.
 The color is used to maintain balance during tree modifications.
2. Root Property:
 The root of the tree is always black.
 This property ensures that the black height (the number of black nodes on any path from the root to a leaf) is
consistent across all paths.
3. Red Property:
 Red nodes cannot have red children.
 This property prevents consecutive red nodes along any path, maintaining balance and preventing the tree from
becoming too skewed.
4. Black Height Property:
 Every path from a node to its descendant leaves has the same number of black nodes.
 This ensures that the tree is balanced and the height remains logarithmic.
5. Leaf Nodes (NIL Nodes):
 The leaf nodes (external nodes) are considered black. These nodes are often represented by a single NIL node.
 The NIL nodes simplify the implementation and help in handling boundary cases.
6. Color of NIL Nodes:
 The color of NIL nodes is considered black. This ensures that the black height is consistent for all paths,
including those leading to NIL nodes.
7. Red-Black Properties after Insertion:
 When a new node is inserted, it is initially colored red.
 If the parent of the new node is also red, additional adjustments are made to maintain the red-black properties.
8. Balancing Operations:
 Balancing operations include rotations (left and right) and color flips to restore the red-black properties after
insertions and deletions.
 These operations ensure that the tree remains balanced and the black height is consistent.
9. Red-Black Properties after Deletion:
 When a node is deleted, it is replaced by its in-order successor (or predecessor).
 If the replacement node is black, the tree may violate the black height property, and additional adjustments are
made.
10. Time Complexity:
 The time complexity of search, insertion, and deletion operations in a well-balanced Red-Black Tree is O(log n),
where 'n' is the number of nodes in the tree.
 The balancing operations ensure that the height of the tree remains logarithmic.
11. Applications:
 Red-Black Trees are commonly used in implementations of associative containers, such as sets and maps, in
programming languages and libraries.
 They are used in various database systems and file systems for efficient indexing.
Red-Black Trees strike a balance between maintaining a relatively balanced structure and minimizing the number of
rotations and adjustments required during modifications. These properties make them well-suited for applications where
efficient search and modification operations are critical.

ROTATIONS – INSERTION – DELETION OF RED BLACK TREES


Rotations are fundamental operations in Red-Black Trees that are used to maintain the balance of the tree during
insertions and deletions. There are two types of rotations: left rotations and right rotations. These rotations, combined
with color changes, ensure that the Red-Black Tree properties are preserved. Below, I'll provide an overview of rotations
and demonstrate their application during insertion and deletion.
Rotations:
Left Rotation:
A left rotation is applied to a node and its right child. It reorganizes the nodes to maintain the binary search tree property
and the Red-Black Tree properties.
A B
\ /\
B ===> A C
\
C
Right Rotation:
A right rotation is applied to a node and its left child. It reorganizes the nodes to maintain the binary search tree property
and the Red-Black Tree properties.
C B
/ /\
B ===> A C
/
AInsertion:
When a new node is inserted into the Red-Black Tree, it is initially colored red. After insertion, the tree may violate the
Red-Black Tree properties, and rotations are performed to restore balance. The following steps are taken after inserting a
new red node:
1. Color the New Node Red: After insertion, the new node is colored red.
2. Fix Violations:
 If the parent of the new red node is black, no violations occur, and the tree remains balanced.
 If the parent of the new red node is red, additional adjustments are made to restore balance.
 Possible cases:
 Case 1: The uncle of the new node is red.
 Case 2: The uncle of the new node is black, and the new node is the right child of its parent.
 Case 3: The uncle of the new node is black, and the new node is the left child of its parent.
3. Apply Rotations:
 Depending on the cases, rotations (left or right) and color flips are applied to restore balance.
Deletion:
When a node is deleted from the Red-Black Tree, it is replaced by its in-order successor (or predecessor). After deletion,
the tree may violate the Red-Black Tree properties, and rotations are performed to restore balance. The following steps
are taken after deleting a node:
1. Fix Double Black Violation:
 If the replacement node is black, it creates a "double black" violation.
 The double black violation is propagated up the tree, and adjustments are made to resolve it.
 Possible cases:
 Case 1: Sibling is red.
 Case 2: Sibling is black, and both nephews are black.
 Case 3: Sibling is black, and the nearer nephew is red.
 Case 4: Sibling is black, and the farther nephew is red.
2. Apply Rotations:
 Depending on the cases, rotations (left or right) and color flips are applied to restore balance.
It's important to note that the details of these operations can be intricate, and their implementation involves handling
various cases and edge conditions. The exact adjustments depend on the specific scenario encountered during insertion or
deletion.
For a complete and detailed understanding of these operations, it's recommended to refer to textbooks or online resources
that provide step-by-step explanations and pseudocode for Red-Black Tree operations.
PRIORITY QUEUE AND THEIR EXTENSION
A priority queue is a data structure that stores elements with associated priorities and allows for efficient retrieval of the
element with the highest (or lowest) priority. It supports two main operations: inserting an element with a priority, and
removing the element with the highest (or lowest) priority.
There are several variations and extensions of priority queues to address specific needs and use cases. Here are some of
them:
1. Min-Max Heap:
o A regular binary heap supports efficiently retrieving the minimum element. A min-max heap, on the
other hand, allows for efficient retrieval of both the minimum and maximum elements. It is a binary
heap where each level alternates between being a min-level and a max-level.
2. Fibonacci Heap:
o Fibonacci Heap is a more advanced data structure that supports a decrease-key operation in constant
time. This can be beneficial for certain algorithms like Dijkstra's algorithm and Prim's algorithm.
However, Fibonacci Heaps have more complex implementations compared to binary heaps.
3. Binomial Heap:
o Binomial Heaps are another type of heap that allows for efficient merging of two heaps in constant
time. They are composed of a collection of binomial trees, each satisfying the heap property.
4. Pairing Heap:
o Pairing Heaps are a type of heap that allows more flexibility in terms of the structure of the heap. They
support operations like melding (combining two heaps), which can be done efficiently.
5. Adaptable Priority Queue:
o An adaptable priority queue allows for the efficient decrease or increase of the priority of an element.
This is useful in applications where priorities may change dynamically.
6. Double-Ended Priority Queue:
o Also known as a dequeue or a max-min heap, a double-ended priority queue supports efficient removal
of both the maximum and minimum elements. This can be useful in scenarios where you need quick
access to both extreme values.
7. Multi-level Priority Queue:
o In a multi-level priority queue, elements are assigned to different priority levels, and each level can
have its own priority queue. This structure is helpful when there is a need to prioritize elements within
different categories.
8. Parallel Priority Queue:
o Designed for parallel computing environments, a parallel priority queue allows multiple processors to
operate on the priority queue concurrently. This can improve performance in parallel or distributed
systems.
When choosing a priority queue or its extension, the selection depends on the specific requirements of the application, the
types of operations needed, and the efficiency requirements for those operations. Each type has its own advantages and
trade-offs, so the choice depends on the specific context in which it will be used.
HEAP- BINOMIAL HEAPS
A binomial heap is a specific type of heap data structure that extends the concept of a binary heap. It consists of a
collection of binomial trees, where each tree is defined recursively as follows:
1. A binomial tree of order 0 is a single node.
2. A binomial tree of order k can be formed by taking two binomial trees of order k-1 and making one of them the
leftmost child of the other.
The order of a binomial tree is the number of nodes in the tree. In a binomial heap, there can be at most one binomial tree
of each order. Therefore, a binomial heap can be represented as a collection of binomial trees, each corresponding to a
different order.
The key properties of binomial heaps are as follows:
1. Shape Property:
o The binomial heap satisfies the binomial tree properties, and the order of the binomial trees in the heap
is distinct. That is, there is at most one binomial tree of order k for every k.
2. Heap Property:
o Each binomial tree in the heap satisfies the heap property. For a min-heap, the key of each node is less
than or equal to the key of its parent.
Operations on Binomial Heaps:
1. Union (or Merge):
o Merging two binomial heaps of the same type involves combining the heaps by merging their binomial
trees of the same order. This operation is crucial for maintaining the heap properties efficiently.
2. Insert:
o To insert a new element into a binomial heap, a new binomial heap is created with a single-node
binomial tree of order 0, representing the new element. The resulting heap is then merged with the
original binomial heap.
3. Extract-Min (or Delete-Min):
o Extracting the minimum element from a binomial heap involves finding the binomial tree with the
minimum root and removing that tree. The remaining trees are then combined into a new binomial
heap, and the overall heap is updated.
4. Decrease Key:
o This operation involves decreasing the key of a specific node in the binomial heap. After the key is
decreased, the heap structure may be violated, so the heap is rearranged to restore the binomial heap
properties.
Advantages of Binomial Heaps:
1. Efficient Union Operation:
o The union (merge) operation in binomial heaps is particularly efficient, taking O(log n) time, where n is
the total number of elements in the combined heaps.
2. Supports Efficient Decrease Key:
o Binomial heaps support efficient decrease-key operations, which is crucial for algorithms like Dijkstra's
algorithm and Prim's algorithm.
3. Amortized Time Complexity:
o The amortized time complexity of most operations in a binomial heap is favorable, making it suitable
for dynamic data structures.
4. Parallelism:
o Binomial heaps can be efficiently parallelized, which makes them suitable for parallel computing
environments.
While binomial heaps have several advantages, their main drawback is that they can require more memory compared to
other types of heaps due to the structure of binomial trees. In practice, the choice of heap depends on the specific
requirements of the application and the types of operations that are most frequently performed.

FIBONACCI HEAPS
Fibonacci Heap is a more advanced and flexible type of heap data structure that was introduced by Michael L. Fredman
and Robert E. Tarjan in 1984. It is an extension of the binomial heap and is known for its efficient support of certain
operations, especially the decrease key operation. Fibonacci Heaps have amortized time complexities that can be more
favorable than other types of heaps in certain scenarios.
Here are the key features and operations of Fibonacci Heaps:
Features:
1. Dynamic Structure:
o Fibonacci Heaps have a dynamic structure that allows for efficient melding (combining) of heaps,
making them suitable for algorithms where heaps need to be merged frequently.
2. Lazy Merging:
o Unlike other heap structures, Fibonacci Heaps use a lazy approach to merging during the union
operation. This means that the actual merging of two Fibonacci Heaps is deferred until it becomes
necessary, leading to more efficient operations.
3. Decrease Key Operation:
o The decrease key operation in Fibonacci Heaps is particularly efficient, taking constant amortized time.
This makes Fibonacci Heaps well-suited for algorithms like Dijkstra's shortest path and Prim's
minimum spanning tree algorithms, where decrease key operations are common.
4. Potential Method:
o Fibonacci Heaps use the potential method to amortize the cost of operations. The potential method
involves assigning a potential to the heap and using it to spread the cost of expensive operations across
a sequence of operations, resulting in an amortized constant time for certain operations.
Operations:
1. Insert:
o To insert a new element into a Fibonacci Heap, a new node is created with the given value, and it is
added to the root list of the heap.
2. Union (or Merge):
o The union operation in Fibonacci Heaps is efficient and involves simply concatenating the root lists of
two Fibonacci Heaps.
3. Extract-Min (or Delete-Min):
o Extracting the minimum element from a Fibonacci Heap involves cutting the minimum node from the
root list and consolidating the heap to ensure that no two trees have the same degree.
4. Decrease Key:
o Decreasing the key of a node in a Fibonacci Heap involves decreasing the key value and potentially
cutting the node from its parent. The operation is efficient and takes constant amortized time.
5. Delete:
o Deleting a node from a Fibonacci Heap involves decreasing its key to negative infinity (to make it the
minimum), followed by an extract-min operation.
Advantages of Fibonacci Heaps:
1. Amortized Time Complexity:
o The amortized time complexity of many operations in Fibonacci Heaps, especially decrease key, is
favorable compared to other heap structures.
2. Efficient Union Operation:
o The union operation is very efficient, making Fibonacci Heaps suitable for applications where heaps
need to be merged frequently.
3. Supports Decrease Key in Constant Amortized Time:
o Decrease key operations are particularly fast in Fibonacci Heaps, making them well-suited for certain
graph algorithms.
Drawbacks:
1. Higher Memory Overhead:
o Fibonacci Heaps can have higher memory overhead compared to simpler heaps due to their dynamic
structure.
2. Complexity of Implementation:
o The implementation of Fibonacci Heaps is more complex than that of simpler heaps, which can make
them harder to understand and maintain.
In practice, Fibonacci Heaps are chosen when the specific advantages they offer, such as fast decrease key operations and
efficient union operations, outweigh the potential drawbacks for a given application. They are commonly used in
situations where the amortized time complexity is a critical factor.
STRING MATCHING ALGORITHMS.
String matching algorithms are fundamental techniques used in computer science to find occurrences of a substring (or
pattern) within a larger string (or text). There are various algorithms with different complexities and strengths. Here are
some notable string matching algorithms:
1. Brute Force Method:
o The simplest method involves checking each position in the text against the pattern. This method has a
time complexity of O((n-m+1)m), where n is the length of the text and m is the length of the pattern.
2. Knuth-Morris-Pratt (KMP) Algorithm:
o The KMP algorithm is a linear-time algorithm that avoids unnecessary character comparisons by
utilizing information about the pattern's structure. It preprocesses the pattern to create a "failure
function" that helps skip unnecessary comparisons during the actual search.
3. Boyer-Moore Algorithm:
o The Boyer-Moore algorithm is known for its practical efficiency. It preprocesses the pattern to skip
large portions of the text in each comparison, making it especially effective for searching in large texts.
It employs both bad character and good suffix rules.
4. Rabin-Karp Algorithm:
o Rabin-Karp is a probabilistic algorithm that uses hashing for string matching. It computes hash values
for overlapping substrings and compares them with the hash value of the pattern. It is useful for
multiple pattern matching.
5. Aho-Corasick Algorithm:
o The Aho-Corasick algorithm constructs a finite state machine to efficiently search for multiple patterns
simultaneously. It preprocesses the patterns and creates a trie structure, allowing for fast searches.
6. Finite Automaton Algorithm:
o A finite automaton can be constructed to represent the pattern. The text is scanned, and transitions in
the automaton are made based on the characters encountered. This algorithm has linear time complexity
in the best case.
7. Bitap Algorithm (Shift-Or Algorithm):
o Bitap is an approximate string matching algorithm. It uses bitwise operations to efficiently search for a
pattern with allowed errors or "don't care" positions.
8. Suffix Trees and Suffix Arrays:
o Suffix trees and suffix arrays are advanced data structures that store all the suffixes of a string. They
enable efficient pattern matching and support various string-related operations.
9. Burrows-Wheeler Transform (BWT):
o BWT is a reversible transformation of a string that can be used for compression and efficient pattern
matching. Algorithms like the FM-index, based on the BWT, are used in bioinformatics and other
applications.
10. Smith-Waterman Algorithm:
o Smith-Waterman is a dynamic programming algorithm used for local sequence alignment. It is often
employed in bioinformatics for comparing biological sequences.
The choice of a specific string matching algorithm depends on factors such as the size of the text, the length of the
pattern, and whether approximate matching or multiple pattern matching is required. Each algorithm has its strengths and
weaknesses, and the optimal choice depends on the characteristics of the problem at hand.

UNIT - III GRAPHS


Elementary Graph Algorithms: Breadth-First Search – Depth-First Search – Topological Sort –
Strongly Connected Components- Minimum Spanning Trees: Growing a Minimum Spanning Tree –
Kruskal’s and Prims– Single-Source Shortest paths in Directed Acyclic Graphs – Dijkstra‘s Algorithm;
Dynamic Programming - All-Pairs Shortest Paths: Shortest Paths and Matrix Multiplication – The Floyd-
Warshall’s Algorithm-Connectivity.

ELEMENTARY GRAPH ALGORITHMS: BREADTH-FIRST SEARCH


Breadth-First Search (BFS) is an elementary graph algorithm that explores all the vertices of a graph in breadthward
motion, i.e., it visits all the neighbors of a vertex before moving on to the next level of vertices. BFS is often used to find
the shortest path in an unweighted graph and to explore all connected components of an undirected graph.
Procedure for Breadth-First Search:
1. Initialization:
o Start with a source vertex ss.
o Initialize a queue to keep track of the vertices to be visited.
o Mark the source vertex ss as visited and enqueue it.
2. Explore Neighbors:
o While the queue is not empty:
 Dequeue a vertex vv from the queue.
 For each unvisited neighbor uu of vv:
 Mark uu as visited.
 Enqueue uu.
 Optionally, record the predecessor of uu to reconstruct the shortest path later.
3. Termination:
o When the queue is empty, the BFS is complete.

Pseudocode:
BFS(graph, start):
create a queue Q
enqueue start into Q
mark start as visited

while Q is not empty:


current = dequeue from Q
for each neighbor of current:
if neighbor is not visited:
mark neighbor as visited
enqueue neighbor into Q
// Optionally, record the predecessor of neighbour
Properties of BFS:
 Shortest Paths:
o BFS discovers the shortest path from the source vertex to every other reachable vertex in an
unweighted graph.
 Connected Components:
o BFS can be used to find connected components in an undirected graph.
 Bipartiteness Testing:
o BFS can be used to test whether an undirected graph is bipartite.
Time Complexity:
The time complexity of BFS is O(V+E)O(V+E), where VV is the number of vertices and EE is the number of edges. This
is because each vertex and each edge are processed once.
Example:
Consider the following undirected graph:
1 -- 2 -- 3
| |
4 -- 5 – 6
Starting BFS from vertex 1:
1. Visit 1, enqueue neighbors (2, 4), visit 2, enqueue neighbors (1, 3, 5), visit 4.
2. Continue visiting vertices level by level until all vertices are visited.
The order of traversal might be: 1, 2, 4, 3, 5, 6.
BFS explores vertices in layers, ensuring that the shortest path to each vertex is discovered first.
Depth-First Search
Depth-First Search (DFS) is another elementary graph algorithm that explores a graph by visiting as far as possible along
each branch before backtracking. It is often used to traverse or search through graphs and trees. DFS can be implemented
using either recursion or an explicit stack.
Procedure for Depth-First Search:
1. Initialization:
o Start with a source vertex ss.
o Mark the source vertex ss as visited.
2. Explore Neighbors:
o For each unvisited neighbor uu of the current vertex:
 Recursively apply DFS starting from uu.
 Optionally, record the predecessor of uu to reconstruct the traversal or paths later.
3. Termination:
o DFS continues until all vertices are visited.
Pseudocode (Recursive):
DFS(graph, current):
mark current as visited
for each neighbor of current:
if neighbor is not visited:
DFS(graph, neighbor)
// Optionally, record the predecessor of neighbour
Pseudocode (Iterative with Stack):
DFS(graph, start):
create a stack S
push start onto S
mark start as visited

while S is not empty:


current = pop from S
for each neighbor of current:
if neighbor is not visited:
mark neighbor as visited
push neighbor onto S
// Optionally, record the predecessor of neighbour
Properties of DFS:
 Topological Sorting:
o DFS can be used to perform a topological sorting of the vertices in a directed acyclic graph (DAG).
 Connected Components:
o DFS can be used to find connected components in an undirected graph.
 Cycle Detection:
o DFS can be used to detect cycles in a graph.
Time Complexity:
The time complexity of DFS is O(V+E)O(V+E), where VV is the number of vertices and EE is the number of edges. This
is because each vertex and each edge are processed once.
Example:
Consider the following undirected graph:
1 -- 2 -- 3
| |
4 -- 5 – 6
Starting DFS from vertex 1:
1. Visit 1, recursively visit 2, recursively visit 3, backtrack to 2, recursively visit 4, backtrack to 1.
2. Continue exploring until all vertices are visited.
The order of traversal might be: 1, 2, 3, 4, 5, 6.
TOPOLOGICAL SORT
DFS explores as deep as possible before backtracking, which may lead to different traversal orders compared to BFS. It
is useful for tasks such as topological sorting, detecting cycles, and exploring connected components.
Topological sorting is an ordering of the vertices of a directed acyclic graph (DAG) such that for every directed edge
(u,v)(u,v), vertex uu comes before vv in the ordering. In other words, it is a linear ordering of the vertices that respects
the partial order defined by the edges of the graph.
Topological sorting is primarily used in scenarios where there are dependencies between tasks, and the tasks need to be
executed in a certain order to satisfy those dependencies.
Algorithm for Topological Sort:
The standard algorithm for topological sorting is based on Depth-First Search (DFS). Here's a high-level description:
1. Initialization:
o Mark all vertices as unvisited.
o Initialize an empty stack to store the topologically sorted order.
2. DFS-based Ordering:
o For each vertex vv in the graph:
 If vv is not visited, perform a DFS starting from vv.
 During the DFS, when a vertex uu is completely explored (all its neighbors are visited), push
uu onto the stack.
3. Retrieve Order:
o Pop vertices from the stack. The order in which vertices are popped represents the topological order.
Pseudocode:
TopologicalSort(graph):
create a stack S
mark all vertices as unvisited

for each vertex v in graph:


if v is not visited:
DFS(graph, v, S)

// The stack now contains the topologically sorted order


result = empty list
while S is not empty:
[Link](pop from S)
return result

DFS(graph, current, stack):


mark current as visited
for each neighbor of current:
if neighbor is not visited:
DFS(graph, neighbor, stack)
// All neighbors visited, push current onto stack
push current onto stack
Example:
Consider the following DAG:
1 --> 2 --> 3
| |
v v
4 --> 5 – 6
Starting the topological sort:
1. DFS from 1: Visit 1, DFS from 2, DFS from 3, push 3, push 2, push 1.
2. DFS from 4: Visit 4, DFS from 5, DFS from 6, push 6, push 5, push 4.
The topological order is 1, 2, 3, 4, 5, 6.
Time Complexity:
The time complexity of the topological sort algorithm is O(V+E)O(V+E), where VV is the number of vertices and EE is
the number of edges in the graph. This is because it uses DFS, which has a linear time complexity for graphs.
Applications:
 Task Scheduling:
o In project management, topological sorting can be used to schedule tasks in a way that respects
dependencies.
 Build Systems:
o In build systems, where tasks depend on the successful completion of other tasks, topological sorting
can determine the order in which tasks should be executed.
 Dependency Resolution:
o In package management systems, topological sorting can be used to resolve dependencies between
software packages.
 Course Prerequisites:
o In educational systems, topological sorting can be applied to determine the order in which courses
should be taken based on prerequisites.
STRONGLY CONNECTED COMPONENTS-
Strongly Connected Components (SCCs) are subsets of vertices in a directed graph such that every vertex in a subset is
reachable from every other vertex in the same subset. In other words, for any pair of vertices uu and vv in an SCC, there
is a directed path from uu to vv and a directed path from vv to uu.
Finding strongly connected components is a fundamental operation in the analysis of directed graphs and has various
applications, including in optimization problems, compiler design, and network analysis.
Kosaraju's Algorithm:
Kosaraju's algorithm is a popular method for finding strongly connected components in a directed graph. The algorithm is
based on two passes of Depth-First Search (DFS).
Here's an outline of the algorithm:
1. First Pass (DFS on Reverse Graph):
o Perform a Depth-First Search (DFS) on the reverse graph (obtained by reversing all the edges in the
original graph).
o During DFS, record the finishing times of the vertices.
2. Second Pass (DFS on Original Graph):
o Perform a DFS on the original graph in decreasing order of finishing times obtained from the first pass.
o Each tree in the DFS forest forms a strongly connected component.
Pseudocode:
Kosaraju(graph):
// First Pass: DFS on Reverse Graph
mark all vertices as unvisited
create an empty stack S

for each vertex v in graph:


if v is not visited:
DFS_reverse(graph, v, S)

// Second Pass: DFS on Original Graph


mark all vertices as unvisited

while S is not empty:


current = pop from S
if current is not visited:
SCC = empty set
DFS_original(graph, current, SCC)
// SCC contains the vertices of a strongly connected component
print(SCC)

DFS_reverse(graph, current, stack):


mark current as visited
for each neighbor of current:
if neighbor is not visited:
DFS_reverse(graph, neighbor, stack)
push current onto stack

DFS_original(graph, current, SCC):


mark current as visited
add current to SCC
for each neighbor of current:
if neighbor is not visited:
DFS_original(graph, neighbor, SCC)
Example:
Consider the following directed graph:
1 --> 2 --> 3 4 <--- 5
^ | | \ |
| v v \ v
8 <-- 7 6 --> 9
Applying Kosaraju's algorithm:
1. First Pass (DFS on Reverse Graph):
o Starting DFS from 1, 2, 3, 4, 5, 6, 7, 8, 9, recording finishing times.
o Finishing times: 8, 7, 1, 2, 3, 9, 6, 4, 5.
2. Second Pass (DFS on Original Graph):
o Starting DFS from vertices in decreasing order of finishing times.
o SCCs: {8, 7, 1}, {2}, {3}, {9, 6, 4, 5}.
The strongly connected components are {8, 7, 1}, {2}, {3}, and {9, 6, 4, 5}.
Time Complexity:
The time complexity of Kosaraju's algorithm is O(V+E) O(V+E), where VV is the number of vertices and EE is the
number of edges in the graph. This is because the algorithm performs two passes of DFS, and each vertex and edge are
processed once in each pass.
Applications:
 Network Analysis:
o SCCs can represent groups of closely interconnected nodes in a network.
 Compiler Optimization:
o SCCs are used in compiler design for code optimization and register allocation.
 Component-Based Systems:
o SCCs can be used in systems where components need to communicate with each other.
 Dead Code Elimination:
o In compilers, SCCs are used for dead code elimination.
Minimum Spanning Trees: Growing a Minimum Spanning Tree
Growing a Minimum Spanning Tree (MST) is a process used in graph theory to construct a minimum spanning tree from
an undirected, connected graph. A minimum spanning tree is a subgraph of the original graph that includes all vertices
and forms a tree with the minimum possible total edge weight.
One common algorithm for growing a minimum spanning tree is Prim's Algorithm. Another popular algorithm is
Kruskal's Algorithm. Here, we'll focus on the process of growing a minimum spanning tree using Prim's Algorithm.
PRIM'S ALGORITHM:
Prim's Algorithm starts with an arbitrary vertex and grows the minimum spanning tree one vertex at a time. At each step,
the algorithm selects the smallest edge that connects a vertex in the growing tree to a vertex outside the tree.
Here's a step-by-step outline of Prim's Algorithm:
1. Initialization:
o Start with an arbitrary vertex as the initial tree.
o Maintain a priority queue (or a min-heap) to keep track of candidate edges.
o Mark the initial vertex as visited.
2. Growing the Tree:
o While the tree does not include all vertices:
 Identify the smallest edge that connects a visited vertex to an unvisited vertex.
 Add the selected edge and the unvisited vertex to the tree.
 Mark the newly added vertex as visited.
3. Termination:
o The algorithm terminates when all vertices are visited.
Pseudocode:
Prim(graph):
create a priority queue Q
start with an arbitrary vertex s
mark s as visited

while not all vertices are visited:


for each edge (u, v) connected to the visited vertices:
if v is not visited:
insert (u, v) with weight w into Q
select the edge with the smallest weight from Q
add the selected edge to the minimum spanning tree
mark the unvisited vertex as visited
Example:
Consider the following undirected, weighted graph:
2
1 ------ 2
|\ |
| \ |
| \ |
4 ------ 3
1 Starting with vertex 1:
1. Add edge (1, 2) to the tree with weight 2.
2. Add edge (2, 3) to the tree with weight 1.
3. Add edge (3, 4) to the tree with weight 1.
The minimum spanning tree is formed by edges (1, 2), (2, 3), and (3, 4), with a total weight of 4.
Time Complexity:
The time complexity of Prim's Algorithm is O(Elog⁡V)O(ElogV), where VV is the number of vertices and EE is the
number of edges. This is because the algorithm involves selecting the minimum-weight edge at each step, and
maintaining the priority queue has a time complexity of O(log⁡V)O(logV).
Applications:
 Network Design:
o In computer networks, finding the minimum spanning tree helps optimize the design and reduce costs.
 Circuit Design:
o In electronic circuit design, minimum spanning trees are used to connect components efficiently.
 Cluster Analysis:
o Minimum spanning trees can be used in clustering algorithms to identify relationships between data
points.
 Graph Partitioning:
o Minimum spanning trees are used in partitioning large graphs into connected components.
Kruskal’s
Kruskal's Algorithm is another popular algorithm for finding a Minimum Spanning Tree (MST) in a connected,
undirected graph. Like Prim's Algorithm, Kruskal's Algorithm grows the MST edge by edge, but it does so by
considering the edges in ascending order of their weights.
KRUSKAL'S ALGORITHM
1. Initialization:
o Create a forest (a collection of trees), initially containing all the vertices of the graph as individual
trees.
o Sort all the edges in the graph in non-decreasing order of their weights.
2. Growing the MST:
o For each edge in the sorted list of edges:
 If adding the edge to the forest does not create a cycle (i.e., the two endpoints of the edge are
in different trees):
 Add the edge to the Minimum Spanning Tree.
 Merge the two trees that contain the endpoints of the selected edge.
3. Termination:
o The algorithm terminates when the Minimum Spanning Tree includes all vertices.
Pseudocode:
Kruskal(graph):
create a forest F, initially containing all vertices as individual trees
sort all edges in non-decreasing order of weights

for each edge (u, v) in the sorted list:


if u and v are not in the same tree in F:
add (u, v) to the Minimum Spanning Tree
merge the trees containing u and v in F
Example:
Consider the following undirected, weighted graph:
2
1 ------ 2
|\ |
| \ |
| \ |
4 ------ 3
1 Sorting the edges by weight: (3, 4), (2, 3), (1, 2), (1, 4), (2, 4).
1. Add edge (3, 4) to the MST.
2. Add edge (2, 3) to the MST.
3. Add edge (1, 2) to the MST.
4. Add edge (1, 4) to the MST.
The minimum spanning tree is formed by edges (3, 4), (2, 3), (1, 2), and (1, 4), with a total weight of 6.
Time Complexity:
The time complexity of Kruskal's Algorithm is O(Elog⁡V)O(ElogV), where VV is the number of vertices and EE is the
number of edges. This is due to the sorting of the edges and the disjoint-set data structure used for checking connectivity,
which has a time complexity of O(log⁡V)O(logV).
Applications:
 Network Design:
o Kruskal's Algorithm is widely used in designing efficient networks, such as laying down cable to
connect computers.
 Circuit Design:
o In electronic circuit design, Kruskal's Algorithm is used for connecting components in a way that
minimizes cost.
 Cluster Analysis:
o The algorithm can be applied in clustering algorithms to identify relationships between data points.
 Maze Generation:
o Kruskal's Algorithm can be used to generate mazes by removing walls between cells in a grid.
Single-Source Shortest paths in Directed Acyclic Graphs – Dijkstra‘s Algorithm;
Dijkstra's Algorithm is a well-known algorithm for finding the shortest paths from a single source vertex to all other
vertices in a directed graph with non-negative edge weights. However, Dijkstra's Algorithm is not applicable to graphs
with negative edge weights or cycles.
For Directed Acyclic Graphs (DAGs), Dijkstra's Algorithm can be simplified, and it becomes more efficient than the
general case. In a DAG, there are no cycles, and the topological ordering of vertices can be exploited to compute the
shortest paths in a linear time complexity.
DIJKSTRA'S ALGORITHM FOR DAGS:
1. Topological Sorting:
o Perform a topological sorting of the vertices in the DAG.
2. Initialization:
o Initialize the distance to the source vertex as 0 and to all other vertices as infinity.
3. Relaxation:
o For each vertex uu in topological order:
 For each neighbor vv of uu:
 Relax the edge (u,v)(u,v) by updating the distance to vv: dist[v]=min⁡(dist[v],dist[u]
+weight(u,v))dist[v]=min(dist[v],dist[u]+weight(u,v)).
Pseudocode:
DijkstraDAG(graph, source):
topologicalOrder = TopologicalSort(graph)
initialize dist[] with infinity, except dist[source] = 0

for each vertex u in topologicalOrder:


for each neighbor v of u:
relax the edge (u, v)

return dist[]
Example:
Consider the following directed acyclic graph:
1 --(1)--> 2 --(4)--> 3
| | |
v v v
4 --(2)--> 5 --(3)--> 6
Performing topological sorting: 1, 4, 2, 5, 3, 6.
1. Initialize distances: dist[1] = 0, dist[2] = dist[3] = dist[4] = dist[5] = dist[6] = ∞.
2. Start relaxation:
o Relax (1, 2): dist[2] = min(∞, 0 + 1) = 1
o Relax (1, 4): dist[4] = min(∞, 0 + 0) = 0
o Relax (2, 5): dist[5] = min(∞, 1 + 4) = 5
o Relax (4, 5): dist[5] = min(5, 0 + 2) = 2
o Relax (2, 3): dist[3] = min(∞, 1 + 3) = 4
o Relax (5, 6): dist[6] = min(∞, 5 + 3) = 8
o Relax (3, 6): dist[6] = min(8, 4 + 3) = 7
The shortest paths from vertex 1 to all other vertices are: dist[1] = 0, dist[2] = 1, dist[3] = 4, dist[4] = 0, dist[5] = 2,
dist[6] = 7.
Time Complexity:
The time complexity of Dijkstra's Algorithm for DAGs is O(V+E)O(V+E), where VV is the number of vertices and EE is
the number of edges. The linear time complexity is achieved due to the topological sorting and the single-pass relaxation.
Applications:
Dijkstra's Algorithm for DAGs is useful in scenarios where the graph represents dependencies between tasks, and you
want to find the minimum time or cost to perform a particular task or reach a particular vertex from a source vertex.
Dynamic Programming
Dynamic Programming is a powerful optimization technique that involves breaking down a problem into smaller
overlapping subproblems and solving each subproblem only once, storing the solutions to subproblems in a table to avoid
redundant computations. The method is particularly useful for solving optimization problems where the goal is to find the
best solution among a set of feasible solutions.
Here are the key principles and components of Dynamic Programming:
1. Overlapping Subproblems:
Dynamic Programming is applicable when a problem can be broken down into smaller subproblems, and the solutions to
these subproblems can be reused multiple times. These subproblems often overlap, meaning that the same subproblem is
solved multiple times during the computation.
2. Optimal Substructure:
The optimal solution to the original problem can be constructed from the optimal solutions of its subproblems. In other
words, the problem exhibits optimal substructure if an optimal solution to the problem contains optimal solutions to its
subproblems.
COMPONENTS OF DYNAMIC PROGRAMMING
A. MEMOIZATION:
Memoization is a technique where the results of expensive function calls are stored and returned when the same inputs
occur again. In dynamic programming, this often involves creating a table to store the solutions to subproblems, and
before computing a solution, checking whether it has already been computed and stored.
b. Tabulation:
Tabulation is another approach in dynamic programming where solutions to subproblems are filled in a table
systematically, starting from the smallest subproblems and progressing towards the original problem. Tabulation is often
done using bottom-up iteration.
Types of Dynamic Programming:
a. Top-down (Memoization):
The recursive approach where the problem is solved by breaking it down into smaller subproblems, and the solutions to
these subproblems are stored in a table to avoid redundant computations.
b. Bottom-up (Tabulation):
The iterative approach where the table is filled systematically, starting from the smallest subproblems and progressing
towards the original problem.
Applications of Dynamic Programming:
1. Fibonacci Sequence:
o Computing Fibonacci numbers using dynamic programming significantly improves efficiency by
avoiding redundant computations.
2. Shortest Paths:
o Algorithms like Floyd-Warshall and Dijkstra's Algorithm can be optimized using dynamic
programming.
3. Matrix Chain Multiplication:
o Finding the most efficient way to multiply a sequence of matrices.
4. Longest Common Subsequence (LCS):
o Finding the longest subsequence common to two sequences of characters.
5. Knapsack Problem:
o Optimizing the selection of items with given weights and values to maximize the total value without
exceeding a given weight capacity.
6. Edit Distance:
o Finding the minimum number of operations (insertion, deletion, substitution) required to transform one
string into another.
7. Optimal Binary Search Trees:
o Constructing a binary search tree that minimizes the expected search time.
8. Dynamic Time Warping:
o Measuring the similarity between two sequences with different lengths.
Dynamic Programming is a versatile technique that can be applied to a wide range of problems in various domains. It
often leads to efficient and optimized solutions compared to naive recursive approaches.
All-Pairs Shortest Paths: Shortest Paths and Matrix Multiplication
he All-Pairs Shortest Paths problem involves finding the shortest paths between all pairs of vertices in a weighted
directed graph. One of the efficient algorithms for solving this problem is based on the concept of matrix multiplication.
Floyd-Warshall algorithm is a well-known algorithm that uses matrix multiplication to compute the all-pairs shortest
paths.
FLOYD-WARSHALL ALGORITHM
The Floyd-Warshall algorithm works for both positive and negative edge weights but does not handle graphs with
negative cycles. It is based on the idea of dynamic programming and uses a matrix to store intermediate results.
Algorithm Steps:
1. Initialization:
o Create a matrix DD of size V×VV×V (where VV is the number of vertices) to store the distances
between all pairs of vertices.
o Initialize D[i][j]D[i][j] to the weight of the edge from ii to jj if there is an edge; otherwise, set D[i]
[j]D[i][j] to ∞∞.
o Initialize D[i][i]D[i][i] to 0 for all ii.
2. Iteration:
o For each vertex kk from 1 to VV:
 For each pair of vertices ii and jj:
 Update D[i][j]D[i][j] to the minimum of D[i][j]D[i][j] and D[i][k]+D[k][j]D[i][k]
+D[k][j].
Pseudocode:
FloydWarshall(graph):
V = number of vertices in graph
initialize matrix D with dimensions V x V

for each vertex i:


for each vertex j:
if i equals j:
D[i][j] = 0
else if there is an edge from i to j:
D[i][j] = weight of the edge from i to j
else:
D[i][j] = infinity

for k from 1 to V:
for i from 1 to V:
for j from 1 to V:
D[i][j] = min(D[i][j], D[i][k] + D[k][j])
return D
MATRIX MULTIPLICATION AND SHORTEST PATHS:
The Floyd-Warshall algorithm's key insight is based on the relationship between matrix multiplication and the shortest
paths problem. If D1D1 and D2D2 are matrices representing the shortest path distances between vertices for two
subgraphs, then D1×D2D1×D2 represents the shortest path distances for the combined graph.
In other words, if D1[i][j]D1[i][j] represents the shortest path distance from vertex ii to vertex jj in the first subgraph, and
D2[i][j]D2[i][j] represents the shortest path distance in the second subgraph, then (D1×D2)[i][j](D1×D2)[i][j] represents
the shortest path distance in the combined graph.
Time Complexity:
The time complexity of the Floyd-Warshall algorithm is O(V3)O(V3), where VV is the number of vertices in the graph.
The algorithm iterates over all pairs of vertices for each intermediate vertex, leading to a cubic time complexity.
Applications:
 Routing Algorithms:
o Floyd-Warshall is used in computer networks for finding the shortest paths between routers.
 Traffic Engineering:
o Used in traffic engineering to optimize traffic flow through a network.
 Robotics:
o Applied in robotics for path planning.
 Game Development:
o Used for pathfinding in game development.
 Graph Analysis:
o Analyzing the structure and properties of graphs.
CONNECTIVITY.
In graph theory, connectivity refers to the property of a graph that describes how vertices are connected to each other.
The concept of connectivity is crucial in understanding the structure and properties of graphs. There are different types of
connectivity, each capturing different aspects of relationships between vertices.
1. Vertex Connectivity:
Vertex connectivity measures how many vertices need to be removed to disconnect a graph. A graph is kk-vertex
connected (or simply kk-connected) if removing any k−1k−1 vertices does not disconnect the graph. The minimum
number of vertices whose removal disconnects the graph is called the vertex connectivity.
2. Edge Connectivity:
Edge connectivity measures how many edges need to be removed to disconnect a graph. A graph is kk-edge connected
(or simply kk-edge-connected) if removing any k−1k−1 edges does not disconnect the graph. The minimum number of
edges whose removal disconnects the graph is called the edge connectivity.
3. Connected Graph:
A graph is connected if there is a path between every pair of vertices. In a connected graph, every vertex is reachable
from every other vertex.
4. Strongly Connected Graph:
For directed graphs, strong connectivity is used. A directed graph is strongly connected if there is a directed path from
any vertex to any other vertex.
5. Biconnected Graph:
A biconnected graph is a connected and 2-vertex connected graph. Removing any single vertex does not disconnect the
graph.
6. Triconnected Graph:
A triconnected graph is a connected and 3-vertex connected graph. It is more robust than biconnected graphs in terms of
vertex connectivity.
7. k-Connected Graph:
A graph is kk-connected if it is kk-vertex connected. In a kk-connected graph, there are kk vertex-disjoint paths between
any pair of vertices.
8. Planar Graph:
A graph is planar if it can be drawn in the plane without any edges crossing. Kuratowski's theorem states that a graph is
planar if and only if it does not contain a subgraph that is a subdivision of K5K5 (complete graph on 5 vertices) or
K3,3K3,3 (complete bipartite graph on 3 and 3 vertices).
9. Eulerian Graph:
An Eulerian graph is a graph that contains an Eulerian circuit, which is a circuit that visits every edge exactly once.
Eulerian circuits are relevant to the concept of connectivity because they provide a way to traverse all edges.
10. Hamiltonian Graph:
A Hamiltonian graph is a graph that contains a Hamiltonian cycle, which is a cycle that visits every vertex exactly once.
Hamiltonian cycles also relate to connectivity, as they provide a way to traverse all vertices.
Understanding the connectivity properties of a graph is essential for various applications, including network design,
transportation planning, and analyzing the robustness of systems. Different types of connectivity capture different aspects
of the relationships within a graph, providing valuable insights into its structure and behavior.

UNIT - IV ALGORITHM DESIGN TECHNIQUES 9


Dynamic Programming: Matrix-Chain Multiplication – Elements of Dynamic Programming – LongestCommon
Subsequence- Greedy Algorithms: – Elements of the Greedy Strategy- An Activity-Selection Problem - Huffman
Coding. Backtracking, branch and bound, Brute force search
DYNAMIC PROGRAMMING: MATRIX-CHAIN MULTIPLICATION
Matrix Chain Multiplication is a classic optimization problem in computer science and linear algebra that involves
finding the most efficient way to parenthesize a sequence of matrices to minimize the number of scalar multiplications
required to compute their product. This problem is relevant in various applications, including computer graphics,
robotics, and numerical simulations.
Problem Statement:
Given a sequence of matrices A1,A2,…,AnA1,A2,…,An, where the dimensions of AiAi are pi−1×pipi−1×pi for i=1,2,
…,ni=1,2,…,n, the goal is to find the optimal parenthesization of the matrices to minimize the total number of scalar
multiplications needed to compute their product A1×A2×…×AnA1×A2×…×An.
Dynamic Programming Solution:
Dynamic Programming can be applied to efficiently solve the Matrix Chain Multiplication problem by breaking it down
into smaller overlapping subproblems.
1. Subproblem Definition:
Let m[i][j]m[i][j] represent the minimum number of scalar multiplications needed to compute the product Ai×Ai+1×…
×AjAi×Ai+1×…×Aj.
2. Recursive Relationship:
The optimal parenthesization can be expressed as:
m[i][j]=min⁡i≤k<j{m[i][k]+m[k+1][j]+pi−1×pk×pj}m[i][j]=mini≤k<j{m[i][k]+m[k+1][j]+pi−1×pk×pj}
where pipi represents the number of rows in matrix AiAi, and pi−1pi−1 represents the number of columns in matrix AiAi
.
3. Bottom-Up Computation:
The dynamic programming table mm can be filled in a bottom-up manner, starting from subproblems of size 2 and
progressively building up to the original problem:
for l = 2 to n:
for i = 1 to n - l + 1:
j=i+l-1
m[i][j] = min_{i <= k < j} {m[i][k] + m[k+1][j] + p_{i-1} * p_k * p_j}
Example:
Consider three matrices A1A1 with dimensions 10×3010×30, A2A2 with dimensions 30×530×5, and A3A3 with
dimensions 5×605×60. The sequence of matrices is [A1,A2,A3][A1,A2,A3].
p=[10,30,5,60]p=[10,30,5,60]
The optimal parenthesization for this sequence is:
((A1×A2)×A3)((A1×A2)×A3)
Time Complexity:
The time complexity of the dynamic programming solution for Matrix Chain Multiplication is O(n3)O(n3), where nn is
the number of matrices in the sequence. The space complexity is O(n2)O(n2) for storing the mm table.
Applications:
Matrix Chain Multiplication has applications in various fields, including computer graphics, where it is used for
transformations and rendering; robotics, where it is used for robot motion planning; and numerical simulations, where it
is applied in scientific computing. Efficiently parenthesizing the matrices can significantly reduce the computational cost
of matrix multiplication.
Dynamic Programming is a powerful optimization technique used to solve problems by breaking them down into smaller
overlapping subproblems and solving each subproblem only once, storing the solutions to subproblems in a table to avoid
redundant computations. Here are the key elements of dynamic programming:
1. Optimal Substructure:
o Dynamic Programming problems exhibit the optimal substructure property, which means that an
optimal solution to the problem contains optimal solutions to its subproblems. This allows us to solve a
problem by solving its smaller subproblems.
2. Overlapping Subproblems:
o Subproblems in a dynamic programming approach overlap, meaning that the same subproblems are
solved multiple times. Dynamic Programming techniques involve storing the solutions to subproblems
in a table to avoid redundant computations.
3. Memoization:
o Memoization is a top-down approach where the results of expensive function calls are stored and
returned when the same inputs occur again. It involves using a data structure (often a table or a
dictionary) to store the solutions to subproblems.
4. Tabulation:
o Tabulation is a bottom-up approach where solutions to subproblems are filled in a table systematically,
starting from the smallest subproblems and progressing towards the original problem. This approach is
often used when memoization is not practical or when the structure of computation is more
straightforward.
5. State Transition Recurrence:
o Dynamic Programming problems can often be expressed using state transition recurrence, which
defines how the solution to a larger problem can be constructed from the solutions of its subproblems.
This recurrence relation helps formulate the dynamic programming algorithm.
6. Base Cases:
o Base cases define the solutions to the smallest subproblems and serve as the starting point for building
up solutions to larger problems. The base cases are essential to ensure that the algorithm terminates and
returns correct results for the smallest instances of the problem.
7. Time and Space Complexity:
o Analyzing the time and space complexity of a dynamic programming solution is crucial. Understanding
how the size of the problem relates to the number of subproblems and the space needed for
memoization or tabulation helps assess the efficiency of the algorithm.
8. Examples of Dynamic Programming Problems:
o Numerous problems in various domains can be solved using dynamic programming. Some classic
examples include:
 Fibonacci Sequence: Efficient computation of Fibonacci numbers.
 Shortest Paths: Algorithms like Dijkstra's and Floyd-Warshall.
 Longest Common Subsequence (LCS): Finding the longest subsequence common to two
sequences.
 Knapsack Problem: Optimization problem of selecting items to maximize total value without
exceeding a given weight capacity.
9. Greedy vs. Dynamic Programming:
o Dynamic Programming and Greedy algorithms are related optimization techniques. While Greedy
algorithms make locally optimal choices at each step, Dynamic Programming ensures a globally
optimal solution by solving subproblems.
10. Trade-off:
o The use of dynamic programming involves a trade-off between time complexity and space complexity.
Memoization might use more space, while tabulation may involve redundant computations. The choice
depends on the specific problem and constraints.
Understanding these elements helps in formulating and solving problems using dynamic programming. While the
technique is powerful, choosing the right approach (top-down with memoization or bottom-up with tabulation) depends
on the characteristics of the problem at hand.
LONGESTCOMMON SUBSEQUENCE
The Longest Common Subsequence (LCS) problem is a classic dynamic programming problem that involves finding the
longest subsequence common to two given sequences. A subsequence is a sequence that appears in the same relative
order but not necessarily contiguous. The goal is to find the length of the longest common subsequence.
Problem Statement:
Given two sequences X={x1,x2,…,xm}X={x1,x2,…,xm} and Y={y1,y2,…,yn}Y={y1,y2,…,yn}, the task is to find the
length of the longest common subsequence (LCS) of XX and YY.
Dynamic Programming Solution:
Dynamic programming can be applied to solve the LCS problem efficiently. The idea is to build a table CC where C[i]
[j]C[i][j] represents the length of the LCS of the prefixes X[1..i]X[1..i] and Y[1..j]Y[1..j].
1. Subproblem Definition:
Let C[i][j]C[i][j] represent the length of the LCS of X[1..i]X[1..i] and Y[1..j]Y[1..j].
2. Recursive Relationship:
The length of the LCS can be expressed in terms of smaller subproblems:


C[i][j]={0if i=0 or j=0,C[i−1][j−1]+1if xi=yj,max⁡(C[i−1][j],C[i][j−1])if xi≠yj.C[i][j]=⎩

⎧0C[i−1][j−1]+1max(C[i−1][j],C[i][j−1])if i=0 or j=0,if xi=yj,if xi=yj.


3. Bottom-Up Computation:
The dynamic programming table CC can be filled in a bottom-up manner:
for i from 0 to m:
for j from 0 to n:
if i = 0 or j = 0:
C[i][j] = 0
else if x[i] = y[j]:
C[i][j] = C[i-1][j-1] + 1
else:
C[i][j] = max(C[i-1][j], C[i][j-1])
Example:
Consider two sequences X={A,B,C,B,D,A,B}X={A,B,C,B,D,A,B} and Y={B,D,C,A,B,A}Y={B,D,C,A,B,A}. The LCS
is {B,C,A}{B,C,A}, and its length is 3.
Time Complexity:
The time complexity of the dynamic programming solution for LCS is O(mn)O(mn), where mm and nn are the lengths of
the input sequences XX and YY.
Applications:
LCS has applications in various fields, including:
 Bioinformatics:
o DNA sequence comparison and identification of common genetic elements.
 Version Control Systems:
o Detecting changes and finding differences between different versions of files.
 Natural Language Processing:
o Text comparison and plagiarism detection.
 Data Compression:
o Identifying repeating patterns in data.
The LCS problem is a fundamental problem in computer science with widespread applications.

GREEDY ALGORITHMS: – ELEMENTS OF THE GREEDY STRATEGY


Greedy algorithms are a class of algorithms that make locally optimal choices at each step with the hope of finding a
global optimum. They are often used for optimization problems where the goal is to find the best solution among a set of
feasible solutions. The key characteristic of greedy algorithms is that they make a series of choices, each of which is the
best at the moment, without considering the global situation.
Here are the key elements and characteristics of the greedy strategy:
1. Greedy Choice Property:
o A greedy algorithm makes a locally optimal choice at each step, with the hope that the series of locally
optimal choices will lead to a globally optimal solution. The choice made at each step is based on the
information available up to that point.
2. Optimal Substructure:
o The problem must exhibit the optimal substructure property, meaning that an optimal solution to the
problem contains optimal solutions to its subproblems. This property allows us to make locally optimal
choices at each step.
3. Greedy Algorithm Paradigm:
o The greedy paradigm involves making a sequence of choices, each of which looks best at the moment,
without worrying about the consequences of these choices. The algorithm is guided by the "greedy
choice property."
4. Greedy Choice:
o At each step, a greedy algorithm makes the choice that seems best at the moment. This is determined
by a greedy criterion, which is often based on maximizing or minimizing a certain value.
5. Feasible Solutions:
o The problem must have the property that any locally optimal solution is also globally optimal. This
ensures that the greedy algorithm's series of locally optimal choices leads to the best overall solution.
6. Proof of Correctness:
o To demonstrate the correctness of a greedy algorithm, one often needs to provide a proof that the series
of locally optimal choices indeed leads to a globally optimal solution. Inductive proofs and
contradiction are commonly used techniques.
7. Examples of Greedy Algorithms:
o Greedy algorithms are used in various problems, including:
 Greedy Coin Change: Minimizing the number of coins to make change.
 Fractional Knapsack: Maximizing the value of items placed in a knapsack given their
weights and values.
 Huffman Coding: Constructing a variable-length prefix-free binary tree for efficient data
compression.
 Dijkstra's Algorithm: Finding the shortest paths in a graph with non-negative edge weights.
 Prim's Algorithm: Constructing a minimum spanning tree in a graph.
8. Greedy vs. Dynamic Programming:
o Greedy algorithms and dynamic programming are both optimization techniques. While greedy
algorithms make locally optimal choices, dynamic programming ensures a globally optimal solution by
solving subproblems and storing their solutions.
9. Greedy Algorithms for NP-Hard Problems:
o Greedy algorithms are often used for approximating solutions to NP-hard problems, where finding an
exact solution is computationally infeasible. While not guaranteed to be optimal, they often provide
solutions close to the optimum.
10. Trade-offs:
o The advantage of greedy algorithms lies in their simplicity and efficiency, but the downside is that they
may not always guarantee an optimal solution. Careful analysis and proof are required to ensure
correctness.
Greedy algorithms are widely used in practice due to their simplicity and efficiency. While not suitable for all types of
problems, they are powerful tools when the greedy choice property and optimal substructure are present.

AN ACTIVITY-SELECTIONPROBLEM
The Activity Selection Problem is a classic scheduling problem that involves selecting a maximum-size set of non-
overlapping activities, given their start and finish times. The goal is to find the optimal schedule that maximizes the
number of activities performed.
Problem Statement:
Given a set of nn activities with their start and finish times represented as pairs (si,fi)(si,fi), where sisi is the start time
and fifi is the finish time of the ii-th activity, the task is to select a maximum-size set of non-overlapping activities.
Greedy Algorithm:
The greedy approach is well-suited for solving the Activity Selection Problem. The algorithm sorts the activities based on
their finish times and then iteratively selects activities that finish earliest without conflicting with the previously selected
ones.
Greedy Choice Property:
At each step, the algorithm selects the activity with the earliest finish time among the remaining activities that do not
conflict with the previously selected activities.
Optimal Substructure:
The problem exhibits optimal substructure because a globally optimal solution can be constructed from locally optimal
solutions (subproblems).
Greedy Algorithm for Activity Selection:
1. Sort activities by finish time in ascending order.
2. Select the first activity as the first task in the schedule.
3. For each subsequent activity:
o If the start time of the current activity is greater than or equal to the finish time of the last selected
activity, select the current activity.
Pseudocode:
ActivitySelection(activities):
Sort activities by finish time in ascending order
Add the first activity to the schedule

lastSelected = 0

for i from 1 to length(activities) - 1:


if activities[i].startTime >= activities[lastSelected].finishTime:
Add activities[i] to the schedule
lastSelected = i

return schedule
Example:
Consider the following set of activities with their start and finish times:
Activity Start Time Finish Time
A 1 4
B 3 5
C 0 6
D 5 7
E 8 9
F 5 9
The greedy algorithm selects activities C, D, and E, resulting in a maximum-size set of non-overlapping activities.
Time Complexity:
The time complexity of the greedy algorithm for the Activity Selection Problem is O(nlog⁡n)O(nlogn), where nn is the
number of activities. Sorting the activities based on finish times dominates the time complexity.
Applications:
The Activity Selection Problem has applications in scheduling tasks and resource allocation, such as in project
management, conference scheduling, and classroom scheduling. The greedy approach provides an efficient solution to
this optimization problem.

HUFFMAN CODING. BACKTRACKING, BRANCH AND BOUND


Huffman Coding:
Problem: Huffman coding is a compression algorithm that assigns variable-length codes to input characters based on
their frequencies. The goal is to minimize the total number of bits required to represent the input.
Greedy Algorithm:
1. Build a frequency table: Count the frequency of each character in the input.
2. Build a priority queue (min-heap): Each character becomes a leaf node with its frequency as the key.
3. Build the Huffman tree:
o Repeatedly remove two nodes with the smallest frequencies from the priority queue.
o Create a new internal node with a frequency equal to the sum of the frequencies of the two nodes.
o Insert the new node back into the priority queue.
o Repeat until there is only one node in the queue, which becomes the root of the Huffman tree.
4. Generate Huffman codes:
o Traverse the Huffman tree to assign binary codes to each character.
o Left edges represent 0, and right edges represent 1.
Huffman coding guarantees a prefix-free code, meaning no code is a prefix of another. This property ensures
unambiguous decoding.

Backtracking:
Concept: Backtracking is a general algorithmic technique for solving computational problems by iteratively trying to
build a solution incrementally and abandoning a partial candidate ("backtracking") as soon as it determines that the
candidate cannot be completed to a valid solution.
Key Elements:
1. Decision Space:
o Define a solution space representing all possible solutions to the problem.
o At each step, make a decision that narrows down the possibilities.
2. Recursive Approach:
o Formulate the problem in terms of smaller subproblems.
o Recursively explore each subproblem until a solution is found or it is determined that the current path
cannot lead to a solution.
3. Backtracking Mechanism:
o If the current partial solution cannot be extended to a valid solution, backtrack to the previous decision
point.
o Undo the last decision and explore alternative choices.
4. Base Cases:
o Define base cases that indicate when a solution has been found.
Applications:
 N-Queens Problem
 Subset Sum
 Hamiltonian Path
 Cryptarithmetic Puzzles
 Sudoku

Branch and Bound:


Concept: Branch and Bound is an algorithm design paradigm used for solving optimization problems. It systematically
searches the solution space and uses bounds to eliminate subproblems that cannot lead to an optimal solution.
Key Elements:
1. Bounding Function:
o Define a bounding function to estimate the best possible solution achievable in a subtree without
exploring all nodes.
o Use this function to prioritize the order of exploration.
2. Exploration Strategy:
o Explore nodes in a priority order based on the bounding function.
o Use a data structure (e.g., priority queue) to manage the exploration order.
3. Branching:
o Divide the problem into subproblems (branches) at each decision point.
o Explore the branches that have the potential for an optimal solution.
4. Pruning:
o Prune branches that cannot lead to an optimal solution based on the bounding function.
o Avoid unnecessary exploration of subproblems.
Applications:
 Traveling Salesman Problem
 Knapsack Problem
 Job Scheduling
 Job Shop Scheduling
 Quadratic Assignment Problem
Branch and Bound is particularly useful for solving combinatorial optimization problems efficiently by intelligently
exploring the solution space. It combines elements of both divide and conquer and greedy strategies.
Brute force search
Brute force search is a straightforward approach to problem-solving that involves systematically exploring all possible
solutions to a problem. It is a general problem-solving technique that exhaustively checks every potential solution,
making no attempt to optimize or eliminate possibilities based on prior knowledge.
Key Characteristics:
1. Exhaustive Enumeration:
o Brute force considers every possible candidate solution, leaving no stone unturned.
2. Systematic Search:
o The search is systematic and explores the entire solution space in a predetermined order.
3. No Heuristics:
o Brute force does not use heuristics or optimization strategies; it simply evaluates all possibilities.
4. Applicability:
o Brute force is applicable to a wide range of problems, especially when the problem size is small enough
to handle all possibilities.
Examples:
1. String Matching:
o Brute force can be used to search for a substring within a longer string by comparing all possible
substring matches.
2. Traveling Salesman Problem (TSP):
o Brute force can be applied to solve the TSP by exploring all possible permutations of cities and
selecting the one with the minimum total distance.
3. Subset Sum:
o Given a set of numbers, brute force can be used to find all possible subsets and determine which
subset(s) sum to a specific target value.
4. Password Cracking:
o Brute force can be used to systematically try all possible combinations of characters to crack a
password.
Advantages:
1. Simplicity:
o Brute force is easy to understand and implement, making it suitable for small-scale problems.
2. Guaranteed Correctness:
o Since brute force explores all possibilities, it guarantees finding the correct solution if one exists.
Disadvantages:
1. Inefficiency:
o Brute force is often inefficient, especially for large problem sizes, as it considers a large number of
unnecessary possibilities.
2. Computational Complexity:
o The time and space complexity of brute force solutions can be high, making them impractical for
certain problems.
3. Not Suitable for NP-Hard Problems:
o Brute force is not suitable for solving problems classified as NP-hard, where the number of possibilities
grows exponentially.
4. Lack of Scalability:
o The approach becomes increasingly impractical as the problem size increases, leading to computational
infeasibility.
Use Cases:
 Brute force is suitable for small-scale problems or situations where the solution space is limited and can be
exhaustively explored within reasonable computational limits.
Note: While brute force may be inefficient for some problems, it can be a valuable baseline approach for validating more
complex algorithms. It is often used for small-scale testing and as a reference for correctness when developing optimized
solutions.

UNIT - V NP COMPLETE AND NP HARD

NP-Completeness: Polynomial Time – Polynomial-Time Verification – NP- Completeness a nd


Reducibility – NP-Completeness Proofs – NP-Complete Problems

NP-COMPLETENESS: POLYNOMIAL TIME


Understanding NP-completeness and polynomial time involves delving into the realm of computational complexity
theory, a field that explores the inherent difficulty of computational problems. Let's break down the key concepts:
NP-Completeness:
1. NP (Nondeterministic Polynomial) Class:
o NP is a complexity class that represents problems for which a given solution can be checked quickly (in
polynomial time) by a nondeterministic Turing machine.
o A problem is in NP if, given a solution, its correctness can be verified efficiently.
2. P Class:
o P is a subset of NP and represents problems for which a solution can be found quickly (in polynomial
time) by a deterministic Turing machine.
o P problems are considered efficiently solvable.
3. NP-Complete (NPC) Problems:
o A problem is NP-complete if it is in NP, and any problem in NP can be polynomial-time reducible to it.
o The first problem proven to be NP-complete was the Boolean satisfiability problem (SAT).
4. Cook's Theorem:
o Stephen Cook's theorem (Cook's theorem) established the concept of NP-completeness and introduced
the notion of NP-completeness reductions.
o A problem is NP-complete if it is both in NP and as hard as the hardest problems in NP.
5. Implications of NP-Completeness:
o If there exists a polynomial-time algorithm for any NP-complete problem, then there exists a
polynomial-time algorithm for all problems in NP. This is known as the P vs NP problem.
Polynomial Time:
1. Polynomial Time Complexity:
o A problem is said to be solvable in polynomial time if there exists an algorithm for solving it with time
complexity bounded by a polynomial function of the problem size.
2. Polynomial Time Reduction:
o If we can transform an instance of one problem into an instance of another problem in polynomial time,
we say that the first problem is polynomial-time reducible to the second problem.
3. Polynomial-Time Reductions and NP-Completeness:
o NP-completeness involves polynomial-time reductions between problems. If problem A is polynomial-
time reducible to problem B, and B is NP-complete, then A is also NP-complete.
4. Implications for NP-Completeness:
o If we can show a polynomial-time reduction from a known NP-complete problem to a new problem, we
prove the new problem to be NP-complete as well.
5. Cook-Levin Theorem:
o The Cook-Levin theorem shows that Boolean satisfiability (SAT) is NP-complete. Many other
problems have since been proven NP-complete through reductions from SAT.
Key Points:
 NP-complete problems are challenging because no polynomial-time algorithm is currently known for solving
them, and if one exists, it implies polynomial-time solutions for all NP problems.
 Proving a problem to be NP-complete involves demonstrating polynomial-time reductions from a known NP-
complete problem to the new problem.
 The study of NP-completeness and polynomial time has deep implications for the inherent difficulty of
computational problems and has sparked significant research in algorithms and complexity theory.
 While NP-complete problems are believed to be inherently hard, polynomial-time algorithms are sought after
because they would make solving many practical problems much more efficient.
Polynomial-Time Verification
Polynomial-time verification is a concept related to the complexity class NP (Nondeterministic Polynomial time). In
computational complexity theory, the term refers to the ability to efficiently verify the correctness of a solution to a
problem in polynomial time.
NP (Nondeterministic Polynomial) Class:
1. Decision Problems:
o In the context of NP, we often deal with decision problems that have a binary answer (yes or no).
2. Nondeterministic Polynomial Time:
o A decision problem is in NP if, given a solution, its correctness can be verified in polynomial time by a
nondeterministic Turing machine.
3. Certificate or Witness:
o The solution to an NP problem is often accompanied by a certificate or witness that provides evidence
of the correctness of the solution.
POLYNOMIAL-TIME VERIFICATION:
1. Polynomial-Time Verifiability:
o For a problem to be in NP, the verification process must be polynomial-time, meaning that given a
solution and its certificate, it should be possible to check the correctness of the solution in polynomial
time.
2. Example: SAT (Boolean Satisfiability):
o SAT is a classic NP-complete problem.
o A potential solution is a Boolean assignment to variables, and the certificate is the assignment itself.
o Given an assignment and the original Boolean formula, we can check in polynomial time whether the
assignment satisfies the formula.
3. Certificate Check:
o The certificate check involves examining the provided certificate alongside the input and verifying that
the certificate is indeed a valid solution.
Polynomial-Time Verification vs. Polynomial-Time Solution:
1. Efficient Verification:
o While verifying a solution can be done efficiently in polynomial time, finding the solution may not be
as easy.
2. NP vs. P:
o Problems in P (polynomial time) have efficient algorithms for finding solutions.
o Problems in NP have efficient algorithms for verifying solutions, but finding solutions may be
computationally hard.
3. Cook's Theorem:
o Cook's theorem states that the Boolean satisfiability problem (SAT) is NP-complete.
o Any problem in NP can be polynomial-time reducible to SAT, implying that if we have a polynomial-
time algorithm for SAT, we have one for all problems in NP.
Example of Polynomial-Time Verification:
Consider the Hamiltonian Cycle problem:
 Decision Problem: Does a given graph have a Hamiltonian cycle?
 Certificate: A permutation of vertices representing a potential Hamiltonian cycle.
 Verification: Check that the permutation visits each vertex exactly once and that there is an edge between
consecutive vertices in the permutation.
The verification process is polynomial-time because checking the permutation requires linear time in the size of the input
graph.
In summary, polynomial-time verification is a key aspect of problems in NP. While finding solutions to NP problems
may be difficult, verifying solutions can be done efficiently in polynomial time, allowing for a more manageable
approach to certain types of problems.
NP- Completeness and Reducibility
Understanding NP-completeness and reducibility involves exploring the concepts central to computational complexity
theory. Let's delve into these concepts:
NP-COMPLETENESS:
1. Definition:
o A decision problem is NP-complete if it belongs to the class NP (problems for which solutions can be
verified quickly) and any problem in NP can be polynomial-time reducible to it.
2. Cook's Theorem:
o Cook's theorem, proved by Stephen Cook, introduced the concept of NP-completeness.
o The Boolean satisfiability problem (SAT) was the first problem shown to be NP-complete.
3. NP-Complete Problems:
o Many problems in various domains have been proven to be NP-complete. Examples include the
Traveling Salesman Problem, Knapsack Problem, and Graph Coloring.
4. Reductions:
o Proving a problem to be NP-complete involves demonstrating polynomial-time reductions from a
known NP-complete problem to the new problem.
Reducibility:
1. Polynomial-Time Reduction:
o A problem A is polynomial-time reducible to problem B if there exists a polynomial-time algorithm
that transforms instances of A into instances of B.
o Denoted as A≤pBA≤pB (A is polynomial-time reducible to B).
2. Concept of Reduction:
o If we can efficiently solve problem B, and we can transform instances of problem A into instances of B
in polynomial time, then we can efficiently solve problem A.
3. Transitive Property:
o If A≤pBA≤pB and B≤pCB≤pC, then A≤pCA≤pC.
o The transitive property of reductions allows the comparison of the relative difficulty of different
problems.
NP-COMPLETENESS AND REDUCIBILITY:
1. Proving NP-Completeness:
o To prove a problem A is NP-complete, two conditions must be satisfied:
 A is in NP.
 Every problem in NP is polynomial-time reducible to A.
2. Cook-Levin Theorem:
o The Cook-Levin theorem demonstrated that the Boolean satisfiability problem (SAT) is NP-complete.
o Many other NP-complete problems have been identified through reductions from SAT.
3. Implications:
o If there exists a polynomial-time algorithm for any NP-complete problem, there exists a polynomial-
time algorithm for all problems in NP. This is known as the P vs NP question.
Example:
Consider the Hamiltonian Cycle problem:
 Decision Problem: Does a given graph have a Hamiltonian cycle?
 NP-Completeness: The Hamiltonian Cycle problem is NP-complete.
 Reduction: To prove another problem (e.g., Graph Coloring) NP-complete, one would demonstrate a
polynomial-time reduction from Hamiltonian Cycle to Graph Coloring.
Importance:
1. Inherent Difficulty:
o NP-completeness characterizes problems believed to be inherently hard, as no polynomial-time
algorithms are currently known for them.
2. Algorithm Design:
o Reducibility helps in the design of algorithms by establishing relationships between problem
complexities.
3. Computational Complexity:
o The study of NP-completeness and reducibility provides insights into the inherent complexity of
computational problems.
Understanding NP-completeness and reducibility is crucial for assessing the difficulty of computational problems and
recognizing the relationships between different problems in terms of their solvability. These concepts form the
foundation of computational complexity theory.
NP-Completeness Proofs
Proving that a problem is NP-complete involves demonstrating two key properties:
1. Membership in NP:
o Show that the problem is in the complexity class NP, meaning that given a solution, its correctness can
be verified efficiently in polynomial time.
2. NP-Completeness Reducibility:
o Prove that every problem in the class NP can be polynomial-time reducible to the given problem. This
involves showing that there is a polynomial-time algorithm that can transform any instance of an NP
problem into an instance of the given problem.
The following steps outline the typical process for proving NP-completeness:
Step 1: Choose a Known NP-Complete Problem
Start with a known NP-complete problem, often a classic one like Boolean satisfiability (SAT), 3SAT, or the
Hamiltonian Cycle problem.
Step 2: Formulate a Polynomial-Time Reduction
Define a polynomial-time reduction from the known NP-complete problem to the problem you want to prove NP-
complete.
 Reduction Function:
o Design a function that transforms instances of the known problem into instances of the new problem in
polynomial time.
Step 3: Prove Correctness of Reduction
Demonstrate that the reduction function is correct:
 If Instance A is a "Yes" Instance:
o The transformed instance in the new problem is also a "Yes" instance.
 If Instance A is a "No" Instance:
o The transformed instance in the new problem is also a "No" instance.
Step 4: Prove Membership in NP
Show that the new problem is in NP by demonstrating an efficient verification algorithm:
 Efficient Verification:
o Given a potential solution and a certificate (witness), verify the correctness of the solution in
polynomial time.
Step 5: Conclude NP-Completeness
Once you have established both membership in NP and NP-completeness reducibility, you can conclude that the problem
is NP-complete.
Example: Proving Graph Coloring NP-Completeness
Suppose you want to prove that the Graph Coloring problem is NP-complete. You can follow these steps:
1. Choose a Known NP-Complete Problem:
o Choose a classic NP-complete problem like 3SAT.
2. Formulate a Polynomial-Time Reduction:
o Design a polynomial-time reduction from 3SAT to Graph Coloring. Transform an instance of 3SAT
into an equivalent instance of Graph Coloring.
3. Prove Correctness of Reduction:
o Show that if an instance of 3SAT is satisfiable, the corresponding graph in Graph Coloring is colorable,
and vice versa.
4. Prove Membership in NP:
o Demonstrate an efficient verification algorithm for Graph Coloring. Given a coloring of the graph and a
certificate (witness), verify in polynomial time that the coloring is valid.
5. Conclude NP-Completeness:
o Conclude that Graph Coloring is NP-complete by showing both membership in NP and polynomial-
time reducibility from 3SAT.
It's important to note that proving NP-completeness requires careful reasoning and precision. The process involves
constructing the reduction function and demonstrating its correctness, which can be a non-trivial task. Once proven, the
result has implications for the inherent complexity of the problem.
NP-Complete Problems.
NP-complete (Non-deterministic Polynomial complete) problems are a class of decision problems for which a solution
can be verified quickly, and if a polynomial-time algorithm exists for any NP-complete problem, then it implies a
polynomial-time algorithm exists for all problems in NP (Nondeterministic Polynomial time). In other words, NP-
complete problems are considered among the most difficult problems in NP.
Here are some well-known NP-complete problems:
1. Boolean Satisfiability (SAT):
o Given a Boolean formula, is there an assignment of truth values to the variables that makes the formula
true?
o Example: (x1∨¬x2)∧(x2∨x3∨¬x4)∧(¬x1∨¬x3)(x1∨¬x2)∧(x2∨x3∨¬x4)∧(¬x1∨¬x3)
2. 3SAT:
o A specific version of SAT where each clause has exactly three literals.

∨x4)
o Example: (x1∨x2∨¬x3)∧(¬x2∨x3∨x4)∧(¬x1∨¬x2∨x4)(x1∨x2∨¬x3)∧(¬x2∨x3∨x4)∧(¬x1∨¬x2

3. Clique:
o Given an undirected graph GG and a positive integer kk, does GG contain a complete subgraph of size
at least kk?
o Example: Given a graph with vertices {1,2,3,4}{1,2,3,4} and edges {(1,2),(2,3),(3,4),(4,1)}{(1,2),
(2,3),(3,4),(4,1)}, does it contain a clique of size 3?
4. Vertex Cover:
o Given an undirected graph GG and a positive integer kk, does GG have a vertex cover of size at most
kk?
o Example: Given a graph with vertices {1,2,3,4}{1,2,3,4} and edges {(1,2),(2,3),(3,4),(4,1)}{(1,2),
(2,3),(3,4),(4,1)}, does it have a vertex cover of size 2?
5. Hamiltonian Cycle:
o Given an undirected graph GG, does it contain a Hamiltonian cycle (a cycle that visits each vertex
exactly once)?
o Example: Given a graph with vertices {1,2,3,4}{1,2,3,4} and edges {(1,2),(2,3),(3,4),(4,1)}{(1,2),
(2,3),(3,4),(4,1)}, does it have a Hamiltonian cycle?
6. Knapsack Problem:
o Given a set of items, each with a weight and a value, determine the maximum value that can be
obtained by selecting a subset of items such that the sum of their weights does not exceed a given
capacity.
o Example: You have a set of items with weights and values, and a maximum weight capacity. What is
the most valuable subset of items you can take?
7. Traveling Salesman Problem (TSP):
o Given a list of cities and the distances between each pair of cities, find the shortest possible tour that
visits each city exactly once and returns to the original city.
o Example: Given cities A, B, and C with distances between them, what is the shortest tour that visits
each city exactly once?
These problems are called NP-complete because they are in NP, and they are as hard as the hardest problems in NP. The
concept of NP-completeness was introduced by Stephen Cook in his groundbreaking paper, and it has had profound
implications in the study of computational complexity. Many other problems have been proven to be NP-complete, and
establishing new NP-completeness results often involves reductions from known NP-complete problems.

NP-COMPLETE PROBLEMS.
Here are a few more examples of NP-complete problems:
8. Subset Sum:
o Given a set of positive integers and a target sum, is there a subset of the integers that adds up to the
target sum?
o Example: Given the set {3, 1, 4, 2, 2} and a target sum of 6, is there a subset that adds up to 6?
9. Partition:
o Given a set of positive integers, can it be partitioned into two subsets with equal sums?
o Example: Given the set {1, 5, 11, 5}, can it be partitioned into two subsets with equal sums?
10. Knapsack Problem (Decision Version):
 Given a set of items, each with a weight and a value, and a weight capacity, is there a subset of items that can be
packed into the knapsack to achieve a total value of at least a given threshold?
 Example: You have a set of items with weights and values, a maximum weight capacity, and a threshold value.
Can you pack items to achieve the required value without exceeding the weight capacity?
11. Bin Packing:
 Given a set of items with sizes and bins with a fixed capacity, can the items be packed into the bins efficiently?
 Example: You have items of different sizes and bins with a fixed capacity. Can you pack the items into the bins
in an optimal way?
12. Boolean Circuit Satisfiability (Circuit-SAT):
 Given a Boolean circuit, is there an assignment of truth values to the inputs that makes the output true?
 Example: Given a complex Boolean circuit with AND, OR, and NOT gates, is there an assignment of truth
values that makes the output true?
13. Integer Linear Programming (ILP):
 Given a system of linear inequalities and an objective function, is there an assignment of integer values to the
variables that satisfies the inequalities and maximizes (or minimizes) the objective function?
 Example: You have a set of linear inequalities and an objective function. Can you find integer values for the
variables that satisfy the inequalities and optimize the objective function?
14. 3-Coloring:
 Given an undirected graph, can its vertices be colored with at most 3 colors such that no two adjacent vertices
have the same color?
 Example: Given a graph, can you color its vertices with at most 3 colors without any adjacent vertices having
the same color?

You might also like