Module I
Module I
DATA STRUCTURES
Pallavi Joshi
Asst. Professor,
PICT.
Bad Programmer worries about the
Code, While the good Programmers
worry about the Data Structures and
their relationship.
-Linus Torvalds
3
Course Content
Introduction of Algorithms, Analysis of Algorithms,
Complexity of algorithms- Space complexity, Time
complexity.
Data Structures: Abstract Data Types (ADT), Concept
of linear and Non-linear data structures. Use of Array
and Associative/Jagged Arrays.
Searching: Search Techniques, Sequential search,
Sentinel search, Binary search, Fibonacci search.
Sorting- Types of Sorting- Internal and External
Sorting, Stable Sorting, Sorting methods- Bubble sort,
Insertion sort, Selection sort, Quick sort, Merge sort,
Comparison of All Sorting Methods.
Hashing: Types of Hashing, Hash table, Hash functions,
5
6
What is a Data
Data is nothing but a piece of information.
Data input, data manipulation (or data processing),and data output are
the functions of computers.
Hence all information taken as input, processed within a computer,
or provided as output to the user is nothing but data.
It can be a number, a string, or a set of many numbers and strings.
There are two types
1. Atomic Data and
2. Composite Data
⬛Atomic data is the data that we choose to consider as a single,
non-decomposable entity. For example, the integer 1234, day
⬛The opposite of atomic data is composite data. Composite data can be
broken down into subfields that have meaning. For example, a
student’s record, address.
7
8
What is a Data Type
• Data type refers to the kind of data a variable may store.
Whenever we try to implement any algorithm in some
programming language, we need variables.
9
What is Data Structure?
• A data structure is a storage that is used to store and
organize data. It is a way of arranging data on a
computer so that it can be accessed and updated
efficiently.
12
Data Abstraction:
•Definition: Separates the logical properties of data from its
physical representation.
•Purpose: Hides the details of how data is stored and
manipulated, allowing users to interact with data through a
simplified interface.
•Example: In a database, a user might interact with customer
data through a query language, without needing to know how
the data is physically stored on disk.
Procedural Abstraction:
•Definition: Separates the logic of an operation from its
implementation details.
•Purpose: Allows users to perform an operation without
needing to understand the underlying steps involved.
•Example: A user might call a function to sort a list, without 13
• An ADT is the one in which the set of operations
is defined at a formal, logical level, without being
restricted by the operational details. In other
words, in an ADT we encapsulate the data and
the operations on this data and hide them from
the user.
NON-PRIMIT
PRIMITIVE
IVE
NON-LI
LINEAR
NEAR
17
Primitive D.S
Primitive data structures define a set of primitive elements
that do not involve any other elements as its subparts —
for example, data structures defined for integers and
characters.
These are generally primary or built-in data types in
programming languages. Eg. int, char, float.
22
Partially persistent: Can access any version, but
can modify only the most recent one.
V
1
V
2
V
3
V
4
V
5
23
Fully persistent: Can access and modify any
version any number of times .
V
1
V V V
4 2 5
V V
5 3
24
Confluently persistent: fully persistent and there
is an operation that combines two or more
versions to one new version.
V
1
V V V
4 2 5
V V
5 3
25
Sequential Access and Direct Access Data Structures
26
Linear List
27
Types Of Linear Lists:
ARRAYS:
An array is used to store elements of the same type.
1 2 3 4 5 6 7 8 9 10
B A E C D
30
LINK LIST
A link list is a linear collection of elements called nodes. The
linear order is maintained by pointers. Each node is divided
into two or more parts; one data field and another pointer
NULL
START
field.
10 3352
Data 10 5689
Data 10 1012
Data 10
Data
31
STRUCTURE
• Collections of related variables (aggregates) of under one
name.
• Can contain variables of different data types
• Commonly used to define records to be stored in files
Eg. struct ADate
{
int month;
int day;
int year;
} Date;
32
Types of Non Linear Lists
TREES
A tree can be defined as finite set of data items called
nodes.
Trees are non linear type of data structures in which
data items are arranged or stored in a sorted
sequence.
It represents the hierarchical relationship between
various elements. A ROOT
There is a special data item at the top called Root of
tree. B C
D E F
33
GRAPHS
34
Program Representation
1. Algorithm
2. Pseudocode
3. Flowchart
35
ALGORITHM
36
How to develop an algorithm
1. Understand the problem.
Important
definitions
Solutions that can be solved with a series of known
actions are called Algorithmic Solutions
Employing a self-learning approach to the solution
of a problems is known as Heuristic Solutions
Examples
Algorithmic solution:
numbers
Heuristic solutions:
Key Characteristic :
❖ Language-independent:
❖ Human-readable:
❖ Focus on logic: 41
How to Write Pseudocode
✔ Always capitalize the initial word
✔ Make only one statement per line.
✔ Indent to show hierarchy, improve readability and
show nested constructs.
✔ Always end multi-line sections using any of the END
keywords (ENDIF, ENDWHILE, etc.).
✔ Keep your statements programming language
independent.
✔ Use the naming domain of the problem, not that of the
implementation. For instance: “Append the last name
to the first name” instead of “name = first+last.”
✔ Keep it simple, concise and readable.
42
Practice question
• 30 Candidates in engg. examination are required to
take 5 single subject papers. For passing the exam, a
candidate must score at least 35 in each subject and
obtain an average of at least 40.The details each
candidate i.e name,rno,m1,m2,m3,m4,m5 is stored in a
file. Write a psedo code to process a file and print the
list of successful candidates with their rollno,name and
total marks.
43
Algorithmic Analysis
• Need ?
A. To estimate resource consumption
• Time & Space (what & How much)
• Registers (low level language)
• Bandwidth (channel capacity)
• Battery power: energy consumption (mob/wireless)
44
Methodology of analysis
• 1. H/W Platform
• CPU speed
• Memory speed
• Disk speed
• 2. S/W platform
• Programming language compiler
• OS
45
Analysis Methodology
A Posteriori Analysis A priori Analysis
• Platform independent
• Platform dependent (s/w,
h/w) • B4 running algo on
machine
• After execution strategy
• Done on basis of order
• Experimentation
of magnitude
• Adv: exact values in units
• Adv: uniform, easy
of time & space
performance
• Disadv: difficult to comparison
experiment, Non
• Disadv: No exact value,
uniform, performance
apprx
comparison diff.
• Its most suitable 46
A priori Analysis
Concept: order of magnitude
• Statement
• Construct
• O.S. operation
• Method
• Program element
47
How to analyse
• X=Y+Z ….order of magnitude= 1
• For i=1 to n …..n
• X= y + z ….. 1
• For i=1 to n ….. N
For j=1 to n ….. n
X= y + z ….. F(n) = 1+n+n2
48
What do you conclude??
• Which algorithm is better?
49
• Note:
Counting iterations will be useful through out the course.
50
The algorithms are correct, but which is the best?
• Measure the running time (number of operations needed).
• Measure the amount of memory used.
• Note that the running time of the algorithms increase as the size of
the input increases.
51
Analysis of algorithms
• Speed is fun!
Running Time
Worst-case: (usually)
• T(n) = maximum time of algorithm on any input of
size n.
Average-case: (sometimes)
• T(n) = expected time of algorithm over all inputs of
size n.
• Need assumption of statistical distribution of inputs.
Best-case:
• Cheat with a slow algorithm that works fast on some
input.
ALGORITHM ANALYSIS
An algorithm requires following 2 resources:
56
Machine-Independent time
Assumptions:
• Each simple operation such as +, -, if ...etc takes exactly
one time step.
• Loops and subroutines are not considered simple
operations.
• Each memory acces takes exactly one time step.
Factors governing execution time of a program:
58
59
Framework for Algorithm analysis
Basic Idea:
•To study mathematical model of computer
•Mentally Executes algorithm on computer model and
evaluate time and not on real computer.
60
Mathematical model of computer
RAM model(Random Access Machine)
It has
Processor+ memory
•Memory is a collection of locations
•has one processor
•executes one instruction at a time
•each instruction takes "unit time“
61
Instruction set of Processor
• One step execution
• Arithmetic and logical operations allowed
A=b+c(1 step instruction)
• Jump and conditional jumps
Goto
If(a>b)then goto
• Pointer instruction
B=*c
• Array operations
A[i]=b
B=c[i]
62
• A=b + c*c-u ….3 steps in RAM model
• A[i]=b[i]+c[i]
x=b[i]
Y=c[i]
z=X+y
A[I]=Z …4STEPS
63
Example
Algo: add(a[],b[])
{
for i=1 to n
C[i]=a[i]+b[i]
end for
}
•Lets us convert in RAM model and check the no of
instruction required.
64
Algo: add(a[],b[])
{
1.i=1
2.If(i>n) goto 9
3.x=a[i]
4.y=b[i]
5.z=x+y
6.c[i]=z
7.i=i+1
8.goto 2
9.}
total :1+n+1+b.n+2n=2+3n+bn=2+n(b+3)
65
Let us analyze matrix multiplication
algorithm
Algorithm matmul(n,n)
1.for i=1 to n
2. for j=1 to n
3. c[i][j]=0
4. for k=1 to n
5. c[i][j] = c[i][j]+a[i][k]*b[k][j]
6. end
7. end
8. end
66
Analyse this in RAM model
• Order of n3
67
Ideal Solution
• Express running time as a function of the input size n (i.e., f(n)).
• Compare different functions corresponding to running times.
• Such an analysis is independent of machine time, programming style,
etc.
68
Determine frequency count
probem1
1. i=1
2. While(i<=n){
3. x=x+1;
4. i=i+1; }
Probem2
1. for (L=1;L<=n;L++)
2. for(j=1;j<=L;j++)
3. x=x+1;
69
DETERMINE FREQUENCY COUNT
1. for (i=1;i<=n;i++)
2. for (j=1;j<=i;j++)
3. for (k=1;k<=n;k++)
4. X=X+1;
probem4
1. for (i=1;i<=n;i++)
2. for (j=1;j<=i;j++)
3. for (k=1;k<=j;k++)
4. X=X+1;
70
Asymptotic notations
⚫ Using this we are classifying functions into classes.
Three notations:
• Theta notation Θ
• Big 0 notation O
• Omega notation Ω
71
O-notation
73
EXAMPLE
T(n)=10n3+5n+7 =O(n3).
Prove that if T(n) = 15n3 + n2 + 4, T(n) = O(n3).
Prove that if T(n) = 15n3 + n2 + 4, T(n) = O(n4).
74
Ω-Omega notation
77
-Theta notation
79
EXAMPLE
T(n)=10n3+5n2+17
10n3<=f1(n)<=(10+5+17)n3
C1=10 c2=32
For all n>=1 so n0=1
T(n)= Θ(n3)
80
Asymptotic notation
82
SPACE COMPLEXITY
The space complexity of a program is the memory needs to run to
completion.
The space requirement depends on
1)The fixed part that is independent of the characteristics of the input
and outputs. This part typically includes the instruction space,
space for simple variable ,space for constant.
2)The variable part: space needed for component variable whose size
is dependent on particular instance, the space needed by reference
variables, the recursion stack space.
83
Examples
Float Abc(float a,float b,float c)
{
Return a+b+b*c+(a+b-c)/(a+b);
}
Sp=0
84
Float sum(float *a,const int n)
{
Float s=0;
For(int i=0;i<n;i++)
{s=s+a[i];
Return s;
}
Sp=0
85
Float Rsum(float *a,const int n)
{
If(n=0) return 0;
Else return(Rsum(a,n-1)+a[n-1]))
}
Sp=4(n+1)
86
ALGORITHMIC STRATEGY
1.DIVIDE AND CONQUER
2.GREEDY STRATEGY
87
Divide-and-Conquer
• The strategy involves three steps at each level of
recursion.
• Divide:- Divide the problem into a number of sub
problems.
• Conquer:- Conquer the sub problems by solving them
recursively. If the sub problem sizes are small enough,
then just solve the sub problems in a straight forward
manner.
• Combine:- Combine the solutions to the sub
problems to form the solution for the original
problem.
88
Control abstraction for divide and conquer
strategy.
Algorithm DAndC (P)
{
1.if small(P) then return S(P) //termination condition
Else
{
2.Divide P into smaller instances P1, P2, P3… Pk k≥1;
3.Apply DAndC to each of these sub problems.
4.Return Combine(DAndC(P1), DAndC (P2), …DAndC (Pk)
}
}
89
computing time of DAndC
T(n) = g(n) when n is small
= T(n1)+ T(n1)+ T(n2)+…….+ T(nk)+ f(n) otherwise
90
Binary Search: DIVIDE AND CONQUER STRATEGY
91
Implementing Binary Search (Contd.)
0 1 2 3 4 5 6 7 8
arr 9 13 17 19 25 29 39 40 47
92
Implementing Binary Search (Contd.)
0 1 2 3 4 5 6 7 8
arr 9 13 17 19 25 29 39 40 47
93
Implementing Binary Search (Contd.)
0 1 2 3 4 5 6 7 8
arr 9 13 17 19 25 29 39 40 47
94
Implementing Binary Search (Contd.)
0 1 2 3 4 5 6 7 8
arr 9 13 17 19 25 29 39 40 47
95
Implementing Binary Search (Contd.)
0 1 2 3 4 5 6 7 8
arr 9 13 17 19 25 29 39 40 47
Element found
96
Implementing Binary Search (Contd.)
• Write an algorithm to implement binary search algorithm.
ALGORITHM BINARYSEARCH(A,N,Data)
{//A is an array of N elements. Data is the element to be searched
1.fori :=0 to N do
read A[i]
2.Enter the element to be searched, Data
3. low := 0
4. high := N – 1
5.while(low<=high)
{
5.1 mid := (low + high)/2
5.2 if (A[mid] = Data){
5.2.1 write “Found”;
5.2.2 exit ;}
5.3. If( Data < A[mid])
5.3.1 high := mid – 1
97
Implementing Binary Search (Contd.)
98
Binary search using recursion
Algorithm binsearch(A, Low, high,Data,N)
{//A is an array of N elements. Data is the element to be
searched.
1.fori :=0 to N do
read A[i]
2.Enter the element to be searched, Data
3. low := 0
4. high := N – 1
5.if(low=high) then{
5.1 if(Data=a[i])then return i;
else return 0;}
else
5.1 mid := (low + high)/2
5.2 if (A[mid] = Data) then return mid;
99
5.3 else if(x<a[mid] then return
binsearch(a,i,mid-1,Data,N)
else return binsearch(a,mid+1,high,data,N);
}
100
Determining the Efficiency of Binary Search
101
Just a minute
• Answer:
• linear
102
Just a minute
• Answer:
• binary
103
Greedy Strategy
104
Principal
Basic idea
•Find the optimal solution piece by piece.
•Iterative make decision one by one and hope
everything works out well at the end.
•Once the decision is made then choice should not
get changed for the subsequent steps.
105
Optimization Problems
• For most optimization problems you want to
find, not just a solution, but the best solution.
• A greedy algorithm sometimes works well for
optimization problems. It works in phases. At
each phase:
• You take the best you can get right now, without
regard for future consequences.
• You hope that by choosing a local optimum at each
step, you will end up at a global optimum.
106
Example: Counting Money
• Suppose you want to count out a certain amount of
money, using the fewest possible bills and coins
• A greedy algorithm to do this would be:
At each step, take the largest possible bill or coin that
does not overshoot
• Example: To make $6.39, you can choose:
• a $5 bill
• a $1 bill, to make $6
• a 25¢ coin, to make $6.25
• A 10¢ coin, to make $6.35
• four 1¢ coins, to make $6.39
• For US money, the greedy algorithm always gives the
optimum solution
107
General principal
• The choice that seems best at the moment is the
one we go with.
108
Greedy Algorithm Failure
• In some (fictional) monetary system, “krons” come in 1 kron, 7 kron, and 10
kron coins
• Using a greedy algorithm to count out 15 krons, you would get
• A 10 kron piece
• Five 1 kron pieces, for a total of 15 krons
• This requires six coins
• A better solution would be to use two 7 kron pieces and one 1 kron piece
• This only requires three coins
109
Analysis
• if the items are already sorted into decreasing order of vi / wi
• then the for-loop takes O(n) time
110
Recurrences and Running Time
• An equation or inequality that describes a function in
terms of its value on smaller inputs.
T(n) = T(n-1) + n
• Recurrences arise when an algorithm contains
recursive calls to itself
• What is the actual running time of the algorithm?
112
Solving recurrences
T(n) = c + T(n/2)
T(n/2) = c + T(n/4)
T(n) = c + T(n/2)
T(n/4) = c + T(n/8)
= c + c + T(n/4)
= c + c + c + T(n/8)
Assume n = 2k
T(n) = c + c + … + c + T(1)
k times
= clgn + T(1)
= 0(lgn)
113
T(n) = n + 2T(n/2)
Assume: n = 2k
T(n) = n + 2T(n/2)
= n + 2(n/2 + 2T(n/4)) T(n/2) = n/2 + 2T(n/4)
= n + n + 4T(n/4)
= n + n + 4(n/4 + 2T(n/8))
= n + n + n + 8T(n/8)
… = in + 2iT(n/2i)
= kn + 2kT(1)
= nlgn + nT(1) =0(nlgn)
114
solve
• T(n)=9T(n/3)+4n6
• T(n)=7t(n/2)+18n2
• T(n) = 2T(n/2) + n
• T(n) = 2T( sqrt(n) ) + lgn
• T(n) = 3T(n/4) + cn2
115