0% found this document useful (0 votes)

36 views115 pages

Module I

module 1

Uploaded by

dusane.pratham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views115 pages

Module I

module 1

Uploaded by

dusane.pratham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 115

LECTURE 1

DATA STRUCTURES

Pallavi Joshi
Asst. Professor,
PICT.
Bad Programmer worries about the
Code, While the good Programmers
worry about the Data Structures and
their relationship.
-Linus Torvalds

* Prof. P S Joshi, Comp Department, PICT, Pune

Module 1: Introduction to Analysis
of Algorithms

3
Course Content
Introduction of Algorithms, Analysis of Algorithms,
Complexity of algorithms- Space complexity, Time
complexity.
Data Structures: Abstract Data Types (ADT), Concept
of linear and Non-linear data structures. Use of Array
and Associative/Jagged Arrays.
Searching: Search Techniques, Sequential search,
Sentinel search, Binary search, Fibonacci search.
Sorting- Types of Sorting- Internal and External
Sorting, Stable Sorting, Sorting methods- Bubble sort,
Insertion sort, Selection sort, Quick sort, Merge sort,
Comparison of All Sorting Methods.
Hashing: Types of Hashing, Hash table, Hash functions,
5
6
What is a Data
Data is nothing but a piece of information.
Data input, data manipulation (or data processing),and data output are
the functions of computers.
Hence all information taken as input, processed within a computer,
or provided as output to the user is nothing but data.
It can be a number, a string, or a set of many numbers and strings.
There are two types
1. Atomic Data and
2. Composite Data
⬛Atomic data is the data that we choose to consider as a single,
non-decomposable entity. For example, the integer 1234, day
⬛The opposite of atomic data is composite data. Composite data can be
broken down into subﬁelds that have meaning. For example, a
student’s record, address.
7
8
What is a Data Type
• Data type refers to the kind of data a variable may store.
Whenever we try to implement any algorithm in some
programming language, we need variables.

• Data type is a term that speciﬁes the type of data that a

variable may hold in the programming language

Built-in Data Types

User-deﬁned Data Types

9
What is Data Structure?
• A data structure is a storage that is used to store and
organize data. It is a way of arranging data on a
computer so that it can be accessed and updated
eﬃciently.

• A data structure is not only used for organizing the

data. It is also used for processing, retrieving, and
storing data. There are diﬀerent basic and advanced
types of data structures that are used in almost every10
11
Abstract Data Type

Data abstraction is the separation of logical properties of

the data from details of how the data is represented.
Procedural abstraction means separation of the logical
properties of action from implementation.

Data abstraction focuses on hiding the representation of

data, while procedural abstraction focuses on hiding the
implementation details of operations

12
Data Abstraction:
•Definition: Separates the logical properties of data from its
physical representation.
•Purpose: Hides the details of how data is stored and
manipulated, allowing users to interact with data through a
simplified interface.
•Example: In a database, a user might interact with customer
data through a query language, without needing to know how
the data is physically stored on disk.
Procedural Abstraction:
•Definition: Separates the logic of an operation from its
implementation details.
•Purpose: Allows users to perform an operation without
needing to understand the underlying steps involved.
•Example: A user might call a function to sort a list, without 13
• An ADT is the one in which the set of operations
is defined at a formal, logical level, without being
restricted by the operational details. In other
words, in an ADT we encapsulate the data and
the operations on this data and hide them from
the user.

• Consider the concept of a queue. At least three

data structures will support a queue. We can use
an array, a linked list, or a file. If we place our
queue in an ADT, users should not be aware of 14
Data types and data structures
• Data types and data structures are distinct but related
concepts in programming. A data type defines the kind
of value a variable can hold (e.g., integer, character,
boolean), while a data structure organizes and stores
multiple data elements in a specific way (e.g., array,
linked list, tree) to facilitate efficient access and
modification

• From the surface, both data type and data

structure appear to be the same thing, as both deal with15
TYPE OF DATA
STRUCTURES
DATA
STUCTURE

NON-PRIMIT
PRIMITIVE
IVE

CHA FLOA DOUBL

INT LIST FILES
R T E

NON-LI
LINEAR
NEAR

STRUCTUR LINK- TREE GRAPH

1 STACK QUEUE ARRAY S
E LIST S
6
Types of Data structure

•Primitive and Non Primitive.

•linear and Non linear DS
•Static &dynamic
•Persistent and Ephemeral DS
•Sequential and Direct Access DS

17
Primitive D.S
Primitive data structures define a set of primitive elements
that do not involve any other elements as its subparts —
for example, data structures defined for integers and
characters.
These are generally primary or built-in data types in
programming languages. Eg. int, char, float.

Non Primitive D.S

The data types that are derived from primary data
structures are known as Non Primitive D.S. They are
used to store groups of values.
Eg. array, structure,class, union, link list, stack, queues 18
Linear and Non-linear Data
Structures
⚫ A data structure is said to be linear
if its elements form a sequence or a linear list.
every data element has a unique successor and predecessor.
Two basic ways of representing linear structures in memory.
One way is to have the relationship between the elements by means
of pointers (links), called linked lists.
The other way is using sequential organization, that is, arrays.

⚫ Non-linear data structures

are used to represent the data containing hierarchical or network
relationship among the elements.
Trees and graphs are examples of non-linear data structures.
every data element may have more than one predecessor as well as
successor.
19
Elements do not form any particular linear sequence.
Static and Dynamic Data Structures
A data structure is referred to as a static if it is created
before program execution begins. The variables of static
data structure have user-speciﬁed names. An array is a
static data structure.

A data structure that is created at run-time is called

dynamic data structure. The variables of this type are not
always referenced by a user-deﬁned name. These are
accessed indirectly using their addresses through
pointers.

A linked list is a dynamic data structure when realized

using dynamic memory management and pointers,
whereas an array is a static data structure. Non-linear
data structures are generally implemented in the same 20
Persistent and Ephemeral DS
• Persistent data structure is a data structure that always
preserves the previous version of itself when it is
modified.
• Such data structures are effectively immutable, as their
operations do not update the structure in-place, but
instead always yield a new updated structure
• A data structure is partially persistent if all versions can
be accessed but only the newest version can be modified
• The data structure is fully persistent if every version can
be both accessed and modified.
• An ephemeral data structure is one that supports
operations only on the most recent version. 21
Ephemeral: A modification destroys the version which
we modify. (all older versions)
Persistent: Modifications are nondestructive. Each
modification creates a new version. All version coexist.
We have a big data structure that represent all
versions

22
Partially persistent: Can access any version, but
can modify only the most recent one.
V
1
V
2

V
3

V
4
V
5

23
Fully persistent: Can access and modify any
version any number of times .
V
1

V V V
4 2 5
V V
5 3

24
Conﬂuently persistent: fully persistent and there
is an operation that combines two or more
versions to one new version.
V
1

V V V
4 2 5
V V
5 3

25
Sequential Access and Direct Access Data Structures

• This classiﬁcation is with respect to the access operations

associated with data structures.
• Sequential access means that to access the nth element, we must
access the preceding (n - 1) data elements. A linked list is a
sequential access data structure.
• Direct access means that any element can be accessed without
accessing its predecessor or successor; we can directly access
the nth element. An array is an example of a direct access data
structure.

26
Linear List

These represents a sequence of items and the elements follow a

linear ordering.

The Linear D.S has following properties:

oEach element is followed by one other

element.

oNo two elements are followed by the same

element.

27
Types Of Linear Lists:

ARRAYS:
An array is used to store elements of the same type.

The number of elements that can be stored in a array

can be calculated as-
(upperbound-lowerbound)+1
int a[10]

1 2 3 4 5 6 7 8 9 10

a[0] a[1] ----- ----------------

-------------------------------- -a[9]
28
STACKS
Collection of elements like array with a special
feature that deletion and insertion of elements can be
done only from one end, called the TOP (Top of
stack). Due to this property it is also called LIFO data
structure i.e. Last-In-First-Out data structure.
Push an
Element
3
3
2
1
29
QUEUES

Queues are First -In-First-Out (FIFO) data. In a queue

new elements are added to the queue from one end
called Rear end, and the elements are deleted from
another end called Front end. Eg. the queue in a ticket
FRON REAR
counter. T

B A E C D

30
LINK LIST
A link list is a linear collection of elements called nodes. The
linear order is maintained by pointers. Each node is divided
into two or more parts; one data ﬁeld and another pointer
NULL
START
ﬁeld.
10 3352
Data 10 5689
Data 10 1012
Data 10
Data

2403 3352 5689 1012

31
STRUCTURE
• Collections of related variables (aggregates) of under one
name.
• Can contain variables of different data types
• Commonly used to define records to be stored in files
Eg. struct ADate
{
int month;
int day;
int year;
} Date;

32
Types of Non Linear Lists

TREES
A tree can be deﬁned as ﬁnite set of data items called
nodes.
Trees are non linear type of data structures in which
data items are arranged or stored in a sorted
sequence.
It represents the hierarchical relationship between
various elements. A ROOT
There is a special data item at the top called Root of
tree. B C

D E F
33
GRAPHS

Graph is non-linear data structure capable of

representing kinds of physical structures.
A graph G(V,E) is a set of vertices V and a set of edges
E connecting vertices.

34
Program Representation
1. Algorithm
2. Pseudocode
3. Flowchart

35
ALGORITHM

An Algorithm, can be deﬁned as a ﬁnite sequence of

instructions, that if followed accomplishes a particular
task.

36
How to develop an algorithm
1. Understand the problem.

2. Identify the output of the problem.

3. Identify inputs required by the problem and choose

the associated data structure.

4. Design a logic that will produce the desired output

from the given inputs.

5. Test the algorithm for diﬀerent sets of input data.

6. Repeat steps 1 to 5 till the algorithm produces the 37

Characteristics of an algorithm
1. INPUT: This part of the algorithm reads the data of
the given problem.

2. DEFINITENESS: Each instruction is clear and

unambiguous .for eg add 6 or 7 to x.

3. FINITENESS: The algorithm must come to an end

after a ﬁnite number of steps.

4. EFFECTIVENESS:. It should also be basic enough to

be carried out with pencil and paper. Each
instruction should be deﬁnite and feasible.
38
Approaches to solve a problem:
▪Algorithmic
▪Heuristic

Important
deﬁnitions
Solutions that can be solved with a series of known
actions are called Algorithmic Solutions
Employing a self-learning approach to the solution
of a problems is known as Heuristic Solutions
Examples
Algorithmic solution:

⬛ To make a cup of coﬀee

⬛ To ﬁnd largest of three

numbers

Heuristic solutions:

⬛ how to buy the best stock?

Pseudo code
• Pseudocode is a way to describe the steps of an
algorithm using a combination of natural language and
programming language conventions, without the strict
syntax of any speciﬁc programming language. It's
essentially a human-readable outline of a program's
logic, making it easier to understand and communicate
the core functionality of a piece of code.

Key Characteristic :
❖ Language-independent:
❖ Human-readable:
❖ Focus on logic: 41
How to Write Pseudocode
✔ Always capitalize the initial word
✔ Make only one statement per line.
✔ Indent to show hierarchy, improve readability and
show nested constructs.
✔ Always end multi-line sections using any of the END
keywords (ENDIF, ENDWHILE, etc.).
✔ Keep your statements programming language
independent.
✔ Use the naming domain of the problem, not that of the
implementation. For instance: “Append the last name
to the first name” instead of “name = first+last.”
✔ Keep it simple, concise and readable.
42
Practice question
• 30 Candidates in engg. examination are required to
take 5 single subject papers. For passing the exam, a
candidate must score at least 35 in each subject and
obtain an average of at least 40.The details each
candidate i.e name,rno,m1,m2,m3,m4,m5 is stored in a
file. Write a psedo code to process a file and print the
list of successful candidates with their rollno,name and
total marks.

43
Algorithmic Analysis
• Need ?
A. To estimate resource consumption
• Time & Space (what & How much)
• Registers (low level language)
• Bandwidth (channel capacity)
• Battery power: energy consumption (mob/wireless)

B. Performance comparison of diﬀ algo for given

problem

44
Methodology of analysis

• 1. H/W Platform
• CPU speed
• Memory speed
• Disk speed

• 2. S/W platform
• Programming language compiler
• OS

45
Analysis Methodology
A Posteriori Analysis A priori Analysis
• Platform independent
• Platform dependent (s/w,
h/w) • B4 running algo on
machine
• After execution strategy
• Done on basis of order
• Experimentation
of magnitude
• Adv: exact values in units
• Adv: uniform, easy
of time & space
performance
• Disadv: diﬃcult to comparison
experiment, Non
• Disadv: No exact value,
uniform, performance
apprx
comparison diﬀ.
• Its most suitable 46
A priori Analysis
Concept: order of magnitude
• Statement
• Construct
• O.S. operation
• Method
• Program element

Order of magnitude refers to:

• # times of fundamental operations involved

47
How to analyse
• X=Y+Z ….order of magnitude= 1
• For i=1 to n …..n
• X= y + z ….. 1
• For i=1 to n ….. N
For j=1 to n ….. n
X= y + z ….. F(n) = 1+n+n2

• Objective: of a priori algo is to generate a function of

quadratic polynomial

48
What do you conclude??
• Which algorithm is better?

49
• Note:
Counting iterations will be useful through out the course.

50
The algorithms are correct, but which is the best?
• Measure the running time (number of operations needed).
• Measure the amount of memory used.
• Note that the running time of the algorithms increase as the size of
the input increases.

51
Analysis of algorithms

The theoretical study of computer-program

performance and resource usage.

What’s more important than performance?

• modularity
• correctness
• maintainability
• functionality
• robustness
• user-friendliness
• programmer time
• simplicity
• extensibility
• reliability
Analysis of algorithms
Why study algorithms and performance?

• Algorithms help us to understand scalability.

• Performance often draws the line between what is feasible

and what is impossible.

• Algorithmic mathematics provides a language for talking

about program behavior.

• The lessons of program performance generalize to other

computing resources.

• Speed is fun!
Running Time

• Therunning time depends on the input: an already

sorted sequence is easier to sort.

• Parameterize the running time by the size of the

input, since short sequences are easier to sort than
long ones.

• Generally, we seek upper bounds on the running

time, because everybody likes a guarantee.
Kinds of analyses

Worst-case: (usually)
• T(n) = maximum time of algorithm on any input of
size n.

Average-case: (sometimes)
• T(n) = expected time of algorithm over all inputs of
size n.
• Need assumption of statistical distribution of inputs.

Best-case:
• Cheat with a slow algorithm that works fast on some
input.
ALGORITHM ANALYSIS
An algorithm requires following 2 resources:

1.Memory Space: Space occupied by program

code and the associated data structures.

2.CPU Time: Time spent by the algorithm to

solve the problem.

Thus an Algorithm can be analyzed on two

accounts space and time.

56
Machine-Independent time

The RAM Model

Machine independent algorithm design depends on a

hypothetical computer called Random Acces Machine (RAM).

Assumptions:
• Each simple operation such as +, -, if ...etc takes exactly
one time step.
• Loops and subroutines are not considered simple
operations.
• Each memory acces takes exactly one time step.
Factors governing execution time of a program:

1. The speed of the computer system.

2. The structure of the program.
3. Quality of the compiler that has compiled the
program.
4. The current load on the computer system.

It is thus better we compare the

algorithm based on their relative time
instead of actual execution time.

58
59
Framework for Algorithm analysis
Basic Idea:
•To study mathematical model of computer
•Mentally Executes algorithm on computer model and
evaluate time and not on real computer.

60
Mathematical model of computer
RAM model(Random Access Machine)
It has
Processor+ memory
•Memory is a collection of locations
•has one processor
•executes one instruction at a time
•each instruction takes "unit time“

61
Instruction set of Processor
• One step execution
• Arithmetic and logical operations allowed
A=b+c(1 step instruction)
• Jump and conditional jumps
Goto
If(a>b)then goto
• Pointer instruction
B=*c
• Array operations
A[i]=b
B=c[i]

62
• A=b + c*c-u ….3 steps in RAM model
• A[i]=b[i]+c[i]
x=b[i]
Y=c[i]
z=X+y
A[I]=Z …4STEPS

63
Example
Algo: add(a[],b[])
{
for i=1 to n
C[i]=a[i]+b[i]
end for
}
•Lets us convert in RAM model and check the no of
instruction required.

64
Algo: add(a[],b[])
{
1.i=1
2.If(i>n) goto 9
3.x=a[i]
4.y=b[i]
5.z=x+y
6.c[i]=z
7.i=i+1
8.goto 2
9.}
total :1+n+1+b.n+2n=2+3n+bn=2+n(b+3)

65
Let us analyze matrix multiplication
algorithm
Algorithm matmul(n,n)
1.for i=1 to n
2. for j=1 to n
3. c[i][j]=0
4. for k=1 to n
5. c[i][j] = c[i][j]+a[i][k]*b[k][j]
6. end
7. end
8. end

66
Analyse this in RAM model
• Order of n3

67
Ideal Solution
• Express running time as a function of the input size n (i.e., f(n)).
• Compare different functions corresponding to running times.
• Such an analysis is independent of machine time, programming style,
etc.

68
Determine frequency count
probem1
1. i=1
2. While(i<=n){
3. x=x+1;
4. i=i+1; }

Probem2
1. for (L=1;L<=n;L++)
2. for(j=1;j<=L;j++)
3. x=x+1;

69
DETERMINE FREQUENCY COUNT
1. for (i=1;i<=n;i++)
2. for (j=1;j<=i;j++)
3. for (k=1;k<=n;k++)
4. X=X+1;
probem4
1. for (i=1;i<=n;i++)
2. for (j=1;j<=i;j++)
3. for (k=1;k<=j;k++)
4. X=X+1;
70
Asymptotic notations
⚫ Using this we are classifying functions into classes.
Three notations:
• Theta notation Θ
• Big 0 notation O
• Omega notation Ω

71
O-notation

• For a given function , we denote by the set

of functions

• We use O-notation to give an asymptotic upper bound of

a function, to within a constant factor.
• means that there existes some constant c
s.t. is always for large enough n.
Asymptotic notations

73
EXAMPLE
T(n)=10n3+5n+7 =O(n3).
Prove that if T(n) = 15n3 + n2 + 4, T(n) = O(n3).
Prove that if T(n) = 15n3 + n2 + 4, T(n) = O(n4).

74
Ω-Omega notation

• For a given function , we denote by the

set of functions

• We use Ω-notation to give an asymptotic lower bound on

a function, to within a constant factor.
• means that there exists some constant c s.t.
is always for large enough n.
76
• Prove that if T(n) = 15n3 + n2 + 4, T(n) = Ω (n3)

77
-Theta notation

• For a given function , we denote by the set

of functions

• A function belongs to the set if there exist

positive constants and such that it can be “sand-
wiched” between and or sufficienly large n.
• means that there exists some constant c1
and c2 s.t. for large enough n.
j

79
EXAMPLE

T(n)=10n3+5n2+17
10n3<=f1(n)<=(10+5+17)n3
C1=10 c2=32
For all n>=1 so n0=1
T(n)= Θ(n3)

Solve:T(n) = n3 + n log n =Θ(n3)

80
Asymptotic notation

Graphic examples of and .

Asymptotic Notation Properties

82
SPACE COMPLEXITY
The space complexity of a program is the memory needs to run to
completion.
The space requirement depends on
1)The fixed part that is independent of the characteristics of the input
and outputs. This part typically includes the instruction space,
space for simple variable ,space for constant.
2)The variable part: space needed for component variable whose size
is dependent on particular instance, the space needed by reference
variables, the recursion stack space.

S(P)=c + Sp(c is constant)

To analyze space complexity we will concentrate on Sp

83
Examples
Float Abc(float a,float b,float c)
{
Return a+b+b*c+(a+b-c)/(a+b);
}
Sp=0

84
Float sum(float *a,const int n)
{
Float s=0;
For(int i=0;i<n;i++)
{s=s+a[i];
Return s;
}
Sp=0

85
Float Rsum(float *a,const int n)
{
If(n=0) return 0;
Else return(Rsum(a,n-1)+a[n-1]))
}
Sp=4(n+1)

86
ALGORITHMIC STRATEGY
1.DIVIDE AND CONQUER
2.GREEDY STRATEGY

87
Divide-and-Conquer
• The strategy involves three steps at each level of
recursion.
• Divide:- Divide the problem into a number of sub
problems.
• Conquer:- Conquer the sub problems by solving them
recursively. If the sub problem sizes are small enough,
then just solve the sub problems in a straight forward
manner.
• Combine:- Combine the solutions to the sub
problems to form the solution for the original
problem.
88
Control abstraction for divide and conquer
strategy.
Algorithm DAndC (P)
{
1.if small(P) then return S(P) //termination condition
Else
{
2.Divide P into smaller instances P1, P2, P3… Pk k≥1;
3.Apply DAndC to each of these sub problems.
4.Return Combine(DAndC(P1), DAndC (P2), …DAndC (Pk)

}
}
89
computing time of DAndC
T(n) = g(n) when n is small
= T(n1)+ T(n1)+ T(n2)+…….+ T(nk)+ f(n) otherwise

T(n) denotes the time for DAndC on any input of size

‘n’.
g(n) is the time to compute the answer directly for
small inputs.
f(n) is the time for dividing ‘P’ and combining the
solutions of sub problems.

90
Binary Search: DIVIDE AND CONQUER STRATEGY

• Binary search algorithm:

• Is used for searching large lists
• Searches the element in very few comparisons
• Can be used only if the list to be searched is sorted

91
Implementing Binary Search (Contd.)