DATA AND FILE
STRUCTURES
CHAPTER 1:
INTRODUCTION TO
DATA STRUCTURES
(1)DATA MANAGEMENT
CONCEPTS
Till now we have learnt the basics of programming in C and know how to
write,debug and run simple programs in C Language.
Our Aim has been to design good programs where a good program is
defined as follow:
runs correctly
is easy to read and understand
is easy to debug
is easy to modify
A program is said to be efficient when it executes in minimum time and
with minimum memory space.
In order to write efficient programs we need to apply certain data
management concepts.
The concept of data management is a complex task that includes activities
like data collection,organization of data into appropriate structures and
developing and maintaining routines for quality.
DATA MANAGEMENT
CONCEPTS(cont.)
Data structure is a part of data management.
A data structure is basically a group of data
elements that are put together under one name , to
define a particular way of storing and organizing
data in a computer so that it can be used
efficiently.
Data structure are used in almost every program or
software.
Some example of data structures are:
Arrays
linked lists
queues
stacks
binary trees
hash tables
DATA MANAGEMENT
CONCEPTS(cont.)
Data structures are applied in areas like:
compiler design
operating system
DBMS
Graphics
Numerical Analysis
Artificial Intelligence
For Example:
In DBMS Following data structure are
used
(1) In RDBMS data structure used is arrays
(2) Network data model is graphs
(3)Hierarchical data model is trees
DATA MANAGEMENT
CONCEPTS(cont.)
When selecting a data structure to solve a problem
, the following steps must required.
(1)Analysis of the problem to determine
the basic operation that must be supported.
for example basic operation may include
inserting/deleting/searching a data item from
data structure.
(2)Quantify the resource for each
operation
(3) Select the data structure that best
meets these requirements.
(2)DATA TYPES(PRIMITIVE AND NON
PRIMITIVE)
Each programming language provides various data types.
The data types provided by a programming language are
known as primitive data types or in-built data types.
For example int,char,float are primitive data types in
C.
The data types that are derived from primitive data types
are known as non-primitive (derived) data types.
Example of Non-Primitive data types are
(Arrays,functions,pointers) in C.
In addition to primitive and non-primitive data types ,
programming language allows to defing new data types is
called user-defined data types.
For example structure,unions,enum are user defined
data types in C.
DATA TYPES(cont.)
(3)TYPES OF DATA
STRUCTURES
(1)ARRAYS
An array is a collection of similar data items(elements).
The elements of the array are stored in consecutive memory
locations and referenced by an index.
SYNTAX:
data type name[size] ;
EXAMPLE: int marks [8] ;
The above statement declares an array marks that contains 8
elements.In C , the array index starts from 0. the first element
will be stored in marks[0],second in marks[2], and soon. The 8h
element will be stored in marks[7].
In memory ,array will be stored as shown in
st
nd
rd
th
th
th
th
th
1 ele. 2
below
Marks[
0]
ele. 3
fig: ele.
4
ele.
5
ele.
6
ele.
7
ele.
8
ele.
Marks[1 Marks[ Marks[ Marks[ Marks[ Marks[ Marks[
]
2]
3]
4]
5]
6]
7]
TYPES OF DATA
STRUCTURES(cont.)
(1)ARRAYS(cont.)
Array have following limitation:
(1) Array are of fixed size
(2) Data elements are stored in continuous
memory location which may not be always
available.
(3)Adding and Removing of elements is
problematic because of shifting the elements
from their positions.
However ,these limitation can be solved by using
linked lists.
TYPES OF DATA
STRUCTURES(cont.)
(2)Linked Lists
Linked list is a very flexible ,dynamic data structure in which the
elements can be added to or deleted from anywhere.
In linked list, each element(is called node) is allocated space as it is
added to the list.
Every node in the list points to the next node in the list.
Therefore ,in linked list ,every node contains two types of information:
(1)The value of the node
(2)A Pointer or Link to the next node in the list.
The last node in the list contains a NULL pointer to indicate that it is the
end or tail of list.
The memory for a node is dynamically allocated when it is added to the
list.
The total number of nodes that may be added to the list is limited only by
the amount of memory available.
Fig show linked list 2
of 4 nodes
X
Advantage : provide quick insert and delete operation
Disadvantage: slow search operation and require more memory space.
TYPES OF DATA
STRUCTURES(cont.)
(3)STACKS
In the computers memory stack can be represented
as a linear array.
Every stack has a variable TOP associated with it.
TOP is used to store the address of the top most
element of the stack. it is this position from where
the element will be added or deleted.
There is another variable MAX , which will be used
to store the maximum number of elements that the
stack can store.
AB
ABC
ABCD that stack is empty
IfA TOP=NULL
then itABCD
indicates
E
and if TOP=MAX , then stack is full.
0
1
2
3
TOP=4 5
6
7
STACK REPRESENTION:
TYPES OF DATA
STRUCTURES(cont.)
(3)STACKS(cont.)
In fig. TOP=4 ,so insertion and deletion will be done at this
position.
Here ,the stack can store maximum of 8 elements where the
rang from 0-7.In above stack 3 more elements can still stored.
A stack has basic 3 operations:
(1)push(this operation adds elements to the top of the
stack)
(2)pop(this operation removes the element from top
of the stack)
(3)peep(returns the value of topmost element of the
stack)
Advantage: last in, fist out(LIFO) access.
Disadvantage: slow access to other elements.
TYPES OF DATA
STRUCTURES(cont.)
(4)QUEUE
A queue is FIFO(first-in first-out) data structure in which the elements
that was inserted first is the first to be taken out.
The element in a queue are added at one end called the rear and
removed from the other end called the front.
Consider queue given in below fig.(QUEUE)
1
2
18
14
36
Here ,front=0 and rear=5.if we want to add another element with the
value 45,then rear would be incremented by 1 and rear point the value
45.
See fig.(QUEUE AFTER INSERTION OF NEW ELEMENT)
12
18
14
36
45
Here front=0 and rear =6.everytime new element has to be added, we will re
TYPES OF DATA
STRUCTURES(cont.)
(4)QUEUE(cont.)
Now ,if we want to delete an element from the queue , then the value of
the front will be incremented.
Deletions are done only from this end of the queue.
See fig.(QUEUE AFTER DELETION OF AN ELEMENT)
18
14
36
45
front
rear
Advantage: Provide FIFO data access.
Disadvantage: Slow access to other items.
TYPES OF DATA
STRUCTURES(cont.)
(5)TREES
A binary tree is a data structure which is defined as a collection of
elements called the nodes.
Every node contains a left pointer,a right pointer,and a data
element.
Every binary tree has a root element pointed by a root pointer.
The root element is the topmost element in tree.if root=NULL ,then tree
is empty.`
TYPES OF DATA
STRUCTURES(cont.)
(5)TREES(cont.)
Figure 1.7 shows a binary tree. If the root node R is
not NULL, then the two trees T 1 and T 2 are called
theleft and right sub trees Of R. If r. is non-empty,
then Tl is said to be the left successor of R. Likewise,
if T 2 is non-empty, then it is called the right
successor of R.
In Fig. 1.7, node 2 is the left successor and node 3 is
the right successor of the root node 1. Note that the
left subtree of the root node consists of the nodes,
2,4, 5, 8, and 9. Similarly, the right subtree of the
root node consists ofthe nodes, 3, 6, 7, 10, 11, and
12.
TYPES OF DATA
(6)GRAPHS
STRUCTURES(cont.)
A graph is an abstract data structure that is
used to implement the graph concept from
mathematics.
It is basically a collection of vertices (also called
nodes) and edges that connect these vertices.
A graph is often viewed as a generalization of
the tree structure, where instead of a having a
purely parent-to-child relationship between tree
nodes, any kind of complex relationships
between the nodes can be represented.
In a tree structure, the nodes can have
manychildren but only one parent, a graph on
the otherhand relaxes all such kinds of
restrictions. Figure 1.8 shows a graph with five
nodes.
TYPES OF DATA
STRUCTURES(cont.)
(6)GRAPHS(cont.)
Every node in the graph may represent a city and
the edges connecting the nodes could represent
the roads.
that unlike trees, graphs do not have any root
node. Rather, every node in the graph can be
connected with another node in the graph. When
two nodes are connected via an edge, the two
nodes are known as neighbors. For example, in
Fig. 1.8, node A has two neighbors: Band D.
(4)LINEAR & NON LINEARDATA
STRUCTURE
If the elements of a data structure are stored
sequentially then it is a linear data structure. In
linear data structures, we can traverse either
forward or backward from a node. Examples
include arrays, stacks, queues, and linked
lists.
However, if the elements of a data structure are
not stored in a sequential order, then it is a
nonlinear data structure.
It branches to more than one node and cannot
be traversed in a single run.
Examples include trees and graphs.
(5)PERFORMANCE ANALYSIS AND
MEASURMENT(TIME AND SPACE ANALYSIS
OF ALGORITHMS)
To analyse an algorithm means determining the
amount of resources (such as time and storage)
needed to execute it.
The time complexity of an algorithm is basically
the running time of a program.
the space complexity of an algorithm is the
amount of computer memory that is required
during the program execution.
From time complexity of algorithm, we can
define Worst-case running time, Average-case
running time, Best-case running time of
algorithm.
(5)PERFORMANCE ANALYSIS AND
MEASURMENT(TIME AND SPACE ANALYSIS OF
ALGORITHMS) (Cont.)
Time-Space Trade-of
The best algorithm to solve a particular problem at hand
is, no doubt, the one that requires less memory space
and takes less time to complete its execution.
But practically, designing such an ideal algorithm is not a
easy task.
There can be more than one algorithm to solve a
particular problem. One may require less memory space,
while the other may require less CPU time to execute.
there exists a time space trade-of among algorithms.
So, if space is a big constraint, then one might choose a
program that takes less space at the cost of more CPU
time. On the contrary, if time is a major constraint, then
one might choose a program that takes minimum time to
execute at the cost of more space.
(6) BIG-OH NOTATION
In today's era of massive advancement in computer technology, we are
hardly concerned about the efficiency of algorithms.
If we have two diferent algorithms to solve the same problem where one algorithm
executes in 10 iterations and the other in 20 iterations, the diference between the two
algorithms is not much. However, if the first algorithm executes in 10 iterations and
the other in 1000 iterations, then it is a matter of concern.
We have seen that the number of statements executed in the function for n elements
of the data is a function of the number of elements, expressed as f (n ).
Even if the equation derived for a function may be complex, a dominant factor in the
equation is sufficient to determine the efficiency of the algorithm.
This factor is the Big- Oh, as in on the order of, and is expressed as
0 (n ).
The Big-Oh notation, where the 0 stands for 'order of', is concerned
with what happens for very large values of n.
For example, if a sorting algorithm performs n2 operations to sort
just n elements, then that algorithm would be described as an 0
(n2) algorithm.
When expressing complexity using the Big-Oh notation, constant
multipliers are ignored. So, an 0 (4n) algorithm is equivalent to 0
(n), which is how it should be written.
If f (n) and g (n) are the functions defined on a positive integer
number n, then f(n) = O(g(n)).
(6) BIG-OH NOTATION(Cont.)
That is, f of n is big-oh of g of n.
constant values will be ignored because the main
purpose of the Big-Oh notation is to analyse the
algorithm in a general fashion.
(6) BIG-OH NOTATION(Cont.)
Categories of Algorithms
According to the Big-Oh notation, we have five diferent categories of
algorithms:
Constant time algorithm; running time complexity given as 0 (1)
Linear time algorithm; running time complexity given as 0 (n)
Logarithmic time algorithm; running time complexity given as 0 (log
n)
Polynomial time algorithm; running time complexity given as 0 (nk)
where k > 1
Exponential time algorithm; running time complexity given as 0
( 2n)
(6) BIG-OH NOTATION(Cont.)
Hence, the Big-Oh notation provides a convenient way to
express the worst-case scenario for a given algorithm,
although it can also be used to express the average case.
The Big-Oh notation is derived from f (n) using the following
steps:
Step 1: Set the coefficient of each term to 1.
Step 2: Rank each term starting from the lowest to the highest.
Then, keep the largest term in the function and discard the
rest
Example: Calculate the Big-Oh notation for the
function.
f(n) = n (n + 1)/2
The function can be expanded as n2/2 + n/2
Step 1: Set the coefficient of each term to 1, so now we have
n2 + n.
Step 2: Keep the largest term and discard the rest, so discard n
and the Big-Oh notation can be given as O(F(n) = O(n2).
Limitations of Big-Oh Notation
(6)areBIG-OH
NOTATION(Cont.)
There
certain limitations
with the Big-Oh
notation of expressing the complexity of
algorithms.
These limitations include:
Many algorithms are simply too hard to analyse
mathematically.
There may not be sufficient information to calculate the
behaviour of the algorithm in the average case.
Big-Oh analysis only tells us how the algorithm grows with
the size of the problem, not how efficient it is, as it does not
consider the programming efort.
It ignores important constants. For example, if one algorithm
takes 0 (n2) time to execute and the other takes
O( 100000n2) time to execute, then as per Big-Oh, both
algorithm have equal time complexity. In real-time systems,
this may be a serious consideration.