UNIT-1 Notes
UNIT-1 Notes
NOTION OF AN ALGORITHM
Problem to be solved
Algorithm
It is a step by step procedure with the input to solve the problem in a finite amount of time
to obtain the required output.
Characteristics of an algorithm:
Input: Zero / more quantities are externally supplied.
Output: At least one quantity is produced.
Definiteness: Each instruction is clear and unambiguous.
Finiteness: If the instructions of an algorithm is traced then for all cases the algorithm must
terminates after a finite number of steps.
Efficiency: Every instruction must be very basic and runs in short time.
Example:
The greatest common divisor(GCD) of two nonnegative integers m and n (not-both-zero),
denoted gcd(m, n), is defined as the largest integer that divides both m and n evenly, i.e., with a
remainder of zero.
Euclid’s algorithm is based on applying repeatedly the equality gcd(m, n) = gcd(n, m mod n),
where m mod n is the remainder of the division of m by n, until m mod n is equal to 0. Since gcd(m,
0) = m, the last value of m is also the greatest common divisor of the initial m and n.
gcd(60, 24) can be computed as follows:gcd(60, 24) = gcd(24, 12) = gcd(12, 0) = 12.
a. Natural Language
It is very simple and easy to specify an algorithm using natural language. But many times
specification of algorithm by using natural language is not clear and thereby we get brief
specification.
Example: An algorithm to perform addition of two numbers.
Such a specification creates difficulty while actually implementing it. Hence many programmers
Step 1: Readofthe
prefer to have specification first number,
algorithm say of Pseudocode.
by means
a. Step 2: Read the first number,
b. Pseudocodesay b.
PseudocodeStepis3:aAdd
mixture of a natural
the above language
two numbers and
and programming
store the result inlanguage constructs.
Pseudocode is usually more precise than natural language.
For Assignment operation left arrow “←”, for comments two slashes “//”,if condition, for,
while loops are used.
Condition / Decision
Display the value of c
Flow connectivity
Stop
Stop Stop state
FIGURE 1.4 Flowchart symbols and Example for two integer addition.
Once an algorithm has been specified then its correctness must be proved.
An algorithm must yields a required result for every legitimate input in a finite amount of
time.
For example, the correctness of Euclid’s algorithm for computing the greatest common
divisor stems from the correctness of the equality gcd(m, n) = gcd(n, m mod n).
A common technique for proving correctness is to use mathematical induction because an
algorithm’s iterations provide a natural sequence of steps needed for such proofs.
The notion of correctness for approximation algorithms is less straightforward than it is for
exact algorithms. The error produced by the algorithm should not exceed a predefined
limit.
(ii) Searching
The searching problem deals with finding a given value, called a search key, in a given set.
E.g., Ordinary Linear search and fast binary search.
A data structure can be defined as a particular scheme of organizing related data items. The nature of
the data items is dictated by a problem at hand; they can range from elementary data types (e.g.,
integers or characters) to data structures (e.g., a one-dimensional array of one-dimensional arrays is
often used for implementing matrices).
Each and every element of an array can be accessed in the same constant amount of time regardless of
where in the array the element in question is located. This feature positively distinguishes arrays from
linked lists . It is also assumed that every element of an array occupies the same amount of computer
storage.
Arrays are used for implementing a variety of other data structures. Prominent among them is the
string, a sequence of characters from an alphabet terminated by a special character indicating the
string's end. Strings composed of zeros and ones are called binary strings or bit strings. Operations
we usually perform on strings differ from those we typically perform on other. They include
computing the string length, comparing two strings to determine which one precedes the other
according to the so-called lexicographic order, i.e., in a dictionary, and concatenating two strings .
A linked list is a sequence of zero or more elements called nodes each containing two kinds of
information: some data and one or more links called pointers to other nodes of the linked list. (A
special pointer called "null" is used to indicate the absence of a node's successor.) In a singly linked
list, each node except the last one contains a single pointer to the next element.
To access a particular node of a linked list, we start with the list's first node and traverse the pointer
chain until the particular node is reached. Thus, the time needed to access an element of a singly
linked list, unlike that of an array, depends on where in the list the element is located. On the positive
side, linked lists do not require any preliminary reservation of the computer memory, and insertions
and deletions can be made quite efficiently in a linked list by reconnecting a few appropriate pointers.
Another extension is the structure called the doubly linked list, in which every node, except the
first and the last, contains pointers to both its successor and its predecessor.
Two special types of lists, stacks and queues, are particularly important. A stack is a list in which
insertions and deletions can be done only at the end. This end is called the top because a stack is
usually visualized not horizontally but vertically. As a result, when elements are added to (pushed
onto) a stack and deleted from (popped off) it, the structure operates in the "last-in-first-out" (LIFO)
fashion, exactly as the stack of plates does if we can remove only the top plate or add another plate to
top of the stack. Stacks have a multitude of applications; in particular, they are indispensable for
implementing recursive algorithms.
A queue, on the other hand, is a list from which elements are deleted from one end of the structure,
called the front (this operation is called dequeue), and new elements are added to the other end,
called the rear (this operation is called enqueue). Consequently, a queue operates in the "first -in-
first-out" (FIFO) fashion. Queues also have many important applications, including several
algorithms for graph problems.
The principal operations on a priority queue are finding its largest element, deleting its largest
element, and adding a new element. Of course, a priority queue must be implemented so that the last
two operations yield another priority queue. Straightforward implementations of this data structure
can be based on either an array or a sorted array, but neither of these options yields the most efficient
solution possible. A better implementation of a priority queue is based on an ingenious data structure
called the heap.
Graphs
A graph is a collection of points in the plane called "vertices" or "nodes," some of them connected
by line segments called "edges" or "arcs." Formally, a graph G = {V, E) is defined by a pair of two
sets: a finite set V of items called vertices and a set E of pairs of these items called edges. lf these
pairs of vertices are unordered, i.e., a pair of vertices (u, v) is the same as the pair (v, u), we say that
the vertices u and v are adjacent to each other and that they are connected by the undirected edge (u,
v). We call the vertices u and v endpoints of the edge (u, v) and say that u and v are incident to this
edge; we also say that the edge (u, v) is incedent to its endpoints u and v. A graph G is called
undirected if every edge in it is undirected.
If a pair of vertices (u, v) is not the same as the pair (v, u), we say that the edge (u, v) is directed from
the vertex u, called the edge's tail, to the vertex v, called the edge's head. We also say that the edge (u,
v) leaves u and enters v. A graph whose every edge is directed is called directed. Directed graphs are
also called digraphs.
It is normally convenient to label vertices of a graph or a digraph with letters, integer numbers, or, if
an application calls for it, character strings.
Graph representations
Graphs for computer algorithms can be represented in two principal ways: the adjacency matrix and
adjacency lists. The adjacency matrix of a graph with n vertices is an n-by-n boolean matrix with one
row and one column for each of the graph's vertices, in which the element in the ith row and the jth
column is equal to 1 if there is an edge from the ith vertex to the jth vertex, and equal to 0 if there is no
such edge.
The adjacency lists of a graph or a digraph is a collection of linked lists, one for each vertex, that
contain all the vertices adjacent to the list's vertex (i.e., all the vertices connected to it by an edge).
Usually, such lists start with a header identifying a vertex for which the list is compiled.
Weighted graphs
A weighted graph (or weighted digraph) is a graph (or digraph) with numbers assigned to its edges.
These numbers are called weights or costs. An interest in such graphs is motivated by numerous real-
life applications, such as finding the shortest path between two points in a transportation or
communication network or the traveling salesman problem mentioned earlier.
If a weighted graph is represented by its adjacency matrix, then its element A[i, j] will simply
contain the weight of the edge from the ith to the jth vertex if there is such an edge and a special
symbol, e.g., ∞, if there is no such edge. Such a matrix is called the weight matrix or cost matrix.
Adjacency lists for a weighted graph have to include in their nodes not only the name of an adjacent
vertex but also the weight of the corresponding edge.
A cycle is a path of a positive length that starts and ends at the same vertex and does not traverse the
same edge more than once.
Analysis Framework
There are two kinds of efficiencies to analyze the efficiency of any algorithm. They are:
Time efficiency, indicating how fast the algorithm runs, and
Space efficiency, indicating how much extra memory it uses.
The algorithm analysis framework consists of the following:
Measuring an Input’s Size
Units for Measuring Running Time
Orders of Growth
Worst-Case, Best-Case, and Average-Case Efficiencies
TABLE 1.1 Values (approximate) of several functions important for analysis of algorithms
n √𝑛 log2n n n log2n n2 n3 2n n!
1 1 0 1 0 1 1 2 1
2 1.4 1 2 2 4 4 4 2
4 2 2 4 8 16 64 16 24
8 2.8 3 8 2.4•101 64 5.1•102 2.6•102 4.0•104
10 3.2 3.3 10 3.3•101 102 103 103 3.6•106
16 4 4 16 6.4•101 2.6•102 4.1•103 6.5•104 2.1•1013
102 10 6.6 102 6.6•102 104 106 1.3•1030 9.3•10157
103 31 10 103 1.0•104 106 109
104 102 13 104 1.3•105 108 Very big
1012
105 3.2•102 17 105 1.7•106 10 computation
10 1015
106 103 20 106 2.0•107 1012 1018
In the worst case, there is no matching of elements or the first matching element can found
at last on the list. In the best case, there is matching of elements at first on the list.
Worst-case efficiency
The worst-case efficiency of an algorithm is its efficiency for the worst case input of size n.
The algorithm runs the longest among all possible inputs of that size.
For the input of size n, the running time is Cworst(n) = n.
Yet another type of efficiency is called amortized efficiency. It applies not to a single run of
an algorithm but rather to a sequence of operations performed on the same data structure.
Asymptotic notation is a notation, which is used to take meaningful statement about the
efficiency of a program.
The efficiency analysis framework concentrates on the order of growth of an algorithm’s
basic operation count as the principal indicator of the algorithm’s efficiency.
To compare and rank such orders of growth, computer scientists use three notations, they
are:
O - Big oh notation
Ω - Big omega notation
Θ - Big theta notation
Let t(n) and g(n) can be any nonnegative functions defined on the set of natural numbers.
The algorithm’s running time t(n) usually indicated by its basic operation count C(n), and g(n),
some simple function to compare with the count.
Example 1:
above by some constant multiple of g(n) for all large n, i.e., if there exist some positive constant c
Where t(n) and g(n) are nonnegative functions defined on the set of natural numbers.
O = Asymptotic upper bound = Useful for worst case analysis = Loose bound
FIGURE 1.5 Big-oh notation: 𝑡 (𝑛) ∈ ((𝑛)).
A function t(n) is said to be in Ω(g(n)), denoted t(n) ∈ Ω(g(n)), if t(n) is bounded below by
(ii) Ω - Big omega notation
some positive constant multiple of g(n) for all large n, i.e., if there exist some positive constant c
and some nonnegative integer n0 such that
t (n) ≥ cg(n) for all n ≥ n0.
Where t(n) and g(n) are nonnegative functions defined on the set of natural numbers.
Ω = Asymptotic lower bound = Useful for best case analysis = Loose bound
FIGURE 1.6 Big-omega notation: t (n) ∈ Ω (g(n)).
A function t(n) is said to be in Θ(g(n)), denoted t(n) ∈ Θ(g(n)), if t(n) is bounded both
(iii) Θ - Big theta notation
above and below by some positive constant multiples of g(n) for all large n, i.e., if there exist some
positive constants c1 and c2 and some nonnegative integer n0 such that
c2g(n) ≤ t (n) ≤ c1g(n) for all n ≥ n0.
Where t(n) and g(n) are nonnegative functions defined on the set of natural numbers.
Θ = Asymptotic tight bound = Useful for average case analysis
2
Example 5:
𝑛 (𝑛 − 1) = 1 𝑛2 − 1 𝑛 ≤ 1 𝑛2 for all
Proof: First prove the right inequality (the upper bound):
1
n ≥ 0.
2 2 2 2
𝑛 (𝑛 − 1) = 1 𝑛2 − 1 𝑛 ≥ 1 𝑛2 − [1 𝑛] [1 𝑛] for all n ≥ 2.
Second, we prove the left inequality (the lower bound):
1
2 2 2 2 2 2
± 1
𝑛 (𝑛 − 1 ) ≥ 1
𝑛2
2 4
𝑛2 ≤ 1 𝑛 (𝑛 − 1) ≤ 1 𝑛2
1
1
Hence, 𝑛 (𝑛 − 1) ∈ Θ(𝑛2)
41
i.e.,
2 2 1
2 4 2
, where c2= , c1= and n0=2
Note: asymptotic notation can be thought of as "relational operators" for functions similar to the
THEOREM: If t1(n) ∈ O(g1(n)) and t2(n) ∈ O(g2(n)), then t1(n) + t2(n) ∈ O(max{g1(n), g2(n)}).
(The analogous assertions are true for the Ω and Θ notations as well.)
PROOF: The proof extends to orders of growth the following simple fact about four arbitrary real
Since t1(n) ∈ O(g1(n)), there exist some positive constant c1 and some nonnegative integer
numbers a1, b1, a2, b2: if a1 ≤ b1 and a2 ≤ b2, then a1 + a2 ≤ 2 max{b1, b2}.
n1 such that
Hence, t1(n) + t2(n) ∈ O(max{g1(n), g2(n)}), with the constants c and n0 required by the
definition O being 2c3 = 2 max{c1, c2} and max{n1, n2}, respectively.
The property implies that the algorithm’s overall efficiency will be determined by the part
with a higher order of growth, i.e., its least efficient part.
± t1(n) ∈ O(g1(n)) and t2(n) ∈ O(g2(n)), then t1(n) + t2(n) ∈ O(max{g1(n), g2(n)}).
Summation formulas
EXAMPLE 1: Compute the factorial function F(n) = n! for an arbitrary nonnegative integer n.
Since n!= 1•. . . . • (n − 1) • n = (n − 1)! • n, for n ≥ 1 and 0!= 1 by definition, we can compute
F(n) = F(n − 1) • n with the following recursive algorithm. (ND 2015)
ALGORITHM F(n)
//Computes n! recursively
//Input: A nonnegative integer n
//Output: The value of n!
if n = 0 return 1
else return F(n − 1) * n
Algorithm analysis
For simplicity, we consider n itself as an indicator of this algorithm’s input size. i.e. 1.
The basic operation of the algorithm is multiplication, whose number of executions we
denote M(n). Since the function F(n) is computed according to the formula F(n) = F(n −1)•n
for n > 0.
The number of multiplications M(n) needed to compute it must satisfy the equality
M(n) = M(n-1) + 1 for n > 0
To compute To multiply
F(n-1) F(n-1) by n
M(n − 1) multiplications are spent to compute F(n − 1), and one more multiplication is
needed to multiply the result by n.
Recurrence relations
The last equation defines the sequence M(n) that we need to find. This equation defines
M(n) not explicitly, i.e., as a function of n, but implicitly as a function of its value at another point,
Solve the recurrence relation (𝑛) = (𝑛 − 1) + 1, i.e., to find an explicit formula for
namely n − 1. Such equations are called recurrence relations or recurrences.
Thus, the recurrence relation and initial condition for the algorithm’s number of multiplications
M(n):
M(n) = M(n − 1) + 1 for n > 0,
M(0) = 0 for n = 0.
EXAMPLE 1: Consider the problem of finding the value of the largest element in a list of n
numbers. Assume that the list is implemented as an array for simplicity.
ALGORITHM MaxElement(A[0..n − 1])
//Determines the value of the largest element in a given array
//Input: An array A[0..n − 1] of real numbers
//Output: The value of the largest element in A
maxval ←A[0]
for i ←1 to n − 1 do
if A[i]>maxval
maxval←A[i]
return maxval
Algorithm analysis
The measure of an input’s size here is the number of elements in the array, i.e., n.
There are two operations in the for loop’s body:
o The comparison A[i]> maxval and
o The assignment maxval←A[i].
The comparison operation is considered as the algorithm’s basic operation, because the
comparison is executed on each repetition of the loop and not the assignment.
The number of comparisons will be the same for all arrays of size n; therefore, there is no
need to distinguish among the worst, average, and best cases here.
Let C(n) denotes the number of times this comparison is executed. The algorithm makes
one comparison on each execution of the loop, which is repeated for each value of the
loop’s variable i within the bounds 1 and n − 1, inclusive. Therefore, the sum for C(n) is
calculated as follows:
𝑛−
𝑐(𝑛) = ∑
=
i.e., Sum up 1 in repeated n-1 times
𝑛−
𝑐(𝑛) = ∑ = 𝑛 − ∈ ()
=
EXAMPLE 2: Consider the element uniqueness problem: check whether all the Elements in a
given array of n elements are distinct.
ALGORITHM UniqueElements(A[0..n − 1])
//Determines whether all the elements in a given array are distinct
//Input: An array A[0..n − 1]
//Output: Returns “true” if all the elements in A are distinct and “false” otherwise
for i ←0 to n − 2 do
for j ←i + 1 to n − 1 do
if A[i]= A[j ] return false
return true
Algorithm analysis
The natural measure of the input’s size here is again n (the number of elements in the array).
Since the innermost loop contains a single operation (the comparison of two elements), we
should consider it as the algorithm’s basic operation.
The number of element comparisons depends not only on n but also on whether there are
equal elements in the array and, if there are, which array positions they occupy. We will
limit our investigation to the worst case only.
One comparison is made for each repetition of the innermost loop, i.e., for each value of the
loop variable j between its limits i + 1 and n − 1; this is repeated for each value of the outer
loop, i.e., for each value of the loop variable i between its limits 0 and n − 2.
EXAMPLE 3: Consider matrix multiplication. Given two n × n matrices A and B, find the time
efficiency of the definition-based algorithm for computing their product C = AB. By definition, C
is an n × n matrix whose elements are computed as the scalar (dot) products of the rows of matrix A
and the columns of matrix B:
where C[i, j ]= A[i, 0]B[0, j]+ . . . + A[i, k]B[k, j]+ . . . + A[i, n − 1]B[n − 1, j] for every pair of
indices 0 ≤ i, j ≤ n − 1.