DATA STRUCTURES and Algorithms
DATA STRUCTURES and Algorithms
Data Structure is a way of collecting and organizing and storing data in a computer in such a
way that we can access and modify it efficiently.
Data Structures is about rendering data elements in terms of some relationship, for better
organization and storage. For example, we have some data which has, player's name "Virat"
and age 26. Here "Virat" is a String data type and 26 is an integer data type.
We can organize this data as a record like Player record, which will have both player's name
and age in it. Now we can collect and store player's records in a file or database as a data
structure. For example: "Dhoni" 30, "Gambhir" 31, "Sehwag" 33.
Qn. What is a difference between a class and a data structure?
In object oriented programming, a class is an extensible program code template for creating
objects, providing initial values for state (member variables) and implementations of behavior
(member functions or methods).
OR. A class is a part of a computer program that a programmer creates to represent a thing
in a way that a computer can understand.
Classes can only be written in the language called object oriented programming.
In simple language, Data Structures are structures programmed to store ordered data, so that
various operations can be performed on it easily. It represents the knowledge of data to be
organized in memory. It should be designed and implemented in such a way that it reduces
the complexity and increases the efficiency.
Tree: A tree refers to a non-linear data structure which organizes data in hierarchical
manner.
A tree data structure can be defined as a collection of data/nodes (starting at the root node)
which is organized in a hierarchical manner. Each node is a data structure consisting of value,
together with a list of references to nodes (the children), with the constraints that no
reference is duplicated.
In a tree data structure, every individual element is called a node. A node stores the actual
data of that particular element and link to the next element in the hierarchical structure.
Uses of a tree
- To store data that has an inherent hierarchical structure. Eg. An O/S may use a tree for
directories, files and folders in its file management system.
1
- Trees are dynamic, so they enable programmers/users to add and delete nodes.
- While compiling program codes, programmers use trees to process the syntax of
statements.
Tree vocabulary
2. Edge
In a tree data structure, the connecting link between any two nodes is called as EDGE. In a
tree with ' N ' number of nodes there will be a maximum of ' N-1 ' number of edges.
3. Parent
In a tree data structure, the node which is predecessor of any node is called as PARENT
NODE. In simple words, the node which has branch from it to any other node is called as
parent node. Parent node can also be defined as “The node which has child / children".
4. Child
In a tree data structure, the node which is descendant of any node is called as CHILD Node. In
simple words, the node which has a link from its parent node is called as child node. In a tree,
any parent node can have any number of child nodes. In a tree, all the nodes except root are
child nodes.
5. Siblings
2
In a tree data structure, nodes which belong to same Parent are called as SIBLINGS. In simple
words, the nodes with same parent are called as Sibling nodes.
6. Leaf
In a tree data structure, the node which does not have a child is called as LEAF Node. In
simple words, a leaf is a node with no child.
In a tree data structure, the leaf nodes are also called External Nodes. External node is also a
node with no child. In a tree, leaf node is also called Terminal ' node.
7. Internal Nodes
In a tree data structure, the node which has at least one child is called as INTERNAL Node. In
simple words, an internal node is a node with at least one child. Internal nodes are also called
as ' Non-Terminal ' nodes.
8. Degree
In a tree data structure, the total number of children of a node is called as DEGREE of that
Node. In simple words, the Degree of a node is total number of children it has. The highest
degree of a node among all the nodes in a tree is called as ' Degree of Tree'
9. Level
In a tree data structure, the root node is said to be at Level 0 and the children of root node
are at Level 1 and the children of the nodes which are at Level 1 will be at Level 2 and so on...
In simple words, in a tree each step from top to bottom is called as a Level and the Level
count starts with '0' and incremented by one at each level (Step).
10. Height
In a tree data structure, the total number of egdes from leaf node to a particular node in the
longest path is called as
HEIGHT of that Node. In a tree, height of the root node is said to be height of the tree. In a
tree, height of all leaf nodes is '0'.
3
11. Depth
In a tree data structure, the total number of edges from root node to a particular node is
called as DEPTH of that Node. In a tree, the total number of edges from root node to a leaf
node in the longest path is said to be Depth of the tree. In simple words, the highest depth of
any leaf node in a tree is said to be depth of that tree. In a tree, depth of the root node is '0'.
12. Path
In a tree data structure, the sequence of Nodes and Edges from one node to another node is
called as PATH between those two Nodes. Length of a Path is total number of nodes in that
path.
13. Sub Tree
In a tree data structure, each child from a node forms a sub tree recursively. Every child node
will form a sub tree on its parent node.
Graph
In computer science, a graph is an abstract data type that is meant to implement the
undirected graph and directed graph concepts from mathematics, specifically the field of
graph theory.
It can also be defined as a collection of nodes and edges that represents relationships. Nodes
are vertices that correspond to objects. Edges are the connections between objects.
Undirected graph is the one with edges which don’t have direction. The edges indicate a two
way relationship, in that each edge can be can be traversed ion both directions. The figure
below shows an undirected graph with a 3 nodes and 3 edges.
Directed graphs are the ones with edges with direction. The edges indicate a one way
relationship, in that each edge can only be traversed in a single direction. See the figures
below.
4
A graph data structure consists of a finite (and possibly mutable) set of vertices or nodes or
points, together with a set of unordered pairs of these vertices for an undirected graph or a
set of ordered pairs for a directed graph. These pairs are known as edges, arcs, or lines for an
undirected graph and as arrows, directed edges, directed arcs, or directed lines for a directed
graph.
Stack
A stack is an abstract data type that serves as a collection of elements, with two principal
operations:
Push, which adds an element to the collection, and pop, which removes the most recently
added element that was not yet removed.
The order in which elements come off a stack gives rise to its alternative name, LIFO (last in,
first out). Additionally, a peek operation may give access to the top without modifying the
stack. The name "stack" for this type of structure comes from the analogy to a set of physical
items stacked on top of each other, which makes it easy to take an item off the top of the
stack, while getting to an item deeper in the stack may require taking off multiple other items
first.
Considered as a linear data structure, or more abstractly a sequential collection, the push and
pop operations occur only at one end of the structure, referred to as the top of the stack. This
makes it possible to implement a stack as a singly linked list and a pointer to the top element.
A stack may be implemented to have a bounded capacity. If the stack is full and does not
contain enough space to accept an entity to be pushed, the stack is then considered to be in
an overflow state. The pop operation removes an item from the top of the stack.
Queue
A queue is a first-in first-out (FIFO) abstract data type that is heavily used in computing.
Uses for queues involve anything where you want things to happen in the order that they
were called, but where the computer can't keep up to speed. For example:
Keyboard Buffer - you want the letters to appear on the screen in the order you press them.
You might notice that when your computer is busy the keys you press don't appear on the
screen until a little while after you press them. When they do appear they appear in the order
you press them.
Queues provide services in computer science, transport, and operations research where
various entities such as data, objects, persons, or events are stored and held to be processed
later. In these contexts, the queue performs the function of a buffer.
Stack is a collection of objects that works in LIFO (Last in First out) mechanism while Queue is
FIFO (First in First out). This means that the object that is inserted first is removed last in a
stack while an object that is inserted first is removed first in a queue.
TYPS OF QUEUES
5
Linear: In this queue form, elements are able to join the queue at one end and can
exit from the queue at the other end. First In First Out (FIFO).
A real life example of this would be queuing to pay at the supermarket. The first person into
the queue is the first person to be served. And if you join the queue last you are going to be
served last (no pushing!). However, there is one difference, in a supermarket queue once the
first person has been served the rest of the queue shuffles forward, making the previously
second person now first, this is possible in a computer but very costly in terms of processing
and slow.
Circular
Circular queue is a linear data structure in which the operations are performed based on FIFO
principle and the last position is connected to first position to make a circle. It is also called
Ring buffer. ircular queues are very common in computer science and they are used to store
things.
- They reuse memory space, meaning they won't overwrite data or run out of memory
(within limits) and drawbacks.
- Involve storing pointers as well as data, taking up more space than linear queues.
Priority
This is based on the order of priority of the elements in the queue. In other words, each
element of this type of queue has different level of priority attached to it; e.g., high priority or
low priority. A&E is an example of a priority queue.
In a computer we use priority queues to handle the execution of processes. Each process will
have a priority attached to it, with the more important processes jumping ahead of the less
important for the processor's attention when they require it. For example: Process Priority;
Web Browser Normal, Media Player Below Normal, OS Security High, Scheduled Virus Scanner
Low,etc
Array
In computer science, an array data structure, or simply an array, is a data structure consisting
of a collection of elements (values or variables), each identified by at least one array index or
key. An array is stored so that the position of each element can be computed from its index
tuple by a mathematical formula. [The simplest type of data structure is a linear array, also
called one-dimensional array.
The memory address of the first element of an array is called first address or foundation
address.
Uses of arrays.
Arrays are often used to implement tables, especially lookup tables; the word table is
sometimes used as a synonym of array.
6
They are also used to implement many other data structures, such as lists and strings,
mathematical vectors and matrices..
One or more large arrays are sometimes used to emulate in-program dynamic memory
allocation , particularly memory pool allocation
Record
A record is a collection of fields, possibly of different data types, typically in fixed number and
sequence. The fields of a record may also be called members, particularly in object-oriented
programming; fields may also be called elements, though these risk confusion with the
elements of a collection.
For example, a date could be stored as a record containing a numeric year field, a month field
represented as a string, and a numeric day-of-month field.
Records are distinguished from arrays by the fact that their number of fields is typically fixed,
each field has a name, and that each field may have a different type.
A record type is a data type that describes such values and variables.
Records can exist in any storage medium, including main memory and mass storage devices
such as magnetic tapes or hard disks. Records are a fundamental component of most data
structures, especially linked data structures. Many computer files are organized as arrays of
logical records, often grouped into larger physical records or blocks for efficiency.
Keys
A record may have zero or more keys. A key is a field or set of fields in the record that serves
as an identifier. A unique key is often called the primary key, or simply the record key. For
example an employee file might contain employee number, name, department, and salary.
The employee number will be unique in the organization and would be the primary key.
Depending on the storage medium and file organization the employee number might be
indexed —that is also stored in a separate file to make lookup faster.
Characteristic Description
Linear In Linear data structures, the data items are arranged in a linear
sequence. Example: Array
Non-Linear In Non-Linear data structures, the data items are not in sequence.
Example: Tree, Graph
Homogeneous In homogeneous data structures, all the elements are of same type.
Example: Array
Non-Homogeneous In Non-Homogeneous data structure, the elements may or may not be of
the same type. Example: Structures
Static Static data structures are those whose sizes and structures associated
memory locations are fixed, at compile time. Example: Array
7
Dynamic Dynamic structures are those which expand or shrink depending upon the
program need and its execution. Also, their associated memory locations
changes. Example: Linked List created using pointers
ALGORITHMS
Define an algorithm. What are the properties of an algorithm? What are the types of
algorithms?
Algorithm is not the complete code or program, it is just the core logic(solution) of a problem,
which can be expressed either as an informal high level description as pseudo code or using a
flowchart.
Every Algorithm must satisfy the following properties:
Input- There should be 0 or more inputs supplied externally to the algorithm.
Output- There should be at least 1 output obtained.
Definiteness- Every step of the algorithm should be clear and well defined.
Finiteness- The algorithm should have finite number of steps.
Correctness- Every step of the algorithm must generate a correct output.
An algorithm is said to be efficient and fast, if it takes less time to execute and consumes less
memory space. The performance of an algorithm is measured on the basis of following
properties:
1. Time Complexity 2. Space Complexity
1. Space Complexity
It’s the amount of memory space required by the algorithm, during the course of its
execution. Space complexity must be taken seriously for multi-user systems and in situations
where limited memory is available.
8
3. Ω Notation: just as the Big O notation provides an upper bound of on a function, Ω (omega)
notation defines a lower bound of an algorithm and it bounds the function only from below.
Properties of an algorithm:-
Recursive algorithm is a method of simplification that divides the problem into sub-problems
of the same nature. The result of one recursion is the input for the next recursion. The
repletion is in the self-similar fashion. The algorithm calls itself with smaller input values and
obtains the results by simply performing the operations on these smaller values. This
parameter is the input while the return value is the output. Generation of factorial,
Fibonacci number series are the examples of recursive algorithms.
RECURSION
Recursion is an approach in which a function calls itself with an argument. Upon reaching a
termination condition, the control returns to the calling function.
Qn. Explain the terms Base case, Recursive case, Run-Time Stack and Tail
Recursion.
Base case: - In this case, the output is known or when using recursion, the termination
condition which restarts the function is called as base case.
Binding time:-
Run-Time Stack: - Run Time stack contains return address, local variables and return value
if any of a recursive function call.
Tail Recursion: - Tail recursion consists of one recursive call with the last statement to be
executed. To find factorial of a given number is an example of tail recursion.
The process of attempting for solving a problem which finds successive approximations for
solution, starting from an initial guess. The result of repeated calculations is a sequence of
9
approximate values for the quantities of interest. An iterative algorithm executes steps in
iterations. It aims to find successive approximation in sequence to reach a solution. They are
most commonly used in linear programs where large numbers of variables are involved.
INTRODUCTION TO SORTING
Sorting is nothing but arranging the data in ascending or descending order. The term sorting
came into picture, as humans realized the importance of searching quickly. Sorting makes it
easier for everyone to arrange data in an order, hence making it easier to search. Sorting is
implemented to test 2 main goals.
There are many different techniques available for sorting, differentiated by their efficiency
and space requirements. Following are some sorting techniques.
1. Bubble Sort
2. Insertion Sort
3. Selection Sort
4. Quick Sort
5. Merge Sort
6. Heap Sort
Qn. Explain quick sort and merge sort algorithms.
Quick sort – Quick sort employs the ‘divide and conquer’ concept by dividing the list of
elements into two sub elements. On dividing, the quick sort procedure is recursively called to
sort the two halves. A “pivot” is used as the center point and elements less than the pivot are
moved to the left or before the pivot and elements greater than pivot are moved to the right.
Merge sort- Merge sort is based on divide and conquer mechanism. The array elements are
divided into partitions (n/2). Each partition is sorted recursively and then merged.
Bubble Sort is a simple algorithm which is used to sort a given set of n elements provided in
form of an array with n number of elements. Bubble Sort compares all the element one by
one and sort them based on their values.
If the given array has to be sorted in ascending order, then bubble sort will start by
comparing the first element of the array with the second element, if the first element is
greater than the second element, it will swap both the elements, and then move on to
compare the second and the third element, and so on.
It is known as bubble sort , because with every complete iteration the largest element in the
given array, bubbles up towards the last place or the highest index, just like a water bubble
rises up to the water surface.
Sorting takes place by stepping through all the elements one-by-one and comparing it with
the adjacent element and swapping them if required.
Following are the steps involved in bubble sort (for sorting a given array in ascending order):
1. Starting with the first element (index = 0), compare the current element with the next
element of the array.
2. If the current element is greater than the next element of the array, swap them.
10
3. If the current element is less than the next element, move to the next element. Repeat
Step 1.
Selection sort is conceptually the simplest sorting algorithm. This algorithm will first find the
smallest element in the array and swap it with the element in the first position, then it will
find the second smallest element and swap it with the element in the second position, and it
will keep on doing this until the entire array is sorted.
It is called selection sort because it repeatedly selects the next-smallest element and swaps it
into the right place.
Variables
In programming, a variable is a value that can change, depending on conditions or on
information passed to the program. Typically, a program consists of instruction(s) that tell the
computer what to do and data that the program uses when it is running. The data consists of
constants or fixed values that never change and variable values (which are usually initialized
to "0" or some default value because the actual values will be supplied by a program's user).
Usually, both constants and variables are defined as certain data types. Each data type
prescribes and limits the form of the data. Examples of data types include: an integer
expressed as a decimal number, or a string of text characters, usually limited in length.
In object-oriented programming, each object contains the data variables of the class it is an
instance of. The object's method(s) are designed to handle the actual values that are supplied
to the object when the object is being used.
Expression
An expression in a programming language is a combination of one or more clearly observable
values, constants, variables, operators, and functions that the programming language
interprets (according to its particular rules of precedence and of association) and computes to
produce ("to return") another value. This process, as for mathematical expressions, is called
evaluation.
In simple settings, the resulting value is usually one of various primitive types, such as
numerical, string, and logical; in more elaborate settings, it can be an arbitrary complex data
type.
For example, 2+3 is an arithmetic and programming expression which evaluates to 5. A
variable is an expression because it denotes a value in memory, so y+6 is an expression. An
example of a relational expression is 4≠4, which evaluates to false.
ASSIGNMENTS
An assignment is a statement in computer programming that is used to set a value to a
variable name. The operator used to do assignment is denoted with an equal sign (=). This
operand works by assigning the value on the right-hand side of the operand to the operand
on the left-hand side. It is possible for the same variable to hold different values at different
instants of time.
A simple assignment statement could be:
int x = 5
This means that x can only take integer values, and currently a value of 5 is assigned to the
variable x.
11
A single variable can hold various values at different times based on its life span and scope.
When a variable which is already assigned a value is assigned another value, the new
assignment value overwrites the previously assigned value. The assignment statement also
varies from programming language to programming language.
TYPE
In computer science and computer programming, a data type or simply type is a classification
of data which tells the compiler or interpreter how the programmer intends to use the data.
Most programming languages support various types of data, for example: real, integer or
Boolean. A data type provides a set of values from which an expression (i.e. variable,
function...) may take its values. This data type defines the operations that can be done on the
data, the meaning of the data, and the way values of that type can be stored.
In the linguistics of natural and computer languages, the terms syntax, semantics and
pragmatics are used to categorize descriptions of language characteristics.
12
Syntax
The syntax of a language describes the structure and composition of allowable phrases and
sentences of the language.
Semantics
The meaning of these syntactic elements must be provided through semantics. In essence,
we may think of providing syntactic elements as inputs to a semantic function, which in turn
provides some representation of the meaning of the elements as output.
Pragmatics
Pragmatics is the third general area of language description, referring to practical aspects of
how constructs and features of a language may be used to achieve various objectives.
Consider, for example, the syntax, semantics and pragmatics of an assignment statement. As
a syntactic construct, an assignment statement may consist of a variable and an expression
(themselves syntactic constructs), separated by the token as an assignment operator.
Exception
An exception is a signal that an error or other unusual condition has occurred. There are a
number of built-in exceptions, which indicate conditions like reading past the end of a file, or
dividing by zero. You can also define your own exceptions.
Visual-based programming language (VPL) is any programming language that lets users
create programs by manipulating program elements graphically rather than by specifying
them textually.
VPLs are also known as dataflow languages or diagrammatic languages. They allow
programming with visual expressions, spatial arrangements of text and graphic symbols are
used either elements of syntax or secondary notations.
They are on the idea of boxes and arrows. Boxes represent entities and arrows represent
relations.
Text-based programming languages are the ones which have you write lines and lines of code
to make a program rather than graphics and sound.
A Programming error refers to an error resulting from bad code in some program involved in
producing the erroneous result
1. Runtime errors. These are errors that occur when the program is running. There are
many errors that follow under these type of errors.
a) logical error which produces wrong output like miscalculation
2. Compile time errors: these are errors which occur while the program is being compiled
for example;
a) Syntax errors: errors due to the fact that the syntax of the language is not respected.
b) Semantic errors: errors due to an improper use of program statements.
Programming Warnings
13
Like an error, the warning function alerts the user of unexpected conditions detected when
running a program. However, warning does not halt the execution of the program. It displays
the specified warning message and then continues.
Reporting a Warning
Use warning in your code to generate a warning message during execution. Specify the
message string as the input argument to warning. For example, warning ('Input must be a
string')
Warnings also differ from errors in that you can disable any warnings that you don't want to
see. You do this by invoking warning with certain control parameters.
Many console applications such as command line interpreters are command line tools, but
numerous text-based user interface (TUI) programs also exist.
As the speed and ease-of-use of GUIs applications have improved over time, the use of
console applications has greatly diminished, but not disappeared. Some users simply prefer
console based applications, while some organizations still rely on existing console
applications to handle key data processing tasks.
SOURCE CODES
In computing, source code is any collection of computer instructions, possibly with comments,
written using a human-readable programming language, usually as plain text.
The source code of a program is specially designed to facilitate the work of computer
programmers, who specify the actions to be performed by a computer mostly by writing
source code. The source code is often transformed by an assembler or compiler into binary
machine code understood by the computer. The machine code might then be stored for
execution at a later time.
Alternatively, source code may be interpreted and thus immediately executed. Most
application software is distributed in a form that includes only executable files. If the source
code were included it would be useful to a user , programmer or a system administrator, any
of whom might wish to study or modify the program.
Purposes
Source code is primarily used as input to the process that produces an executable
program.
It is also used as a method of communicating algorithms between people.
14
Computer programmers often find it helpful to review existing source code to learn
about programming techniques. [
Porting software to other computer platforms is usually prohibitively difficult without
source code. Without the source code for a particular piece of software, portability is
generally computationally expensive.
Possible porting options include binary translation and emulation of the original platform.
Decompilation of an executable program can be used to generate source code, either in
assembly code or in a high-level language.
Programmers frequently adapt source code from one piece of software to use in other
projects, a concept known as software reusability.
15