Introduction to data structures:
A data structure is a specialized format for organizing, processing,
retrieving and storing data. There are several basic and advanced types of
data structures, all designed to arrange data to suit a specific purpose.
Data structures make it easy for users to access and work with the data
they need in appropriate ways. Most importantly, data structures frame the
organization of information so that machines and humans can better
understand it.
Data structures are the building blocks for more sophisticated
applications. They are designed by composing data elements into a
logical unit representing an abstract data type that has relevance to the
algorithm or application. An example of an abstract data type is a
"customer name" that is composed of the character strings for "first name,"
"middle name" and "last name."
data structures are used to implement the physical forms of abstract data
types. Data structures are a crucial part of designing efficient software.
They also play a critical role in algorithm design and how those algorithms
are used within computer programs.
Some examples of how data structures are used include the following:
Storing data. Data structures are used for efficient data persistence, such as
specifying the collection of attributes and corresponding structures used to
store records in a database management system.
Managing resources and services. Core operating system (OS) resources and
services are enabled through the use of data structures such as linked lists for
memory allocation, file directory management and file structure trees, as well
as process scheduling queues.
Data exchange. Data structures define the organization of information shared
between applications, such as TCP/IP packets.
Ordering and sorting. Data structures such as binary search trees -- also
known as an ordered or sorted binary tree -- provide efficient methods of
sorting objects, such as character strings used as tags. With data structures
such as priority queues, programmers can manage items organized according
to a specific priority.
Indexing. Even more sophisticated data structures such as B-trees are used to
index objects, such as those stored in a database.
Searching. Indexes created using binary search trees, B-trees or hash tables
speed the ability to find a specific sought-after item.
Scalability. Big data applications use data structures for allocating and
managing data storage across distributed storage locations, ensuring scalability
and performance. Certain big data programming environments -- such
as Apache Spark -- provide data structures that mirror the underlying structure
of database records to simplify querying.
Characteristics of data structures
Data structures are often classified by their characteristics. The following three
characteristics are examples:
1. Linear or non-linear. This characteristic describes whether the data items are
arranged in sequential order, such as with an array, or in an unordered
sequence, such as with a graph.
2. Homogeneous or heterogeneous. This characteristic describes whether all
data items in a given repository are of the same type. One example is a
collection of elements in an array, or of various types, such as an abstract data
type defined as a structure in C or a class specification in Java.
3. Static or dynamic. This characteristic describes how the data structures are
compiled. Static data structures have fixed sizes, structures and memory
locations at compile time. Dynamic data structures have sizes, structures and
memory locations that can shrink or expand, depending on the use.
Types of data structures
The data structure type used in a particular situation is determined by the
type of operations that will be required or the kinds of algorithms that will be
applied. The various data structure types include the following:
Array. An array stores a collection of items at adjoining memory
locations. Items that are the same type are stored together so the
position of each element can be calculated or retrieved easily by an
index. Arrays can be fixed or flexible in length.
Stack. A stack stores a collection of items in the linear order that
operations are applied. This order could be last in, first out (LIFO) or
first in, first out (FIFO).
Queue. A queue stores a collection of items like a stack; however, the
operation order can only be first in, first out.
Linked list. A linked list stores a collection of items in a linear order.
Each element, or node, in a linked list contains a data item, as well as a
reference, or link, to the next item in the list.
Tree. A tree stores a collection of items in an abstract, hierarchical
way. Each node is associated with a key value, with parent nodes
linked to child nodes -- or subnodes. There is one root node that is the
ancestor of all the nodes in the tree.
Heap. A heap is a tree-based structure in which each parent node's associated
key value is greater than or equal to the key values of any of its children's key
values.
Graph. A graph stores a collection of items in a nonlinear fashion. Graphs are
made up of a finite set of nodes, also known as vertices, and lines that connect
them, also known as edges. These are useful for representing real-world
systems such as computer networks.
Trie. A trie, also known as a keyword tree, is a data structure that stores
strings as data items that can be organized in a visual graph.
Hash table. A hash table -- also known as a hash map -- stores a collection of
items in an associative array that plots keys to values. A hash table uses a hash
function to convert an index into an array of buckets that contain the desired
data item.
Hashing
is a data structure technique where key values are converted into indexes of an
array where the data is stored.
These are considered complex data structures as they can store large amounts of
interconnected data.
Abstract Data type (ADT) is a type (or class) for objects whose behavior
is defined by a set of values and a set of operations. The definition of
ADT only mentions what operations are to be performed but not how
these operations will be implemented. It does not specify how data will be
organized in memory and what algorithms will be used for implementing
the operations. It is called “abstract” because it gives an implementation-
independent view.
The process of providing only the essentials and hiding the details is
known as abstraction.
The user of data type does not need to know how that data type is
implemented, for example, we have been using Primitive values like int,
float, char data types only with the knowledge that these data type can
operate and be performed on without any idea of how they are
implemented.
So a user only needs to know what a data type can do, but not how it will
be implemented. Think of ADT as a black box which hides the inner
structure and design of the data type. Now we’ll define three ADTs
namely List ADT, Stack ADT, Queue ADT.
1. List ADT
Vies of list
The data is generally stored in key sequence in a list which has a
head structure consisting of count, pointers and address of compare
function needed to compare the data in the list.
The data node contains the pointer to a data structure and a self-
referential pointer which points to the next node in the list.
The List ADT Functions is given below:
get() – Return an element from the list at any given position.
insert() – Insert an element at any position of the list.
remove() – Remove the first occurrence of any element from a non-
empty list.
removeAt() – Remove the element at a specified location from a non-
empty list.
replace() – Replace an element at any position by another element.
size() – Return the number of elements in the list.
isEmpty() – Return true if the list is empty, otherwise return false.
isFull() – Return true if the list is full, otherwise return false.
2. Stack ADT
View of stack
In Stack ADT Implementation instead of data being stored in each
node, the pointer to data is stored.
The program allocates memory for the data and address is passed to
the stack ADT.
The head node and the data nodes are encapsulated in the ADT. The
calling function can only see the pointer to the stack.
The stack head structure also contains a pointer to top and count of
number of entries currently in stack.
push() – Insert an element at one end of the stack called top.
pop() – Remove and return the element at the top of the stack, if it is
not empty.
peek() – Return the element at the top of the stack without removing
it, if the stack is not empty.
size() – Return the number of elements in the stack.
isEmpty() – Return true if the stack is empty, otherwise return false.
isFull() – Return true if the stack is full, otherwise return false.
3. Queue ADT
View of Queue
The queue abstract data type (ADT) follows the basic design of the
stack abstract data type.
Each node contains a void pointer to the data and the link pointer to
the next element in the queue. The program’s responsibility is to
allocate memory for storing the data.
enqueue() – Insert an element at the end of the queue.
dequeue() – Remove and return the first element of the queue, if the
queue is not empty.
peek() – Return the element of the queue without removing it, if the
queue is not empty.
size() – Return the number of elements in the queue.
isEmpty() – Return true if the queue is empty, otherwise return false.
isFull() – Return true if the queue is full, otherwise return false.
Features of ADT:
Abstract data types (ADTs) are a way of encapsulating data and
operations on that data into a single unit. Some of the key features
of ADTs include:
Abstraction: The user does not need to know the implementation of
the data structure only essentials are provided.
Better Conceptualization: ADT gives us a better conceptualization
of the real world.
Robust: The program is robust and has the ability to catch errors.
Encapsulation: ADTs hide the internal details of the data and provide
a public interface for users to interact with the data. This allows for
easier maintenance and modification of the data structure.
Data Abstraction: ADTs provide a level of abstraction from the
implementation details of the data. Users only need to know the
operations that can be performed on the data, not how those
operations are implemented.
Data Structure Independence: ADTs can be implemented using
different data structures, such as arrays or linked lists, without
affecting the functionality of the ADT.
Information Hiding: ADTs can protect the integrity of the data by
allowing access only to authorized users and operations. This helps
prevent errors and misuse of the data.
Modularity: ADTs can be combined with other ADTs to form larger,
more complex data structures. This allows for greater flexibility and
modularity in programming.
Overall, ADTs provide a powerful tool for organizing and manipulating
data in a structured and efficient manner.
Abstract data types (ADTs) have several advantages and disadvantages
that should be considered when deciding to use them in software
development. Here are some of the main advantages and disadvantages
of using ADTs:
Advantages:
Encapsulation: ADTs provide a way to encapsulate data and
operations into a single unit, making it easier to manage and modify
the data structure.
Abstraction: ADTs allow users to work with data structures without
having to know the implementation details, which can simplify
programming and reduce errors.
Data Structure Independence: ADTs can be implemented using
different data structures, which can make it easier to adapt to
changing needs and requirements.
Information Hiding: ADTs can protect the integrity of data by
controlling access and preventing unauthorized modifications.
Modularity: ADTs can be combined with other ADTs to form more
complex data structures, which can increase flexibility and modularity
in programming.
Disadvantages:
Overhead: Implementing ADTs can add overhead in terms of memory
and processing, which can affect performance.
Complexity: ADTs can be complex to implement, especially for large
and complex data structures.
Learning Curve: Using ADTs requires knowledge of their
implementation and usage, which can take time and effort to learn.
Limited Flexibility: Some ADTs may be limited in their functionality
or may not be suitable for all types of data structures.
Cost: Implementing ADTs may require additional resources and
investment, which can increase the cost of development.