0% found this document useful (0 votes)
9 views18 pages

Chapter No#5

Chapter 5 focuses on Data Analytics, emphasizing the importance of model building, statistical concepts, and data visualization in decision-making. It covers methods for data collection and preparation, including surveys, observations, and experiments, as well as techniques for cleaning and transforming data. The chapter also introduces statistical modeling to analyze relationships and make predictions based on data.

Uploaded by

Mirza Imran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views18 pages

Chapter No#5

Chapter 5 focuses on Data Analytics, emphasizing the importance of model building, statistical concepts, and data visualization in decision-making. It covers methods for data collection and preparation, including surveys, observations, and experiments, as well as techniques for cleaning and transforming data. The chapter also introduces statistical modeling to analyze relationships and make predictions based on data.

Uploaded by

Mirza Imran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

CHAPTER # 5: Data Analytics COMPUTER PART-I

Chapter No. 5
DATA ANALYTICS
STUDENT LEARNING OUTCOMES [C-11-B-01 to C-11-B-18]
 The role and importance of model building and their real world applications. (Understanding)
 Build basic statistical model for real world problems and test their performance. (Understanding)
 Explain experimental design in data science. (Understanding)
 Explain the types, uses, methods of data visualizations. (Understanding)
 Benefits of visualizing data through descriptive statistics. (Understanding)
 Explain and create a data visualizing using data visualization software (for example MS excel,
Google sheets, Python, Tableau or Matplotlib). (Understanding)

1
CHAPTER # 4: COMPUTATIONAL STRUCTURES COMPUTER PART-I

2
CHAPTER # 5: Data Analytics COMPUTER PART-I

Data Analytics
Data Analytics in Python refers to the process of examining, cleaning, transforming, and modeling data using
Python programming language to discover useful information, draw conclusions, and support decision-making.
5.1 Model Building
Model building is the process of creating simplified representations of complex problems. It helps in understanding,
analyzing, and solving real-world issues by focusing on key elements.
Example: Consider a student preparing for exams. To decide how much time to spend on each subject,
the student creates a study plan. This plan is a model that simplifies their schedule by focusing on
important subjects and ignoring less relevant tasks.
5.1.1 Role of Models in Decision-Making
Models are useful because they give a structured way to think about a problem. They take complicated
information and make it easier to work with. This is especially important in decision making, where
models help us consider different outcomes before making a choice.
Example: When a farmer use a model to decide the best time to plant crops. By studying weather
patterns and soil conditions, the model helps the farmer predict the right planting time, which increases
the chances of a good harvest.

5.1.2 Applications of Models


Models are tools that help us understand complex problems by simplifying them. They are widely used
in various fields to make predictions and decisions based on available data. Some common fields where
models are applied include weather forecasting, financial predictions, and scientific research.
Weather Forecasting
In weather forecasting, models are used to predict future weather conditions. Meteorologists collect
data such as temperature, humidity, and wind speed, and then use models to analyze this data and
forecast the weather.
Financial Predictions
Models are also very important in the financial world. Banks and companies use models to predict
economic trends, stock prices, and even customer behavior.
Scientific Research
In scientific research, models help scientists test theories, explore new ideas, and make predictions
based on data. With advancements in Machine Learning (ML) and Artificial Intelligence (AI), models
have become even more powerful and accurate in analyzing large amount of data.
5.2 Basic Statistical Concepts
Statistics is a branch of mathematics that helps us understand and analyze data. By using statistics, we
can summarize large sets of information in a simple way, making it easier to draw conclusions.
5.2.1 Measures of Central Tendency
Measures of central tendency help us find the “center” or typical value in a set of data. There are three
main measures of central tendency: mean, median, and mode. These measures give us a sense of the
average or most common values in a dataset.
Mean
The mean is the average of all the numbers in a data set. To find the mean, we add all the numbers together and divide by
the total number of values.

3
CHAPTER # 4: COMPUTATIONAL STRUCTURES COMPUTER PART-I

Example: Imagine 5 students scored 50, 60, 70, 80, and 90 in a test. The mean score is calculated by adding all the
scores and then dividing by the number of students.
Mean = 50+60+70+80+90/5
Mean = 70

Median
The median is the middle value when the numbers are arranged in order. If there is an odd number
of values, the median is the exact middle number. If there is an even number of values, the median is
the average of the two middle numbers.
Example: Using the same test scores: 50, 60, 70, 80, and 90. When we arrange these scores in
ascending order (which they already are), the middle value is 70. Therefore, the median score is 70.
Example with Even Numbers: If the scores were 50, 60, 70, and 80, we would take the average of the
two middle scores (60 and 70):
Median = (60+70)/2= 65, the median is 65.
The median helps us understand the middle point of the data.

Mode
The mode is the number that appears most often in a data set. There can be more than one mode if
multiple numbers appear with the same highest frequency.
Example: If 5 students scored 50, 60, 70, 70, and 90, the number 70 appears twice, while all other
numbers appear only once. Therefore, the mode is 70.
Example with Multiple Modes: If the scores were 50, 60, 70, 70, 60, and 90, both 60 and 70 appear
twice. So, there are two modes: 60 and 70.
5.2.2 Measures of Dispersion
Measures of dispersion tell us how spread out or scattered the data is. Two common measures of
dispersion are variance and standard deviation. These help us understand whether the data points are
close to the average (mean) or far from it.
Variance
The variance shows how much the numbers in a data set differ from the mean. A higher variance
means that the numbers are more spread out, while a lower variance means that the numbers are
closer to the mean. To calculate variance, we use the following mathematical formula.

Where: xi represents each individual value in the data set, µ is the mean of the data set, N is the total
number of values in the data set.
Example: Imagine two classes, Class A and Class B, took the same exam.
1. In Class A, the scores are: 50, 52, 55, 57, and 60.
2. In Class B, the scores are: 30, 45, 55, 75, and 90. Steps are
involved in variance calculations:
Step 1: Variance for Class A Given Score = 50,52,55,57,60
Step 1.1: Compute the Mean (μ)

4
CHAPTER # 5: Data Analytics COMPUTER PART-I

(μ) = 50 +52 +55 +57 + 60 = 274 =54.8


5 5
2
Step 1.2: Compute Each Squared Deviation ((xi−μ )
xi xi−μ
2
(xi−μ)
50 50−54.8=−4.8 23.04
52 52−54.8=−2.8 7.84
55 55−54.8=0.2 .04
57 57−54.8=2.2 4.84
60 60−54.8=5.2 27.04

Step 1.3: Compute Variance

Step 2: Variance for Class B

xi xi−μ
2
(xi−μ)
30 30−59 = −29 841
45 45− 59 = −14 196
55 55 – 59 = − 4 16
75 75 – 59 = 16 256
90 90 – 59 = 31 961

Step 2.3: Compute Variance

 Variance of Class A: 12.56


 Variance of Class B: 454
This confirms that Class B has a much higher variance, meaning the scores are more spread out
compared to Class A.
5.2.3 Standard Deviation
The standard deviation is similar to variance but provides a more practical number. It tells us how
spread out the numbers are in relation to the mean. The standard deviation is simply the square root
of the variance. To calculate standard deviation, we use the following mathematical formula.

5
CHAPTER # 4: COMPUTATIONAL STRUCTURES COMPUTER PART-I

Where: - xi represents each individual value in the data set, - µ is the mean of the data set,
- N is the total number of values in the data set.
Calculating Standard Deviation:

The standard deviation for Class A is approximately 3.55, while for Class B, it is about 21.26. This means
that Class A's scores are closely packed around the mean, whereas Class B's scores are more widely
scattered. The standard deviation helps us easily understand how much variation there is in the scores.

5.2.3 Introduction to Probability


Probability is the study of how likely an event is to happen. It helps us predict outcomes
based on what we know.
Example: Consider flipping a coin. There are two possible outcomes: heads or tails. Since both
outcomes are equally likely, the probability of getting heads is 50% (or 1/2), and the probability of
getting tails is also 50%.
We can express this mathematically as:

In the case of the coin flip:


Probability of heads = (1 favorable outcome, 2 total outcomes)
Probability is not just for coin flips. It is used in many areas, such as predicting the weather, making
business decisions, or even playing cricket.

5.1 Data Collection and Preparation


Data Collection refers to the systematic process of gathering, measuring, and analyzing
information from various sources to get a complete and accurate picture of an area of interest. Data
collection is a critical step in any research or data-driven decision-making process, ensuring the accuracy
and reliability of the results obtained.
5.1.1 Data Collection Methods
Data collection refers to the process of gathering relevant information for a particular purpose.
Depending on the nature of the research, different methods can be used for data collection.
These methods are:
 Surveys Choosing the right method depends on the
 Observations research objective and the type of data required.
 Experiments

6
CHAPTER # 5: Data Analytics COMPUTER PART-I

Surveys
 Surveys are a commonly used method for collecting large amounts of data in a structured way.
 These involve asking a predefined set of questions to a sample group.
 Surveys can be conducted using various means such as online forms, telephone calls, or face-to-face
interviews.
 Example of a Survey: Suppose a school wants to know students' favorite subjects.
Population: All students in the school.
Sample: 100 randomly selected students.
Questionnaire: “What is your favorite subject?” (Options: Math, Science, English, History)
The collected responses are then analyzed to find which subject is most popular.

Observations
 Observation involves collecting data by watching or monitoring subjects in their natural
environment.
 This method is useful when researchers want to gather data on behaviors or phenomena
without interference.
 Example: If you are conducting a study to find the average height of students in a class, each
student’s measured height is an observation.
 For instance:
 Student A’s height = 150 cm (one observation)
 Student B’s height = 160 cm (another observation)
 Student C’s height = 155 cm (another observation)
 Each height value is an individual observation collected for analysis.

Experiments
 Experiments involve manipulating one or more variables to determine their effect on another
variable.
 This method is particularly useful in scientific and engineering fields where controlled
environments are necessary for accurate measurement.
 Example: A school teacher wants to test whether providing students with printed notes helps
improve their performance in exams.
 The teacher conducts an experiment with two groups of students, one receiving printed notes
and the other relying solely on lectures.
 After one month, both groups take the same test, and the teacher compares the results to see if
printed notes had a positive impact on performance.

5.2.4 Data Preparation


Data Preparation is the process of cleaning and organizing raw data before analysis. It involves steps like
removing errors, handling missing values, transforming data formats, and normalizing data to
ensure it is accurate and usable. Proper data preparation improves the quality of the analysis and helps in
getting reliable results. It’s an essential step in data science, statistics, and machine learning.
Example: If survey responses contain incomplete information, missing values can be estimated based
on the available data. Proper data preparation ensures that the analysis leads to reliable and valid
results.

7
CHAPTER # 4: COMPUTATIONAL STRUCTURES COMPUTER PART-I

5.1.1 Data Cleaning and Transformation


 Data cleaning and transformation are important steps to prepare data for analysis.
 Raw data often has errors, missing values, or may be in the wrong format.
 To ensure accurate results in analysis, it is important to fix these issues before moving forward.
Data Cleaning
Data cleaning means correcting or removing any problems in the data. These problems can include
incorrect entries, missing values, or duplicate data. If these errors are not fixed the results of the analysis
will be misleading.
Example: Imagine a school collecting data on student scores. Some students may have entered their
names incorrectly, or a few scores may be missing from the records. In this case, data cleaning would
involve correcting any wrong names and finding the missing grades to complete the dataset.

Graphical Representation of Data Cleaning for Student Grade Records

Data Transformation
Once the data is clean, it is often necessary to transform it into a format that is easy to work with. This
transformation may include converting data into different formats, creating new columns, or organizing
data in a different way. These changes help make the data more suitable for analysis or modeling.
Example: After cleaning the student grade records, it may be necessary to transform this data for better
analysis. For instance, instead of displaying grades for each individual student, the data.

8
CHAPTER # 5: Data Analytics COMPUTER PART-I

Handling Missing Data


Sometimes, data is incomplete or has missing values. There are different techniques to handle missing data.
One option is to remove the rows with missing values if they are very few. Another option is to fill in the
missing values with an average or with data from similar cases. The choice depends on the type of data and
the amount of missing information.
Example: In the dataset of student grades, if Sara's grade is missing, this creates a challenge in assessing
her performance. To address this issue, several strategies can be employed:
Imputation Flagging Removal
One common method is to The school can also keep track of If the number of missing entries is
estimate the missing value using Sara's missing score by adding a small, the school might choose to
existing data. note in the dataset. exclude Sara's record from
specific analyses.
Example: The school can This method indicates that Sara's This decision is acceptable if it
calculate the average score of all score is not available, making does not significantly impact the
students in Sara's class. If the analysts aware of the incomplete overall understanding of student
average score is 87, the school data. This approach ensures performance. However, it risks
may assign this value to Sara's transparency while allowing the losing valuable information about
record temporarily. This approach analysis to proceed without filling Sara.
allows the school to maintain a in the gap.
complete dataset while making a
reasonable assumption about
Sara's performance.

5.3 Building Statistical Models


Building Statistical Models means using mathematical formulas and algorithms to understand patterns in
data and make predictions or decisions. These models help us analyze relationships between variables,
identify trends, and test hypotheses.

5.3.1 Introduction to Statistical Modeling


Statistical modeling is a way to use data to make sense of the world and predict what will happen in the
future. Think of it like this: if you want to know how much money you'll spend on groceries next month, you
can look at what you spent in the past. By analyzing that data, you can create a model to help you estimate
your future grocery expenses.
5.3.1.1 Model Development
Building a statistical model involves several steps. Let's break them down:
Step 1: Define the Problem
 First, we need to understand the problem.
 Example: If we are trying to predict grocery expenses, we need to know which factors will cause to
increase our grocery expenses (e.g., family size, location, or income).
Step 2: Collect Data
 Next, we gather data related to the problem.
 In our example, we will collect data on past spending habits, number of family members, and any
other factors that may affect grocery costs.

9
CHAPTER # 4: COMPUTATIONAL STRUCTURES COMPUTER PART-I

Step 3: Choose an Algorithm


 Based on the problem and the data, we choose an algorithm.
 Algorithms are methods that help us create a model.
 Some popular algorithms are linear regression and logistic regression, which we will further in this
section.
Step 4: Train the Model
 The model is then trained using the data.
 This means the model learns from the data to make predictions.
Step 5: Evaluate the Model
 Finally, we test the model to see how well it works by using new data.
 This step is very important to ensure the model makes good predictions.
5.3.1.2 Linear Regression
Linear regression is a common statistical model used to understand the relationship between two variables.
It is often used to predict one variable based on another.
Example: Imagine you run a small fruit stall in your town, and you want to predict how much money you
will make each day based on the number of customers who visit your stall. The number of customers is the
independent variable (the cause), and the money you earn is the dependent variable (the effect). We will
use linear regression to understand this relationship and help you predict future earnings.
Step 1: Collecting Data
 To build a linear regression model, we need data.
 Let's assume you've recorded the number of customers and your daily earnings for the last 5 days:

Customer’s data

Step 2: Understanding the Linear Regression Formula


 The formula for simple linear regression is: Y =β₀+β₁ x+ϵ
 Where: Y is the dependent variable (in our case, daily earnings),
 X is the independent variable (the number of customers),
 β₀is the intercept, which is the starting value of Y when X= 0,
 β₁is the slope, which shows how much Y changes with each unit increase in X,
 ε is the error term, which accounts for the difference between the predicted and actual values.

Step 3: Building the Linear Regression Model

 When building a linear regression model, our goal is to find the best line that explains how two
things are related in this case, the number of customers and daily earnings.
 Here's how we get the values for the slope (40) and intercept (300):

10
CHAPTER # 5: Data Analytics COMPUTER PART-I

 Understanding the Slope (β₁= 40)

 The slope shows how much extra money we make for every new customer. Let's use our data to
figure it out:

 If you notice, for every 5 extra customers, earnings go up by 200 rupees. So, for each new customer:
β₁= 200/ 5 = 40.

 This means every new customer adds 40 rupees to our earnings. Understanding the Intercept (β₀=
300).

 The intercept (β₀) represents the earnings when no customers visit. To find this value, we look at
where the line crosses the vertical axis when the number of customers is zero. In simpler terms, it
tells us what the base earnings are, even if no one shows up.

 Using this data, we calculate the slope (β₁ = 40) means for every additional customer, earnings
increase by 40 rupees. Now, to find the intercept, we need to consider how much we earn when
there are no customers.

 We can use the equation: Earnings = β₀+ β₁× Customers

 If we take any data point, say when there are 10 customers, the earnings are 500 rupees.

 Substituting these values into the equation: 500 = β₀+ (40 × 10) 500 = β₀+ 400 Solving this gives:
β₀ = 500 − 400 = 100 This means that, based on the data, if no customers show up, you'd still
expect to make 100 rupees, maybe from regular customers or other fixed earnings. So, the intercept
value of 100 rupees represents the minimum amount you'd make on a day with zero customers.

11
CHAPTER # 4: COMPUTATIONAL STRUCTURES COMPUTER PART-I

Exercise
Multiple Choice Questions: Choose the correct option.
1. The function used to add an item at the end of a list in Python:
a) insert( ) b) append( ) c) remove( ) d) pop( )
2. What does the 'in' keyword do when used with python list?
a) Adds an item to the list.
b) Removes an item from the list.
c) Checks if an item exists in the list.
d) Returns the length of the list.
3. Which operation removes an item from the top of the stack?
a) Push b) Pop c ) Peek d) Add
4. When converting an infix expression to postfix notation, what does a stack help to manage?
a) Order of operands b) Priority of operators c) Parentheses d)Both b and c

5. Which operation is used to add an item to a queue?


a) Dequeue b) Peek c) Enqueue d) Remove
6. In the context of Breadth-First Search (BFS) in a tree, which operation of queue is used to
visit nodes level by level?
a) Adding notes to the end of a stack b) Removing nodes from the end of a list
c) Enqueueing nodes to a queue d) Dequeueing nodes from a stack
7. Which of the following tree traversals visits the root node before its children?

12
CHAPTER # 5: Data Analytics COMPUTER PART-I

a) In-order b) pre-order c) post-order d) level order


8. Which of the following is true about the height of a tree?
a) The height is the number of edges from the root to the deepest node
b) The height is the number of nodes from the root to the deepest node
c) The height is the number of children of the root node
d) The height is always equal to the number of nodes in the tree
9. Which graph traversal explores all immediate neighbors of a vertex before moving to the
next level?
a) Depth-First Search (DFS)
b) Breadth-First Search (BFS)
c) Depth-Limited Search (DLS)
d) Iterative Deepening Search (IDS)
10. For which scenario would a graph data structure be most appropriate?
a) Managing a to-do list
b) Modeling a line of customers in a store
c) Representing connections in a social network
d) None of these

Short Questions:
1. Explain how the ' extend() ' function works in python lists. Provide an example.
Answer: In Python, extend() method is used to add items from one list to the end of another list. This
method modifies the original list by appending all items from the given iterable.
Let’s look at a simple example of the extend() method.
a = [1, 2, 3]
b = [4, 5]
Using extend() to add elements of b to a
a.extend(b)
print(a)
Output: [1,2,3,4,5]
2. Explain the potential issues which could arise when two variables reference the same list
in a program? Provide an example.
Answer: When two variables reference the same list in Python, they both point to the same object
in memory. This means that changes made through one variable will affect the other, which can lead
to unexpected behavior or bugs, especially if you're not intentionally sharing data .

Example: list1 = [1, 2, 3]


list2 = list1 #Both variables refer to the same list
list2.append(4)
print(list1) Output: [1, 2, 3, 4]
print(list2) Output: [1, 2, 3, 4]
3. Define a stack and explain the Last-In, First-Out (LIFO) principle.
Answer: A stack is a simple data structure where you can only add or remove items from one end,

13
CHAPTER # 4: COMPUTATIONAL STRUCTURES COMPUTER PART-I

known as the “top”. Both insertion and deletion of elements occur at this top end. A stack operates on
the Last-In, First-Out (LIFO) principle, meaning that the most recently added element is the first one to
be removed.
4. How does the stack help in balancing parentheses in an expression? Describe the process.
Answer: The stack helps balance parentheses by keeping track of opening parentheses
encountered. When a closing parenthesis is met, it checks if the top of the stack contains a
matching opening parenthesis. If it does, they cancel out (pop the stack). If not, or if the stack is
empty, the parentheses are unbalanced. If the stack is empty at the end, the parentheses are
balanced.
5. Differentiate between the Enqueue and Dequeue operations of queue.
Enqueue operations Dequeue operations
When a new task arrives (for example, when you As soon as the computer is ready to process a
press "Print" on your computer), it is added to task, it removes the first task from the front of
the end of the queue, similar to how a new the queue, just like the doctor calling in the next
patient is added to the end of the waiting list. patient from the waiting list. This process of
This process of adding a task to the queue is removing a task from the queue for processing is
called enqueue. called dequeue.

6. Name two basic operations performed on stack.


Answer: Two basic operations performed on a stack are:
 Push: Adds an element to the top of the stack.
 Pop: Removes the top element from the stack.
 These operations follow the Last In, First Out (LIFO) principle.

7. What is difference between enqueue () and dequeue.

Enqueue Dequeue
Enqueue is the operation of adding an Dequeue is the operation of removing an element
element to the end (rear) of a queue data from the front (beginning) of a queue data
structure. In Python, you can perform structure. In Python, you can perform dequeue on a
enqueue on a list using the append() method. list using the pop(0) method.
Example: queue = [] queue = [10, 20, 30]
queue.append(10) Enqueue 10 item = queue.pop(0) Dequeue operation
queue.append(20) Enqueue 20 removes 10
print(queue) Output: [10, 20] print(item) Output: 10
print(queue) Output: [20, 30]

Long Questions
1. Discuss the dynamic size property of lists in Python. How does this property
make lists more flexible?
Answer: Dynamic Size Property of Lists in Python and Its Flexibility
Python lists have a dynamic size, which means that unlike arrays in some other programming
languages, you don’t need to define the size of a list at the time of its creation. You can add or remove
elements freely during the execution of the program, and the list will automatically adjust its size to

14
CHAPTER # 5: Data Analytics COMPUTER PART-I

accommodate these changes.


How Python Implements Dynamic Size:

 When you add elements using methods like append() or insert(), Python internally allocates
more memory to hold the new elements.
 If the current allocated memory is full, Python creates a larger space and copies the existing
elements to this new space. This process is mostly transparent to the programmer.
 When elements are removed using remove() or pop(), the list size reduces, and Python may
free some memory as needed.

Why Is This Property Important?

 Flexibility in Usage: You don’t have to know the number of elements beforehand. This makes lists
very useful when working with data that can change size dynamically, such as user input, data
processing, or results of computations.
 Efficient Memory Management: Python optimizes memory allocation behind the scenes, so the list
grows and shrinks as required without wasting too much memory.
 Ease of Programming: Programmers can focus more on the logic rather than managing memory or
resizing arrays manually, which reduces complexity and chances of errors.
 Supports Various Operations: Operations like insertion, deletion, and appending become simpler
and more natural, which is not always possible in fixed-size data structures.
2. Explain the operations on stack with real life example and python code.

Answer: A stack is a data structure that follows the Last In, First Out (LIFO) principle. The two main
operations on a stack are:

1. Push: Add an element to the top of the stack.


2. Pop: Remove the element from the top of the stack.

Other common operations include peek (view the top element without removing it) and isEmpty (check if
the stack is empty).
Real-Life Example: Stack of Plates
 Imagine a stack of plates in your kitchen.

 You add a plate on top of the stack (push).

 When you need a plate, you take the topmost plate first (pop).

 You cannot remove a plate from the middle or bottom without taking off the plates above it.

Python code:

stack = []

# Push operation 15
stack.append('Plate 1')
stack.append('Plate 2')
print("Stack after pushes:", stack)
CHAPTER # 4: COMPUTATIONAL STRUCTURES COMPUTER PART-I

3. Write, a simple program to implement a queue (insertion and deletion).

from collections import deque

# Create a queue using deque


queue = deque()

# Enqueue operation (insertion)


queue.append('A')
queue.append('B')
queue.append('C')
print("Queue after enqueues:", list(queue)) # Output: ['A', 'B', 'C']

# Dequeue operation (deletion)


item = queue.popleft()
print("Dequeued item:", item) Output: 'A'
print("Queue after dequeue:", list(queue)) Output: ['B', 'C']

deque provides O(1) time complexity for both enqueue and dequeue, making it ideal for queue
implementation.

4. Define Tree and explain its properties


Answer: A tree is a data structure that consists of nodes connected in a hierarchy.
It starts from a root node and branches out to child nodes, forming a parent-child relationship.
Trees are used to represent hierarchical data like file systems, organizational charts, and more.
Each node can have zero or more children, but only one parent (except the root, which has none).
Common types of trees include binary trees, where each node has at most two children.
Example:
 A family tree is a classic example of a tree structure in real life.
 It shows relationships between family members across generations.

16
CHAPTER # 5: Data Analytics COMPUTER PART-I

 The root node could be the oldest known ancestor.


 Each person (node) can have children (child nodes), representing the next generation.
 Each node (person) has one parent (except the root) and can have multiple children.

Properties of Trees
 Root Node: The root is the very first or top node in a tree, like the main folder in a
computer where all other folders and files are contained.
 Edges and Nodes: Nodes are the individual elements in the tree, and they are connected by
lines called edges. A node without any child nodes is called a leaf, similar to a file
in a folder that doesn't contain any other files.
 Height: The height of a tree is the longest path from the root node down to the farthest leaf. It
tells us how deep or tall the tree is.
 Balanced Trees: A tree is considered balanced if the branches on the left and right sides are
nearly the same height.
5. What is graph? Explain differences between directed and undirected graph.
Answer: A Graph is a data structure that consists of a set of vertices (or nodes) connected by edges.
Graphs are used to represent networks of connections, where each connection is a relationship
between two vertices. These vertices can represent anything, like cities, people, or even abstract
concepts, and the edges represent the relationships or pathways between them.
Example: : In a social network, each person can be connected to many others, forming a graph. There
is no single starting point, and people (vertices) can have multiple connections (edges) that do not
follow a strict parent-child relationship like in a tree.

 Directed Graphs:
In a directed graph, edges have a direction, which means they go from one vertex to another in a specific way.

17
CHAPTER # 4: COMPUTATIONAL STRUCTURES COMPUTER PART-I

Directed Graph
Example: Consider the above graph: If you want to travel from city A to city B, you can only go in the
direction permitted by the city's sign. If there's no one-way street going from city A to city B, you cannot
travel directly from city A to B.
Undirected Graphs:
In an undirected graph, edges do not have a direction. This means that if there is a connection between two
vertices, you can travel in both directions.
Example: Consider the below graph: if Person A is friends with Person B, then Person B is also friends with
Person A. There is no restriction on the direction of the friendship, so you can move freely between friends.

Undirected Graph

18

You might also like