Data Visualization using Python (24SAD28)
1. Bar Plot: Scenario-Based Question
Scenario:
A university is analyzing the results of its student clubs’ fundraising efforts
over the past semester. Each club (e.g., Debate, Robotics, Art) has reported
the total amount of funds raised. As the data analyst, your task is to present
this information in a way that allows the university council to quickly
compare the fundraising performance of each club and identify which clubs
led or lagged in their efforts.
Question:
You are provided with a list of clubs and the corresponding funds raised by
each. Using Matplotlib, create a bar plot to visualize the fundraising results
for each club.
Clearly label the axes, set a relevant title, and use custom colors for
better visual distinction between the clubs.
After visualizing, interpret which club(s) had standout performances
and suggest what the administration should focus on to boost
fundraising in the underperforming clubs.
Solution:
import pandas as pd
import [Link] as plt
df = pd.read_csv('[Link]')
[Link](df['Club'], df['Funds_Raised'], color=['#1f77b4', '#ff7f0e', '#2ca02c',
'#d62728', '#9467bd'])
[Link]('Student Clubs')
[Link]('Funds Raised (in USD)')
[Link]('Fundraising Performance by Student Clubs')
[Link]()
top_club = [Link][df['Funds_Raised'].idxmax(), 'Club']
low_club = [Link][df['Funds_Raised'].idxmin(), 'Club']
print(f"Top performer: {top_club}")
print(f"Lowest performer: {low_club}")
Dept. of AI and DS, MCE, Hassan 1
Data Visualization using Python (24SAD28)
2. Scatter Plot: Scenario-Based Question
Scenario:
A local health organization has collected data on several individuals,
including the number of hours they exercise each week and their
corresponding cholesterol levels. The organization wants to determine if
there is any visible relationship between exercise and cholesterol, with the
aim of informing health recommendations.
Question:
Using the dataset featuring ‘Hours of Exercise per Week’ (x-axis) and
‘Cholesterol Level’ (y-axis) for a group of individuals, create a scatter plot
with Matplotlib.
Make sure to annotate outliers, use appropriate axis labels and title,
and apply a visual style that makes trends easy to spot.
Based on your plot, discuss whether higher activity levels are
associated with lower cholesterol. What further investigations would
you propose based on the scatter plot’s pattern?
Solution:
import pandas as pd
import [Link] as plt
import numpy as np
df = pd.read_csv('scatter_data.csv')
[Link]('seaborn-v0_8')
[Link](df['Hours_Exercise_Per_Week'], df['Cholesterol_Level'],color='blue',
edgecolors='black', s=80)
slope,intercept=[Link](df['Hours_Exercise_Per_Week'],df['Cholesterol_Leve
l'], 1)
trend_line = slope * df['Hours_Exercise_Per_Week'] + intercept
[Link](df['Hours_Exercise_Per_Week'], trend_line, color='green', linestyle='--',
label='Trend Line')
outlier_index = df['Cholesterol_Level'].idxmax()
[Link]('Outlier', (df['Hours_Exercise_Per_Week'][outlier_index],
Dept. of AI and DS, MCE, Hassan 2
Data Visualization using Python (24SAD28)
df['Cholesterol_Level'][outlier_index]), xytext=(30, 20), textcoords='offset
points',
arrowprops=dict(arrowstyle='->', color='red'))
[Link]('Hours of Exercise per Week')
[Link]('Cholesterol Level (mg/dL)')
[Link]('Exercise vs Cholesterol Level')
[Link]()
[Link](True)
[Link]()
3. Histogram Plot: Scenario-Based Question
Scenario:
A fitness center has tracked the number of daily steps taken by each member
for an entire month. The management wants to understand the overall
distribution of activity among members to identify trends, such as how many
members are highly active versus those who are less active.
Question:
Using the collected step count data for all members, create a histogram plot
with Matplotlib to visualize the distribution of daily steps.
Choose an appropriate number of bins to reveal patterns such as
clusters of activity or notable gaps.
Customize the histogram with an informative title, labeled axes, and a
clear color scheme to distinguish the bars.
After plotting, interpret what the histogram shows about the general
activity levels of the fitness center members. What observations or
recommendations can you make for programs to engage the less active
members
Solution:
import pandas as pd
import [Link] as plt
df = pd.read_csv('steps_data.csv')
Dept. of AI and DS, MCE, Hassan 3
Data Visualization using Python (24SAD28)
[Link]('seaborn-v0_8')
[Link](df['Daily_Steps'], bins=8, color='skyblue', edgecolor='black')
[Link]('Daily Steps')
[Link]('Number of Members')
[Link]('Distribution of Daily Steps Among Fitness Center Members')
[Link](axis='y', linestyle='--', alpha=0.7)
[Link]()
4. Pie Chart: Scenario-Based Question
Scenario:
A company conducted a survey to determine how employees commute to
work. The results are categorized into methods like driving, cycling, public
transport, walking, and others. The HR department wants to present the
relative usage of each commuting method during a company-wide meeting.
Question:
Given the survey data representing the number of employees for each
commuting method, create a pie chart in Matplotlib to illustrate the
proportion of each method.
Ensure every slice of the pie is labeled clearly, and use distinct colors
for each commuting method for clarity.
Add percentage values on each slice for additional insight, and make
one of the less common methods “stand out” by slightly separating its
slice from the center (explode effect).
After creating the plot, explain which commuting methods are most
and least popular and briefly suggest how this information could
inform future company travel policies
Solution:
import pandas as pd
import [Link] as plt
df = pd.read_csv('commute_data.csv')
explode = [0.1 if method == 'Others' else 0 for method in f['Commute_Method']]
Dept. of AI and DS, MCE, Hassan 4
Data Visualization using Python (24SAD28)
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#9467bd', '#d62728']
[Link](df['Employees'], labels=df['Commute_Method'], autopct='%1.1f%%',
startangle=90, colors=colors, explode=explode, shadow=True)
[Link]('equal')
[Link]('Employee Commute Methods')
[Link]()
5. Linear Plotting: Scenario-Based Question
Scenario:
An environmental agency is monitoring river water levels at four key
locations along a river's course over the rainy season. The data collected
includes the position (in kilometres from the source) and the corresponding
average water level (in meters) at each site. The agency needs a clear way to
observe how water levels change linearly along the river.
Question:
Given the dataset of position vs. average water level for the four locations,
use Matplotlib to create a linear plot.
Label the x-axis as "Distance from Source (km)" and the y-axis as
"Average Water Level (m)".
Add a meaningful title reflecting the study.
After producing the plot, describe how the water level trends along
the river and identify any anomalies or expected patterns based on
geography.
Solution:
import pandas as pd
import [Link] as plt
df = pd.read_csv("05_river_water_levels.csv")
[Link](df['Position_km'], df['Avg_Water_Level_m'], marker='o', linestyle='-',
color='blue')
[Link]("Distance from Source (km)")
[Link]("Average Water Level (m)")
Dept. of AI and DS, MCE, Hassan 5
Data Visualization using Python (24SAD28)
[Link]("River Water Level Distribution Along Course During Rainy Season")
[Link](True)
[Link]()
#Interpretation
'''Water level rises to a peak at 10 km, then drops steadily.
This peak may be due to extra rain, a joining stream, or
and needs further checking.'''
6. Linear Plotting with Line Formatting: Scenario-Based Question
Scenario:
A telecommunications company is tracking the signal strength of its new
wireless tower installations across different cities. The engineering team
wants to visually compare the strength in each city but is particularly
interested in showing which towers performed above the national average
using a distinctive line style and color.
Question:
Provided with the signal strength (y-axis) data across a sequence of cities (x-
axis), use Matplotlib to plot:
The general signal strength using a basic line.
The national average as a dashed red line across the plot for reference.
For any cities where the signal strength exceeds the national average,
change the marker for those points to a green triangle and use a thicker
line segment.
Ensure the plot includes a legend explaining the formatting choices
and a suitable title.
After visualization, discuss which cities outperformed, and what
strategic steps the company might take next in cities below the national
average.
Solution:
import pandas as pd
import [Link] as plt
df = pd.read_csv("06_signal_strength.csv")
Dept. of AI and DS, MCE, Hassan 6
Data Visualization using Python (24SAD28)
cities = df['City']
signal_strength = df['Signal_Strength']
national_avg = df['National_Average'].mean()
[Link](figsize=(10, 6))
[Link](cities, signal_strength, label='Signal Strength', color='blue', linestyle='-',
marker='o')
[Link](national_avg, color='red', linestyle='--', label='National Average')
for i, strength in enumerate(signal_strength):
if strength > national_avg:
[Link](cities[i], strength, marker='^', color='green', markersize=10)
[Link]("City")
[Link]("Signal Strength (dBm)")
[Link]("Wireless Tower Signal Strength Comparison Across Cities")
[Link](rotation=45)
[Link]()
[Link](True)
plt.tight_layout()
[Link]()
#Interpretation
'''Cities B and E outperformed the national average signal strength,
indicating strong network performance in those regions. The company
should prioritize infrastructure upgrades and signal optimization in
underperforming cities like A, C, and D to ensure consistent service quality.'''
Dept. of AI and DS, MCE, Hassan 7
Data Visualization using Python (24SAD28)
7. Customizing Seaborn Plots with Aesthetic Functions
A city’s public health department is exploring trends in air quality,
temperature, and humidity over several years and wants to communicate
these insights through engaging and professional-quality plots at a science
fair. You are tasked with presenting these data in a compelling style, making
sure different elements are visually customized for clarity and impact.
Question:
Given a dataset of daily air quality, temperature, and humidity
measurements, create a multi-variable line or bar plot using Seaborn.
Use Seaborn’s aesthetic functions (set_style(), set_context(), custom color
palettes, etc.) to match the science fair’s branding.
Set background grids, adjust font sizes, and differentiate lines or bars by
color and linestyle.
After preparing the visualization, explain how each aesthetic adjustment
improved interpretability and how such customization helps
communicate your findings to a non-expert audience.
Solution:
import pandas as pd
import seaborn as sns
import [Link] as plt
df = pd.read_csv("07_air_quality_temperature_humidity.csv")
df['Date'] = pd.to_datetime(df['Date'])
sns.set_style("whitegrid")
sns.set_context("talk")
custom_palette = sns.color_palette("Set2")
[Link](figsize=(12, 7))
[Link](x='Date', y='Air_Quality_Index', data=df, label='Air Quality Index',
color=custom_palette[0], linestyle='-')
[Link](x='Date', y='Temperature_C', data=df, label='Temperature (°C)',
color=custom_palette[1], linestyle='--')
[Link](x='Date', y='Humidity_Percent', data=df, label='Humidity (%)',
color=custom_palette[2], linestyle='-.')
Dept. of AI and DS, MCE, Hassan 8
Data Visualization using Python (24SAD28)
[Link]("Daily Environmental Conditions Over Time", fontsize=18)
[Link]("Date", fontsize=14)
[Link]("Measurements", fontsize=14)
[Link](rotation=90)
[Link](title='Legend', fontsize=12,loc='upper right')
plt.tight_layout()
[Link]()
'''The use of distinct line styles and colors (solid, dashed, dash-dot)
and a whitegrid background improves clarity by making it easy to distinguish
between variables. '''
8. Bokeh Line Graph with Annotations and Legends
Scenario:
A meteorology team tracks temperature changes during a recent heatwave
and wants to highlight key temperature spikes and compare different cities’
trends. They need an interactive plot where significant events are easily
identified, and each city’s data is clearly distinguished in the legend.
Question:
Provided with time-stamped hourly temperature readings for three cities,
plot a Bokeh line graph:
Annotate record-breaking heat events with text or arrows.
Use a legend to distinguish the cities, with each line styled differently.
Customize the title position, font, and background for enhanced clarity.
After producing the plot, analyze which city experienced the most
extreme fluctuations and describe how the use of annotations and legends
supports your analysis.
Solution:
import pandas as pd
from [Link] import figure, show, output_file
from [Link] import Label
from [Link] import Category10
Dept. of AI and DS, MCE, Hassan 9
Data Visualization using Python (24SAD28)
df = pd.read_csv("08_bokeh_temperature_readings.csv")
cities = ["City1", "City2", "City3"]
columns = ["City1_Temp", "City2_Temp", "City3_Temp"]
colors = Category10[3]
hours = df["Hour"]
output_file("heatwave_plot.html")
p = figure(title="Heatwave Temperature Trends Across Cities",
x_axis_label="Hour of Day", y_axis_label="Temperature (°C)", width=800,
height=400)
[Link].text_font_size = '16pt'
[Link] = 'center'
[Link].background_fill_color = "#f0f0f0"
for i, (city, col) in enumerate(zip(cities, columns)):
temps = df[col]
line = [Link](hours, temps, line_width=2, color=colors[i], legend_label=city)
max_idx = [Link]()
max_temp = temps[max_idx]
max_hour = hours[max_idx]
label = Label(x=max_hour, y=max_temp + 1,
text=f"{city} High: {max_temp}°C",
text_font_size="10pt", text_color=colors[i])
p.add_layout(label)
[Link] = "top_left"
[Link].click_policy = "hide"
[Link].label_text_font_size = '10pt'
show(p)
9. Plotting Different Types of Plots Using Bokeh
Scenario:
The HR department of a large company is preparing a diversity report,
intending to visualize employee demographics, departmental size, average
years of experience, and gender ratios. Multiple plot types are required for
a comprehensive overview.
Question:
Using Bokeh, create a dashboard-style layout that includes:
A bar chart for the number of employees in each department.
Dept. of AI and DS, MCE, Hassan 10
Data Visualization using Python (24SAD28)
A pie (or wedge) chart showing gender distribution.
A scatter plot comparing years of experience and employee age.
A line chart depicting new hires each quarter.
Briefly justify your choice of plot types and explain how Bokeh’s
interactivity (such as tooltips and hover features) enhances the ability
of stakeholders to explore these diverse datasets
Solution:
import pandas as pd
from [Link] import figure, show, output_file
from [Link] import gridplot
from [Link] import ColumnDataSource, HoverTool
from [Link] import cumsum
from math import pi
df = pd.read_csv("09_bokeh_dashboard_data.csv")
output_file("hr_dashboard.html")
# 1. Bar Chart: Employees per Department
bar_source = ColumnDataSource(df)
bar_chart = figure(x_range=df['Department'], title="Employees per
Department", height=300, width=400, tools="hover", tooltips="@Department:
@Employees employees")
bar_chart.vbar(x='Department', top='Employees', width=0.6, color="skyblue",
source=bar_source)
bar_chart.xgrid.grid_line_color = None
bar_chart.y_range.start = 0
# 2. Pie Chart: Gender Distribution
gender_totals = {
'Male': df['Gender_Male'].sum(),
'Female': df['Gender_Female'].sum()
}
gender_df =
[Link](gender_totals).reset_index(name='value').rename(columns={'index':
'gender'})
gender_df['angle'] = gender_df['value'] / gender_df['value'].sum() * 2 * pi
gender_df['color'] = ["dodgerblue", "lightcoral"]
pie_chart = figure(title="Overall Gender Distribution", height=300, width=400,
toolbar_location=None, tools="hover", tooltips="@gender: @value")
Dept. of AI and DS, MCE, Hassan 11
Data Visualization using Python (24SAD28)
pie_chart.wedge(x=0, y=1, radius=0.4, start_angle=cumsum('angle',
include_zero=True), end_angle=cumsum('angle'), line_color="white",
fill_color='color', legend_field='gender', source=gender_df)
pie_chart.[Link] = False
pie_chart.[Link] = False
# 3. Scatter Plot: Experience vs Age
scatter_chart = figure(title="Experience vs Age per Department",
x_axis_label="Years of Experience", y_axis_label="Average Age", height=300,
width=400, tools="hover")
scatter_chart.circle(x='Years_Experience', y='Age', size=10,
color="mediumseagreen", source=bar_source)
scatter_chart.add_tools(HoverTool(tooltips=[("Department", "@Department"),
("Experience", "@Years_Experience"), ("Age", "@Age")]))
# 4. Line Chart: Quarterly New Hires
line_chart = figure(title="Quarterly New Hires per Department",
x_range=df['Department'], y_axis_label="New Hires", height=300, width=400,
tools="hover")
line_chart.line(x='Department', y='Quarters_New_Hires', line_width=2,
color="orange", source=bar_source)
line_chart.circle(x='Department', y='Quarters_New_Hires', size=8,
color="orange", source=bar_source)
line_chart.add_tools(HoverTool(tooltips=[("Department", "@Department"),
("New Hires", "@Quarters_New_Hires")]))
# Layout as dashboard
dashboard = gridplot([[bar_chart, pie_chart], [scatter_chart, line_chart]])
show(dashboard)
10. Drawing 3D Plots Using Plotly Libraries
Scenario:
An engineering research team simulates the motion of a drone through 3D
space, recording its position coordinates over time. They want to visualize
the flight path and analyze spatial patterns and altitude changes.
Question:
Dept. of AI and DS, MCE, Hassan 12
Data Visualization using Python (24SAD28)
Using the dataset that contains time-stamped (x, y, z) positions of the drone,
use Plotly to create an interactive 3D line plot of the flight path.
Color the segments by velocity or altitude changes.
Enable rotation and zoom so users can view the trajectory from any
angle.
Add axis labels and an informative title.
After visualizing, interpret what the 3D aspects of the plot reveal about
the drone’s movement that 2D projections cannot, and suggest one
additional insight made possible only by using a 3D Plotly approach
Solution:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
df = pd.read_csv("10_drone_flight_path.csv")
dx = df["X"].diff()
dy = df["Y"].diff()
dz = df["Z"].diff()
dt = df["Time"].diff().replace(0, [Link])
velocity = [Link](dx**2 + dy**2 + dz**2) / dt
velocity = [Link](0)
fig = [Link](data=go.Scatter3d(
x=df["X"],
y=df["Y"],
z=df["Z"],
mode='lines+markers',
marker=dict(
size=4,
color=df["Z"],
colorscale='Viridis',
colorbar=dict(title='Altitude'),
),
line=dict(
color=velocity,
colorscale='Jet',
width=4,
colorbar=dict(title="Velocity"),
),
text=[f"Time: {t}s<br>Velocity: {v:.2f} m/s" for t, v in zip(df["Time"],
Dept. of AI and DS, MCE, Hassan 13
Data Visualization using Python (24SAD28)
velocity)],
hoverinfo='text'
))
fig.update_layout(
title="3D Drone Flight Path with Altitude and Velocity",
scene=dict(
xaxis_title='X Position',
yaxis_title='Y Position',
zaxis_title='Altitude (Z)',
),
width=800,
height=600,
margin=dict(l=0, r=0, b=0, t=40)
)
[Link]()
Dept. of AI and DS, MCE, Hassan 14