0% found this document useful (0 votes)
18 views13 pages

Data Visualization Lab Manual

The document is a lab manual for a Data Visualization course at Marathwada Institute of Technology, outlining various practical experiments including visualizing global health data, creating interactive sales dashboards, network analysis, geospatial data visualization, and time series visualization. Each experiment includes steps for defining objectives, acquiring and preparing data, selecting visualization techniques, and creating visualizations using tools like Python and Tableau. The manual serves as a comprehensive guide for students to learn and apply data visualization concepts effectively.

Uploaded by

Tushar Thote
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views13 pages

Data Visualization Lab Manual

The document is a lab manual for a Data Visualization course at Marathwada Institute of Technology, outlining various practical experiments including visualizing global health data, creating interactive sales dashboards, network analysis, geospatial data visualization, and time series visualization. Each experiment includes steps for defining objectives, acquiring and preparing data, selecting visualization techniques, and creating visualizations using tools like Python and Tableau. The manual serves as a comprehensive guide for students to learn and apply data visualization concepts effectively.

Uploaded by

Tushar Thote
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Marathwada Institute of Technology, CIDCO, Chhatrapati Sambhajinagar

MSc(CS/IT) – I sem
Subject: Data Visualization
Lab Manual

INDEX

S.NO NAME OF THE EXPERIMENT DATE SIGNATURE

1. Visualizing Global Health Data: Analysing and


presenting health indicators across countries to identify
patterns and disparities.

2. Interactive Sales Dashboard: Designing an interactive


dashboard to explore sales data and identify trends,
regions, and product performance.

3. Network Analysis: Visualizing social networks or


organizational structures to reveal connections and influence
patterns.

4. Geospatial Data Visualization: Mapping and analyzing


geographic data, such as population density, distribution of
resources, or climate patterns.

5. Time Series Visualization: Analyzing temporal data, such as


stock prices or weather patterns, to identify trends and make
predictions.
Practical 1: Visualizing Global Health Data: Analysing and presenting health
indicators across countries to identify patterns and disparities.

Step 1: Define the objective


A clear objective is crucial for an effective visualization.
 Goal: To analyze the relationship between a country's wealth and its life expectancy.

 Question: Do wealthier countries have a higher life expectancy at birth than less wealthy
countries?

 Hypothesis: There is a positive correlation between Gross Domestic Product (GDP) per capita
and life expectancy.

Step 2: Acquire and prepare the data


Global health datasets are available from reliable organizations like the World Health
Organization (WHO), the World Bank, and the Institute for Health Metrics and Evaluation
(IHME).
1. Select indicators:

1. Health Indicator: Life expectancy at birth.

2. Economic Indicator: GDP per capita.

2. Source the data: Obtain a dataset containing life expectancy and GDP per capita for multiple
countries and years. For this practical, we will use a hypothetical dataset, but real data can be
sourced from the WHO Global Health Observatory or World Bank Open Data.

3. Prepare the data: Ensure the data is clean and in the correct format. This includes:

0. Matching data from different sources by country and year.

1. Handling missing data points (e.g., using interpolation or filtering).

2. Removing inconsistencies to ensure data quality.

Step 3: Choose the right visualization technique


The choice of visualization depends on the data and the message you want to convey.
 Technique: A scatter plot is an excellent choice for visualizing the relationship between two
continuous variables, such as GDP per capita and life expectancy. Adding a third dimension, like
population size, can be done using a bubble chart, where the size of each bubble represents the
population.

Step 4: Create the visualization


Using a tool like Python with libraries (Matplotlib, Plotly) or a business intelligence tool
like Tableau, you can generate your visualization.
Example using Python with Matplotlib and Pandas
Here is a sample Python code to create a bubble chart that visualizes the relationship between life
expectancy, GDP per capita, and population size.

Practical No : 2 - Interactive Sales Dashboard: Designing an interactive dashboard


to explore sales data and identify trends, regions, and product performance.
 Define Goals & Key Metrics (KPIs):
 Identify Questions: What do you need to know? (e.g., "Which products are performing best in
the East region?", "What is the sales trend over the last quarter?")

 Choose Metrics: Select relevant Key Performance Indicators (KPIs) such as total sales, sales
growth rate, top-selling products, sales by region, and conversion rates.
 Prepare Your Data:
 Source & Organize Data: Gather sales data from various sources (e.g., CRM, spreadsheets)
and organize it in a clean, structured format, ensuring data accuracy.

 Use Data Models: For complex data, consider creating a data model (e.g., in Excel's Power
Pivot or a BI tool) to link different data tables effectively.
 Choose a BI Tool:
 Select a suitable tool that allows for creating interactive dashboards. Popular options
include Tableau, Qlik, and even advanced features in Microsoft Excel.
 Design the Layout & Visualizations:
 Layout: Arrange your dashboard with a predictable, easy-to-follow pattern, placing summary
information and key charts prominently.

 Interactive Elements: Incorporate filters, slicers, and drill-down capabilities so users can click
on elements to view more detailed information or change the data being displayed.

 Chart Selection:
 Trends: Use line charts to show sales trends over time.
 Product Performance: A bar chart or a treemap can effectively compare product
performance.

 Regions: A geographic map or a bar chart can be used to visualize sales by region.

 Overall Performance: Use KPI cards for key metrics like total revenue and target
achievement.
 Use Color Strategically: Employ clear color cues to tell data stories quickly and highlight
important information.
 Test & Refine:
 Test the dashboard with users to ensure it is intuitive and provides the insights they
need. Gather feedback and make necessary adjustments to improve usability and effectiveness.
2. Practical Example (Conceptual Diagram)
Imagine a dashboard with the following sections:

graph TD
A[Dashboard Header] --> B(Filters: Date, Region, Product)
A --> C(KPI Cards)
B --> D(Sales Trends - Line Chart)
B --> E(Regional Performance - Map/Bar Chart)
B --> F(Product Performance - Bar Chart/Table)
C -- Total Sales --> D
C -- Sales Growth --> D
C -- Top Products --> F
C -- Sales by Region --> E
 Header:

Displays the dashboard title and last updated date.

 Filters:

Allows users to select a specific date range, region, or product to view relevant data.

 KPI Cards:

Shows key summary metrics at a glance, such as total sales for the period, sales growth, and
average deal size.

 Sales Trends (Line Chart):

A dynamic chart showing sales over the selected time period, allowing users to spot upward or
downward trends.

 Regional Performance (Map/Bar Chart):

Visualizes sales by geographical area, enabling quick comparison of regional success.


 Product Performance (Bar Chart/Table):
Lists products with their corresponding sales, helping identify high-performing and
underperforming items.

Practical No : 3 - Network Analysis: Visualizing social networks or


organizational structures to reveal connections and influence patterns.

A network visualization, often called a sociogram, represents individuals or entities


as nodes and their relationships as edges. Key insights are gained by analyzing
features such as:
 Centrality: Identifying the most important or influential nodes.

 Clustering: Finding tightly-knit groups or communities.

 Pathways: Tracing how information or influence flows through the network.

Practical application: Organizational influence analysis


This practical exercise demonstrates how to use a network visualization to analyze
communication patterns within a company. The goal is to identify the most influential
employees and understand the informal social structure, which may differ from the
formal organizational chart.
Step 1: Data collection
First, you need data on who communicates with whom. For an internal company
analysis, this data can be collected through a confidential survey.
Survey questions:
 "Which three colleagues do you most frequently interact with on work-related matters?"

 "If you needed advice on a new project, who are the top two people you would go to?"

Step 2: Data preparation


Organize the survey responses into a simple two-column format listing the sender and
receiver of each interaction. This is your "edge list" and will look something like this:
Source (Sender) Target (Receiver)

Jane Alice

Mike Jane

Jane Frank

Alice David

David Jane

Step 3: Tool selection


For this visualization, you can use a tool like Gephi (a dedicated network analysis tool)
or Python libraries such as NetworkX and Matplotlib.
Step 4: Building and visualizing the network
Using a Python script with the NetworkX and Matplotlib libraries, you can create the
network visualization.

import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd

# Create a DataFrame from the prepared data


data = {
'Source': ['Jane', 'Mike', 'Jane', 'Alice', 'David'],
'Target': ['Alice', 'Jane', 'Frank', 'David', 'Jane']
}
df = pd.DataFrame(data)

# Create a directed graph from the DataFrame


G = nx.from_pandas_edgelist(df, 'Source', 'Target',
create_using=nx.DiGraph())

# Calculate node metrics (e.g., centrality)


degree_centrality = nx.degree_centrality(G)
node_size = [v * 1000 for v in degree_centrality.values()]
node_colors = ['red' if node == 'Jane' else 'skyblue' for node in G.nodes()]

# Create the layout


pos = nx.spring_layout(G, seed=42) # For a reproducible layout

# Draw the network


plt.figure(figsize=(10, 8))
nx.draw(G, pos, with_labels=True, node_color=node_colors,
node_size=node_size,
edge_color='gray', arrows=True, arrowsize=20, font_size=12)
plt.title('Organizational Communication Network')
plt.show()

Step 5: Diagram interpretation


Practical No.4 : Geospatial Data Visualization: Mapping and analyzing
geographic data, such as population density, distribution of resources, or climate
patterns.

1. Data acquisition
This is the first step, where the necessary geographic data is gathered from various sources.
 Vector data: Represents discrete features using points (e.g., specific locations like schools),
lines (e.g., roads, rivers), and polygons (e.g., country boundaries, land parcels).

 Raster data: Represents continuous phenomena as a grid of cells, such as satellite imagery,
aerial photographs, or digital elevation models (DEMs).
 Attribute data: Non-spatial information related to geographic features, such as population
counts for a city or resource levels for a region.

 Sources: Global Positioning System (GPS) devices, remote sensing satellites, public databases
(e.g., census data), and crowdsourced mapping platforms.

2. Data processing and storage


Once collected, data is cleaned, transformed, and organized to ensure accuracy and
compatibility.
 Integration and cleaning: Combining datasets from different sources and formats. This often
involves correcting errors and filling in missing data.

 Geospatial database: The cleaned data is stored in a specialized database, such as PostGIS, that
can handle complex geospatial data types and queries.

3. Geospatial analysis
This stage involves applying quantitative techniques to interpret the geographical data. For
mapping and analyzing population and resource data, key techniques include:
 Spatial query: Answering location-based questions like "Which cities have a population over
100,000?" or "How many resources are within a 50-mile radius of a particular point?".

 Statistical analysis: Measuring relationships and patterns, for example, correlating high
population density with low access to a specific resource.

 Density analysis: Calculating and mapping the concentration of a phenomenon, such as


population density, to identify "hotspots" and "coldspots".

 Spatial modeling: Creating predictive models, such as forecasting resource needs based on
projected population growth.

4. Data visualization
The processed and analyzed data is translated into visual forms that reveal patterns and insights.
Common map types include:
 Choropleth maps: Use color shading to represent different values across predefined geographic
areas like counties or states. This is ideal for visualizing population density across different
regions.
 Heatmaps: Use color gradients to represent the density or intensity of events or attributes,
revealing clusters or areas of high concentration. This is effective for showing resource
distribution or disease outbreaks.

 Proportional symbol maps: Use symbols of varying sizes to represent the magnitude of a
variable at a specific location. For example, a larger circle could indicate a larger population.

 Interactive maps: Allow users to pan, zoom, and click on features to explore data in more
detail. Tools like Leaflet or Folium allow for the creation of web-based interactive maps.

5. Interpretation and communication


The final, crucial step is to interpret the visualized data to draw conclusions and communicate
the findings.
 Identify patterns: The visual representations allow for the easy identification of spatial patterns,
such as urban concentration or uneven resource distribution.

 Generate insights: Use the identified patterns to inform decisions, such as where to allocate
resources or plan new infrastructure projects.

 Share results: The final maps, reports, and interactive dashboards are shared with stakeholders
to guide policy, strategy, or public awareness.

Practical No : 5 - Time Series Visualization: Analyzing temporal data, such as


stock prices or weather patterns, to identify trends and make predictions .

Time series visualization is the graphical representation of data points ordered chronologically to
identify trends, seasonal patterns, and anomalies. These visualizations are critical for analyzing
temporal data, such as stock prices or weather patterns, which helps in making informed
predictions.
Core components of time series data
Time series analysis often involves breaking the data into four components:
 Trend: The long-term, overall direction of the data, which can be upward, downward, or stable.

 Seasonality: Regular, repeating patterns that occur at fixed intervals, such as daily, weekly, or
yearly. For example, retail sales peaking during holiday seasons.
 Cyclicity: Fluctuations that occur over longer, less fixed periods than seasonal patterns, often
related to business or economic cycles.

 Irregularity (Noise): The random, unpredictable fluctuations in the data that are not explained
by the other components.

Key visualization techniques

Different chart types are used to highlight specific aspects of the data.

 Line charts: The most common visualization, ideal for showing a variable's value over time.
They are effective for spotting trends and identifying sudden shifts or anomalies.

 Moving averages: Smooth out short-term noise to reveal the underlying long-term trend more
clearly.

 Decomposition plots: Break down the time series into its trend, seasonal, and residual
components, providing deeper insight into the data's structure.

 Heatmaps: Excellent for visualizing patterns across two time dimensions, such as hours of the
day versus days of the week. Color intensity can indicate the value of the metric.

 Seasonal subseries plots: Organize data by season (e.g., month or quarter) to make recurring
seasonal patterns easier to compare and analyze.

 Autocorrelation plots (ACF): Help identify seasonality and lagged relationships by showing
the correlation between a time series and a lagged version of itself.

 Box plots: Useful for visualizing the distribution of data grouped by time intervals (e.g., year) to
understand the median, quartiles, and outliers.

Tools for time series visualization


A variety of tools are available, ranging from programming libraries to specialized dashboards.

 Python: The libraries Matplotlib, Seaborn, and Plotly are widely used for creating custom
plots. Pandas is essential for data manipulation.

 R: The ggplot2 package is a powerful tool for creating high-quality, customizable time series
visualizations.

 Tableau and Microsoft Power BI: Business intelligence platforms with graphical user
interfaces that allow for creating interactive and professional time series dashboards.

 Grafana: An open-source analytics platform for creating dashboards to monitor real-time


metrics from a variety of data sources.

 Excel: A simple and accessible tool for creating basic time series visualizations like line and bar
graphs.

You might also like