Rajarata University of Sri Lanka
Department of Computing
DATA VISUALIZATION WITH MATPLOTLIB
Table of Contents
Principle of analytic Graphics ....................................................................................................................... 1
Activities ....................................................................................................................................................... 2
Activity 01: Line Chart ............................................................................................................................. 2
Activity 02: Scatter Plot ............................................................................................................................ 3
Activity 3: Using Data in MS Excel ......................................................................................................... 4
Activity 4: Histogram ............................................................................................................................... 6
Activity 5(i): Bubble Chart ....................................................................................................................... 7
Activity 5(ii).............................................................................................................................................. 8
Activity 6: Emulating ggplot..................................................................................................................... 9
Activity 7: Multiple Subplots ................................................................................................................... 10
Activity 8: Exporting Plots ..................................................................................................................... 11
Activity 9(i): Bar Chart ........................................................................................................................... 12
Activity 9 (ii): Differentiate between the networks by applying different colors ................................... 13
Activity 10: Scatter Chart with trend line (Using Plotly Express) .......................................................... 13
Activity 11: Scatter Chart with multiple subplot (Using Plotly Express) ............................................... 14
Activity 12: Animated Chart ................................................................................................................... 15
Activity 13: Map ..................................................................................................................................... 16
Activity 14: Map with geopandas library ............................................................................................... 17
DATA VISUALIZATION WITH MATPLOTLIB
Principle of analytic Graphics
Principle 1: Show Comparisons
Principle 2: Show causality, mechanism, explanation, systemic structure
Principle 3: Show Multivariate data
Principle 4: Integration of evidence
Principle 5: Describe & document the evidence with appropriate labels, scales,
sources etc.
Principle 6: Content is king
A common visualization library in python is the matplotlib.You need to import matplotlib
before you start it in python
If matplotlib and pandas libraries are not available. Please install it, you can refer the
below link for your reference.
(Link : https://www.youtube.com/watch?v=YDqcGxo_4WQ )
1
DATA VISUALIZATION WITH MATPLOTLIB
Activities
Activity 01: Line Chart
Enter the below codes and check the output
import matplotlib.pyplot as plt
year = [1950, 1970, 1990, 2010]
pop = [2.519, 3.692, 5.263, 6.972]
plt.plot(year, pop)
plt.show()
Output:
2
DATA VISUALIZATION WITH MATPLOTLIB
Activity 02: Scatter Plot
import matplotlib.pyplot as plt
x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6]
y = [99, 86, 87, 88, 100, 86, 103, 87, 94, 78, 77, 85, 86]
plt.scatter(x, y, c="blue")
# To show the plot
plt.show()
3
DATA VISUALIZATION WITH MATPLOTLIB
Activity 3: Using Data in MS Excel
Note:Install openpyxl library
Create a sample data using MS Excel sheet. Then type the code and check the output.
Country Female Literacy Fertility Population
Afghanistan 53% 1.8 50
Albania 42% 2.4 100
Algeria 57% 3.3 150
Andorra 61% 3.5 200
Angola 41% 1.8 250
Anguilla 44% 2.1 300
Antigua and Barbuda 48% 1.7 350
Argentina 62% 3.2 50
Armenia 28% 2.2 100
Australia 35% 1.7 150
Austria 45% 1.8 200
Azerbaijan 48% 2.4 250
Bahamas 25% 3.3 300
Bahrain 56% 3.5 350
Bangladesh 24% 1.8 50
Barbados 51% 2.2 100
Chad 57% 1.8 150
Chile 42% 2.4 200
China 57% 3.3 250
Colombia 61% 3.5 300
Comoros 41% 1.8 350
Congo 44% 2.2 200
Cook Islands 28% 1.7 250
Costa Rica 35% 1.8 300
Côte d'Ivoire 45% 2.4 150
Croatia 48% 3.3 200
Cuba 25% 3.5 250
Cyprus 56% 1.8 300
Czechia 24% 2.2 350
Denmark 35% 2.4 150
Djibouti 45% 1.7 200
Dominica 48% 3.2 250
Dominican Republic 25% 2.2 450
Ecuador 56% 1.7 350
Egypt 35% 1.8 400
El Salvador 45% 2.4 120
Equatorial Guinea 48% 3.3 350
Eritrea 25% 3.5 150
Estonia 56% 1.8 200
Eswatini 45% 2.2 250
Ethiopia 48% 1.8 100
Fiji 25% 2.4 300
Finland 56% 3.3 350
France 24% 3.5 150
Gabon 35% 1.8 200
Gambia 45% 2.4 250
Georgia 48% 3.3 300
Germany 25% 3.5 120
Ghana 56% 1.8 150
Greece 35% 2.2 200
Grenada 45% 1.8 250
Guatemala 48% 2.2 300
Guinea 25% 1.8 300
Guinea-Bissau 56% 2.4 350
Guyana 24% 1.8 150
Haiti 35% 2.4 200
Holy See 45% 3.3 200
Honduras 48% 3.5 300
Hungary 25% 1.8 120
Iceland 56% 2.2 400
India 35% 1.8 420
4
DATA VISUALIZATION WITH MATPLOTLIB
import matplotlib.pyplot as plt
import pandas as pd
#Change the location of the Excel file according to yours
data = pd. ExcelFile('C:\\Users\\HpUser\\Downloads\population_literacy.xlsx')
population_literacy = data.parse('Sheet1')
# Rename coLumns to remove white spaces and inconsistent copitoLization
population_literacy.rename(columns={'Country ': 'Country', 'Continent':'Continent',
'female literacy' : 'Female Literacy','fertility': 'Fertility', 'population': 'Population'}, inplace=True)
plt.scatter(population_literacy['Female Literacy'], population_literacy['Fertility'])
plt.show()
Output
5
DATA VISUALIZATION WITH MATPLOTLIB
Activity 4: Histogram
import matplotlib.pyplot as plt
import pandas as pd
#Change the location of the Excel file according to yours
data = pd. ExcelFile('C:\\Users\\Hp User\\Downloads\population_literacy.xlsx')
population_literacy = data.parse('Sheet1')
# Rename coLumns to remove white spaces and inconsistent copitoLization
population_literacy.rename(columns={'Country ': 'Country', 'Continent':'Continent',
'female literacy' : 'Female Literacy','fertility': 'Fertility', 'population': 'Population'}, inplace=True)
#plt.scatter(population_literacy['Female Literacy'],population_literacy['Fertility'])
# FiLL missing vaLues in PopuLation column with the median vaLue.
population_literacy['Population'] =
population_literacy['Population'].fillna(population_literacy['Population'].median())
plt.hist(population_literacy['Population'], bins=5)
plt.show()
Output
6
DATA VISUALIZATION WITH MATPLOTLIB
Activity 5(i): Bubble Chart
import matplotlib.pyplot as plt
import pandas as pd
#Change the location of the Excel file according to yours
data = pd. ExcelFile('C:\\Users\\Hp User\\Downloads\population_literacy.xlsx')
population_literacy = data.parse('Sheet1')
# Rename coLumns to remove white spaces and inconsistent copitoLization
population_literacy.rename(columns={'Country ': 'Country', 'Continent':'Continent',
'female literacy' : 'Female Literacy','fertility': 'Fertility', 'population': 'Population'}, inplace=True)
plt.scatter(population_literacy['Female Literacy'],population_literacy['Fertility'],
s= population_literacy['Fertility'] ** 3,marker='o',c=population_literacy['Fertility'])
plt.show()
Output
7
DATA VISUALIZATION WITH MATPLOTLIB
Activity 5(ii)
Change the marker to marker='x' and check the output
Output
8
DATA VISUALIZATION WITH MATPLOTLIB
Activity 6: Emulating ggplot
import matplotlib.pyplot as plt
import pandas as pd
#Emulate ggplot
plt.style.use('ggplot')
#Change the location of the Excel file according to yours
data = pd. ExcelFile('C:\\Users\\HpUser\\Downloads\population_literacy.xlsx')
population_literacy = data.parse('Sheet1')
plt.scatter(population_literacy['Female Literacy'],population_literacy['Fertility'],
s= population_literacy['Fertility'] ** 4,marker='o',c=population_literacy['Fertility'])
#add Title
plt.title('Female Literacy vs. Fertility')
#Add x axis label
plt.xlabel('Literacy')
#add y axis label
plt.ylabel('# of Children')
plt.show()
Output
9
DATA VISUALIZATION WITH MATPLOTLIB
Activity 7: Multiple Subplots
import matplotlib.pyplot as plt
import pandas as pd
year = [1950, 1970, 1990, 2010]
pop = [2.519, 3.692, 5.263, 6.972]
#Change the location of the Excel file according to yours
data = pd. ExcelFile('C:\\Users\\Hp User\\Downloads\population_literacy.xlsx')
population_literacy = data.parse('Sheet1')
#Create figure and array ox containing the subplots
fig, ax = plt.subplots(nrows=2, ncols=2)
# Access the first subplot: upper Left
plt.subplot(2,2,1)
plt.plot(year, pop)
# Access the second subpLot: upper right
plt.subplot(2,2,2)
plt.scatter(population_literacy['Female Literacy'],population_literacy['Fertility'])
# Access the third subpLot: Lower Left
plt.subplot(2,2,3)
plt.hist(population_literacy['Population'], bins=5)
# Access the fourth subpLot: Lower right
plt.subplot(2,2,4)
plt.scatter(population_literacy['Female Literacy'], population_literacy['Fertility'],
s =population_literacy['Fertility'] ** 3, marker='o', c=population_literacy['Fertility'])
plt.show()
10
DATA VISUALIZATION WITH MATPLOTLIB
Activity 8: Exporting Plots
import matplotlib.pyplot as plt
import pandas as pd
year = [1950, 1970, 1990, 2010]
pop = [2.519, 3.692, 5.263, 6.972]
#Change the location of the Excel file according to yours
data = pd. ExcelFile('C:\\Users\\Hp User\\Downloads\population_literacy.xlsx')
population_literacy = data.parse('Sheet1')
#Create figure and array ox containing the subplots
fig, ax = plt.subplots(nrows=2, ncols=2)
# Access the first subplot: upper Left
plt.subplot(2,2,1)
plt.plot(year, pop)
# Access the second subpLot: upper right
plt.subplot(2,2,2)
plt.scatter(population_literacy['Female Literacy'],population_literacy['Fertility'])
# Access the third subpLot: Lower Left
plt.subplot(2,2,3)
plt.hist(population_literacy['Population'], bins=5)
# Access the fourth subpLot: Lower right
plt.subplot(2,2,4)
plt.scatter(population_literacy['Female Literacy'], population_literacy['Fertility'],
s =population_literacy['Fertility'] ** 3, marker='o', c=population_literacy['Fertility'])
plt.savefig('D:\\subplot.png')
plt.show()
11
DATA VISUALIZATION WITH MATPLOTLIB
Activity 9(i): Bar Chart
import pandas as pd
import plotly.express as px
#Change the location of the CSV file according to yours
Phone_Data=pd.read_csv('C:\\Users\\Hp User\\Downloads\Phone_Data.csv' )
# Get total duration for each network
total_duration_by_network=Phone_Data.groupby('network')['duration'].sum()
#Convert Series into Dataframe as required by Plotly Express
total_duration_by_network=total_duration_by_network.to_frame('duration').reset_index()
bar_chart =px.bar(total_duration_by_network.reset_index(),x="network",y="duration")
bar_chart.show()
12
DATA VISUALIZATION WITH MATPLOTLIB
Activity 9 (ii): Differentiate between the networks by applying different colors
import pandas as pd
import plotly.express as px
Phone_Data=pd.read_csv('C:\\Users\\Hp User\\Downloads\Phone_Data.csv' )
#Change the location of the CSV file according to yours
# Get total duration for each network
total_duration_by_network=Phone_Data.groupby('network')['duration'].sum()
#Convert Series into Dataframe as required by Plotly Express
total_duration_by_network=total_duration_by_network.to_frame('duration').reset_index()
bar_chart
=px.bar(total_duration_by_network.reset_index(),x="network",y="duration",color="network")
bar_chart.show()
Activity 10: Scatter Chart with trend line (Using Plotly Express)
Scatter plots support linear and non-linear trend lines.
import plotly.express as px
df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
fig.show()
13
DATA VISUALIZATION WITH MATPLOTLIB
Activity 11: Scatter Chart with multiple subplot (Using Plotly Express)
Can easily plot scatter chart using Plotly Express. You can easily create multiple subplots in a
very intuitive manner directly from the scatter function.
import plotly.express as px
df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", color="smoker",
facet_col="sex", facet_row="time")
fig.show()
14
DATA VISUALIZATION WITH MATPLOTLIB
Activity 12: Animated Chart
import plotly.express as px
df = px.data.gapminder()
fig = px.bar(df, x="continent", y="pop", color="continent",
animation_frame="year", animation_group="country", range_y=[0,4000000000])
fig.show()
Please click this to
Animated Chart_1.wmv see the animation
15
DATA VISUALIZATION WITH MATPLOTLIB
Activity 13: Map
Generate a map from a preloaded dataset in Plotly Express. Use the data function
to load gapminder dataset. Gapminder is a dataset about life expectancies across all
the countries in the world.
import plotly.express as px
df = px.data.gapminder().query("year==2007")
fig = px.scatter_geo(df, locations="iso_alpha", color="continent",
hover_name="country", size="pop",
projection="natural earth")
fig.show()
16
DATA VISUALIZATION WITH MATPLOTLIB
Activity 14: Map with geopandas library
Generate a map from a preloaded dataset in Plotly Express. Use the data function
to load geojson dataset.
Note:Install geopandas library
import plotly.express as px
import geopandas as gpd
df = px.data.election()
geo_df = gpd.GeoDataFrame.from_features(
px.data.election_geojson()["features"]
).merge(df, on="district").set_index("district")
fig = px.choropleth(geo_df,
geojson=geo_df.geometry,
locations=geo_df.index,
color="Joly",
projection="mercator")
fig.update_geos(fitbounds="locations", visible=False)
fig.show()
Output:
17