0% found this document useful (0 votes)
87 views18 pages

Unit 3 CHP 1

Matplotlib is a Python library used for data visualization and plotting graphs. It provides various plotting functions like bar(), pie(), scatter() etc. to create different types of plots and graphs. Pyplot is a Matplotlib submodule that provides simple functions for adding plot elements like lines, images, text etc. to figures. Matplotlib can be used to create bar graphs, pie charts, line graphs and other visualizations from data. It allows customization of colors, labels, legends and other plot elements.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views18 pages

Unit 3 CHP 1

Matplotlib is a Python library used for data visualization and plotting graphs. It provides various plotting functions like bar(), pie(), scatter() etc. to create different types of plots and graphs. Pyplot is a Matplotlib submodule that provides simple functions for adding plot elements like lines, images, text etc. to figures. Matplotlib can be used to create bar graphs, pie charts, line graphs and other visualizations from data. It allows customization of colors, labels, legends and other plot elements.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Matplotlib and Pyplot

1. Matplotlib is a low-level graph plotting library in python that serves as a visualization


utility. One of the greatest benefits of visualization is that it allows us visual access to
huge amounts of data in easily digestible visuals.
2. Matplotlib was created by John D. Hunter in the year 2002.
3. Matplotlib is open source and we can use it freely.
4. Matplotlib is mostly written in python, a few segments are written in C, Objective-C
and JavaScript for Platform compatibility.
5. Pyplot is a Matplotlib submodule that provides simple functions for adding plot
elements, such as lines, images, text, etc. to the axes in the current figure.
6. Most of the Matplotlib utilities lies under the pyplot submodule, and are usually
imported under the plt alias:

import matplotlib.pyplot as plt

7. Matplotlib can be installed using pip. The following command is run in the command
prompt to install Matplotlib.

pip install matplotlib

8. Uses:
a. Matplotlib is extremely powerful because it allows users to create numerous and
diverse plot types.
b. It can be used in variety of user interfaces such as IPhython shells, Python scripts,
Jupyter notebooks, as well as web applications and GUI toolkits.
c. It has support for LaTeX-formatted labels and texts. Has great control of every
aspect of a figure or a plot.
d. It supports high quality output in various formats including PNG, SVG and PDF.
e. One of the key features of Matplotlib is the possibility to use a programmatic
approach in which graphs are created by writing code. You control every aspect of
their appearance instead of manually creating graphs using a graphical user
interface. This is is extremely important because programmatically created
graphics can be made reproducible or easily adjusted when data is updated and are
time-saving, as there is no need to redo lengthy and tedious procedures in a GUI.
f. Matplotlib is open source and therefore data scientists and developers can use it for
free.

creating bar graph

1) A bar chart or bar graph is a chart or graph that presents categorical data with
rectangular bars with heights or lengths proportional to the values that they represent.
The bars can be plotted vertically or horizontally.
2) A bar graph shows comparisons among discrete categories. One axis of the chart
shows the specific categories being compared, and the other axis represents a
measured value.
3) Matplotlib API provides the bar() to draw bar graphs. It tskes various argumemts like
height, width, color,x,y

Bar graph

a) Creating simple bar graph using own data

import matplotlib.pyplot as plt

country = ['A', 'B', 'C', 'D', 'E']

gdp_per_capita = [45000, 42000, 52000, 49000, 47000]

plt.bar(country, gdp_per_capita)

plt.show()

####The categories and their values represented by the first and second argument as
arrays.
plt. show() starts an event loop, looks for all currently active figure objects, and opens
one or more interactive windows that display your figure or figures.

b) With labels

import matplotlib.pyplot as plt


country = ['A', 'B', 'C', 'D', 'E']
gdp_per_capita = [45000, 42000, 52000, 49000, 47000]
plt.bar(country, gdp_per_capita)
plt.title('Country Vs GDP Per Capita')
plt.xlabel('Country')
plt.ylabel('GDP Per Capita')
plt.show()

c) Horizontal Bars
If you want the bars to be displayed horizontally instead of vertically, use the barh()
function

import matplotlib.pyplot as plt


country = ['A', 'B', 'C', 'D', 'E']
gdp_per_capita = [45000, 42000, 52000, 49000, 47000]
plt.barh(country, gdp_per_capita)
plt.title('Country Vs GDP Per Capita')
plt.xlabel('Country')
plt.ylabel('GDP Per Capita')
plt.show()

d) Color
The bar() and barh() take the keyword argument color to set the color of the bars
You can use any of the 140 supported color names Or you can use Hexadecimal color
values

import matplotlib.pyplot as plt


x = ["A", "B", "C", "D"]
y = [3, 8, 1, 10]
plt.bar(x, y, color = "#4CAF50")
plt.show()

e) From CSV file

import matplotlib.pyplot as plt


import pandas as pd
data = pd.read_csv('data.csv')
df = pd.DataFrame(data)
X = list(df.iloc[:, 0])
Y = list(df.iloc[:, 1])
# Plot the data using bar() method
plt.bar(X, Y, color='g')
plt.title("innovative companies")
plt.xlabel("Countries")
plt.ylabel("Number of Companies")
plt.show()
To access data from the CSV file, we require a function read_csv() from Pandas that
retrieves data in the form of the data frame

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a


table with rows and columns
The iloc() function in python is one of the functions defined in the Pandas module that
helps us to select a specific row or column from the data set. Using the iloc() function
in python, we can easily retrieve any particular value from a row or column using
index values.

Syntax of iloc() function in Python


The syntax of the iloc() function in python is very simple. We can invoke the iloc()
function in python on the data set to retrieve rows and columns. As:

pandas.dataset.iloc[row, column]

f) Saving
savefig() is a function provided by the matplotlib. pyplot library, and is used to save
plotted images on the local machine.

import matplotlib.pyplot as plt


x = ["A", "B", "C", "D"]
y = [3, 8, 1, 10]
plt.bar(x, y, color = "#4CAF50")
a="g.png"
plt.savefig(a)
plt.show()

g) Changing size of the graph


We can change the size of the plot using the figsize() attribute of the figure() function.
The figsize() attribute takes in two parameters — one for the width and the other for
the height.

the syntax looks like:


figure(figsize=(WIDTH_SIZE,HEIGHT_SIZE))

import matplotlib.pyplot as plt


x = [2,4,6,8]
y = [10,3,20,4]
plt.figure(figsize=(6,2))
plt.plot(x,y)
plt.show()
What is a Pie chart?

1. A pie chart is basically a special chart that is used to show relative sizes of the data
with the help of Pie slices. So, it is a complete circle to represent the 100% space and
it creates pie slices to represent data sets.

2. It is a circular statistical plot that is used to display only one series of data.

3. The complete area of the pie chart is equal to the total percentage of the given data.

4. In the Pie Chart, the area of slices of the pie is used to represent the percentage of the
parts of the data.

5. The slices of the Pie are commonly known as wedges.

6. The area of the wedge mainly represents the percentage of that part with respect to the
whole data and can be calculated by the length of the arc of the wedge.

Matplotlib pie() Function

1. The pie() function in the pyplot module of matplotlib is used to create a pie chart
representing the data in an array.

2. The best pie chart can be created if the figure and axes are square, or the aspect of the
Axes is equal.

3. The required syntax for the pie() function is given below:


matplotlib.pyplot.pie(data, explode, labels, colors, autopct, shadow)

4. pie() Function Parameters:

A. data

This parameter is used to represents the array consisting of data values to be plotted,
the fractional area of each slice is indicated by data/sum(data). If the sum(data)<1,
then the data values return the fractional area directly, thus resulting pie will have an
empty wedge of size = 1-sum(data).

B. labels
This parameter represents a list of the sequence of strings which is used to set the label
of each wedge

C. autopct

This parameter is in the form of a string and is used to label the wedge with their
numerical value

D. colors

This parameter is used to provide color to the wedges.

E. shadow

This parameter is used to create the shadow of the wedges.

Examples:

a) simple pie chart

import matplotlib.pyplot as plt


y = [35, 25, 25, 15]
plt.pie(y)
plt.show()

b) With Labels
import matplotlib.pyplot as plt
y = [35, 25, 25, 15]
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
plt.pie(y, labels = mylabels)
plt.show()

c) Start Angle
you can change the start angle by specifying a startangle parameter. default angle is 0
import matplotlib.pyplot as plt
y = [35, 25, 25, 15]
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
plt.pie(y, labels = mylabels, startangle = 90)
plt.show()

d) Explode: The explode parameter allows one of the wedges to stand out.

import matplotlib.pyplot as plt


y = [35, 25, 25, 15]
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
myexplode = [0.2, 0, 0, 0]
plt.pie(y, labels = mylabels, explode = myexplode)
plt.show()

e) Shadow
import matplotlib.pyplot as plt
y = [35, 25, 25, 15]
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
myexplode = [0.2, 0, 0, 0]
plt.pie(y, labels = mylabels, explode = myexplode, shadow = True)
plt.show()

f) Colors
You can set the color of each wedge with the colors parameter.
The colors parameter, if specified, must be an array with one value for each wedge:

import matplotlib.pyplot as plt


y = [35, 25, 25, 15]
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
mycolors = ["black", "hotpink", "b", "#4CAF50"]
plt.pie(y, labels = mylabels, colors = mycolors)
plt.show()

g) Legend
To add a list of explanation for each wedge, use the legend() function:
import matplotlib.pyplot as plt
y = [35, 25, 25, 15]
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
plt.pie(y, labels = mylabels)
plt.legend()
plt.show()

h) Legend With Header

import matplotlib.pyplot as plt


y = [35, 25, 25, 15]
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
plt.pie(y, labels = mylabels)
plt.legend(title = "Four Fruits:")
plt.show()
i) Autopct : used to label the wedges with their numeric value. The label will be placed
inside the wedge. The format string will be fmt%pct.

import matplotlib.pyplot as plt


y = [35, 25, 25, 15]
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
plt.pie(y, labels = mylabels,autopct='%1.2f%%')
plt.show()
Scatter Plot

1. A scatter plot is a diagram where each value in the data set is represented by a dot.

2. With Pyplot, you can use the scatter() function to draw a scatter plot.

3. The scatter() function plots one dot for each observation. It needs two arrays of the
same length, one for the values of the x-axis, and one for values on the y-axis

4. Examples:

a) Simple scatter plot(single data)

import matplotlib.pyplot as plt

import numpy as np

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]

y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

plt.scatter(x, y)

plt.show()

b) Scatter plot(multiple data)

import matplotlib.pyplot as plt

import numpy as np

#data one

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]

y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

plt.scatter(x, y)

#data two

x = [2,2,8,1,15,8,12,9,7,3,11,4,7,14,12]

y = [100,105,84,105,90,99,90,95,94,100,79,112,91,80,85]

plt.scatter(x, y)

plt.show()
c) Color: You can set your own color for each scatter plot with the color or
the c argument

import matplotlib.pyplot as plt

import numpy as np

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]

y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

plt.scatter(x, y, c = 'hotpink')

x = [2,2,8,1,15,8,12,9,7,3,11,4,7,14,12]

y = [100,105,84,105,90,99,90,95,94,100,79,112,91,80,85]

plt.scatter(x, y, color = '#88c999')

plt.show()

d) Color Each Dot : You can even set a specific color for each dot by using an array of
colors as value for the c argument
Note: You cannot use the color argument for this, only the c argument.

import matplotlib.pyplot as plt


import numpy as np
x = [5,7,8,7,2,17,2,9,4]
y = [99,86,87,88,111,86,103,87,94]
colors =["red","green","blue","yellow","pink","black","orange","purple","beige"]
plt.scatter(x, y, c=colors)
plt.show()

e) Size: You can change the size of the dots with the s argument. Just like colors, make
sure the array for sizes has the same length as the arrays for the x- and y-axis

import matplotlib.pyplot as plt


import numpy as np
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
sizes = [20,50,100,200,500,1000,60,90,10,300,600,800,75]
plt.scatter(x, y, s=sizes)
plt.show()
Area Chart
An area chart is really similar to a line chart, except that the area between the x axis and the
line is filled in with color or shading. The matplotlib.pyplot.fill_between() is used to fill area
between two horizontal curves. This Function takes parameter like x, y, color, alpha etc.
Examples:
a) Simple area plot:
import matplotlib.pyplot as plt
x=range(1,6)
y=[1,4,6,8,4]
plt.fill_between(x, y)
plt.show()

b) Changing the default color of the area plot


import matplotlib.pyplot as plt
x=range(1,6)
y=[1,4,6,8,4]
plt.fill_between(x, y,color='#bedeaf')
plt.show()

c) Changing the transparency of the color


import matplotlib.pyplot as plt
x=range(1,6)
y=[1,4,6,8,4]
plt.fill_between(x, y,color='green',alpha=0.3)
plt.show()

d) adding a stronger line on top (edge)


import matplotlib.pyplot as plt
x=range(1,6)
y=[1,4,6,8,4]
plt.fill_between(x, y,color='green',alpha=0.3)
plt.plot(x, y, color="green")
plt.show()

e) Adding titles and labels. Also saving the image

import matplotlib.pyplot as plt


x=range(1,6)
y=[1,4,6,8,4]
plt.fill_between(x, y,color='green')
plt.title("An area chart", loc="left")
plt.xlabel("Value of X")
plt.ylabel("Value of Y")
plt.savefig('area.png')
plt.show()
f) Retrieving Data from CSV file

import matplotlib.pyplot as plt


import pandas as pd
data = pd.read_csv('country.csv')
df = pd.DataFrame(data)
x = list(df.iloc[:, 0])
y = list(df.iloc[:, 1])
plt.fill_between(x, y,color='green')
plt.title("An area chart", loc="left")
plt.xlabel("Year")
plt.ylabel("Population")
plt.show()
Matplotlib Histogram
1. A histogram is used to represent data provided in the form of some groups. It is an
accurate method for the graphical representation of numerical data distribution.
2. It is a type of bar plot where the X-axis represents the bin ranges while the Y-axis
gives information about frequency.
3. In Python hist() function in the pyplot of the Matplotlib library is used to plot a
histogram.
4. The following table shows the parameters accepted by matplotlib.pyplot.hist()
function :

Attribute Parameter
x array or sequence of array
bins optional parameter contains integer or sequence or strings
density optional parameter contains boolean values
range optional parameter represents upper and lower range of bins
histtype optional parameter used to create type of histogram [bar, barstacked,
step, stepfilled], default is “bar”
align optional parameter controls the plotting of histogram [left, right, mid]
weights optional parameter contains array of weights having same dimensions
as x
bottom location of the baseline of each bin
rwidth optional parameter which is relative width of the bars with respect to
bin width
color optional parameter used to set color or sequence of color specs
label optional parameter string or sequence of string to match with multiple
datasets
log optional parameter used to set histogram axis on log scale

Example:
a) Creating Simple histogram
import matplotlib.pyplot as plt
data = [32, 96, 45, 67, 76, 28, 79, 62, 43, 81, 70,61, 95, 44, 60, 69, 71, 23, 69, 54, 76,
67,88]
plt.hist(data)
plt.show()

b) Changing the color and edgecolor of the histogram


import matplotlib.pyplot as plt
data = [32, 96, 45, 67, 76, 28, 79, 62, 43, 81, 70, 61, 95, 44, 60, 69, 71, 23, 69, 54, 76,
67,88]
plt.hist(data,color='#bedeaf', edgecolor='green')
plt.show()

c) Adding Labels and title


import matplotlib.pyplot as plt
data = [32, 96, 45, 67, 76, 28, 79, 62, 43, 81, 70, 61, 95, 44, 60, 69, 71, 23, 69, 54, 76,
67,88]
plt.hist(data)
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Basic Histogram')
plt.show()

d) Bins
The towers or bars of a histogram are called bins. The height of each bin shows how
many values from that data fall into that range.

import matplotlib.pyplot as plt


height = [189, 185, 195, 149, 189, 147, 154,
174, 169, 195, 159, 192, 155, 191,
153, 157, 140, 144, 172, 157, 181,
182, 166, 167]
plt.hist(height, edgecolor="red", bins=10)
plt.show()

e) bin width
import matplotlib.pyplot as plt
marks = [1, 2, 3, 2, 1, 2, 3, 2,
1, 4, 5, 4, 3, 2, 5, 4,
5, 4, 5, 3, 2, 1, 5]
plt.hist(marks, bins=[1, 2, 3, 4, 5], edgecolor="black")
plt.show()

f) Data from CSV file and saving the output


import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv')
df = pd.DataFrame(data)
X = list(df.iloc[:, 0])
plt.hist(X, edgecolor="red")
plt.savefig(‘hist.png’)
plt.show()

You might also like