Informatics Practices (065)
CLASS-XII
Unit 1: Data handling using Pandas
Chapter 4: Data Visualization
Syllabus:
1. Purpose of plotting
2. Drawing and saving following types of plots using Matplotlib – line plot, bar graph,
histogram
3. Customizing plots: adding label, title, and legend in plots.
Purpose of Plotting: When data is shown in the form of pictures, it becomes easy for the
user to understand it.
Data Visualization
“Data visualization” is the representation of data in the form of pictures or graph.
It represents patterns, trends, correlations etc. in data and thereby helps decision
makers to make business decisions.
matplotlib
matplotlib is a high quality plotting library in python which provides many interfaces
and functions to present data in 2D graphics.
pyplot
pyplot is one of the sub-module within matplotlib library that is a collections of
methods which allows user to construct 2D plots easily.
To use pyplot, we need to first import it using import statement as –
1. import matplotlib.pyplot (But this method will require to type every command as -
matplotlib.pyplot.Command) Another method is
2. import matplotlib.pyplot as plt (Now we can qualify command as plt.Command.
here plt is an identifier)
Installing and importing Matplotlib
1. Open command prompt using cmd in search bar.
2. Change present working directory to the Scripts directory inside python folder using cd
command.
3. Install matplotlib using: pip install matplotlib
Note: To upgrade pip version:
Path to python directory\python.exe –m pip install –upgrade pip
Types of charts in python:
1. Line Chart: It displays information as a series of data points called ‘markers’ connected
by straight line.
1
2. Bar Chart: It presents category wise data in rectangular bars with length proportional to
the values. It can be horizontal and vertical.
3. Pie Chart: It is a circular chart divided into slices to represent the value/percentage.
I. Line Chart
Line graph is a simple graph that shows the result in the form of lines. To create a line
graph we need x and y coordinates
Plot() function is used to draw line chart
Various parameters to which can be set are –
plt.xlabel('time') xlabel sets the x-axis label
plt.ylabel('speed') ylabel sets the y-axis label
plt.yticks([5,7,10]) yticks sets the tick marks that appear on y-axis
plt.xticks([1,3,4],['abc','def','ghi']) xticks sets the ticks to appear on the x-axis at points [1,3,4] the
second parameter changes the corresponding labels to
['abc','def','ghi'].
plt.grid() displays the gridlines
plt.legend() displays the legend using the labels for the corresponding plots.
legend is drawn only after the plot() is called since it takes the
labels from plot function.
plt.plot(x, y, ‘colorname’)
Colorname blue green red cyan magenta yellow black white
Character ‘b’ ‘g’ ‘r’ ‘c’ ‘m’ ‘y’ ‘k’ ‘w’
1. Drawing a basic Line Chart using plot() function
The plot function accepts two datasets, the
first one is a list of x-coordinates and the
second a list of corresponding y-coordinates.
The number of values in both the x and y
lists must be same. The plt.plot(x,y) is used
to draw the graph and the plt.show()
function is used to display the plot on the
screen.import matplotlib.pyplot as plt
x=[1,2,3]
y=[2,3.5,5]
plt.plot(x,y)
plt.show()
2
2. Setting title of the chart and label of X and Y axis
import matplotlib.pyplot as plt
x=[10,20,30,40,50]
y=[60,100,140,190,270]
plt.xlabel("Overs")
plt.ylabel("Runs Scored")
plt.title("Over-wise runs scored by India Vs
England")
plt.plot(x,y,"g")
plt.show()
3. Drawing multiline charts
import matplotlib.pyplot as plt
x=[10,20,30,40,50]
y=[60,100,140,190,270]
z=[70,110,145,198,265]
plt.xlabel("Overs")
plt.ylabel("Runs Scored")
plt.title("Over-wise runs scored by India Vs
England")
plt.plot(x,y,"g")
plt.plot(x,z,"r")
plt.show()
Note: If no colour is specified, python plots the lines with different colours which are
decided internally by python.
4. Setting Line styles: linestyle or ls attribute
3
import matplotlib.pyplot as plt
x=[10,20,30,40,50]
y=[60,100,140,190,270]
z=[70,110,145,198,265]
plt.xlabel("Overs")
plt.ylabel("Runs Scored")
plt.title("Over-wise runs scored by India Vs
England")
plt.plot(x,y, 'r',linestyle="dashed")
plt.plot(x,z, 'c',ls="dotted")
plt.show()
linestyle or ls can be – ‘solid’, ‘dashed’, ‘dotted’, ‘dashdot’
5. Using marker type and marker size attribute in the plot function
import matplotlib.pyplot as plt
x=[10,20,30,40,50]
y=[60,100,140,190,270]
z=[70,110,145,198,265]
plt.xlabel("Overs")
plt.ylabel("Runs Scored")
plt.title("Over-wise runs scored by India Vs
England")
plt.plot(x,y,color='r',linestyle="dashed",
marker='x', markersize=15)
plt.plot(x,z,color='c',ls="dotted", marker='h',
markersize=10)
plt.show()
6. Combining colour and marker
import matplotlib.pyplot as plt
x=[10,20,30,40,50]
y=[60,100,140,190,270]
z=[70,110,145,198,265]
plt.xlabel("Overs")
plt.ylabel("Runs Scored")
plt.title("Over-wise runs scored by India Vs
England")
plt.plot(x,y,'r*')
plt.plot(x,z,'c+')
4
plt.show()
Note:
+c or c+ works the same.
Only points are plotted and no lines are drawn.
If ls attribute is used it draws lines also.
The above chart is also known as scatter chart.
II. Bar Chart
• A bar graph is used to represents data in the form of vertical or horizontal bars.
• A bar graph shows comparisons among discrete categories. One axis of the chart
shows the specific categories being compared, and the other axis represents a
measured value.
• bar() method is used to draw Vertical Bar Graphs. barh() is used to draw Horizontal
bar graphs.
Syntax: plt.bar(x-Interval, y-value)
1. Drawing a basic bar chart using bar() method
import matplotlib.pyplot as plt
OverInt=['1-10','11-20','21-30','31-40','41-50']
RunsScored=[60,50,55,42,70]
plt.xlabel("Over Interval")
plt.ylabel("Runs Scored")
plt.title("Over Interval/ runs scored by India Vs
England")
plt.bar(OverInt,RunsScored)
plt.show()
5
2. Changing the width and colour of the bar chart
import matplotlib.pyplot as plt
OverInt=['1-10','11-20','21-30','31-40','41-50']
RunsScored=[60,50,55,42,70]
plt.xlabel("Over Interval")
plt.ylabel("Runs Scored")
plt.title("Over Interval/ runs scored by India Vs
England")
plt.bar(OverInt,RunsScored, width=0.3,color='g')
plt.show()
Note: plt.bar(OverInt,RunsScored, 0.3,color='g') also works. Default width is 0.5
3. Changing width and colour of each bar in the bar
chart
import matplotlib.pyplot as plt
OverInt=['1-10','11-20','21-30','31-40','41-50']
RunsScored=[60,50,55,42,70]
plt.xlabel("Over Interval")
plt.ylabel("Runs Scored")
plt.title("Over Interval/ runs scored by India Vs
England")
plt.bar(OverInt,RunsScored,
width=(0.1,0.2,0.3,0.4,0.5), color=(‘r’,'g',’y’,’c’,’m’)
plt.show()
4. Drawing horizontal bar chart – barh() function
import matplotlib.pyplot as plt
OverInt=['1-10','11-20','21-30','31-40','41-50']
RunsScored=[60,50,55,42,70]
plt.ylabel("Over Interval")
plt.xlabel("Runs Scored")
plt.title("Over Interval/ runs scored by India Vs
England")
plt.barh(OverInt,RunsScored,height=(0.1,0.2,0.3,
0.4,0.5),color=('r','g','y','c','m'))
plt.show()
Note: x and y labels have been interchanged
5. Drawing multiple bar charts
import matplotlib.pyplot as plt
import numpy as np
6
RunsInd=[70,50,50,60,80]
RunsPak=[60,40,35,45,65]
x=np.linspace(1,51,5)
plt.xlabel("Over Interval")
plt.ylabel("Runs Scored")
plt.title("Over Interval/ runs scored by India Vs Pakistan")
plt.bar(x,RunsInd,width=4,color='r',label='Ind')
plt.bar(x+3,RunsPak,width=2,color='g',label='Pak')
plt.legend()
plt.show()
Note:
1. To decide the number of X points, we can use arange() or linspace() function which finds
the number of points based on the length of values in the sequence.
2. Decide the thickness of each bar and accordingly adjust X points on X-axis.
3. Give different color to different data ranges.
4. The width remains the same for all ranges being plotted.
III. Histogram
• It was first introduced by KarlPearson.
• Histogram shows distribution of values.
• It is an accurate graphical representation of the distribution of numerical data by
showing the number of data points that fall within a special range of values (called
bins)
• It is an estimate of the distribution of a continuous variable (quantitative variable).
• To construct a histogram, the first step is to “bin” the range of values —means divide
the entire range of values in to a series of intervals.
• And then count how many values fall in to each interval. The bins are usually specified
as consecutive, non-overlapping intervals of a variable.
• The bins (intervals) must be adjacent, and are often of equal size.
Difference between a histogram and a bar chart / graph –
histogram bar chart
histogram has number ranges A bar chart majorly represents categorical data
(data that has some labels associated with it),
they are usually represented using rectangular
bars with lengths proportional to the values
that they represent.
the bins(bars) of histogram have no gaps The bars of the bar-chart have gaps in between
as number ranges are consecutive, non-
overlapping intervals of a variable.
7
Attributes of a Histogram –
1. Title–To display heading of the histogram.
2. Color–To show the color of the bar.
3. Axis: y-axis and x-axis.
4. Data: The data can be represented as an array.
5. Height and width of bars: This is determined based on the analysis. The width of the
bar is called bin or intervals.
6. Border color–To display border color of the bar.
Example: Draw a histogram to show how many
students are there in different age ranges.
(Default bin value, color=’magenta’)
import matplotlib.pyplot as plt
years=[5,10,15,20]
age=[5,14,17,12,15,18,19,20,7,13,17,16,18,6,8]
plt.xlabel("age")
plt.ylabel("No. of Students")
plt.title("School Name")
plt.hist(age,color=”m”)
plt.show()
Note: By default histogram uses bin value of 10. i.e. 0-10, 10-20
In this example: 0-10 age group [5,6,7,8]-> freq=4,
10-20 age group [12,13,14,15,16,17,17,18,18,19,20]-> freq=11
Example: Draw a histogram to show how many
students are there in different age ranges.
(Default bin value, color=’magenta’,
histtype=’step’)
import matplotlib.pyplot as plt
years=[5,10,15,20]
age=[5,14,17,12,15,18,19,20,7,13,17,16,18,6,8]
8
plt.xlabel("age")
plt.ylabel("No. of Students")
plt.title("School Name")
plt.hist(age, color='m', histtype='step')
plt.show()
Note: histtype can be bar, barstacked, step, stepfilled. Default is bar.
Example: Draw a histogram to show how many students are there in different age
ranges. (Default bin value, color=’magenta’, Orientation=[horizontal, vertical])
import matplotlib.pyplot as plt
years=[5,10,15,20]
age=[5,14,17,12,15,18,19,20,7,13,17,16,18,6,8]
plt.xlabel("age")
plt.ylabel("No. of Students")
plt.title("School Name")
plt.hist(age, color='m',
orientation='horizontal')
plt.show()
Note: Default orientation is ‘vertical’
Example: Draw a histogram to show how many students are there in given age ranges.
import matplotlib.pyplot as plt
years=[5,10,15,20]
age=[5,14,17,12,15,18,19,20,7,13,17,16,18,6,8]
plt.xlabel("age")
plt.ylabel("No. of Students")
plt.title("School Name")
plt.hist(age, bins=years)
plt.show()
Note: bin1(5-10)=5,6,7,8 (4 values), bin2(10-
15)=12,13,14 (3 values), bin3(15-20)=
15,16,17,17,18,18,19,20(8 values)
Bins are optional parameters containing integer
or string or sequence values.
Example: Draw a histogram to show how many
students are there in given age ranges.
import matplotlib.pyplot as plt
years=[5,10,15,20]
plt.xlabel("age")
9
plt.ylabel("No. of Students")
plt.title("School Name")
plt.hist(years)
plt.show()
Note: By default hist() uses bin value of 10
Example: Draw a histogram to show how many students are there in age range 0-9, 10-
19, 20-29, 30-39
import matplotlib.pyplot as plt
years=[5,10,15,20]
#age=[5,14,17,12,15,18,19,20,7,13,17,16,18,6,8]
plt.xlabel("age")
plt.ylabel("No. of Students")
plt.title("School Name")
plt.hist(years,bins=[0,10,20,30,40])
plt.show()
Note: bin1(0-9)=5 (one value), bin2(10-19)=10,15 (two values), bin3(20-29)=20 (on vaue),
bin4(30-39)=nil(no value)
Example: Draw a histogram to show how many students are there in age range 0-9, 10-
19, 20-29, 30-39
import matplotlib.pyplot as plt
years=[5,10,15,20]
plt.xlabel("age")
plt.ylabel("No. of Students")
plt.title("School Name")
plt.hist(years,bins=[0,10,20,30,40],weights=[20,10,
45,33])
plt.show()
Note:
bin1(0-9) = 5(1 value), bin2(10-19) =10,15 (2 values), bin3(20-29)=20(1 value), bin4(30-39) =
nil(no value)
weights: bin1=20 bin2=20+10+45=75
bin3=75+33=108
Example: cumulative-bool: optional, Draw a histogram
to show how many students are there in age range 0-9,
10-19, 20-29, 30-39
10
Note: The last bin gives the total no. of datapoints, default=False
bin1(0-9) = 5(1 value), bin2(10-19) =10,15 (2 values), bin3(20-29)=20(1 value), bin4(30-39) =
nil(no value)
weights: bin1=20(for 1 value) bin2=10+45(for 2 values) bin3=33(for 1 value)
11