0% found this document useful (0 votes)
12 views5 pages

TD5Numpy Pandas and Matplotlib

Uploaded by

Shei ma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

TD5Numpy Pandas and Matplotlib

Uploaded by

Shei ma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

(_(_((_Université_Cadi_Ayyad_de_marrakech_))__) EST d’Essaouira

Filière: IDSD -- Semestre 2 Python pour la science des données


Prof. Hanane GRISSETTE ; 2022/2023

TD5: Numpy, Pandas and matplotlib


Objectif : we'll learn about using numpy and pandas libraries for data manipulation
from scratch. we'll understand the syntax and commonly used functions of the
respective libraries. Later, we'll work on a real-life data set.

1 Numpy
Exercice 1 Write a NumPy program to find the set exclusive-or of two arrays. Set exclusive-or will return
the sorted, unique values that are in only one (not both) of the input arrays.
Array1 : [0 10 20 40 60 80]
Array2 : [10 30 40 50 70]
Unique values that are in only one (not both) of the input arrays: [0 20 30 50 60 70 80]
Exercice 2 Write a NumPy program to reverse an array (first element becomes last).
Original array:
[12 13 14 15 16 17 18 .... 37]
Reverse array:
[37 36 35 34 33 32 .... 12]
Exercice 3 Write a NumPy program to find common values between two arrays.
Expected Output:
Array1 : [0 10 20 40 60]
Array2 : [10 30 40]
Common values between two arrays: [10 40]
Exercice 5 : Write a NumPy program to convert a NumPy array to an image. Display the image.
Sample Output:

The Python Imaging Library PILLOW/ PIL that adds image processing capabilities to your Python
interpreter, use: from PIL import Image.
Use Method PIL.Image.fromarray() to convert array to image, e.g., Image.fromarray(data, ’RGB’)
Exercice 4 Write a NumPy program to select indices satisfying multiple conditions in a NumPy array.
Sample array :
a = np.array([97, 101, 105, 111, 117])
b = np.array([′ a′ ,′ e′ ,′ i′ ,′ o′ ,′ u′ ])
Note : Select the elements from the second array corresponding to elements in the first array that
are greater than 100 and less than 110
Expected Output: Elements from the second array corresponding to elements in the first array
that are greater than 100 and less than 110: [’e’ ’i’]
Exercice 6 : Write a NumPy program to calculate the Euclidean distance.
PS: In mathematics, the Euclidean distance or Euclidean metric is the ”ordinary” straight-line dis-
tance between two points in Euclidean space. With this distance, Euclidean space becomes a met-
ric space. The associated norm is called the Euclidean norm. Older literature refers to the metric
as the Pythagorean metric (source: Wikipedia).
Sample Output: Euclidean distance: 5.196152422706632

2 Pandas
1. Replace the spaces in my_str with the least frequent character.
Input :
1 my_str = 'dbc deb abed gade '

Desired Output :
1 'dbccdebcabedcgade ' # least frequent is 'c'

2. create a TimeSeries starting ‘2000-01-01’ and 10 weekends (saturdays) after that having ran-
dom numbers as values?
date_range() method returns the DateTime series according to the combination of three pa-
rameters from the following four parameters: start — the start date of the date range gen-
erated. end — the end date of the date range generated. periods — the number of dates
generated.
1 pd.date_range('2000 -01 -01 ', periods =10, freq='W-SAT')

Desired output
1 # values can be random
2 2000 -01 -01 4
3 2000 -01 -08 1
4 ......
5 2000 -02 -19 9
6 2000 -02 -26 6
7 2000 -03 -04 6

3. read and import BostonHousing dataset as a dataframe


4. Import ‘crim’ and ‘medv’ columns of the BostonHousing dataset as a dataframe.
5. Import every 50th row of BostonHousing dataset as a dataframe.
6. Read and import Cars93 dataset as textitdf_Cars93 dataframe.
7. Check if df _Cars93 has any missing values.
8. Count the number of missing values in each column of df_Cars93. Which column has the
maximum number of missing values?
9. Replace missing values in Min.Price and Max.Price columns with their respective mean.
10. In df_Cars93, use apply method to replace the missing values in Min.Price with the column’s
mean and those in Max.Price with the column’s median.

Page 2
11. Get the number of rows, columns, datatype and summary statistics of each column of the
df_Cars93 dataframe. Also get the numpy array and list equivalent of the dataframe.
12. From the Cars93 dataset, extract which manufacturer, model and type has the highest Price?
What is the row and column number of the cell with the highest Price value?
13. Rename the column Type as CarType in df_Cars93 and replace the ‘.’ in column names with
‘_’.
1 import pandas as pd
2 df_$Cars93$ = pd.read_csv('$Cars93$_miss.csv')
3 print(df.columns)
4 >>>> Index (['Manufacturer ', 'Model ', 'Type ', 'Min.Price ', 'Price ', 'Max.Price ', '
MPG.city ', 'MPG.highway ', ...]

The desired Output:


1 print(df_$Cars93$.columns)
2 >>>> Index (['Manufacturer ', 'Model ', 'CarType ', 'Min_Price ', 'Price ', 'Max_Price ', '
MPG_city ', 'MPG_highway ' ,......

14. Change the order of columns of a dataframe?


• In df, interchange columns ’a’ and ’c’.
• Create a generic function to interchange two columns, without hardcoding column names.
• Sort the columns in reverse alphabetical order, that is colume ’e’ first through column ’a’
last.

3 Matplotlib
Matplotlib is a Python 2D plotting library that produces high-quality charts and figures, which
helps us visualize extensive data to understand better. Pandas is a handy and useful data-structure
tool for analyzing large and complex data.
- Matplotlib is a library in Python and it is numerical – mathematical extension for NumPy
library.
- Pyplot is a state-based interface to a Matplotlib module which provides a MATLAB-like inter-
face.
- In this exercise, we are using Pandas and Matplotlib to analyze and visualize Company Sales Data.
1. - Download the csv company_sales_data file.
2. Read this file using Pandas or NumPy or using in-built matplotlib function.
3. utilising a line plot, Read Total profit of all months and plot it.
Total profit data provided for each month. Generated line plot must include the following
properties:
• X label name = Month Number
• Y label name = Total profit
- you may use the function matplotlib.pyplot.xticks() to get or set the current tick locations
and labels of the x-axis where the tick marks appear. This command affects the current axes.
4. Get total profit of all months and show line plot with the following Style properties. Gener-
ated line plot must include following Style properties:
• Line Style dotted and Line-color should be red
• Show legend at the lower right location.

Page 3
• X label name = Month Number
• Y label name = Sold units number
• Add a circle marker.
• Line marker color as read
• Line width should be 3
The matplotlib API in Python provides these properties as plot() function’s attributes, here is
an example:
1 import matplotlib.pyplot as plt
2 profitList = df ['total_profit ']. tolist ()
3 monthList = df ['month_number ']. tolist ()
4 ......
5 plt.plot(monthList , profitList , label = 'Profit data of last year ', color='r',
marker='o', markerfacecolor='k', linestyle='--', linewidth =3)
6 ..........

5. Read toothpaste sales data of each month and show it using a scatter plot Also, add a grid in
the plot. gridline style should “–“.
FYI : A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two
different numeric variables. The position of each dot on the horizontal and vertical axis in-
dicates values for an individual data point. Scatter plots are used to observe relationships
between variables.
The matplotlib API in Python provides the scatter() function. This method is used as follows:
1 import matplotlib.pyplot as plt
2 monthList = data_sales ['month_number ']. tolist ()
3 toothPasteSalesData = data_sales ['toothpaste ']. tolist ()
4 plt.scatter(monthList , toothPasteSalesData , label = 'Tooth paste Sales data ')
5 ..........

6. Display the number of units sold per month for each product using multiline plots. (i.e.,
Separate Plotline for each product ).
The graph should look like this:

7. Read facecream and facewash product sales data and show it using the bar chart The bar
chart should display the number of units sold per month for each product. Add a separate
bar for each product in the same chart.

FYI: A bar plot or bar chart is a graph that represents the category of data with rectangular
bars with lengths and heights that is proportional to the values which they represent. The

Page 4
bar plots can be plotted horizontally or vertically. A bar chart describes the comparisons
between the discrete categories.
The matplotlib API in Python provides the bar() function. This method is used as follows:
1 import matplotlib.pyplot as plt
2 monthList = data_sales ['month_number ']. tolist ()
3 faceCremSalesData = data_sales ['facecream ']. tolist ()
4 faceWashSalesData = data_sales ['facewash ']. tolist ()
5 plt.bar([a -0.25 for a in monthList], faceCremSalesData , width= 0.25, label = 'Face
Cream sales data ', align='edge ')
6 ......

8. Read sales data of bathing soap of all months and show it using a bar chart. Save this plot to
your hard disk.
Utilizing the bar() function, create your bar chart and save the figure, here is an example :
1 import matplotlib.pyplot as plt
2 monthList = data_sales ['month_number ']. tolist ()
3 bathingsoapSalesData = data_sales ['bathingsoap ']. tolist ()
4 plt.bar(monthList , bathingsoapSalesData )
5 ......
6 plt.savefig('sales_data_of_bathingsoap .png', dpi =150)
7 ....

9. Read the total profit of each month and show it using the histogram to see the most common
profit ranges.
FYI : matplotlib.pyplot.hist() Function The hist() function in pyplot module of matplotlib
library is used to plot a histogram.
The matplotlib API in Python provides the hist() function. This method is used as follows:
1 import matplotlib.pyplot as plt
2 profitList = data_sales ['total_profit ']. tolist ()
3 labels = ['low', 'average ', 'Good ', 'Best ']
4 profit_range = [150000 , 175000 , 200000 , 225000 , 250000 , 300000 , 350000]
5 plt.hist(profitList , profit_range , label = 'Profit data ')
6 ......

10. Calculate total sale data for last year for each product and show it using a Pie chart.
Note: In Pie chart display Number of units sold per year for each product in percentage.
With Pyplot, you can use the pie() function to draw pie charts:
1 import matplotlib.pyplot as plt
2 monthList = data_sales ['month_number ']. tolist ()
3 labels = [...]
4 salesData = [....]
5 plt.axis("equal")
6 plt.pie(salesData , labels=labels , autopct='%1.1f%%')
7 .......

11. Read Bathing soap facewash of all months and display it using the Subplot.
NB: pyplot.subplots creates a figure and a grid of subplots with a single call, while providing
reasonable control over how the individual plots are created.
1 import matplotlib.pyplot as plt
2 monthList = []
3 bathingsoap = []
4 faceWashSalesData = []
5 f, axarr = plt.subplots (2, sharex=True)
6 axarr [0]. plot(monthList , bathingsoap , label = '.... ', color='k', marker='o',
linewidth =3)
7 axarr [0]. set_title('..... ')
8 axarr [1]. plot(monthList , faceWashSalesData , label = '..... ', color='r', marker='o',
linewidth =3)
9 axarr [1]. set_title('.... ')

Page 5

You might also like