0% found this document useful (0 votes)

54 views39 pages

Data Science Lab Manual 20 Programs

The document is a data science lab manual that includes various analyses and visualizations of cricket player performances, insurance datasets, temperature data, and student performance metrics. It covers statistical calculations such as averages, medians, modes, quartiles, skewness, and regression analysis, along with graphical representations using plots and histograms. Each section provides code snippets and outputs for practical understanding of data analysis techniques in Python.

Uploaded by

omsinghr2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views39 pages

Data Science Lab Manual 20 Programs

Uploaded by

omsinghr2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

DATA SCIENCE LAB MANUAL

1. Consider the following data of three cricket players in 10 innings T20

Match
Player 1 2 3 4 5 6 7 8 9 10
Cricketer 25 10 55 45 55 78 55 0 49 10
1
Cricketer 47 62 78 45 100 20 100 0 80 10
2
Cricketer 80 17 7 10 45 79 75 75 80 42
3

a) Find Whose average is better.

b) What is the middlemost value of each player?
c) Whose most frequent value is good.
d) Draw a simple plot to show performance of players.

Solution:
#Cricket Player Performance Analysis
import statistics as st
import [Link] as pt
import tabulate
Matches=[1,2,3,4,5,6,7,8,9,10]
Player1=[25,10,55,45,55,78,55,0,49,10]
Player2=[47,62,78,45,100,20,100,0,80,10]
Player3=[80,17,7,10,45,79,75,75,80,42]
#Player1 Summary
print("Player1 Mean = ",[Link](Player1))
print("Player1 Median = ",[Link](Player1))
print("Player1 Mode = ",[Link](Player1))
#Player2 Summary
print("Player2 Mean = ",[Link](Player2))
print("Player2 Median = ",[Link](Player2))
print("Player2 Mode = ",[Link](Player2))
#Player3 Summary
print("Player3 Mean = ",[Link](Player3))
print("Player3 Median = ",[Link](Player3))
print("Player3 Mode = ",[Link](Player3))
#Performance plot
[Link](Matches,Player1)
[Link](Matches,Player2)
[Link](Matches,Player3)
[Link]("Cricket Player Performance")
[Link]("Matches")
[Link]("Scores")
[Link](["Player1","Player2","Player3"])
[Link]()

OUTPUT:
Player1 Mean = 38.2
Player1 Median = 47.0
Player1 Mode = 55
Player2 Mean = 54.2
Player2 Median = 54.5
Player2 Mode = 100
Player3 Mean = 51
Player3 Median = 60.0
Player3 Mode = 80
Analysis
a) Player 2 average is better.
b) Player1 Median = 47.0, Player2 Median = 54.5, Player3 Median = 60.0
c) Player2
d) Draw a simple plot to show performance of players.
2. Consider Insurance Dataset and analyze following
a) Count Number of Male and Female
b) What is average age of peoples.
c) Display simple bar plot Gender wise
Solution:
import pandas as pd
import openpyxl
import statistics as st
import [Link] as pt
data = pd.read_csv("E:\Data Science with Python\DataSet\[Link]")
print(data)
#Analysis genderwise
ls=data['sex'].tolist()
y1=[Link]('female')
y2=[Link]('male')
print("female Count = ",y1)
print("male Count = ",y2)

#Aveage age of customers

avgage=data['age'].tolist()
print("Average Age= %.2f " % [Link](avgage))

#Display Histogram genderwise

x=["FEMALE","MALE"]
y=[y1,y2]
[Link](x,y)
[Link]("Genderwise Insurance Data")
[Link]("Gender")
[Link]("Count")
[Link]()

Analysis:
a)
female Count = 662
male Count = 676
b) Average Age= 39.21

c)
3. Consider Insurance Dataset and analyze data region wise. Also display
a simple bar chart region wise.

Solution:

import pandas as pd
import openpyxl
import [Link] as pt
data = pd.read_csv("E:\Data Science with Python\DataSet\[Link]")
print(data)

#Regionwise count
region=data['region'].tolist()
output=[]
for x in region:
if x not in output:
[Link](x)
print(output)
y1=[Link]('southwest')
y2=[Link]('southeast')
y3=[Link]('northwest')
y4=[Link]('northeast')
print("Southwest count= ",y1)
print("southeast count= ",y2)
print("northwest count= ",y3)
print("northeast count= ",y4)
[Link]("Regionwise Count")
[Link]("Region")
[Link]("Count")
y=[y1,y2,y3,y4]
[Link](output,y)
[Link]()

Analysis:
Southwest count= 325
southeast count= 364
northwest count= 325
northeast count= 324
4. Consider temperature dataset and analyze average of minimum and
maximum temperature, minimum temperature, maximum temperature
month wise.
Solution:
import pandas as pd
import openpyxl
import numpy as np
data=pd.read_excel("E:\\Data Science with Python\\DataSet\\
[Link]")
print(data)
df1 = ([Link](["Year",
"Month"],sort=False).agg(Avg_of_Max_Temp=("Max", 'mean'),
Max_temp=("Max",'max'),Avg_of_Min_Temp=("Min",
'mean'),Min_temp=("Min",'min')))
print(df1)

Analysis:
Avg_of_Max_Temp Max_temp Avg_of_Min_Temp Min_temp
Year Month
2022 January 29.290323 33 14.838710 11
February 32.535714 35 16.928571 14
March 35.451613 39 20.322581 17
April 36.666667 39 22.300000 19
May 33.838710 38 21.612903 19
June 31.533333 36 21.033333 20
July 28.225806 33 20.451613 19
August 28.419355 32 20.258065 19
September 29.533333 32 19.833333 18
October 29.741935 32 18.677419 14
November 30.433333 32 16.433333 11
December 29.870968 33 17.967742 14
[Link] following data and calculate Descriptive statistics using
formulas.
22,26,14,30,18,1135,41,12,32
Solution:
import numpy as np
import pandas as pd
data=[22,26,14,30,18,11,35,41,12,32]
print("Mean = %.2f"% [Link](data))
print("Median = ",[Link](data))
print("Max = ",[Link](data))
print("Min = ",[Link](data))
print("First Quartile =",[Link](data,0.25))
print("Second Quartile = ",[Link](data,0.50))
print("Third Quartile = ",[Link](data,0.75))
print("20 th Percentilee = ",[Link](data,20))
print("99 th Percentilee = ",[Link](data,99))
print("Standard deviation = %.2f" % [Link](data))
print("Variance = ",[Link](data))

OUTPUT:
Mean = 24.10
Median = 24.0
Max = 41
Min = 11
First Quartile = 15.0
Second Quartile = 24.0
Third Quartile = 31.5
20 th Percentilee = 13.6
99 th Percentilee = 40.46
Standard deviation = 9.83
Variance = 96.69
6. Find the Quartiles for the following Students Score data and visualize
graphically.
50,50,47,97,49,3,53,42,26,74,82,62,37,15,70,27,36,35,48,52,63,64.
Solution:
import numpy as np
import [Link] as pt
import numpy as np
import pandas as pd
data=[50,50,47,97,49,3,53,42,26,74,82,62,37,15,70,27,36,35,48,52,63,64]
print(data)
print("Quartile 1 = %.2f"%[Link](data,0.25))
print("Quartile 2 = %.2f"%[Link](data,0.50))
print("Quartile 3 = %.2f"%[Link](data,0.75))
[Link](figsize=(8,4))
[Link](data)
# Vertical lines for each percentile of interest
[Link]([Link](data, 0.25), linestyle='--', color='red')
[Link]([Link](data, 0.25), 4, 'Q1', color='r', ha='right', va='top',
rotation=60)
[Link]([Link](data, 0.50), linestyle='-', color='red')
[Link]([Link](data, 0.50), 4, 'Q2', color='r', ha='right', va='top',
rotation=60)
[Link]([Link](data, 0.75), linestyle='--', color='red')
[Link]([Link](data, 0.75), 4, 'Q3', color='r', ha='right', va='top',
rotation=60)
[Link]()

OUTPUT:
Quartile 1 = 36.25
Quartile 2 = 49.50
Quartile 3 = 62.75
7. Calculate the skewness for the following data also conclude skewness
85,96,76,108,84,100,86,70,95,84

Solution
# Importing library
import [Link] as pt
import statistics as st
import seaborn as sns
# Creating a dataset
dataset =[85,96,76,108,84,100,86,70,95,84]
meandata=[Link](dataset)
print("Mean = %.2f"%meandata)
modedata=[Link](dataset)
print("Mode = %.2f"%modedata)
meddata=[Link](dataset)
print("Median = %.2f"%meddata)
# Calculate the skewness
stddata=[Link](dataset)
print("Standard Deviation =%.2f" % stddata)
sk=(meandata-modedata)/stddata
print("Skewness= %.2f" % sk)
[Link](dataset)
[Link]()
OUTPUT:
Mean = 88.40
Mode = 84.00
Median = 85.50

Analysis: Distribution is Positively Skewed.

8. Consider Student Performance dataset and find skewness for all subjects.
import pandas as pd
import [Link] as plt
import openpyxl
data =pd.read_csv("E:\Data Science with Python\
DataSet\[Link]")
print(data)
print("Skew of Cloud Computing score:
%.2f"%data['Cloud Computing'].skew())
print("Skew of Data Science: %.2f"%data['Data
Science'].skew())
print("Skew of Computer Networks:
%.2f"%data['Computer Network'].skew())

[Link](figsize = (12,6))
[Link](1, 3, 1)
[Link](data['Cloud Computing'])
[Link]('Cloud Computing ')

[Link](1, 3, 2)
[Link](data['Data Science'])
[Link]('Data Science ')

[Link](1,3,3)
[Link](data['Computer Network'])
[Link]('Computer Network ')

[Link]()

OUTPUT:
Skew of Cloud Computing score: -0.28
Skew of Data Science: -0.26
Skew of Computer Networks: -0.29
Analysis:
All subjects Distribution is negatively skewed.
Maximum students score between 60-100.
9. Consider Student Performance dataset find basic statistics of data science
subject using pandas describe function, calculate skewness also visualize
distribution.
Solution:
import pandas as pd
import [Link] as plt
import seaborn as sns
from [Link] import skew, skewtest, norm
import openpyxl
data =pd.read_csv("E:\Data Science with Python\DataSet\
[Link]")
print(data)
print(data['Data Science'].describe())
print("Skewness= %.2f"%data['Data Science'].skew())
[Link](data['Data Science'], fit=norm, color="r")
[Link]()

OUTPUT:
count 1000.000000
mean 69.169000
std 14.600192
min 17.000000
25% 59.000000
50% 70.000000
75% 79.000000
max 100.000000
Name: Data Science, dtype: float64
Skewness= -0.26
[Link] Regression Line for the following data. Conclude your
analysis.
No. of 1 2 3 4 5 6 7 8
chimpanzees

No. of hunting 30 45 51 57 60 65 70 71

# Import packages
import numpy as np
import [Link] as plt
x= [Link]([1,2,3,4,5,6,7,8])
# Dependent Variable - percent of successful hunts
y = [Link]([30,45,51,57,60,65,70,71])
n = [Link](x)
x_mean = [Link](x)
y_mean = [Link](y)
b1=n * [Link](x*y)-[Link](x)*[Link](y)
b2=(n * sum(x*x) - ([Link](x)*[Link](x)))
b=(b1/b2)
a= y_mean-b*x_mean
print("Line Slope is : %.4f"%b)
print("Line Intercept is: %.4f"%a)
y_pred=b*x+a
[Link](x, y, color = 'red')
[Link](x, y_pred, color = 'green',label='y= 5.4405*x+31.6429')
[Link]('Number of Chimpanzees')
[Link]('Number of Hunts')
[Link]("Number of chimpanzees Vs Number of Hunts")
[Link]()
[Link]()

OUTPUT:

Line Slope is : 5.4405

Line Intercept is: 31.6429
Analysis:
Positive Correlation exist between number of chipanzees and number of
hunts.
[Link] performance of two students in different subjects using bar
graph. Also Comment on analysis.

Student CC DS ENG CN FE
Student1 85 87 80 84 96
Student2 65 64 55 54 60

import [Link] as plt

import numpy as np
Stud1=[85,87,80,84,96]
Stud2=[65,64,55,54,60]
# create plot
bar_width = 0.35
X = [Link](5)
p1 = [Link](X, Stud1, bar_width, color='b',label='Student1')
# The bar of second plot starts where the first bar ends
p2 = [Link](X + bar_width, Stud2,
bar_width,color='g',label='Student2')
[Link]('Subject')
[Link]('Scores')
[Link]('Student1 and Student2 Comparision ')
[Link](X + (bar_width/2) , ("CC","DS","CN","ENG","FE"))
[Link]()
plt.tight_layout()
[Link]()

OUTPUT:
Student1 performance is good compared to student2.

[Link] Pie chart for following data with explode, Shadow parameter.
cars AUD BMW FORD TESLA JAGUAR MERCEDES
I
data 23 17 35 29 12 41

# Import libraries
from matplotlib import pyplot as plt
import numpy as np
# Creating dataset
cars = ['AUDI', 'BMW', 'FORD','TESLA', 'JAGUAR', 'MERCEDES']
data = [23, 17, 35, 29, 12, 41]
# Creating plot
explode = [0.1, 0, 0.1, 0, 0,0.2]
[Link](data, labels = cars,autopct='%1.2f%%',
explode=explode,shadow = True,startangle = 90,counterclock=False)
# show plot
[Link]()

OUTPUT:
[Link] the following Marks data of students and draw color bar
for percentage. Also analyze data. Given marks is out of 30.40% and
above Passing percentage.
marks= [30,28,22,18,15,5,0,19,22,23]
import [Link] as plt
rollno=
["Amita","Roopa","Sonali","Santosh","Manali","Mohan","Pramveer","H
ema","Gita","Sohan"]
marks= [30,28,22,18,15,5,0,19,22,23]
perls=[]
for i in marks:
per="%.2f"%(i/30*100)
[Link](float(per))
[Link](figsize=(10, 5))
[Link](x=rollno, y=marks, c=perls, cmap="cool")
[Link](label="Percentage", orientation="horizontal")
[Link]("Student Performance in DS Subject")
[Link]("Students")
[Link]("Marks")
[Link]()

OUTPUT:
Analysis:
Mohan and pramveer is failed because their percentage is between 0
to 20. Remaining 6 students Passed DS exam.

[Link] subplot 2 by 2 for the following data of student deepali in for

different subjects. Comment on your analysis.

Test T1 T2 T3 T4 T5 T6
CC 20 23 25 26 28 30
DS 30 28 25 29 30 28
CN 29 28 25 22 21 19
ENG 15 19 20 15 22 23

import [Link] as plt

Test=['T1','T2','T3','T4','T5','T6']
CC=[20,23,25,26,28,30]
DS=[30,28,25,29,30,28]
CN=[29,28,25,22,21,19]
ENG=[15,19,20,15,22,23]
[Link](figsize=(10,6))
fig, ax = [Link](2,2)
ax[0,0].plot(Test,CC,'r-.',label='CC')
ax[0,0].legend()
ax[0,1].plot(Test,DS,'g--',label='DS')
ax[0,1].legend()
ax[1,0].plot(Test,CN,'y.-.',label='CN')
ax[1,0].legend()
ax[1,1].plot(Test,ENG,'b--',label='ENG')
ax[1,1].legend()
ax[0, 0].set_title("Cloud Computing")
ax[0, 1].set_title("Data Science")
ax[1, 0].set_title("Computer Network")
ax[1, 1].set_title("English")
# set spacing
fig.tight_layout()
[Link]()

OUTPUT:
Analysis:
Cloud Computing performance increased whereas Computer network decreased.
[Link] text Annotation for following data.
Color red black green yellow blue
Likes 50 80 30 60 70

import [Link] as plt

import numpy as np
color=['red','black','green','yellow','blue']
likes=[50,80,30,60,70]
f, ax = [Link]()
[Link](color,likes,color=color)
[Link](50, xy=(0.1, 50), xytext=(0.3, 51.5),
arrowprops=dict(facecolor='cyan', shrink=0.05,connectionstyle="angle3"))
[Link](80, xy=(1, 80), xytext=(1.2, 80.5),
arrowprops=dict(facecolor='cyan', shrink=0.1))
[Link](30, xy=(2, 30), xytext=(2.2, 30.5),
arrowprops=dict(facecolor='cyan', shrink=0.1))
[Link](60, xy=(3, 60), xytext=(3.2, 60.5),
arrowprops=dict(facecolor='cyan', shrink=0.1))
[Link](70, xy=(4, 70), xytext=(4.2, 70.5),
arrowprops=dict(facecolor='cyan', shrink=0.1))
[Link]("Color Likes Count")
[Link]("Colors")
[Link]("Likes")
[Link]()
OUTPUT:
[Link] Histogram comparison for following data. Also comment your
analysis.
# importing libraries
import [Link] as plt
# giving two age groups data
age_g1 = [1, 3, 5, 10, 15, 17, 18, 16, 19,
21, 23, 28, 30, 31, 33, 38, 32,
40, 45, 43, 49, 55, 53, 63, 66,
85, 80, 57, 75, 93, 95]

age_g2 = [6, 4, 15, 17, 19, 21, 28, 23, 31,

36, 39, 32, 50, 56, 59, 74, 79, 34,
98, 97, 95, 67, 69, 92, 45, 55, 77,
76, 85]
# plotting first histogram
[Link](age_g1, label='Age group1', bins=5, alpha=.7, edgecolor='red')
# plotting second histogram
[Link](age_g2, label="Age group2", bins=5, alpha=.7,
edgecolor='yellow')
[Link]()
# Showing the plot using [Link]()
[Link]()

OUTPUT:
In Age group1 0-20 years’ people are more where as in age group2 60-80
peoples are more.
[Link] 2D ndarray basic operation accessing, inserting, deleting,
updating elements operations also show additional functions of numpy
array.
import numpy as np
#Create 2 D Array
arr=[Link]([[1,2,3],[4,5,6],[7,8,9]])
arr1=[Link]([[10,11,12],[13,14,15],[16,17,18]])
#print array
print("Array = ",arr)
#Display Dimesion of array
print("Dimesion of array = ",[Link])
#Display Shape of Array
print("Dimesion of array = ",[Link])
# Access element 5
print("Accessed Element= ",arr[1,1])
#Insert new value at position 1 rowwise
arr=[Link](arr,1,[9,4,7],axis=0)
print("After Insertion = ",arr)
#Modification 8 with 88
arr[3,1]=88
print("After Modification = ",arr)
#Deleting elemnts
print(arr)
arr = [Link](arr, 1, axis=0)
print("After Deletion = ",arr)
#Addtional numpy array functions
print("Transpose of matrix= ",[Link](arr))
print("After Concatnation Columnwise of arr and arr1= ",
[Link]((arr,arr1),axis=1))
print("After Vetical stack operation on arr and arr1=
",[Link]((arr,arr1)))
print("After Horizontal stack operation on arr and arr1=
",[Link]((arr,arr1)))

OUTPUT:
Array = [[1 2 3]
[4 5 6]
[7 8 9]]
Dimesion of array = 2
Dimesion of array = (3, 3)
Accessed Element= 5
After Insertion = [[1 2 3]
[9 4 7]
[4 5 6]
[7 8 9]]
After Modification = [[ 1 2 3]
[ 9 4 7]
[ 4 5 6]
[ 7 88 9]]
[[ 1 2 3]
[ 9 4 7]
[ 4 5 6]
[ 7 88 9]]
After Deletion = [[ 1 2 3]
[ 4 5 6]
[ 7 88 9]]
Transpose of matrix= [[ 1 4 7]
[ 2 5 88]
[ 3 6 9]]
After Concatnation Columnwise of arr and arr1= [[ 1 2 3 10 11 12]
[ 4 5 6 13 14 15]
[ 7 88 9 16 17 18]]
After Vetical stack operation on arr and arr1= [[ 1 2 3]
[ 4 5 6]
[ 7 88 9]
[10 11 12]
[13 14 15]
[16 17 18]]
After Horizontal stack operation on arr and arr1= [[ 1 2 3 10 11 12]
[ 4 5 6 13 14 15]
[ 7 88 9 16 17 18]]

[Link] 3D ndarray basic operation accessing, inserting, deleting,

updating elements.
import numpy as np
arr=[Link]([[[1,2,3],[4,5,6]],
[[7,8,9],[10,11,12]],
[[13,14,15],[16,17,18]]])
arr1=[Link]([[[19,20,21],[22,23,24]],
[[25,26,27],[28,29,30]],
[[31,32,33],[34,35,36]]])
#Print Dimnsion and shape
print("Dimension= ",[Link],"Shape = ",[Link])
#Access 5
print("Accessing Element 5 =",arr[0,1,1])
#Access 10,11,12
print("Accessing Element [10,11,12] =",arr[1,1,:])
#Insert new row [[19,20,21],[22,23,24]]
arr=[Link](arr,3,[[19,20,21],[22,23,24]],axis=0)
print("After Insertion",arr)
#Modify 8 to 18
arr[1,0,1]=18
print("After Modifying 8 to 18 = ",arr)
#Delete row 2
arr=[Link](arr,2,axis=0)
print("After deleting 2 row = ",arr)
#Addtional Functions
print("Transpose of matrix= ",[Link](arr))
print("After Concatnation Columnwise of arr and arr1= ",
[Link]((arr,arr1),axis=1))
print("After Vetical stack operation on arr and arr1=
",[Link]((arr,arr1)))
print("After Horizontal stack operation on arr and arr1=
",[Link]((arr,arr1)))

OUTPUT:
Dimension= 3 Shape = (3, 2, 3)
Accessing Element 5 = 5
Accessing Element [10,11,12] = [10 11 12]
After Insertion [[[ 1 2 3]
[ 4 5 6]]

[[ 7 8 9]
[10 11 12]]

[[13 14 15]
[16 17 18]]

[[19 20 21]
[22 23 24]]]
After Modifying 8 to 18 = [[[ 1 2 3]
[ 4 5 6]]

[[ 7 18 9]
[10 11 12]]
[[13 14 15]
[16 17 18]]

[[19 20 21]
[22 23 24]]]
After deleting 2 row = [[[ 1 2 3]
[ 4 5 6]]

[[ 7 18 9]
[10 11 12]]

[[19 20 21]
[22 23 24]]]
Transpose of matrix= [[[ 1 7 19]
[ 4 10 22]]

[[ 2 18 20]
[ 5 11 23]]

[[ 3 9 21]
[ 6 12 24]]]
After Concatnation Columnwise of arr and arr1= [[[ 1 2 3]
[ 4 5 6]
[19 20 21]
[22 23 24]]

[[ 7 18 9]
[10 11 12]
[25 26 27]
[28 29 30]]

[[19 20 21]
[22 23 24]
[31 32 33]
[34 35 36]]]
After Vetical stack operation on arr and arr1= [[[ 1 2 3]
[ 4 5 6]]
[[ 7 18 9]
[10 11 12]]

[[19 20 21]
[22 23 24]]

[[25 26 27]
[28 29 30]]

[[31 32 33]
[34 35 36]]]
After Horizontal stack operation on arr and arr1= [[[ 1 2 3]
[ 4 5 6]
[19 20 21]
[22 23 24]]

[[ 7 18 9]
[10 11 12]
[25 26 27]
[28 29 30]]

[[19 20 21]
[22 23 24]
[31 32 33]
[34 35 36]]]

[Link] dataframe in python for IPL Data and apply some basic operation
on dataframe.
Team MI CSK Devils MI CSK RCB CSK CSK KKR KKR KKR

Year 2014 2015 2014 2015 2014 2015 2016 2017 2016 2014 2015
Points 876 789 863 673 741 812 756 788 694 701 804

import pandas as pd
df=[Link]({"Team":
["MI","CSK","Devils","MI","CSK","RCB","CSK",
"CSK","KKR","KKR","KKR"],
"Rank":[1,2,2,3,3,4,1,1,2,4,1],
"Year":[2014,2015,2014,2015,2014,2015,2016,2017,
2016,2014,2015],
"Points":[876,789,863,673,741,812,756,788,694,
701,804]},
index=["R1","R2","R3","R4","R5","R6","R7","R8",
"R9","R10","R11"])
print("DataFrame = ")
print(df)
#Access Rows 2,4,6,8 using index and using labels
print("After Accessing Rows 2,4,6,8 Using Labels = ")
print([Link][["R2","R4","R6","R8"]])
print("After Accessing Rows 2,4,6,8 Using Index = ")
print([Link][Link])
#Access top 3 Rows and also bottom 3 rowa
print("Top 3 Rows = ")
print([Link](3))
print("Bottom 3 Rows= ")
print([Link](3))
#Access columns team and points
print("After Accessing 2 Columns Team and Points= ")
print(df[['Team','Points']])
#Access Row 3 and column 1,3,4 using index
print("After Accessing row 3 and Columns 1,3,4 using index= ")
print([Link][2,[0,2,3]])
#Access Row 3 and column 1,3,4 using labels
print("After Accessing row 3 and Columns 1,3,4 using labels= ")
print([Link]["R3",['Team','Year','Points']])
#Update last record with values 'RCB',3,2016,800
[Link][10]=['RCB',3,2016,800]
print("After Updating Last Row = ")
print(df)
#Insert new record in dataframe
[Link][len([Link])] = ['MI',2,2017,800]
print("After Inserting Last Row= ")
print(df)
#Delete row from dataframe
df=[Link]([11])
print("After Deleting Last Row = ")
print(df)

OUTPUT: DataFrame =
Team Rank Year Points
R1 MI 1 2014 876
R2 CSK 2 2015 789
R3 Devils 2 2014 863
R4 MI 3 2015 673
R5 CSK 3 2014 741
R6 RCB 4 2015 812
R7 CSK 1 2016 756
R8 CSK 1 2017 788
R9 KKR 2 2016 694
R10 KKR 4 2014 701
R11 KKR 1 2015 804
After Accessing Rows 2,4,6,8 Using Labels =
Team Rank Year Points
R2 CSK 2 2015 789
R4 MI 3 2015 673
R6 RCB 4 2015 812
R8 CSK 1 2017 788
After Accessing Rows 2,4,6,8 Using Index =
Team Rank Year Points
R2 CSK 2 2015 789
R4 MI 3 2015 673
R6 RCB 4 2015 812
R8 CSK 1 2017 788
Top 3 Rows =
Team Rank Year Points
R1 MI 1 2014 876
R2 CSK 2 2015 789
R3 Devils 2 2014 863
Bottom 3 Rows=
Team Rank Year Points
R9 KKR 2 2016 694
R10 KKR 4 2014 701
R11 KKR 1 2015 804
After Accessing 2 Columns Team and Points=
Team Points
R1 MI 876
R2 CSK 789
R3 Devils 863
R4 MI 673
R5 CSK 741
R6 RCB 812
R7 CSK 756
R8 CSK 788
R9 KKR 694
R10 KKR 701
R11 KKR 804
After Accessing row 3 and Columns 1,3,4 using index=
Team Devils
Year 2014
Points 863
Name: R3, dtype: object
After Accessing row 3 and Columns 1,3,4 using labels=
Team Devils
Year 2014
Points 863
Name: R3, dtype: object
After Updating Last Row =
Team Rank Year Points
R1 MI 1 2014 876
R2 CSK 2 2015 789
R3 Devils 2 2014 863
R4 MI 3 2015 673
R5 CSK 3 2014 741
R6 RCB 4 2015 812
R7 CSK 1 2016 756
R8 CSK 1 2017 788
R9 KKR 2 2016 694
R10 KKR 4 2014 701
R11 RCB 3 2016 800
After Inserting Last Roe=
Team Rank Year Points
R1 MI 1 2014 876
R2 CSK 2 2015 789
R3 Devils 2 2014 863
R4 MI 3 2015 673
R5 CSK 3 2014 741
R6 RCB 4 2015 812
R7 CSK 1 2016 756
R8 CSK 1 2017 788
R9 KKR 2 2016 694
R10 KKR 4 2014 701
R11 RCB 3 2016 800
11 MI 2 2017 800
After Deleting Last Row =
Team Rank Year Points
R1 MI 1 2014 876
R2 CSK 2 2015 789
R3 Devils 2 2014 863
R4 MI 3 2015 673
R5 CSK 3 2014 741
R6 RCB 4 2015 812
R7 CSK 1 2016 756
R8 CSK 1 2017 788
R9 KKR 2 2016 694
R10 KKR 4 2014 701
R11 RCB 3 2016 800
[Link] program for pandas string functions.
import pandas as pd
import numpy as np
s = [Link](['Tom', 'William Rick', 'John',
'Alber@t', [Link], '1234','SteveSmith'])
print("Series=")
print(s)
print("Series in lowercase=")
print([Link]())
print("Series in uppercase=")
print([Link]())
s = [Link](['Tom ', ' William Rick', 'John',
'Alber@t'])
print("new series =")
print (s)
print ("After Stripping:")
print ([Link]())
print([Link](sep='_'))
time_sentences = ["Monday: The doctor's
appointment is at 2:45 pm.",
"Tuesday: The dentist's
appointment is at 11:30 am.",
"Wednesday: At 7:00 pm, there
is a basketball game!",
"Thursday: Be back home by
11:15 pm at the latest.",
"Friday: Take the train at
08:10 am, arrive at 09:00am."]

df = [Link](time_sentences,
columns=['text'])
print(df)
# find which entries contain the word
'appointment'
print("find which entries contain the word
'appointment")
print(df[df['text'].[Link]('appointment')])
# extract the entire time, the hours, the
minutes, and the period
print("extract the entire time, the hours, the
minutes, and the period")
print(df['text'].[Link](r'(?P<time>\d:\
d{1,2})'))

OUTPUT:
C:\Users\ADMIN\PycharmProjects\pythonProject\venv\Scripts\[Link] C:\
Users\ADMIN\PycharmProjects\pythonProject\[Link]
Series=
0 Tom
1 William Rick
2 John
3 Alber@t
4 NaN
5 1234
6 SteveSmith
dtype: object
Series in lowercase=
0 tom
1 william rick
2 john
3 alber@t
4 NaN
5 1234
6 stevesmith
dtype: object
Series in uppercase=
0 TOM
1 WILLIAM RICK
2 JOHN
3 ALBER@T
4 NaN
5 1234
6 STEVESMITH
dtype: object
new series =
0 Tom
1 William Rick
2 John
3 Alber@t
dtype: object
After Stripping:
0 Tom
1 William Rick
2 John
3 Alber@t
dtype: object
Tom _ William Rick_John_Alber@t
text
0 Monday: The doctor's appointment is at 2:45 pm.
1 Tuesday: The dentist's appointment is at 11:30...
2 Wednesday: At 7:00 pm, there is a basketball g...
3 Thursday: Be back home by 11:15 pm at the latest.
4 Friday: Take the train at 08:10 am, arrive at ...
find which entries contain the word 'appointment
text
0 Monday: The doctor's appointment is at 2:45 pm.
1 Tuesday: The dentist's appointment is at 11:30...
extract the entire time, the hours, the minutes, and the period
time
match
00 2:45
10 1:30
20 7:00
30 1:15
40 8:10
1 9:00

Process finished with exit code 0

DAV Practicle File
No ratings yet
DAV Practicle File
28 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
Data Science Practical Problems
No ratings yet
Data Science Practical Problems
40 pages
Experiment 11
No ratings yet
Experiment 11
5 pages
Semi-20-21 Dseclzg555 Ec3r QP
No ratings yet
Semi-20-21 Dseclzg555 Ec3r QP
5 pages
Guidelines DAVP
No ratings yet
Guidelines DAVP
3 pages
Python NumPy and Pandas Exercises
No ratings yet
Python NumPy and Pandas Exercises
24 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
DS Lab Manual Final
No ratings yet
DS Lab Manual Final
49 pages
Data Mining Lab Maual Through Python 031023
No ratings yet
Data Mining Lab Maual Through Python 031023
22 pages
DAV Practical
No ratings yet
DAV Practical
12 pages
01 Statistics With Python
No ratings yet
01 Statistics With Python
8 pages
Data Analysis and Visualization Course
No ratings yet
Data Analysis and Visualization Course
4 pages
WS#3 Python Data Science Toolbox - Nitro
No ratings yet
WS#3 Python Data Science Toolbox - Nitro
6 pages
Assignment 2 (Set B)
No ratings yet
Assignment 2 (Set B)
5 pages
Probability and Statistics (Tutorial 2)
No ratings yet
Probability and Statistics (Tutorial 2)
27 pages
Stastistics and Probability With R Programming Language: Lab Report
67% (3)
Stastistics and Probability With R Programming Language: Lab Report
44 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
38 pages
Week2 Lab
No ratings yet
Week2 Lab
8 pages
Data Science & Analytics Lab Manual
No ratings yet
Data Science & Analytics Lab Manual
39 pages
Exercise - 6: DS203-2024-S1 Problem1:: Statistics
No ratings yet
Exercise - 6: DS203-2024-S1 Problem1:: Statistics
10 pages
Data Science Practical Manual
No ratings yet
Data Science Practical Manual
14 pages
Python Data Science Lab Manual
No ratings yet
Python Data Science Lab Manual
21 pages
TSF - Rose Data
No ratings yet
TSF - Rose Data
31 pages
Solutions Modernstatistics
No ratings yet
Solutions Modernstatistics
140 pages
Descriptive Statistics, Unit 1
No ratings yet
Descriptive Statistics, Unit 1
4 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
End Semester Answer Key Format-Fods
No ratings yet
End Semester Answer Key Format-Fods
8 pages
FDS Important Q
No ratings yet
FDS Important Q
5 pages
Maths Project On Statistics
100% (1)
Maths Project On Statistics
7 pages
Data Analysis Lab with Python
No ratings yet
Data Analysis Lab with Python
11 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
3-Data Description
No ratings yet
3-Data Description
91 pages
Fds QB
No ratings yet
Fds QB
6 pages
Time Series Forecasting Business Report
No ratings yet
Time Series Forecasting Business Report
42 pages
Dav Lab Manual Final
No ratings yet
Dav Lab Manual Final
16 pages
Ad3301 Apr May 2024 Answer Key
No ratings yet
Ad3301 Apr May 2024 Answer Key
31 pages
Data Handling in Data Science
No ratings yet
Data Handling in Data Science
76 pages
Statistics Lab Report 2
No ratings yet
Statistics Lab Report 2
20 pages
Statistics: Dispersion & Data Analysis
No ratings yet
Statistics: Dispersion & Data Analysis
8 pages
Python Data Analysis and Visualization
No ratings yet
Python Data Analysis and Visualization
8 pages
ML Lab FileDhruv
No ratings yet
ML Lab FileDhruv
74 pages
GE02 (DAVP) Assignment
No ratings yet
GE02 (DAVP) Assignment
3 pages
DS Manual 1
No ratings yet
DS Manual 1
96 pages
Practical List 2022-23
100% (1)
Practical List 2022-23
4 pages
Python EDA Workshop with Olympics Data
No ratings yet
Python EDA Workshop with Olympics Data
12 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
Python Code 6-10 Class X
No ratings yet
Python Code 6-10 Class X
6 pages
Advance Operations On Dataframes: Create A Dataframe With Following Values
No ratings yet
Advance Operations On Dataframes: Create A Dataframe With Following Values
3 pages
Practical File 12.
No ratings yet
Practical File 12.
22 pages
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
5 - Data Summaries and Visualization
No ratings yet
5 - Data Summaries and Visualization
97 pages
AD3411 - 1 To 5
No ratings yet
AD3411 - 1 To 5
11 pages
PrepCourseStat Thanarak
No ratings yet
PrepCourseStat Thanarak
27 pages
Final
No ratings yet
Final
14 pages
Effective Meeting Guide for Leaders
No ratings yet
Effective Meeting Guide for Leaders
3 pages
Amplitude Modulation &demodulation: Experiment No. 3
No ratings yet
Amplitude Modulation &demodulation: Experiment No. 3
3 pages
Feedforward Control in Liquid Level Systems
No ratings yet
Feedforward Control in Liquid Level Systems
4 pages
Jazz Trombone Repertoire List
100% (2)
Jazz Trombone Repertoire List
32 pages
D&D Consent Form SM New and Improved 2024
No ratings yet
D&D Consent Form SM New and Improved 2024
3 pages
Fish
No ratings yet
Fish
32 pages
Dell Inspiron 14 7472
No ratings yet
Dell Inspiron 14 7472
2 pages
Heavy Duty Vehicle Grease Guide
No ratings yet
Heavy Duty Vehicle Grease Guide
2 pages
Amazon V/S Flipkart: Name-Kaerin Roll No - 19bba05
No ratings yet
Amazon V/S Flipkart: Name-Kaerin Roll No - 19bba05
7 pages
Savings Account Statement Summary
No ratings yet
Savings Account Statement Summary
2 pages
Maths p1 Gr11 Memo Nov 2023 - English-Afrikaans
No ratings yet
Maths p1 Gr11 Memo Nov 2023 - English-Afrikaans
15 pages
Q2 - First Summative Test
No ratings yet
Q2 - First Summative Test
3 pages
Advancements in Non-Destructive Testing
No ratings yet
Advancements in Non-Destructive Testing
10 pages
UNIT 06 TV Activity Worksheets
No ratings yet
UNIT 06 TV Activity Worksheets
3 pages
JC Agriculture Form 1
No ratings yet
JC Agriculture Form 1
86 pages
Eco and Environmental
No ratings yet
Eco and Environmental
11 pages
Chatterbox Magazine Edition 153 April/May 2025
No ratings yet
Chatterbox Magazine Edition 153 April/May 2025
40 pages
CAF 1 Test (IAS 8, ROE) With Solution
No ratings yet
CAF 1 Test (IAS 8, ROE) With Solution
4 pages
SKKD 162, SKKE 162: Thyristor Bridge, SCR, Bridge
No ratings yet
SKKD 162, SKKE 162: Thyristor Bridge, SCR, Bridge
4 pages
Be Our Guest
No ratings yet
Be Our Guest
11 pages
Negotiable Instruments Law Guide
No ratings yet
Negotiable Instruments Law Guide
24 pages
Doukas, Historia Turcobyzantina
No ratings yet
Doukas, Historia Turcobyzantina
173 pages
Fee Concession USOL
No ratings yet
Fee Concession USOL
5 pages
COMP-001 App1 - STS Questionnaire - Vessel and Cargo Information (M-12)
No ratings yet
COMP-001 App1 - STS Questionnaire - Vessel and Cargo Information (M-12)
3 pages
Ismi Baidar Aldilla - UAS Prosman - Kelas H
No ratings yet
Ismi Baidar Aldilla - UAS Prosman - Kelas H
5 pages
Thermodynamics - Chapter 2 - Lecture 2
No ratings yet
Thermodynamics - Chapter 2 - Lecture 2
20 pages
Advance Transportation Technology in Vehicular Ad-Hoc Network-IJAERDV04I0272219
No ratings yet
Advance Transportation Technology in Vehicular Ad-Hoc Network-IJAERDV04I0272219
4 pages
DRBD Primary/Unknown Issue Resolution
No ratings yet
DRBD Primary/Unknown Issue Resolution
13 pages
Class X Math Assignment 2024-25
No ratings yet
Class X Math Assignment 2024-25
1 page

Data Science Lab Manual 20 Programs

Uploaded by

Data Science Lab Manual 20 Programs

Uploaded by

DATA SCIENCE LAB MANUAL

1. Consider the following data of three cricket players in 10 innings T20

a) Find Whose average is better.

#Aveage age of customers

#Display Histogram genderwise

Analysis: Distribution is Positively Skewed.

Line Slope is : 5.4405

import [Link] as plt

[Link] subplot 2 by 2 for the following data of student deepali in for

import [Link] as plt

import [Link] as plt

age_g2 = [6, 4, 15, 17, 19, 21, 28, 23, 31,

[Link] 3D ndarray basic operation accessing, inserting, deleting,

Process finished with exit code 0

You might also like