EXNO:1 DOWNLOAD, INSTALL AND EXPLORE THE FEATURES OF NUM i
iP
SCIPY, JUPYTER, STATSMODELS AND PANDAS
| DATE: S]i0/o%
‘AIM:
inthis experiment, we explore the knowledge of python packages download, install and
features.
OBJECTIVES:
stallation of python open source packages.
‘To understand the in:
of different python packages.
© To understand the feature
PROCEDURE:
oftware. (Anaconda is a distribution of the python
«Download and install Anaconda s\
easiest way to perform thousands of python open
programming language. It offers the
source packages and libraries).
‘Anaconda software, we download an and python open
= Using install Jupyter Notebook
source packages.
‘After Anaconda installation, open Anaconda Navigator from start menu.
select Environments in the left hand pane below home.
b. In Anaconda Navigator,
1 the right of where you selected and be
y". Click on it
c. Just low the "search environments” bar,
you should see “base (root)
pointing right should appear,
1, using below commands you install Jup)
click on it a select "open terminal”.
4. A triangle
yter Notebook and
«In Anaconda Terminal
packages successfully.
Download and install packages using Terminal Commands:
a, Activate Anaconda. This activates your conda environment.
> conda activate
b. Install Jupyter Notep6ok.
|¢. To list all of the installed packages in the conda aetive environment.
> pip list
d. Install packages.
> pip install
> Example:
«pip install numpy
+ pip install pandas
+ pip install scipy
‘pip install statsmodels
Verify package installation, and know the version of installed package.
> pip show
f. Upgrading package version.
> pip install -upgrade
g- Uninstall packages.
> pip uninstall
h. Deactivate the conda environment.
> conda deactivate
FEATURES OF PACKAGES:
A. Numpy (Numerical Python):
‘+ NumPy (Numerical Python) is an open source Python library that’s used in almost every
field of science and engineering. :
* Itis the core library for num
| The NumPy library
ind scientific computing.
ins multidimensional array and matrix data structures.‘It provides ndaray, a homogeneous n-dimensional array object, with methods to
efficiently operate on it.
‘+ NumPy can be used to perform a wide variety of mathematical operations on arrays,
+ It adds powerful data structures to Python that guarantee efficient calculations with
arrays and matrices and it supplies an enormous library of high-level mathematical
functions that operate on these arrays and matrices.
B. SciPy (Scientific Python):
+ SciPy is a collection of mathematical algorithms and convenience functions built on the
NumPy extension of Python,
+ It provides many user-friendly and effective numerical functions for numerical
integration and optimization,
‘+ Itallows users to manipulate the data and visualize the data using a wide range of high-
level Python commands.
+ The SciPy library supports integration, gradient optimization, special functions,
ordinary differential equation solvers, parallel programming tools, and many more.
C. Jupyter Notebook:
+ Jupyter Notebook is an open-source, web-based interactive environment.
Allows you to create and share documents that contain live code, mathematical
equations, graphics, maps, plots, visualizations, and narrative text.
ot
tegrates with many programming languages like Python, PHP, R, C#, ete.
+ Jupyter Notebook allows users to convert the notebooks into other formats such as
HTML and PDF.
© Jupyter Notebook is platform-independent because it is represented as JSON
GavaScript Object Notation) format, which is a language-independent, text-based
file format.
D. Statsmodels:
+ Statsmodels is a Python library built specifically for statistics.
* Statsmodels is built on top of NumPy, SciPy, and matplotlib,
It contains more advanced functions for statistical testing and modeling the
estimation en t statistical models, as well as for conducting statistical
tests, and statjstiGal data exploration,
E. Pandas ens* Pandas is an open-source Python Library providing high-performance data |
manipulation and analysis tool using its powerful data structures,
Fast and efficient DataFrame object with default and customized indexing. i
* Tools for loading data into i
memory data objects from different file formats. |
© Data alignment and integrated handling of missing data. |
«Reshaping and pivoting of date sets. |
Other Python important packages for data science:
1. Plotly: Plotly is a well-known Python data visualization package. It provides us with
interactive graphs that allow us to explore the relationship of variables.
2. Matplotlib: Matplotlib is the most famous Python data visualization package.
3. BeautifulSoup: Th
popular python library most commonly known for web crawling
and data scraping,
4. Scrapy: Scrapy is one of the most popular, fast, open-source web crawling frameworks
written in Python.
5. Scikit-learn: Scikit-learn, a machine learning library that provides almost all the machine
learning algorithms you might need.
6. Keras: Keras is a neural network library in Python. Aims to work quickly with deep
learning networks, while being designed to be compact, modular, and extensible.
7. TensorFlow: TensorFlow is a software Ii rey) framework to make machine learning and
deep learning concepts as simple asfossible.
‘The python packages are inslaHed and features are studied successfully.EX.NO: 2 WORKING WITH NUMPY ARRAYS
DATE: @}10)23
In this exercise learn about various functions of Numpy package to perform mathematical
and logical operations in numpy arrays.
OBJECTIVES:
To understand the multidimensional array and matrix data structures.
* To understand how to perform advanced operations on multidimensional array.
+ Tounderstand how to apply statistical operations to n-dimensional arrays.
* To .understand axis and shape properties for n-dimensional arrays.
PROCEDURE:
* Open Jupyter Notebook.
* Create new Notebook.
* Import numpy python library.
* Start using different numpy functions.
PROGRAM:
1. Import numpy package.
> import numpy as np
Print(np.__version_)
OUTPUT
1.23.5
2 Functions for creating numpy array,
mparray(), Ap.zeros(), np.ones(), np.empty(), np.aranj
* aenp.array({(1,2],{3,4),{5,6]])
b=np.zeros((2,4),dtypes
C=np.ones((2,4))
'8e0, np.linspace(), np. full), np.
pinta)e=np.arange(2,20,2)
fnp linspace(1,20,num=5)
g=np.full((2,2),3)
benp.eye(3,3)
i=np.repeat()
print(a,"W\b,"Wn'se,\n'd,"\n'e,'n'£,\n'yg,'W',h,'W')
ourPuT
(12)
B 4]
[5 6]}
[10 000)
[0000]
(0. t11]
(Li)
(0.111)
(LL Ly
[2 4 6 810121416 18]
[1.5.75 10.5 15.25 20. ]
(B 3)
(33])
[1.0.0]
0.1.0.)
(0.0.1.]]
1112233445566]
3. Numpy Array Indexing & Slicing.
> arr=np.array({11,12,13,14,15))
rint(arr)
Print("Ele 1 : “jarr{0))
Print("Ele 3 : "arr{2})
Print("Ble 1 to 3 : *,
Print("All elesaéhts : "arr(:])print("All elements except first 2: ",arr{2:])
arr = np.array({{1,2,3,4,5]{6,7,8,9,10]])
print("Array : ",art)
v
print("Dim : ",arr.ndim)
print("Ele (1,1) : ",arr{1]{1])
print("Ele (0,0) : ",arr{0}(0))
print("Ele (1,3) : ",arr(1][3])
print("First three Ele in first row : ",arr{0]{:3])
print("Ele in second row : ",arr(1][:})
print("All the elements in the matrix : \n",arr{:J[:])
> arr= np.array((20,21,22,23,24])
print("Last Ele : ",arr{-1])
print("Last three element : ",arr{-3:])
OUTPUT
(11 12 13 14 15)
Elel: 11
Ele3: 13
Ele 1 to3: [1112 13)
Allelements : [11 1213 14 15)
All elements except first 2: [13 14 15]
4. Attributes of the numpy array.
> ar=npaarray({C1,2,3)4,5,6],7,8,9])
Print("Dimension of the ndarray:"arr.ndim)
Print('Size ofthe ndarray in each dimension:",arrshape)
Print("Total number of elements in the ndatray:”
",arr.size)
print(’
"The data type of the elements of a NumP:
array:" arr.dtype)
print("
"Returns the size (in bytes) of each element ofa ndarray:",arr.itemsize)
OUTPUT
Amay: [[1 23 4 5]
[6789 1o))
Dim: 2Ble (1,1): 7
Ble (0,0): |
Ble (1,3): 9
First three Ble in first row : [1.2.3]
Ele in second row: [6 7 8 910]
All the elements in the matrix :
[12345]
[678 9 10}}
5. Reshaping array.
np.reshape(), np.flatten()
> arr= np.array([1, 2, 3, 4, 5, 6, 7, 8,9,10,11,12))
nowarr = arr.reshape(2, 2, 3)
print(newarr)
> a=nparray(([1,2), (3,4]))
arr=a.flatten()
print(arr)
OUTPUT
Dimension of the ndarray: 2
Size of the ndarray in cach dimension: (3, 3)
Total number of elements in the ndarray: 9
The data type of the elements of a NumPy array: int64
Returns the size (in bytes) of each element of a ndarray: 8
6. Joining, Sorting, Splitting Array.
"p.concatenate(), np.sort(), np.array_split()
> arr=np.array({[3,5,89,34,6,5,34,6])
Print("Sorting the array:",np.sort(arr))
Print("Spliting the array:",np.array_split(arr,2))
arrl = np.array({{1, 2, 3},7,8,9}])
arr2 = np.array({[4, 5, 6},[10.OUTPUT
Sorting the array: [3 5 5 6 6343489]
Spliting the array: [array({ 3, 5, 89, 34)), array({ 6, 5,34, 61)]
Joining two array: [[ 1 2 3]
[789]
145 61
[10 11 12]]
7. Basic mathematic operations in array,
> a= nparray((7,3,4,5,1])
b= np.array([3,4,5,6,7])
print("Addition:",np.add(a,b))
print(’Multiplication:",np.multiply(a,b))
Print("Subtraction:",np.subtract(a,b))
print("Power:",np.power(a,b))
print("Division:" np.divide(a,b))
Print("Modulo Devision:" np.remainder(a,b))
OUTPUT
Addition: [10 7 911 8]
Multiplication: (21 12 20 30 7]
Subtraction: [ 4 -1 -1 -1 -6]
Power: [ 343 81 102415625 1]
Division: [2.33333333 0.75 0.8 —_0.83333333 0, 14285714)
Modulo Devision: [13 45 1]
8. Creating array from existing data
"p.asarray(), np.copy(), np.view()
> a(1,2,3,4,5,6,7,8,9)
arr=np.asarray(a)
rint(arr)
> ™np.array((1,2,3,4,5,6))
b=a.copy()
af4}=90print("Original:",a)
print("Copy:",b)
> asnp.array((1,2,3,4,5,6))
b=a.view()
print("Original:",a)
print("Copy:",b)
OUTPUT
[123456789]
Original: [1 2 3 490 6]
Copy: [123456]
Original: [123 45 6]
Copy: [123456]
9. More useful statistical array operations.
> a=npaarray((21,22,34,45,56,67,31,78])
print("Sum:",np.sum(a))
print("Minimum:",np.min(a))
print("Maximum:",np.max(a))
print("Mean:",np.mean(a))
print("Standard Deviation:",np.std(a))
print(""Varience:"np.var(a))
print("Exponent:",np.exp(a))
print(""Square:",np.sqrt(a))
print("Percentile:",np.percentile(a,25))
OUTPUT
Sum: 354
Minimum: 21
Maximum: 78
Mean: 44,25
Standard Deviation: 19,721498421773127
Varience: 388.9375
Exponent: [1.3188]
1€+09 3,5849128Se+09 $,83461743e+14 3.493427] 1le+197.09165950¢+24 1.25236317e+29 2.90488497e+13 7.49841700e+33)
Square: [4.58257569 4.69041576 5.83095189 6.70820393 7.48331477 8.18535277
5,56776436 8.83176087]
Percentile: 28.75
10, Iterating array.
> aenparray((3,4,6,7,78,8,98,9,9])
for x in np.nditer(a):
print(x)
ouTPUT
3
4
6
7
7B
8
98
9
9
11, Save and Load numpy array.
> a-np.array({[1,2,3,4],(3,4,5,6]))
np.savetxt(‘one.txt'a,delimiter="")
a=np.array({[1,2,3,4],[3,4,5,6]])
np.loadtxt(‘one.txt’delimiter=')
OUTPUT
array({[1., 2.3.4.)
(3.4.5. 6.)
12, Get array unique items.
> a=npaarray({11, 11, 12, 13, 14, 15, 16, 17, 12, 13, 11, 14, 18, 19, 20))
unique_values = np,uffique(h)OUTPUT
[11 12 13 14 15 16 17 18 19 20)
13, Reverse an array.
> arr=np.array([[1, 2, 3, 4], (5, 6, 7, 8], [9, 10, 11, 12]])
reversed_atr = np-flip(arr)
print(reversed_arr)
ouTPUT
[U2 1110 9)
[8765]
(432)
14, Random number generation,
> a=np.random.randint(1,7,size=10)
print(a)
OUTPUT
15213521665)
a
In this exercise the numy Y package functions are studied and exec ited successfi
ise PY pi i
gk ir ic ally.EX.NO :3 WORKING WITH PANDAS DATA FRAMES
| DATE: 4] 10)23
AIM:
The aim of this exercise is to acquire the knowledge of pandas package for data
manipulation and analysis.
OBJECTIVE:
* To understand DataFrame object creation for data manipulation with integrated
indexing.
* To understand data alignment and integrated handling of missing data.
* To .understand reshaping and pivoting of data sets.
+ To understand data set merging and joining.
+ To understand data filtration.
To understand group by engine allowing split-apply-combine operations on data sets.
* To understand data structure column and row insertion and deletion.
To understand easy to convert NumPy data structures into DataFrame objects and
DataFrame objects to NumPy data structures.
To understand reading and writing data between in-memory data structures and
different file formats.
PROCEDURE:
* Open Jupyter Notebook.
* Create new Notebook.
Import pandas python library.
Start using different pandas functions,
PROGRAM:
1. Import the packa;
> imporprint(pd.__version__)
OUTPUT
15.3
2, Object Creation (DataFrame and Series),
> values = (91, 7, 2,10,14,15]
myseries = pd Series(values, index
‘a,
"ernd""e","P"))
print(myseries)
> stu_personal=pd,DataFrame( {'Rollno':[1001,1002, 1003, 1004, 1005),’Name':['A','B'!
C,'D\'E',Address',[‘salem','erode','covai’,‘chennai’,'namakkal']})
stu_personal
college=pd.Series(('GCT',"GCE'/GCT’,'GCE''GCT"))
mark=pd.Series({456,345,399,421,367])
rollno=[1001,1002,1003,1004,1005]
per=pd Series({91.2,69,79.8,84.2,73.4])
stu_college=pd.DataFrame( {'Rolino':rolino,'College’:college, Mark':mark,'Percentag
e'per})
stu_college
> stu_fees=pd.DataFrame( {'Rolino'[1001,1002,1003,1004, 1005},'Fees':{25000,3000,
15000,35000,17500]})
stu_fees
OUTPUT:
a 91
b’ 7
che?
d 10
e 14
fis
oe)eer}
3, Add or delete columns and rows.
> stu_personal = stu_personal.append( {'Rollno : 1006,'Name' : 'F','Addres
gnore_index=True)
stu_personal
21,20,22,21,20,22]
> stu_personall'Age’
stu_personal
> stu_personal.drop({‘Age'],axis=1)
> stu_personal.drop(1)
B
1001.0 ecu
1003.0 ea
oe) ed
fit ie
Pen eemene
iewing data.
> stu_personal.head()
> stu_college.tail(3)
> stu_personal.index
> stu_college.columns
> stu_personal.to_numpy()
> stu_college.describe()
Stu_personal.sort_index(axis=1, ascending=False)
stu_college.sort_values(by="Mark")
stu_college.info()
vvwvyOUTPUT:
clas 'pandas.core.fame,DataFrame>
Rangelndex: 5 entries, Oto 4
Data columns (total 4 columns):
# Column Non-Null Count Diype
© Roline Snon-null ina
1 College non-null object
2 Mark Srnon-null int
3 Percentage $ non-null Monts
types: oat64(1), int64(2), objeci(1)
memory usage: 288.0+ bytes
Rollno College Mark Percentage
lool GCT 456.912 1
1002 GCE 345 9.0 I
1003 GCT 399 798 I
100 GCE 421 842
loos GCT 367 73.4 1
type: ima
5. Selection.
a, Selection by Label.
> stu_college{{"Mark"J]
> stu_personal{0:3]
> stu_personal.loc{:, ["Name", "Address"]]
OUTPUT:
0
(eae Ke)
elu
Peek
a. Selection by position.
> stu_personal.iloc[3]
> stu_personal.iloc[3:5, 0:2]
> stu_personal.iloc{{1, 2,4], (0, 2]]
> stu_personal.iloc[ 1:3, :]
OUTPUT:| Ca 20 Bae es:
RX ea)
PACK MC OnR ETAL Bee ea 71
6, Operations,
> stu_fees['Fees'].mean()
> stu_fees['Fees'].sum()
> stu_feesf'Fees'].max()
>
stu_fees['Fees'].min()
OUTPUT:
3000
7. Apply.
> new = stu_college[’Mark').apply(lambda num : num + 5)
new
> stu_personal.applymap(lambda x: len(str(x)))
OUTPUT:
8. Merge two datasets.
* stu per_col=pd.concat([stu_personal,stu_college],axis~1)
stu_per_col
® stu_all=pd.merge(stu_per_col,stu_fees, on="Rollno")
stu_all
OUTPUT:0 A
em
ees
ec ; |
Cot Reve nieecnys f Con) |
TS nee no KOA
ed
if
A
oo; 8
ost
conc)
oes
fe
9. Grouping.
> stu_all.groupby({'Address'))[['Fees']] sum)
40. Reshaping.
> stacked=stu_all.stack()
stacked
> stacked.unstack()
> pdpivot_table(stu_all, values=["Fees"], index=["Rollno"], columns=["Address"})
OUTPUT:
CS) teu
11. Correlation.
> stu_all.corr()
OUTPUT:
ea
oer Cra Ft ry S$ a |
eerie) Peete
Demanrceeear iertie Peary ors |
eo toes ee Ba Ru Aten Reh)
ee Mime UL ment Lo Mec e Man Rte efit Cea ETC}
RA aoa et eer ae ERARCC)12. Read and Write dataset.
> stu_all.to_excel(‘one.xlsx’)
> pd.read_excel("one.xisx")
OUTPUT:
13. Cleaning the dataset.
> sample=pd.util.testing. makeMi
sample
v
sample.isnull()
sample.isnull(.sum(,
sample.dropna()
sample.duplicated()
sample.drop_duplicates()
sample fillna(130.6542)
vvvvvyv
we
SULT
In this exercise the Pandas package functions are studied and executed
successfully.
es
con
coy
can
cat
ran
Paes
re
Dos
ce
oe
ei
A
20
p
at
Fy
ie
1002
1003,
ingDataframe()
sees
sees
rica
ica0:4 READING DATA FROM TEXT FILES, EXCEL AND THE WEB
Pane AND EXPLORING VARIOUS COMMANDS FOR DESCRIPTIVE
ANALYTICS ON THE IRIS DATA SET.
DATE: 2b} 10/23
AIM:
The aim of this exercise is to acquire the knowledge of reading dataset from the different
extensions and analyzing the Iris dataset.
OBJECTIVE:
To understand some of the necessary packages.
To understand visualization packages like scaborn and matplotlib.
* To understand reading dataset from different extensions.
+ To understand information of the dataset and missing values.
+ To understand dataset description.
* To understand visualization chart like scatterplot, countplot.
* To understand correlation concepts.
* To understand correlation through visualization,
* TounderstandHistogramcharts,Boxplotandtheconceptoftheremovingoutliers,
PROCEDURE:
+ Open Jupyter Notebook.
* Create new Notebook.
Import necessary packages,
Load the dataset in different extensions,
Start analyzing the Iris dataset,
PROGRAM:
Step L:Import Necessary packages
> import pandas as pd
importnumpyasnpimportrequests
frombs4importBeautifulSoup
OUTPUT
dfepd.rend_esv("Itis.csv")
af
}
«Step 2:Reading data from text files
> df+pd.read_esv("Iris.csv")
df
errr
heat
pyeces
ated
ecaea
© Step 3:Reading data from excel files
> df=pd.read_excel("Iris.xlsx")df
OUTPUT:
+ Step 4:Reading data from web
> #Makearequest
page-requests.get("https://get.org.in")
soup=BeautifulSoup(page.text, html.parser’)
# Extract title of page
title=soup.titleprint(ttle)
# Extract all menu list in page
li=soup.find_all(li'elass_="menu-item")
forlin li:#4 Extract all reference link page
forlinkinsoup.find_all(‘a'):
print(link.get(href)
OUTPUT:
+ Step 5:Exploring various commands for doing descriptive analytics on the Iris
dataset
> Display the dataset information
dfinfod
OUTPUT:
Fes eee a ae ee
fer oa eeO Mes CL)
Data’ coluans (total 6 columns):
Cote Cee Teton
5c) pt yore eesti
re erie meee ee eat}
ee meet ee iterate tee
reg ere ee Me eet CTE
Pees trt retest ren temas reat
Pe en beast se
Soiree ui clet eas ate8}
> Display the column names of the iris dataset
df.columns
ouTpuT:Cre a
Pay
ress)
> Display the shape nnd size of the dataset
print(afshape)
print(len(at))
ourPuT:
(150, 6)
150
> Know the column data types of the dataset
dfdtypes
output:
Id
eeu ie)
Reece
Cerra
Peete
species
Eitan ad
> Display the statistical summary of the dataset
df.describe()
OUTPUT:
Pee eae oe!
Seaton)
Care cu) esi ag
Eek ERyEEE Elec
Perri eg treet
pcos cena} Eric}
Erery) Papo Ean
ce Ed Pots)
Bese) an) Pea)
Bee Ce Re cli
Oo Ty> Count the data by species
df{'Species'].value_counts()
ouTPUT:
pereeerre] 50
pote ries ta. Cee
poeer erste meee.
ere es Pass Lee uscd
> Checking missing values
df.isnull().sum()
ouTPUT:
BC
Bete sae
Pes Ur re)
fe yiea saret
Peete
Gusts
Cie tics
> Checking duplicate data
df.duplicated().sum()
OUTPUT:
°
+ Step 6:Understand the data by data visualization
> Countplot displays the number of observations for a categorical variable
using bars.
sns.countplot(x='Species’, data=df,palette="OrRd")
pltshow()
ouTPUT:wis-setona, wis-versicolor wis-virginica
‘species:
> Relation between variables using scatter plot.
sns.scatterplot(x='SepalLengthCm', y='SepalWidthCm' hue~'Species’,
data=df)plt.legend(bbox_to_anchor=(1, 1),loc=2)
pltshow
> sns.scatterplot(x—PetalLengthCm’, y='Petal WidthCm' hue='Species’,
data=df)plt.legend(bbox_to_anchor=(1, 1),loc=2)
pit.showQ,
OUTPUT:
> Heat Map-shows a correlation between all numerical variables in the
datasetsns.heatmap(df.corr(method="pearson’).drop({"Id'], axis=1).drop({"Id'Jaxis=0),annot
=Tmue)
pltshow0
ouTPUT:
> Plot all the column’s relationships using a pair plot
sns pairplot(df.drop(({'Id'],axis=1),hue='Species' height=2)
ouTPUT:
&
zy 3 oT 3
Fetaltengthem Petaivwethens> Data Distribution using Histogram
fig,axes=plt.subplots(2,2, figsize=(10,10))
axes{0,0].set_title("Sepal Length")
axes(0,0].hist(df['SepalLengthCm'], bins=7)
axes[0,1].set_title(""Sepal Width")
axes(0,1].hist(df{'SepalWidthCm’], bins=5)
axes[I,0].sct_title("Petal Length")
axes[ 1,0]-hist(df[’PetalLengthCm'), bins=6)
: axes(1,1].set_title("Petal Width")
axes[1,1].hist(df’PetalWidthCm'), bins=6)
pltshow()
outut:defgraph(y):
sns.boxplot(x="Species", y=y, data=df)
plt.figure(figsize=(10,10))
plt.subplot(22!)
graph(‘SepalLengthCm’)
plt.subplot(222)
graph(‘SepalWidthCm’) |
plt.subplot(223) |
graph(PetalLengthCm')
pit.subplot(224)
sgraph(PetalWidthCm')
pltshow()
ourpuT:
wae *\ anion Pagcgar ven—————<—_—SS
ees ge outlier data using Box plot.
> defoutlier(df):
qi=df.quantile(0.25)
q=df.quantile(0.75)
1QR=q3-q)
final f{((AF<(q1-1-SIQR)M(Al>(G3+1-5*1QR))) anglais 1))
return final
result=outlier(df)
pl-figure(figsize=(15,15))
ns boxplot(data=df-drop({"Ia'},axis=1))
j pltshow0
ouTPUT:
.
In this experim cess!
eriment the is inf ve suc
analysis 6f the irfs dataset have been executed -ssfully.