Ex.No.
1 Download, install and explore the features of
NumPy, SciPy, Jupyter, Statsmodels and Pandas
packages.
How to Install Anaconda & Run Jupyter Notebook
Instructions To Install Anaconda and Run Jupyter Notebook
• Download & Install Anaconda Distribution
• Create Anaconda Environment
• Install and Run Jupyter Notebook
Download & Install Anaconda Distribution
Follow the below step-by-step instructions to install Anaconda
distribution.
Download Anaconda Distribution
Go to https://anaconda.com/ and select Anaconda Individual Edition to
download the latest version of Anaconda. This downloads the .exe file to
the windows download folder.
1
Install Anaconda
By double-clicking the .exe file starts the Anaconda installation. Follow
the below screen shot’s and complete the installation
2
3
4
This finishes the installation of Anaconda distribution, now let’s see how
to create an environment and install Jupyter Notebook.
5
Create Anaconda Environment from Navigator
A conda environment is a directory that contains a specific collection of conda
packages that you have installed. For example, you may have one
environment with NumPy 1.7 and its dependencies, and another environment
with NumPy 1.6 for legacy testing.
https://conda.io/docs/using/envs.html
Open Anaconda Navigator
Open Anaconda Navigator from windows start or by searching it.
Anaconda Navigator is a UI application where you can control the
Anaconda packages, environment e.t.c
Create an Environment to Run Jupyter Notebook
This is optional but recommended to create an environment before you
proceed. This gives complete segregation of different package installs for
different projects you would be working on. If you already have an
environment, you can use it too.
6
select + Create icon at the bottom of the screen to create an Anaconda
environment.
7
Install and Run Jupyter Notebook
Once you create the anaconda environment, go back to the Home page on
Anaconda Navigator and install Jupyter Notebook from an application on
the right panel.
It will take a few seconds to install Jupyter to your environment, once the
install completes, you can open Jupyter from the same screen or by
accessing Anaconda Navigator -> Environments -> your
environment (mine pandas-tutorial) -> select Open With Jupyter Notebook.
This opens up Jupyter Notebook in the default browser.
8
Now select New -> PythonX and enter the below lines and select Run. On
Jupyter, each cell is a statement, so you can run each cell independently
when there are no dependencies on previous cells.
This completes installing Anaconda and running Jupyter Notebook.
9
10
RESULT:
Thus Jupyter Notebook environment has been successfully installed with all the
necessary packages using Anaconda distribution.
11
Ex. No 2 Working with Numpy arrays
Aim
To implement array object using Numpy module in Python programming
Algorithm
Step 1: Start the program
Step 2: Import the required packages
Step 3: Read the elements through list/tuple/dictionary
Step 4: Convert List/tuple/dictionary into array using built-in methods
Step 5: Check the number of dimensions in an array
Step 6: Compute the shape of an array or if it’s required reshape an array
Step 7: Do the required operations like slicing, iterating, searching, concatenating
and splitting an array element.
Step 8: Stop the program
(i) Create a NumPy ndarray Object
Program
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))
<class 'numpy.ndarray'>
(ii) Dimensions in Arrays
0-D Arrays
Program
import numpy as np
arr = np.array(42)
print(arr)
12
1-D Arrays
Program
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
2-D Arrays
Program
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
3-D arrays
Program
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
CheckNu
mber of
Dimensi
ons?
Program
import numpy as np
a = np.array(42)
13
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)
(iii) Access Array Elements
Program
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[0])
Program
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[2] + arr[3])
(iv) Slicing arrays
Program
14
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])
(v) NumPy Array Shape
Program
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)
(vi) Reshaping arrays
Program
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)
print(newarr)
(vii) Iterating Arrays
Program
import numpy as np
arr = np.array([1, 2, 3])
for x in arr:
print(x)
15
(viii) Joining NumPy
Arrays Program
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)
(ix) Splitting NumPy Arrays
Program
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 3)
print(newarr)
(x) Searching Arrays
Program
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(arr == 4)
print(x)
16
(xi) Sorting Arrays
Program
import numpy as np
arr = np.array([3, 2, 0, 1])
print(np.sort(arr))
RESULT:
Thus Array object has been explored using Numpy module in Python programming
successfully.
17
Exp. No. 3. Working with Pandas data frames
Aim:
To work with DataFrame object using Pandas module in Python Programming
Algorithm:
Step 1: Start the program
Step 2: Import the required packages
Step 3: Create a DataFrame using built in method.
Step 4: Load data into a DataFrame object otherwise Load Files(excel/csv) into a
DataFrame
Step 5: Display the rows and describe the data set using built in method.
Step 6: Display the last 5 rows of the DataFrame.
Step 7: Check the number of maximum returned rows
Step 8: Stop the program
(i) Create a simple Pandas DataFrame:
Program
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
18
(ii) Locate Row
Program
print(df.loc[0])
(iv )use a list of indexes:
Program
print(df.loc[[0, 1]])
(v) Named Indexes
Program
import pandas as pd
data = {
"calories": [420, 380, 390],
19
"duration": [50, 40, 45]
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)
(vi) Locate Named Indexes
print(df.loc["day2"])
(vii) Load Files Into a DataFrame
Program
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
20
(viii) Check the number of maximum returned rows:
Program
import pandas as pd
print(pd.options.display.max_rows)
In my system the number is 60, which means that if the DataFrame contains more
than 60 rows, the print(df) statement will return only the headers and the first and
last 5 rows.
import pandas as pd
pd.options.display.max_rows = 9999
df = pd.read_csv('data.csv')
print(df)
(ix) Viewing the Data
Program
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head(4))
21
(x) Print the last 5 rows of the DataFrame:
print(df.tail())
print(df.info())
RESULT:
Thus DataFrame object using Pandas module in Python Programming has been
successfully explored
22
Exp. No. 4. Reading data from text files, Excel and the web and exploring
various commands for doing descriptive analytics on the Iris data
set.
Aim:
To perform descriptive analytics on Iris dataset using Python programming
Algorithm
Step 1: Start the program
Step 2: Import the required packages
Step 3: Load Files(excel/csv/ text) into a DataFrame from Iris data set
Step 4: Display the rows and describe the data set using built in methods
Step 5: Compare Petal Length and Petal Width
Step 6: Visualize the data set using histogram with distplot, heatmaps
box plots methods
Step 7: Check Missing Values, Duplicates and remove outliers
Step 8: Stop the program
Program
import pandas as pd
# Reading the CSV file
df = pd.read_csv("Iris.csv")
# Printing top 5 rows
df.head()
23
Getting Information about the Dataset
df.shape
df.info()
df.describe()
Checking Missing Values
df.isnull().sum()
24
Checking Duplicates
data = df.drop_duplicates(subset ="Species",)
data
df.value_counts("Species")
Data Visualization
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x='Species', data=df, )
plt.show()
25
Comparing Sepal Length and Sepal Width
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.scatterplot(x='SepalLengthCm', y='SepalWidthCm',
hue='Species', data=df, )
# Placing Legend outside the Figure
plt.legend(bbox_to_anchor=(1, 1), loc=2)
plt.show()
import seaborn as sns
import matplotlib.pyplot as plt
sns.pairplot(df.drop(['Id'], axis = 1),
hue='Species', height=2)
26
# importing packages
Histograms
Program
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2, figsize=(10,10))
axes[0,0].set_title("Sepal Length")
axes[0,0].hist(df['SepalLengthCm'], bins=7)
axes[0,1].set_title("Sepal Width")
axes[0,1].hist(df['SepalWidthCm'], bins=5)
axes[1,0].set_title("Petal Length")
axes[1,0].hist(df['PetalLengthCm'], bins=6)
axes[1,1].set_title("Petal Width")
axes[1,1].hist(df['PetalWidthCm'], bins=6)
27
Histograms with Distplot Plot
Program
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
plot = sns.FacetGrid(df, hue="Species")
plot.map(sns.distplot, "SepalLengthCm").add_legend()
plot = sns.FacetGrid(df, hue="Species")
plot.map(sns.distplot, "SepalWidthCm").add_legend()
plot = sns.FacetGrid(df, hue="Species")
plot.map(sns.distplot, "PetalLengthCm").add_legend()
plot = sns.FacetGrid(df, hue="Species")
plot.map(sns.distplot, "PetalWidthCm").add_legend()
plt.show()
28
Handling Correlation
data.corr(method='pearson')
Heatmaps
Program
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(df.corr(method='pearson').drop(
['Id'], axis=1).drop(['Id'], axis=0),
annot = True);
plt.show()
Box Plots
Program
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
def graph(y):
sns.boxplot(x="Species", y=y, data=df)
plt.figure(figsize=(10,10))
# Adding the subplot at the specified
# grid position
plt.subplot(221)
graph('SepalLengthCm')
plt.subplot(222)
graph('SepalWidthCm')
plt.subplot(223)
29
graph('PetalLengthCm')
plt.subplot(224)
graph('PetalWidthCm')
plt.show()
Program
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
# Load the dataset
df = pd.read_csv('Iris.csv')
sns.boxplot(x='SepalWidthCm', data=df)
30
Removing Outliers
Program
# Importing
import sklearn
from sklearn.datasets import load_boston
import pandas as pd
import seaborn as sns
# Load the dataset
df = pd.read_csv('Iris.csv')
# IQR
Q1 = np.percentile(df['SepalWidthCm'], 25,
interpolation = 'midpoint')
Q3 = np.percentile(df['SepalWidthCm'], 75,
interpolation = 'midpoint')
IQR = Q3 - Q1
print("Old Shape: ", df.shape)
# Upper bound
upper = np.where(df['SepalWidthCm'] >= (Q3+1.5*IQR))
# Lower bound
lower = np.where(df['SepalWidthCm'] <= (Q1-1.5*IQR))
31
# Removing the Outliers
df.drop(upper[0], inplace = True)
df.drop(lower[0], inplace = True)
print("New Shape: ", df.shape)
sns.boxplot(x='SepalWidthCm', data=df)
RESULT:
Thus Iris dataset has been explored and descriptively analysed using Python
programming
32
Exp. No. 5. Use the diabetes data set from UCI and Pima Indians Diabetes
data set for performing the following:
Aim:
To perform various exploratory data analysis on Pima Indians Diabetes dataset
using Python Programming
a. Univariate analysis: Frequency, Mean, Median, Mode, Variance,
Standard Deviation, Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the results of the above analysis for the two data sets.
Algorithm
Step 1: Start the program
Step 2: Import the required packages
Step 3: Load Files (excel/csv/ text) into a Data Frame from UCI and Pima Indians
Diabetes data set
Step 4: Display the rows and describe the data set using built in methods
Step 5: Compute Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Skewness and Kurtosis
Step 6: Visualize the data set using histogram with distplot, heatmaps
box plots methods
Step 7: Check Missing Values, Duplicates and remove outliers using built in method
Step 8: Stop the program
Program
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib
df = pd.read_csv('C:/Users/praveen/Downloads/diabetes.csv')
33
count = df['Glucose'].value_counts()
display(count)
df.head()
df.describe()
df.mean()
df.mode()
34
df.var()
df.std()
df.skew()
Pregnancies 0.901674
Glucose 0.173754
BloodPressure -1.843608
SkinThickness 0.109372
Insulin 2.272251
BMI -0.428982
DiabetesPedigreeFunction 1.919911
Age 1.129597
Outcome 0.635017
dtype: float64
df.kurtosis()
Pregnancies 0.159220
Glucose 0.640780
BloodPressure 5.180157
SkinThickness -0.520072
35
Insulin 7.214260
BMI
DiabetesPedigreeFunction 5.594954
Age 0.643159
Outcome -1.600930
dtype: float64
corr = df.corr()
sns.heatmap(corr,
xticklabels=corr.columns,
yticklabels=corr.columns)
sns.countplot('Outcome', data=df)
plt.show()
# Computing the %age of diabetic and non-diabetic in the sample
Out0=len([df.Outcome==1])
Out1=len([df.Outcome==0])
Total=Out0+Out1
PC_of_1 = Out1*100/Total
PC_of_0 = Out0*100/Total
PC_of_1, PC_of_0
(50.0, 50.0)
36
plt.figure(dpi = 120,figsize= (5,4))
mask = np.triu(np.ones_like(df.corr(),dtype = bool))
sns.heatmap(df.corr(),mask = mask, fmt = ".2f",annot=True,lw=1,cmap = 'plasma')
plt.yticks(rotation = 0)
plt.xticks(rotation = 90)
plt.title('Correlation Heatmap')
plt.show()
RESULT:
Thus various exploratory data analysis has been performed on Pima Indians
Diabetes dataset using Python Programming successfully.
37
Exp. No. 6. Apply and explore various plotting functions on UCI data sets.
a. Normal curves
b. Density and contour plots
c. Correlation and scatter plots
d. Histograms
e. Three dimensional plotting
Aim:
To apply various plotting functions on UCI data set using Python Programming
Algorithm
Step 1: Start the program
Step 2: Import the required packages
Step 3: Load Files (excel/csv/ text) into a Data Frame from UCI data set
Step 4: Describe the data set using built in method
Step 5: Compute Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Step 6: Visualize the data set using Explore various plotting functions on UCI data
sets for the following
a. Normal curves
b. Density and contour plots
c. Correlation and scatter plots
d. Histograms
e. Three-dimensional plotting
Step 7: Analyse the sample data and do the required operations
Step 8: Stop the program
a. Normal curves
Program
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df=pd.read_csv("C:/Users/praveen/Downloads/dataset_diabetes/diabetic_data.cs
38
v")
df.head()
mean =df['time_in_hospital'].mean()
std =df['time_in_hospital'].std()
x_axis = np.arange(1, 10, 0.01)
plt.plot(x_axis, norm.pdf(x_axis, mean, std))
plt.show()
b. Density and contour plots
Program
df.time_in_hospital.plot.density(color='green')
plt.title('Density plot for time_in_hospital')
plt.show()
Program
# for 'tip' attribute
# using plot.kde()
df.number_emergency.plot.kde(color='green')
plt.title('KDE-Density
39
plot for number_emergency') plt.show()
df.num_lab_procedures.plot.density(color='green')
plt.title('Density Plot for num_lab_procedures')
plt.show()
df.num_medications.plot.density(color='green')
plt.title('Density Plot for num_medications')
plt.show(
Program
def func(x, y):
return np.sin(x) ** 2 + np.cos(y) **2
# generate 50 values b/w 0 a5
mean =df['time_in_hospital'].mean()
std =df['time_in_hospital'].std()
x = np.linspace(0, mean)
y = np.linspace(0, std)
# Generate combination of grids
X, Y = np.meshgrid(x, y)
Z = func(X, Y)
# Draw rectangular contour plot
plt.contour(X, Y, Z, cmap='gist_rainbow_r');
40
c. Correlation and scatter plots
Program
mp.figure(figsize=(20,10))
dataplot = sb.heatmap(data.corr(), cmap="YlGnBu", annot=True)
d. Histograms
Program
df.hist(figsize=(12,12),layout=(5,3))
# plotting histogram for carat using distplot()
sb.distplot(a=df.num_lab_procedures, kde=False)
# visualizing plot using matplotlib.pyplot library
plt.show()
41
e. Three dimensional plotting
Program
fig = plt.figure()
ax = plt.axes(projection = '3d')
x = df['number_emergency']
x = pd.Series(x, name= '')
y = df['number_inpatient']
y = pd.Series(x, name= '')
z = df['number_outpatient']
z = pd.Series(x, name= '')
ax.plot3D(x, y, z, 'green')
ax.set_title('3D line plot diabetes dataset')
plt.show()
RESULT:
Thus apply various plotting functions on UCI data set using Python Programming
42
Exp. No. 7. Visualizing Geographic Data with Basemap
Aim:
To visualize Geographic Data using BaseMap module in Python Programming
Algorithm:
Step 1: Start the program
Step 2: Import the required packages
Step 3: Visualize Geographic Data with Basemap
Step 4: Display the Base map using built in method like basemap along with latitude
and longitude parameters
Step 5: Display the Coastal lines meters and Country boundaries using built in
methods
Step 6: Fill the Coastal lines meters and Country boundaries with suitable colours
Step 7: Create a global map with a Cylindrical Equidistant Projection, Orthographic
Projection, Robinson Projection
Step 8: Stop the program
Create a global map with a Ortho Projection
Program
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=-100)
m.bluemarble(scale=0.5);
43
Program
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None,
width=8E6, height=8E6,
lat_0=45, lon_0=-100,)
m.etopo(scale=0.5, alpha=0.5)
# Map (long, lat) to (x, y) for plotting
x, y = m(-122.3, 47.6)
plt.plot(x, y, 'ok', markersize=5)
plt.text(x, y, ' INDIA', fontsize=12);
Create a global map with a Coastlines
Program
fig = plt.figure(figsize = (12,12))
m = Basemap()
m.drawcoastlines()
plt.title("Coastlines", fontsize=20)
plt.show()
Program
fig = plt.figure(figsize = (12,12))
m = Basemap()
m.drawcoastlines(linewidth=1.0, linestyle='dashed', color='red')
plt.title("Coastlines", fontsize=20)
plt.show()
44
Create a global map with a Country boundaries Program
fig = plt.figure(figsize = (12,12))
m = Basemap()
m.drawcoastlines(linewidth=1.0, linestyle='solid', color='black')
m.drawcountries()
plt.title("Country boundaries", fontsize=20)
x, y = m(-122.3, 47.6)
plt.plot(x, y, 'ok', markersize=5)
plt.text(x, y, ' INDIA', fontsize=12);
plt.show()
Create a global map with a Mercator Projection Program
fig = plt.figure(figsize = (10,8))
m = Basemap(projection='merc',llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-
180,urcrnrlon=180)
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
m.drawcountries(linewidth=1, linestyle='solid', color='k' )
m.drawmapboundary(fill_color='lightblue')
plt.title("Mercator Projection", fontsize=20)
Create a global map with a Cylindrical Equidistant Projection
Program
fig = plt.figure(figsize = (10,8))
m = Basemap(projection='cyl',llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-180,urcrnrlon=180)
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
m.drawcountries(linewidth=1, linestyle='solid', color='k' )
45
m.drawmapboundary(fill_color='lightblue')
plt.title(" Cylindrical Equidistant Projection", fontsize=20)
Create a global map with Orthographic Projection
Program
fig = plt.figure(figsize = (10,8))
m = Basemap(projection='ortho', lon_0 = 25, lat_0 = 10)
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
m.drawcountries(linewidth=1, linestyle='solid', color='k' )
m.drawmapboundary(fill_color='lightblue')
plt.title("Orthographic Projection", fontsize=18)
Create a global map with a Robinson Projection Program
fig = plt.figure(figsize = (10,8))
m = Basemap(projection='robin',llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-180,urcrnrlon=180,
lon_0 = 0, lat_0 = 0)
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
m.drawcountries(linewidth=1, linestyle='solid', color='k' )
m.drawmapboundary(fill_color='lightblue')
plt.title(" Robinson Projection", fontsize=20)
RESULT
Thus Geographic Data has been visualized using BaseMap module in Python
Programming successfully.
46