0% found this document useful (0 votes)
197 views36 pages

Python Machine Learning Workshop Guide

The document summarizes a workshop on machine learning concepts and applications held at P.S.R. Engineering College in Sivakasi. It discusses the installation of Anaconda for Python packages, popular machine learning libraries like NumPy, SciPy, Pandas and scikit-learn. Specific machine learning algorithms covered include k-nearest neighbors classification, linear regression, support vector machines, k-means clustering and principal component analysis. Code examples are provided for k-NN classification of iris data and linear regression on a salary dataset.

Uploaded by

eshwari2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
197 views36 pages

Python Machine Learning Workshop Guide

The document summarizes a workshop on machine learning concepts and applications held at P.S.R. Engineering College in Sivakasi. It discusses the installation of Anaconda for Python packages, popular machine learning libraries like NumPy, SciPy, Pandas and scikit-learn. Specific machine learning algorithms covered include k-nearest neighbors classification, linear regression, support vector machines, k-means clustering and principal component analysis. Code examples are provided for k-NN classification of iris data and linear regression on a salary dataset.

Uploaded by

eshwari2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Workshop on “Machine Learning Concepts

and Applications”

[Link] College,
Sevalpatti, Sivakasi

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
Session 2
Python for Machine Learning algorithms and
applications

[Link] Prakash
Associate Professor/ECE
[Link] College,
Sevalpatti, Sivakasi

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
Contents
• Anaconda Installation
• Numpy, SciPy, matplotlib, Pandas, openCV
• Sci-kit learn
• K-Nearest Neighbors Classification
• Linear Regression
• SVM Classifier
• K-Means Image Segmentation
• PCA and Logistic Regression Classifier

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
Anaconda Installation

[Link]

• Anaconda Individual Edition is the world’s most popular


Python distribution platform with over 20 million users
worldwide. 
• Over 7,500 data science and machine learning packages
are available. With the conda-install command
thousands of open-source packages can be installed.
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
Numpy, SciPy and matplotlib, OpenCV
• NumPy provides support of highly optimized
multidimensional arrays. These are the basic data
structure of most state-of-the art algorithms.
• SciPy use these arrays to provide a set of fast
numerical recipes. SciPy contains modules for
optimization, linear algebra, integration, interpolation,
special functions, FFT, signal and image processing
• matplotlib is feature-rich library to plot high-quality
graphs using Python.
• OpenCV is popular library for Computer Vision.
• Pandas – Python data analysis and manipulation tool;
Great tool for using Excel with Python
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
• Scalars (0D tensors)
A tensor that contains only one number is called
a scalar.
• Vectors (1D tensors)
• An array of numbers is called a vector, or 1D
tensor. A 1D tensor is said to have exactly
one axis.
• Matrices (2D tensors)
An array of vectors is a matrix, or 2D tensor. A
matrix has two axes (often referred to
rows and columns).
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
• >>> import numpy as np
• >>> a = [Link]([0,1,2,3,4,5])
• >>> a
• array([0, 1, 2, 3, 4, 5])
• >>> [Link]
• 1
• >>> [Link]
• (6,)

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
• >>> b = [Link]((3,2))
• >>> b
• array([[0, 1],
• [2, 3],
• [4, 5]])
• >>> [Link]
• 2
• >>> [Link]
• (3, 2)

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
• c=[Link](3,1,2) • [[[0 1]]

• print(c) • [[2 3]]


• print([Link])
• print([Link]) • [[4 5]]]
• 3
• npdata=[Link](3)
• (3, 1, 2)
• print(npdata) • [0 1 2]
• npdata=[Link](40) • [[ 0 1 2 3 4 5 6 7]
• [Link]=(5,8) • [ 8 9 10 11 12 13 14 15]
• [16 17 18 19 20 21 22 23]
• print(npdata) • [24 25 26 27 28 29 30 31]
• [32 33 34 35 36 37 38 39]]
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
SciPy Packages

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
• import scipy as sp
• data = [Link]("web_traffic.tsv", delimiter="\t")
• print(data[:10])
• print([Link])
• x = data[:,0] 743 hrs – web traffic
• y = data[:,1] In word pad file
Missing Values - 8
• n=[Link]([Link](y))
• print(n)
• x = x[~[Link](y)]
• y = y[~[Link](y)]
• print(x)
• print(y)
• print([Link])
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
• import [Link] as plt
• # plot the (x,y) points with dots of size 10
• [Link](x, y, s=10)
• [Link]("Web traffic over the last month")
• [Link]("Time")
• [Link]("Hits/hour")
• [Link]([w*7*24 for w in range(10)],['week %i' % w
for w in range(10)])
• [Link](tight=True)
• # draw a slightly opaque, dashed grid
• [Link](True, linestyle='-', color='0.75')
• [Link]()
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
Scikit-learn
• Scikit library is the standard library for many
machine learning tasks including classification
• fit(features, labels): This is the learning step
and fits the parameters of the model
• predict(features): This method can only be
called after fit and returns a prediction for one
or more inputs
• Conda install scikit-learn

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
Cross Validation
• Cross-validation is a re-sampling procedure used to
evaluate machine learning models on a limited data
sample.
• Shuffle the dataset randomly.
• Split the dataset into k groups
• For each unique group:
– Take the group as a hold out or test data set
– Take the remaining groups as a training data set
– Fit a model on the training set and evaluate it on the test set
– Retain the evaluation score and discard the model
• Summarize the skill of the model using the sample of
model evaluation scores. In general k=5 or 10
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
Cross Validation

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
K-Nearest Neighbor Algorithm
• KNN is a non-parametric
algorithm in which there
is no assumption for
underlying data
distribution like GMM
• K is the number of
nearest neighbors
The steps include
• Calculate distance
• Find closest neighbors
• Vote for labels
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
K Neighbors Classifier for classification of iris data
• from matplotlib import pyplot as plt
• import numpy as np
• from [Link] import load_iris
• from sklearn.model_selection import
train_test_split
• from sklearn.model_selection import KFold
• data = load_iris()

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
• features = [Link]
• feature_names = data.feature_names
• target = [Link]
• target_names = data.target_names
• labels = target_names[target]
• from [Link] import
KNeighborsClassifier
• (Features are the length and width of sepals
and petals)

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
• classifier = KNeighborsClassifier(n_neighbors=1)
• X=features
• y=target
• X_train, X_test, y_train, y_test = train_test_split(
• X, y, test_size=0.33,
random_state=42)
• [Link](X_train, y_train)
• print([Link](X_test, y_test))
• (random state – seed to the random generator)
• Output : Accuracy=0.98
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
Linear Regression

• import numpy as np
• import [Link] as plt
• import pandas as pd
• dataset = pd.read_csv('Position_Salaries.csv')
• X = [Link][:, 1:2].values
• y = [Link][:, 2].values
• from [Link] import DecisionTreeRegressor
• regressor =
DecisionTreeRegressor(random_state=0)
• (iloc in pandas is used to select rows and columns in
Pandas dataframe by row numbers)
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
• [Link](X,y)
• n=[Link]([6.5]).reshape(1, 1)
• y_pred = [Link](n)
• [Link](X, y, color = 'red')
• [Link](X, [Link](X), color = 'blue')
• [Link]('Regression Model')
• [Link]('Position level')
• [Link]('Salary')
• [Link]()

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
• X_grid = [Link](min(X), max(X), 0.01)
• X_grid = X_grid.reshape((len(X_grid), 1))
• [Link](X, y, color = 'red')
• [Link](X_grid, [Link](X_grid), color =
'blue')
• [Link]('Example of Decision Regression Model')
• [Link]('Position level')
• [Link]('Salary')
• [Link]()
• (arange- evenly spaced values within the given
range)
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
Output

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
SVM Classifier

• from sklearn import datasets


• from sklearn.model_selection import train_test_split
• iris = datasets.load_iris()
• X = [Link] # we only take the first two features.
• y = [Link]
• from [Link] import SVC
• model = SVC(kernel='linear', C=1E10)( # C is the penalty
parameter of error term)
• X_train, X_test, y_train, y_test = train_test_split(
• X, y, test_size=0.33, random_state=42)
• [Link](X_train, y_train)
• print([Link](X_test, y_test))
(For 2 features, accuracy=0.84, For 4 features, accuracy=0.98)
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
K-Means Segmentation of image
• import cv2
• import numpy as np
• import [Link] as plt
• image = [Link]("[Link]")
• image = [Link](image, cv2.COLOR_BGR2RGB)
• pixel_values = [Link]((-1, 3))
• pixel_values = np.float32(pixel_values)
• print(pixel_values.shape)
• criteria = (cv2.TERM_CRITERIA_EPS +
cv2.TERM_CRITERIA_MAX_ITER, 100, 0.2)

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
• k=3
• _, labels, (centers) = [Link](pixel_values,
k, None, criteria, 10,
cv2.KMEANS_RANDOM_CENTERS)
• # convert back to 8 bit values
• centers = np.uint8(centers)
• # flatten the labels array
• labels = [Link]()

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
• # convert all pixels to the color of the centroids
• segmented_image = centers[[Link]()]
• # reshape back to the original image dimension
• segmented_image =
segmented_image.reshape([Link])
• # show the image
• [Link](image)
• [Link]()
• [Link](segmented_image)
• [Link]()

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
Output

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
Principal Component Analysis
• import numpy as np
• import [Link] as plt
• import pandas as pd
• # importing or loading the dataset
• dataset = pd.read_csv('[Link]')
• # distributing the dataset into two
components X and Y
• X = [Link][:, 1:13].values
• y = [Link][:, 0].values
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
• from sklearn.model_selection import
train_test_split
• X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size = 0.2, random_state = 0)
• # performing preprocessing part
• from [Link] import StandardScaler
• sc = StandardScaler()
• X_train = sc.fit_transform(X_train)
• X_test = [Link](X_test)
• # Applying PCA function on training
• # and testing set of X component
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
• from [Link] import PCA
• from sklearn.linear_model import LogisticRegression
• pca = PCA(n_components = 2)
• X1_train = pca.fit_transform(X_train)
• X1_test = [Link](X_test)
• print([Link])
• print(X1_train.shape)
• variance = pca.explained_variance_ratio_
• classifier = LogisticRegression(random_state = 0)
• [Link](X1_train, y_train)
• y_pred = [Link](X1_test)
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
• print([Link](X1_test, y_test))
• [Link](X_train, y_train)
• y1_pred = [Link](X_test)
• # making confusion matrix between
• # test set of Y and predicted value.
• print([Link](X_test, y_test))
• print([Link](X_train))
• print([Link](X1_train))
• [Link](figsize=(8,6))
• [Link](X1_train[:,0],X1_train[:,1],s=10,c=y_train,cmap='r
ainbow')
• [Link]('First principal component')
• [Link]('Second Principal Component')
[Link] Prakash, Associate
9/16/2021
Professor/PSREC
Linear Vs logistic regression

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
SVM

[Link] Prakash, Associate


9/16/2021
Professor/PSREC
References
• Python Data science Hand book –
Jake VanderPlas
• Building Machine Learning Systems with
Python – Luis Pedro Coelho, Willi Richert
• Deep Learning – Ian Goodfellow, Yoshua
Bengio, Aaron Courville
• Statistics and Machine Learning and Python –
Edouard Duchesnay, Tommy Lofstedt
• Other Web Resources

[Link] Prakash, Associate


9/16/2021
Professor/PSREC

You might also like