0% found this document useful (0 votes)

116 views1 page

PCA and Logistic Regression for Rock Analysis

This document provides an overview of principal component analysis (PCA) and logistic regression, and describes a Python code tutorial for applying these techniques to analyze and classify rock samples. PCA is used to reduce the dimensionality of the rock sample data characterized by various oxide contents. Logistic regression is then used to efficiently classify the samples in the lower-dimensional feature space obtained from PCA. The tutorial walks through loading and preprocessing the sample data, performing PCA to identify principal components, transforming the data onto the top two principal components to capture most variance, and applying logistic regression to the transformed data for classification.

Uploaded by

Nishant Tripathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

116 views1 page

PCA and Logistic Regression for Rock Analysis

Uploaded by

Nishant Tripathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

In [ ]: #A TUTORIAL PYTHON CODE ABOUT PRINCIPAL COMPONENT ANALYSIS AND LOGISTIC REGRESSION:

#APPLICATION TO ANALYSIS AND CLASSIFICATION OF ROCK SAMPLES.

#
#Paolo Dell'Aversana - January 2019.

In [ ]: #Remark: a basic knowledge of Python is required.

In [ ]: # INTRODUCTION
# Principal Component Analysis is a statistical approach that converts
# a set of observations of possibly correlated variables (using an orthogonal
# transformation) into a set of values of linearly uncorrelated variables
# (Jolliffe, 2002).
# These are called “principal components”. The optimal matrix transformation is defined in
# such a way that the first principal component has the largest possible variance.
# It means that it takes into account for as much of the variability in the data as possible.
# Each succeeding component in turn has the highest variance possible under the constraint
# that it is orthogonal to the preceding components.
# In this tutorial, I explain how to build, step by step, an efficient Python code
# addressed to PCA of a data set of magmatic rock samples characterized by different
# content in oxides. Finally, I show how to train a Logistic Regression classifier
# using the results of PCA. The logistic regression algorithm performs efficiently on
# the feature subspace obtained through PCA. I modified and readapted the code written by
# Raschka and Mirjalili (2017) for the specific purposes of rock sample analysis.

In [ ]: #We start this tutorial session just with transforming our input data from xls to csv format.
#This is a useful preliminary step for who prefers working with csv files,
#but has the data available in Excel worksheets.

In [1]: #First, we import the module for reading the excel file:
import xlrd
#We can access to the different sheets of the Excel file.
#We will take the sheet with our data set.
workbook = xlrd.open_workbook('CAL_large_test.xlsx')
workbook
workbook.sheet_names()
#Remark: here, 'CAL_large_test.xlsx' is our imput file consisting of several rock samples
#characterized by different disribution of chemical features (oxides).

Out[1]: ['Table DR2']

In [ ]: #Now we can convert our input file from xlsx to csv format, using a function of pandas library.

In [3]: import pandas as pd

data_xls = pd.read_excel('CAL_large_test.xlsx', 'Table DR2', index_col=None)
#data_xls.to_csv('[Link]', encoding='utf-8')
data_xls.to_csv('[Link]')
#Remark: here, "[Link]" is just a simple name given to our csv file.

In [2]: #After these preliminary operations, we can start the tutorial core:
#Unsupervised dimensionality reduction via Principal Component Analysis (Raschka and Mirjalili, 201
7).
#We go through all the steps of the Principal Component Analysis (PCA).
#We first use pandas' libraries, then we'll use a faster approach based ona a specific PCA library o
f
#Scikit-learn.

In [4]: import pandas as pd

#let us start by reading our input file and showing its first rows. We define a function for reading
this file.
df_rock = pd.read_csv('[Link]')
#the commented line below is just in case we desire to add a header (however it is already in our fi
le)
#df_rock.columns = ['Sample', 'Rock_type', 'SiO2', 'TiO2','Al2O3', 'Fe2O3', 'MnO','MgO', 'CaO', 'Na2
O','K2O', 'P2O5']

df_rock.head()

Out[4]:
Unnamed: 0 Sample Rock_type SiO2 TiO2 Al2O3 Fe2O3 MnO MgO CaO Na2O K2O P2O5

0 0 SAR-00-08 andesite 58.9 1.133 16.5 5.85 0.074 2.25 5.20 5.06 2.89 0.557

1 1 SAR-00-07 andesite 62.9 0.828 16.5 4.97 0.075 1.97 4.35 4.76 2.99 0.343

2 2 SAR-00-13 andesite 60.6 1.015 16.4 5.59 0.082 2.24 4.82 4.77 3.00 0.473

3 3 COTA-05-15 andesite 58.9 1.222 16.9 6.56 0.082 2.88 5.64 4.76 2.41 0.482

4 4 COTA-05-06 andesite 59.5 1.169 16.7 6.09 0.077 2.54 5.24 4.58 2.70 0.490

In [ ]: #Let us plot the number of rows and columns of our file

In [5]: df_rock.shape

Out[5]: (174, 13)

In [ ]: #Now, we split our data set in train and test data, then we assign the 10 features to a NumPy array
X.
#Using LabelEncoder, we transform the class labels from their original string representation into in
tegers.

In [6]: from sklearn.model_selection import train_test_split

from [Link] import LabelEncoder
X, y = df_rock.iloc[:, 3:].values, df_rock.iloc[:, 2].values

X_train, X_test, y_train, y_test = \

train_test_split(X, y, test_size=0.3,
stratify=y,
random_state=0)
le = LabelEncoder()
#Below we set the classes of rocks into an array of strings:
y = le.fit_transform(y)
le.classes_

Out[6]: array(['andesite', 'basaltic andesite', 'dacite', 'rhyolite',

'trachyandesite'], dtype=object)

In [81]: #Now we encode the string classes into integers

In [7]: [Link](['andesite', 'basaltic andesite', 'dacite', 'rhyolite', 'trachyandesite'])

Out[7]: array([0, 1, 2, 3, 4], dtype=int64)

In [ ]: #If we like, we can make a print of our train and test files, just to check ...

In [65]: #print(X_train)

In [64]: #print(X_test)

In [8]: #If we like, we can transform the format of our input data
import numpy as np
X_train = X_train.astype(np.float32)
X_test = X_test.astype(np.float32)

In [ ]: #Now we standardize the data. This operation is useful to make our instances comparable.
#Remark. If we have NaN's in our data we can remove them (see below).

In [ ]: #np.nan_to_num(X_train)
#p.nan_to_num(X_test)

In [9]: from [Link] import StandardScaler

sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = [Link](X_test)

In [ ]: #Now we start the process of Pricipal Component Analysis (PCA)

#First we make Eigendecomposition of the covariance matrix.

In [10]: import numpy as np

cov_mat = [Link](X_train_std.T)
eigen_vals, eigen_vecs = [Link](cov_mat)

print('\nEigenvalues \n%s' % eigen_vals)

Eigenvalues
[6.78285831 1.53376255 0.85976428 0.42218026 0.22690364 0.09508775
0.01612308 0.03970756 0.05683254 0.05011281]

In [11]: #Now we calculate the total and explained variance, print and plot them.

In [12]: tot = sum(eigen_vals)

var_exp = [(i / tot) for i in sorted(eigen_vals, reverse=True)]
cum_var_exp = [Link](var_exp)

In [13]: print(var_exp)

[0.6726801998183738, 0.15210869080035438, 0.08526588324049861, 0.04186911924860997, 0.02250284172

984673, 0.009430190579739013, 0.005636285581098437, 0.004969865534965985, 0.003937939965018102,
0.001598983501494715]

In [14]: print(cum_var_exp)

[0.6726802 0.82478889 0.91005477 0.95192389 0.97442673 0.98385693

0.98949321 0.99446308 0.99840102 1. ]

In [15]: import [Link] as plt

[Link](range(1, 11),var_exp, alpha=0.9, align='center',label='individual explained variance')

[Link](range(1, 11), cum_var_exp, where='mid',label='cumulative explained variance')
[Link]('Explained variance ratio')
[Link]('Principal component index')
[Link](loc='best')
plt.tight_layout()
# [Link]('images/05_02.png', dpi=300)
[Link]()

In [ ]: #From the figure above, we can see that the most of the data variance is explained
#by the first 2 or 3 principal components.

In [ ]: #Finally we perform feature tranformation from their initial multidimentional space

#into a new principal components space.
#We use the projection matrix to transform the data onto the lower-dimensional subspace.
#We start by sorting the eigenpairs by decreasing order of the eigenvalues.
#Then, we collect the two eigenvectors that correspond to the two largest values to
#capture the most of the variance in this rock-chemical dataset.

In [16]: #We create a list of (eigenvalue, eigenvector) tuple:

eigen_pairs = [([Link](eigen_vals[i]), eigen_vecs[:, i])
for i in range(len(eigen_vals))]

#Then we sort the (eigenvalue, eigenvector) tuples:

eigen_pairs.sort(key=lambda k: k[0], reverse=True)

In [17]: w = [Link]((eigen_pairs[0][1][:, [Link]],

eigen_pairs[1][1][:, [Link]]))
print('Matrix W:\n', w)

Matrix W:
[[-0.37584162 -0.12101843]
[ 0.36437749 -0.06753255]
[ 0.30961264 -0.25640948]
[ 0.36819809 0.08421512]
[ 0.16022243 -0.32713229]
[ 0.32082161 0.37864227]
[ 0.37085905 0.17790127]
[ 0.08650101 -0.71953609]
[-0.34431539 -0.14348562]
[ 0.31995722 -0.29597502]]

In [ ]: #We have created a 10× 2 -dimensional projection matrix W from the top two eigenvectors.
#Using the projection matrix, we can transform a sample x (represented as 1×10-dimensional row vecto
r)
#onto the PCA subspace obtaining two-dimensional sample vector consisting of two new features
#(PC1 and PC2):

In [18]: X_train_std[0].dot(w)

Out[18]: array([-3.65015577, -1.35169258])

In [19]: #By generalizing that procedure, we can transform the entire training dataset onto the
#two principal components.

In [20]: X_train_pca = X_train_std.dot(w)

colors = ['r', 'b', 'g', 'c', 'm', 'y']
markers = ['s', 'x', 'o', 'v', '^', '*']

for l, c, m in zip([Link](y_train), colors, markers):

[Link](X_train_pca[y_train == l, 0],
X_train_pca[y_train == l, 1],
c=c, label=l, marker=m)

[Link]('PC 1')
[Link]('PC 2')
[Link](loc='upper center')
plt.tight_layout()
# [Link]('images/05_03.png', dpi=300)
[Link]()

In [ ]: #We note that there are few outliers in this new representation, in the sense that the clusters
#are not perfectly separated. However, the different rock classes are pretty well
#distinguished in this new algbric representation.

In [ ]: #USING SCIKIT-LEARN
#
#We can perform the entire PCA analysis using a specific library of scikit-learn.
#Now, let's use the PCA from scikitlearn on the Rock training dataset, classify the transformed samp
les via logistic
#regression, and visualize the results.

In [21]: from [Link] import PCA

pca = PCA()
X_train_pca = pca.fit_transform(X_train_std)
pca.explained_variance_ratio_

Out[21]: array([0.6726802 , 0.15210869, 0.08526588, 0.04186912, 0.02250284,

0.00943019, 0.00563629, 0.00496987, 0.00393794, 0.00159898])

In [ ]: #Let us plot again the explained variance ratio.

In [22]: [Link](range(1, 11), pca.explained_variance_ratio_, alpha=0.9, align='center')

[Link](range(1, 11), [Link](pca.explained_variance_ratio_), where='mid')
[Link]('Explained variance ratio')
[Link]('Principal components')
[Link]()

In [ ]: #Then we apply the function "pca" to our training and testing data sets.

In [23]: pca = PCA(n_components=2)

X_train_pca = pca.fit_transform(X_train_std)
X_test_pca = [Link](X_test_std)

In [ ]: #We can plot our training data set in the PC1-PC2 space, without any color distinction, for now.

In [24]: [Link](X_train_pca[:, 0], X_train_pca[:, 1])

[Link]('PC 1')
[Link]('PC 2')
[Link]()

In [ ]: #Then, we'll distinguish the differnt classes, after training a classifier algorithm
#using the first 2 principal components.

In [25]: from [Link] import ListedColormap

def plot_decision_regions(X, y, classifier, resolution=0.02):

# setup marker generator and color map

colors = ['r', 'b', 'g', 'c', 'm', 'y']
markers = ['s', 'x', 'o', 'v', '^', '*']
cmap = ListedColormap(colors[:len([Link](y))])

# plot the decision surface (for the moment we skip this step, just commenting it)
# x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
# x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
# xx1, xx2 = [Link]([Link](x1_min, x1_max, resolution),
# [Link](x2_min, x2_max, resolution))
# Z = [Link]([Link]([[Link](), [Link]()]).T)
# Z = [Link]([Link])
# [Link](xx1, xx2, Z, alpha=0.4, cmap=cmap)
# [Link]([Link](), [Link]())
# [Link]([Link](), [Link]())

# plot class samples

for idx, cl in enumerate([Link](y)):
[Link](x=X[y == cl, 0],
y=X[y == cl, 1],
alpha=0.6,
c=cmap(idx),
edgecolor='black',
marker=markers[idx],
label=cl)

In [ ]: #We use a LOGISTIC REGRESSION CLASSIFIERS.

#Now we train a logistic regression classifier using the first 2 principal components.

In [26]: from sklearn.linear_model import LogisticRegression

pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train_std)
X_test_pca = [Link](X_test_std)

lr = LogisticRegression()
lr = [Link](X_train_pca, y_train)

In [ ]: #Here we plot our results in the PCA space.

In [27]: plot_decision_regions(X_train_pca, y_train, classifier=lr)

[Link]('PC 1')
[Link]('PC 2')
[Link](loc='upper center')
plt.tight_layout()
# [Link]('images/05_04.png', dpi=300)
[Link]()

In [ ]: #Finally, if we set "none" the n_component parameter in PCA, it would return all principal component
s
#in sorted order instead of performing a dimensionality reduction.

In [28]: pca = PCA(n_components=None)

X_train_pca = pca.fit_transform(X_train_std)
pca.explained_variance_ratio_

Out[28]: array([0.6726802 , 0.15210869, 0.08526588, 0.04186912, 0.02250284,

0.00943019, 0.00563629, 0.00496987, 0.00393794, 0.00159898])

In [ ]: #This tutorial test ends here.

#We learned how to create an entire Python workflow for PCA.
#Furthremore we learned to apply directely a PCA in a fast way, using dedicated libraries
#from Scikit-learn.
#Finally, we applied a Logistic Regression classifier using the first 2 principal components.

In [ ]: #References:
#
#1. Raschka, S. and Mirjalili, V., 2017. Python Machine Learning:
#Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, 2nd Edition.
#Original code modified by Paolo Dell'Aversana and re-adapted for analysis of geo-chemical data.
#
#2. Jolliffe I.T., 2002. Principal Component Analysis,
#Series: Springer Series in Statistics, 2nd ed., Springer, NY, 2002, XXIX, 487 p. 28 illus.
#ISBN 978-0-387-95442-4.

Principal Component Analysis Python
No ratings yet
Principal Component Analysis Python
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Dvpd11 Merged Merged 27 83
No ratings yet
Dvpd11 Merged Merged 27 83
57 pages
Unsupervised Learning & PCA Guide
No ratings yet
Unsupervised Learning & PCA Guide
82 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
Unit1 ML Programs
No ratings yet
Unit1 ML Programs
5 pages
Data Science Libraries
No ratings yet
Data Science Libraries
4 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Machine Learning Practical File MRIEM
No ratings yet
Machine Learning Practical File MRIEM
49 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
34 pages
1
No ratings yet
1
13 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
Lab 8
No ratings yet
Lab 8
8 pages
External
No ratings yet
External
11 pages
AD3411
No ratings yet
AD3411
28 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
ML Lab Manual
No ratings yet
ML Lab Manual
37 pages
(Feature Engineering) (Extended-Cheatsheet)
100% (1)
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Pca 2382487
No ratings yet
Pca 2382487
8 pages
ML Assignment 1
No ratings yet
ML Assignment 1
12 pages
Exp 3
No ratings yet
Exp 3
4 pages
UNITIV BtechIot
No ratings yet
UNITIV BtechIot
43 pages
Ex. No.: 01 Working With Numpy Arrays
No ratings yet
Ex. No.: 01 Working With Numpy Arrays
30 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
ML Lab
No ratings yet
ML Lab
14 pages
DataScience All 1to8
No ratings yet
DataScience All 1to8
6 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
ML Manual
No ratings yet
ML Manual
9 pages
Empirical Crop Suitability Model 1694688954
No ratings yet
Empirical Crop Suitability Model 1694688954
24 pages
ml2020 Pythonlab03
No ratings yet
ml2020 Pythonlab03
5 pages
DAI Amberish LAB ASSIGNMENT 3
No ratings yet
DAI Amberish LAB ASSIGNMENT 3
7 pages
PCA Implementation and Analysis
No ratings yet
PCA Implementation and Analysis
15 pages
ML - Lab Manual
No ratings yet
ML - Lab Manual
54 pages
Iris Dataset PCA Analysis Code
No ratings yet
Iris Dataset PCA Analysis Code
21 pages
AI & Data Science Lab Record
No ratings yet
AI & Data Science Lab Record
28 pages
Computational Statistics Submission 5
No ratings yet
Computational Statistics Submission 5
2 pages
Experiment 1
No ratings yet
Experiment 1
19 pages
Preprocessing
No ratings yet
Preprocessing
9 pages
4 12
No ratings yet
4 12
17 pages
M PDF
No ratings yet
M PDF
13 pages
Experimenting With Data Analysis Packages and Statistical Operations
No ratings yet
Experimenting With Data Analysis Packages and Statistical Operations
18 pages
Advanced Machine Learning Course Guide
No ratings yet
Advanced Machine Learning Course Guide
36 pages
ML Lab - Exp1-10
No ratings yet
ML Lab - Exp1-10
4 pages
Practical 5
No ratings yet
Practical 5
6 pages
ML LAB - Principal Component Analysis
No ratings yet
ML LAB - Principal Component Analysis
3 pages
Data Mining with Python Lab Guide
No ratings yet
Data Mining with Python Lab Guide
39 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
Implementing PCA in Python With Scikit
No ratings yet
Implementing PCA in Python With Scikit
6 pages
Pattern Recognition Practicals
No ratings yet
Pattern Recognition Practicals
8 pages
PCA Guide: Usage, Python Implementation, Feature Importance
No ratings yet
PCA Guide: Usage, Python Implementation, Feature Importance
9 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
Update on pandas.util.testing Deprecation
No ratings yet
Update on pandas.util.testing Deprecation
10 pages
Importing Libraries Used in This Chapter
No ratings yet
Importing Libraries Used in This Chapter
8 pages
Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
ML 3
No ratings yet
ML 3
24 pages
Recommendation System
No ratings yet
Recommendation System
4 pages
ODSC Europe 2023: Machine Learning Insights
No ratings yet
ODSC Europe 2023: Machine Learning Insights
6 pages
Forecast Time Series-Notes
No ratings yet
Forecast Time Series-Notes
138 pages
BI Questions
No ratings yet
BI Questions
13 pages
MBA Management Accounting Course
No ratings yet
MBA Management Accounting Course
6 pages
Memorandum
No ratings yet
Memorandum
8 pages
Reference List of Synonyms PDF
No ratings yet
Reference List of Synonyms PDF
18 pages
Essays
No ratings yet
Essays
215 pages
Class 12 Maths Unit Test: Matrices
No ratings yet
Class 12 Maths Unit Test: Matrices
2 pages
Schur Complements and Its Applications To Symmetric Nonnegative and Z-Matrices
No ratings yet
Schur Complements and Its Applications To Symmetric Nonnegative and Z-Matrices
19 pages
Training Algorithms For Pattern Association
No ratings yet
Training Algorithms For Pattern Association
16 pages
Matrices - Formula Sheet
100% (1)
Matrices - Formula Sheet
6 pages
Tridiagonal Matrix Properties and Inverses
No ratings yet
Tridiagonal Matrix Properties and Inverses
20 pages
Python For Data Science Cheat Sheet
No ratings yet
Python For Data Science Cheat Sheet
14 pages
LEC 07 Divergence and Gauss Divergence Theorem
No ratings yet
LEC 07 Divergence and Gauss Divergence Theorem
31 pages
MA3251 Unit 3
No ratings yet
MA3251 Unit 3
57 pages
Lectures On Infinite Dimensional Lie Algebras Kac Moody Algebras Lecture Notes PDF
No ratings yet
Lectures On Infinite Dimensional Lie Algebras Kac Moody Algebras Lecture Notes PDF
159 pages
Introduction to Vectors and Algebra
No ratings yet
Introduction to Vectors and Algebra
17 pages
Math Notes for Diploma Students
No ratings yet
Math Notes for Diploma Students
42 pages
Mathematical Physics Applied Mathematics For Scientists and Engineers Second Edition Bruce R. Kusse
100% (3)
Mathematical Physics Applied Mathematics For Scientists and Engineers Second Edition Bruce R. Kusse
49 pages
Singlular Value Decomposition
No ratings yet
Singlular Value Decomposition
6 pages
Maths Worksheet 3 GR 11
100% (2)
Maths Worksheet 3 GR 11
3 pages
M Ch-07 Determinants
No ratings yet
M Ch-07 Determinants
3 pages
Applied Data Science with Python
No ratings yet
Applied Data Science with Python
17 pages
Hsslive Xii Maths 3 4. Matrices and Determinants
No ratings yet
Hsslive Xii Maths 3 4. Matrices and Determinants
9 pages
Class 11 Physics: Work, Energy, Power
No ratings yet
Class 11 Physics: Work, Energy, Power
11 pages
Feature Engineering For Machine Learning Principles and Techniques For Data Scientists First Edition Casari PDF Download
No ratings yet
Feature Engineering For Machine Learning Principles and Techniques For Data Scientists First Edition Casari PDF Download
156 pages
Understanding Dimensional Collapse
No ratings yet
Understanding Dimensional Collapse
17 pages
Physics Exam Prep Solutions
No ratings yet
Physics Exam Prep Solutions
46 pages
DPS Newtown Class XII Math Exam 2024
No ratings yet
DPS Newtown Class XII Math Exam 2024
9 pages
Vector Product of Two Vectors: (EAMCET 2007)
No ratings yet
Vector Product of Two Vectors: (EAMCET 2007)
2 pages
Econometrics I: Professor William Greene Stern School of Business Department of Economics
No ratings yet
Econometrics I: Professor William Greene Stern School of Business Department of Economics
49 pages
Curvilinear Coordinates in Math Physics
No ratings yet
Curvilinear Coordinates in Math Physics
5 pages
June 2023 A Level Core Pure 2 Detailed Written MS
No ratings yet
June 2023 A Level Core Pure 2 Detailed Written MS
34 pages
Understanding Vector Properties and Laws
No ratings yet
Understanding Vector Properties and Laws
16 pages
Vector Component Transformation in Mechanics
No ratings yet
Vector Component Transformation in Mechanics
6 pages
The Laplacian: Basic Mathematics
No ratings yet
The Laplacian: Basic Mathematics
33 pages
10th-Grade Vector Basics
No ratings yet
10th-Grade Vector Basics
7 pages

PCA and Logistic Regression for Rock Analysis

Uploaded by

PCA and Logistic Regression for Rock Analysis

Uploaded by

In [ ]: #A TUTORIAL PYTHON CODE ABOUT PRINCIPAL COMPONENT ANALYSIS AND LOGISTIC REGRESSION:

#APPLICATION TO ANALYSIS AND CLASSIFICATION OF ROCK SAMPLES.

In [ ]: #Remark: a basic knowledge of Python is required.

Out[1]: ['Table DR2']

In [3]: import pandas as pd

In [4]: import pandas as pd

In [ ]: #Let us plot the number of rows and columns of our file

Out[5]: (174, 13)

In [6]: from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = \

Out[6]: array(['andesite', 'basaltic andesite', 'dacite', 'rhyolite',

In [81]: #Now we encode the string classes into integers

In [7]: [Link](['andesite', 'basaltic andesite', 'dacite', 'rhyolite', 'trachyandesite'])

Out[7]: array([0, 1, 2, 3, 4], dtype=int64)

In [9]: from [Link] import StandardScaler

In [ ]: #Now we start the process of Pricipal Component Analysis (PCA)

In [10]: import numpy as np

print('\nEigenvalues \n%s' % eigen_vals)

In [12]: tot = sum(eigen_vals)

[0.6726801998183738, 0.15210869080035438, 0.08526588324049861, 0.04186911924860997, 0.02250284172

[0.6726802 0.82478889 0.91005477 0.95192389 0.97442673 0.98385693

In [15]: import [Link] as plt

[Link](range(1, 11),var_exp, alpha=0.9, align='center',label='individual explained variance')

In [ ]: #Finally we perform feature tranformation from their initial multidimentional space

In [16]: #We create a list of (eigenvalue, eigenvector) tuple:

#Then we sort the (eigenvalue, eigenvector) tuples:

In [17]: w = [Link]((eigen_pairs[0][1][:, [Link]],

Out[18]: array([-3.65015577, -1.35169258])

In [20]: X_train_pca = X_train_std.dot(w)

for l, c, m in zip([Link](y_train), colors, markers):

In [21]: from [Link] import PCA

Out[21]: array([0.6726802 , 0.15210869, 0.08526588, 0.04186912, 0.02250284,

In [ ]: #Let us plot again the explained variance ratio.

In [22]: [Link](range(1, 11), pca.explained_variance_ratio_, alpha=0.9, align='center')

In [23]: pca = PCA(n_components=2)

In [24]: [Link](X_train_pca[:, 0], X_train_pca[:, 1])

In [25]: from [Link] import ListedColormap

def plot_decision_regions(X, y, classifier, resolution=0.02):

# setup marker generator and color map

# plot class samples

In [ ]: #We use a LOGISTIC REGRESSION CLASSIFIERS.

In [26]: from sklearn.linear_model import LogisticRegression

In [ ]: #Here we plot our results in the PCA space.

In [27]: plot_decision_regions(X_train_pca, y_train, classifier=lr)

In [28]: pca = PCA(n_components=None)

Out[28]: array([0.6726802 , 0.15210869, 0.08526588, 0.04186912, 0.02250284,

In [ ]: #This tutorial test ends here.

You might also like