0% found this document useful (0 votes)

34 views7 pages

DSBA Master Codebook - Unsupervised Learning

Uploaded by

VAIBHAV PATIL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views7 pages

DSBA Master Codebook - Unsupervised Learning

Uploaded by

VAIBHAV PATIL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DSBA

[email protected]
18XHT46RCY
Codebook

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Preface

Data Science is the art and science of solving real world problems and making data driven decisions. It involves an
amalgamation of three aspects and a good data scientist has expertise in all three of them. These are:

1) Mathematical/ Statistical understanding

2) Coding/ Technology understanding
3) Domain knowledge

Your lack of expertise should not become an impediment in your journey in Data Science. With consistent effort, you
can become fairly proficient in coding skills over a period of time. This Codebook is intended to help you become
comfortable with the finer nuances of Python and can be used as a handy reference for anything related to data science
codes throughout the program journey and beyond that.

In this document we have followed the following syntax:

- Brief description of the topic

- Followed with a code example.

Please keep in mind there is no one right way to write a code to achieve an intended outcome. There can be multiple
ways of doing things in Python. The examples presented in this document use just one of the approaches to perform
the analysis. Please explore by yourself different ways to perform the same thing.

[email protected]
18XHT46RCY

1
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Contents

PREFACE ......................................................................................................................................................... 1

TABLE OF FIGURES ..................................................................................................................................... 2

TABLE OF EQUATIONS ............................................................ ERROR! BOOKMARK NOT DEFINED.

UNSUPERVISED LEARNING ...................................................................................................................... 3

Clustering ......................................................................................................................................................................... 3
Partition Clustering: K-Means ....................................................................................................................................... 3
Hierarchical Clustering: Agglomerative ......................................................................................................................... 3

DIMENSIONALITY REDUCTION TECHNIQUES .................................................................................. 5

Principal Component Analysis ...................................................................................................................................... 5

Dimensionality reduction using Linear Discriminant Analysis .................................................................................. 5

[email protected]
18XHT46RCY

Table of Figures
Figure 15: A Dendrogram ................................................................................................................................................................... 4

2
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Unsupervised Learning

Clustering

Grouping similar data

Partition Clustering: K-Means

The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as
the inertia or within-cluster sum-of-squares (see below). This algorithm requires the number of clusters to be specified. It scales
well to a large number of samples and has been used across a large range of application areas in many different fields.

from sklearn.cluster import KMeans

import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0],

[10, 2], [10, 4], [10, 0]])

kmeans = KMeans(n_clusters=2, random_state=0).fit(X)

kmeans.labels_ #cluster numbers assigned to data points

[email protected]
18XHT46RCY
array([1, 1, 1, 0, 0, 0], dtype=int32) #output

kmeans.predict([[0, 0], [12, 3]])

array([1, 0], dtype=int32) #output

kmeans.cluster_centers_ #cluster centroids

array([[10., 2.], #output

[ 1., 2.]])

Source: scikit-learn

Hierarchical Clustering: Agglomerative

Hierarchical clustering is a general family of clustering algorithms that build nested clusters by merging or splitting them
successively. This hierarchy of clusters is represented as a tree (or dendrogram). The root of the tree is the unique cluster that gathers
all the samples, the leaves being the clusters with only one sample.

The AgglomerativeClustering object performs a hierarchical clustering using a bottom up approach: each observation starts in its
own cluster, and clusters are successively merged together. The linkage criteria determine the metric used for the merge strategy.

from sklearn.cluster import AgglomerativeClustering

import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0],

[4, 2], [4, 4], [4, 0]])

3
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
clustering = AgglomerativeClustering().fit(X)

clustering.labels_

array([1, 1, 1, 0, 0, 0])

Source: scikit-learn

Dendrogram

Plotting the hierarchical clustering as a dendrogram. The dendrogram illustrates how each cluster is composed by drawing a U-
shaped link between a non-singleton cluster and its children. The top of the U-link indicates a cluster merge. The two legs of the U-
link indicate which clusters were merged. The length of the two legs of the U-link represents the distance between the child clusters.
It is also the cophenetic distance between original observations in the two children clusters.

from scipy.cluster import hierarchy

import matplotlib.pyplot as plt

ytdist = np.array([662., 877., 255., 412., 996., 295., 468., 268.,

400., 754., 564., 138., 219., 869., 669.])

Z = hierarchy.linkage(ytdist, 'single')

plt.figure()

dn = hierarchy.dendrogram(Z)
[email protected]
18XHT46RCY

Figure 1: A Dendrogram

Source: scipy

4
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Dimensionality Reduction Techniques

Principal Component Analysis

PCA is used to decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum amount of
the variance. In scikit-learn, PCA is implemented as a transformer object that learns n components in its fit method, and can be used
on new data to project it on these components.

Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. The
input data is centered but not scaled for each feature before applying the SVD. source: scikit-learn

import numpy as np

from sklearn.decomposition import PCA

X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])

pca = PCA(n_components=2)

pca.fit(X)

PCA(n_components=2)

print(pca.explained_variance_ratio_)

[0.99244289 0.00755711]

print(pca.singular_values_)
[email protected]
[6.30061232 0.54980396]
18XHT46RCY
Method 2: (using the statsmodels library)

from statsmodels.multivariate.pca import PCA

pc = PCA(‘Data Frame on which to apply PCA’, ncomp=’Number of principal components required’)
pc.factors #to get the reduced dimension data

#Code to draw the scree plot to decide the number of factors:

from statsmodels.multivariate.factor import Factor
model=Factor(‘Dataset on which you want to apply PCA’).fit()
model.plot_scree()
plt.show()

Dimensionality reduction using Linear Discriminant Analysis

LDA can be used to perform supervised dimensionality reduction, by projecting the input data to a linear subspace consisting of the
directions which maximize the separation between classes. The dimension of the output is necessarily less than the number of
classes, so this is, in general, a rather strong dimensionality reduction, and only makes sense in a multiclass setting.

This is implemented in discriminant_analysis.LinearDiscriminantAnalysis.transform. The desired dimensionality can be set

using the n_components constructor parameter.

import numpy as np

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

X = np.array([[-1, -1,2], [-2, -1,-1], [-3, -2,-3], [1, 1,-2], [2, 1,-3], [3, 2,-2]])

y = np.array([1, 1, 1, 2, 2, 2])

5
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
lda = LinearDiscriminantAnalysis(n_components=1)

reduced=lda.fit_transform(X,y)

reduced

array([[-3.98646358],

[-2.84747399],

[-3.70171618],

[ 2.27797919],

[ 3.41696878],

[ 4.84070578]])

Source: scikit-learn

[email protected]
18XHT46RCY

6
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.

Week 8 DS Practical
No ratings yet
Week 8 DS Practical
13 pages
Day8 Unsupervised Learning
No ratings yet
Day8 Unsupervised Learning
40 pages
Unit 3 Unsupervised Learning
No ratings yet
Unit 3 Unsupervised Learning
9 pages
DATA - Dist
No ratings yet
DATA - Dist
90 pages
Module 3
No ratings yet
Module 3
21 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
10 pages
Agnes
No ratings yet
Agnes
25 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
6 pages
Machine Learning: Dimensionality & Clustering
No ratings yet
Machine Learning: Dimensionality & Clustering
5 pages
Clustering Algorithms CheatSheet
No ratings yet
Clustering Algorithms CheatSheet
6 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
Amlt Bca Unit-3
No ratings yet
Amlt Bca Unit-3
7 pages
MODELS (AutoRecovered)
No ratings yet
MODELS (AutoRecovered)
9 pages
Hierarchical Clustering in Python Guide
No ratings yet
Hierarchical Clustering in Python Guide
30 pages
Lesson 6 - Unsupervised Learning
No ratings yet
Lesson 6 - Unsupervised Learning
63 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
11 pages
Hierarchical Clustering Methods Explained
No ratings yet
Hierarchical Clustering Methods Explained
31 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
51 DA5400 - FML51 - 20250501 ProblemSet06
No ratings yet
51 DA5400 - FML51 - 20250501 ProblemSet06
4 pages
UnsupervisedLearning FoundationalMathofAI S24
No ratings yet
UnsupervisedLearning FoundationalMathofAI S24
6 pages
MATLAB Cluster Analysis Thesis Help
100% (3)
MATLAB Cluster Analysis Thesis Help
7 pages
Clustering
No ratings yet
Clustering
65 pages
Hierarchical Clustering Explained
No ratings yet
Hierarchical Clustering Explained
14 pages
Data Mining
No ratings yet
Data Mining
27 pages
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
No ratings yet
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
23 pages
Practical Data Analysis Cookbook - Sample Chapter
100% (1)
Practical Data Analysis Cookbook - Sample Chapter
31 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
Assignment
No ratings yet
Assignment
24 pages
Partition-Based Clustering Techniques
No ratings yet
Partition-Based Clustering Techniques
52 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
50 pages
Lecture - 11 Hierarchical Clustering
No ratings yet
Lecture - 11 Hierarchical Clustering
28 pages
Bayesian and Clustering Algorithms in Python
No ratings yet
Bayesian and Clustering Algorithms in Python
18 pages
Cheat Sheet-Building Unsupervised Learning Models
No ratings yet
Cheat Sheet-Building Unsupervised Learning Models
3 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
No ratings yet
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
44 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
PeerEval Unsupervised
No ratings yet
PeerEval Unsupervised
6 pages
Clustering in Python-Dr. Afsaneh Javadi
No ratings yet
Clustering in Python-Dr. Afsaneh Javadi
8 pages
Clusteringi 4
No ratings yet
Clusteringi 4
6 pages
K-Means Clustering Overview
No ratings yet
K-Means Clustering Overview
24 pages
Clustering
No ratings yet
Clustering
45 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
Data Analytics for B.Tech Students
No ratings yet
Data Analytics for B.Tech Students
98 pages
ML TCS Lecture Hierarchical 1608
No ratings yet
ML TCS Lecture Hierarchical 1608
41 pages
Clustering
No ratings yet
Clustering
20 pages
Data Mining Project - Clustering - State Wise Health Income
No ratings yet
Data Mining Project - Clustering - State Wise Health Income
9 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
11 pages
Cluster
100% (1)
Cluster
72 pages
Hierarchical Clustering in Python
No ratings yet
Hierarchical Clustering in Python
5 pages
Cluster Analysis & K-Means Limitations
No ratings yet
Cluster Analysis & K-Means Limitations
84 pages
Pca&kmean
No ratings yet
Pca&kmean
6 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
Clustering
No ratings yet
Clustering
55 pages
NNML ml3
No ratings yet
NNML ml3
84 pages
Article 6 - Smart Grid
No ratings yet
Article 6 - Smart Grid
4 pages
Article 2 - Underground & Overhead Cabling
No ratings yet
Article 2 - Underground & Overhead Cabling
8 pages
Article 5 - Acquaguard Water Filter
No ratings yet
Article 5 - Acquaguard Water Filter
4 pages
Data Science & Probability Basics
No ratings yet
Data Science & Probability Basics
5 pages
Feature Encoding
No ratings yet
Feature Encoding
5 pages
Linear Discriminant Analysis Reference
No ratings yet
Linear Discriminant Analysis Reference
6 pages
Learner Handbook
No ratings yet
Learner Handbook
11 pages
Data Digest - June 2022 Edition
No ratings yet
Data Digest - June 2022 Edition
15 pages
PGPDSBA+Oct.B.22+Program+Schedule Updated
No ratings yet
PGPDSBA+Oct.B.22+Program+Schedule Updated
2 pages
Article 3 - Boss Vegetable Chopper
No ratings yet
Article 3 - Boss Vegetable Chopper
3 pages
Data Science Course Overview
No ratings yet
Data Science Course Overview
31 pages
Technology Requirement (3)
No ratings yet
Technology Requirement (3)
2 pages
General Information
No ratings yet
General Information
6 pages
First Step To Start Up
No ratings yet
First Step To Start Up
2 pages
Customer Feedback
No ratings yet
Customer Feedback
1 page
How To Create A Plan
No ratings yet
How To Create A Plan
5 pages
Know Your Faculty (DSBA)
No ratings yet
Know Your Faculty (DSBA)
4 pages
Memorandum of Understanding (MOU) : Date
No ratings yet
Memorandum of Understanding (MOU) : Date
2 pages
Top Cap of Breather
No ratings yet
Top Cap of Breather
1 page
Creating A Website
No ratings yet
Creating A Website
2 pages
Breather
No ratings yet
Breather
1 page
Top Cap of Breather
No ratings yet
Top Cap of Breather
1 page
Inner Oil Cap of Breather4
No ratings yet
Inner Oil Cap of Breather4
1 page
Transformer Breather Cup Specifications
No ratings yet
Transformer Breather Cup Specifications
1 page
500 gm Transformer Breather Container
No ratings yet
500 gm Transformer Breather Container
1 page
Oil Cup of Breather
No ratings yet
Oil Cup of Breather
1 page
Aluminum Container of Breather
No ratings yet
Aluminum Container of Breather
2 pages
Inner Oil Cap of Breather
No ratings yet
Inner Oil Cap of Breather
1 page
Inner Oil Cap of Breather
No ratings yet
Inner Oil Cap of Breather
2 pages
Full Product - Breather
No ratings yet
Full Product - Breather
5 pages
The Role of Health and Safety in Project Management
No ratings yet
The Role of Health and Safety in Project Management
16 pages
Lagna Lord in Various Houses2
100% (1)
Lagna Lord in Various Houses2
31 pages
PLS-SEM Mediation Analysis Guide
No ratings yet
PLS-SEM Mediation Analysis Guide
24 pages
Ground Sense Operational Amplifiers: Datasheet
No ratings yet
Ground Sense Operational Amplifiers: Datasheet
55 pages
Kxts500mx SM Panasonic en
No ratings yet
Kxts500mx SM Panasonic en
32 pages
6155 - G.H.Raisoni College of Engineering & Management, Wagholi, Pune
No ratings yet
6155 - G.H.Raisoni College of Engineering & Management, Wagholi, Pune
12 pages
One Way Slab Design Overview
100% (1)
One Way Slab Design Overview
29 pages
Free Open Access Health Journals List
No ratings yet
Free Open Access Health Journals List
36 pages
Grade 8: Coal & Petroleum Basics
No ratings yet
Grade 8: Coal & Petroleum Basics
14 pages
Computer Inventory June 2020
No ratings yet
Computer Inventory June 2020
19 pages
WRN - STD - 041B Grease Trap 1050 2021 12
No ratings yet
WRN - STD - 041B Grease Trap 1050 2021 12
1 page
Introduction To Counseling: Ma. Jocille S. Alabata, RPM Notre Dame of Marbel Univerisity
No ratings yet
Introduction To Counseling: Ma. Jocille S. Alabata, RPM Notre Dame of Marbel Univerisity
20 pages
EFRIS API Documentation V21.9.1
No ratings yet
EFRIS API Documentation V21.9.1
220 pages
TEAC FD-235HF-C891 Micro Floppy Disk Drive Specification
No ratings yet
TEAC FD-235HF-C891 Micro Floppy Disk Drive Specification
31 pages
Durga Mongo
100% (2)
Durga Mongo
5 pages
En RC 0201 IP Instructions Manual 2020
No ratings yet
En RC 0201 IP Instructions Manual 2020
38 pages
Janome Memory Craft 3500 Sewing Machine Service Manual
No ratings yet
Janome Memory Craft 3500 Sewing Machine Service Manual
31 pages
Masskara LOCAL HISTORY INTERVIEW
No ratings yet
Masskara LOCAL HISTORY INTERVIEW
15 pages
Class Exercise 1
No ratings yet
Class Exercise 1
2 pages
Cracked Password Dumps Overview
33% (6)
Cracked Password Dumps Overview
56 pages
GATE EService ID Application Process
No ratings yet
GATE EService ID Application Process
2 pages
Harmonic Termination for Class-F MMIC PAs
No ratings yet
Harmonic Termination for Class-F MMIC PAs
9 pages
Meng Chen, Lan Xu, Linda Van Horn, Joann E. Manson, Katherine L. Tucker, Xihao Du, Nannan Feng, Shuang Rong, Victor W. Zhong
No ratings yet
Meng Chen, Lan Xu, Linda Van Horn, Joann E. Manson, Katherine L. Tucker, Xihao Du, Nannan Feng, Shuang Rong, Victor W. Zhong
1 page
ADP Potential of Payroll 2025 IN
No ratings yet
ADP Potential of Payroll 2025 IN
17 pages
Discussion Board 3
No ratings yet
Discussion Board 3
9 pages
Copy Editor Resume
100% (1)
Copy Editor Resume
4 pages
Slotkets
No ratings yet
Slotkets
67 pages
Performance Analysis (Sonali Bank - Formalities Page)
No ratings yet
Performance Analysis (Sonali Bank - Formalities Page)
6 pages
MATLAB Simulink Course Overview
100% (1)
MATLAB Simulink Course Overview
11 pages
Calculation of Production Costs and Operating Profit of MSME in Terms of Accounting Standards
No ratings yet
Calculation of Production Costs and Operating Profit of MSME in Terms of Accounting Standards
5 pages

DSBA Master Codebook - Unsupervised Learning

Uploaded by

DSBA Master Codebook - Unsupervised Learning

Uploaded by

DSBA

This file is meant for personal use by [email protected] only.

1) Mathematical/ Statistical understanding

In this document we have followed the following syntax:

- Brief description of the topic

TABLE OF FIGURES ..................................................................................................................................... 2

TABLE OF EQUATIONS ............................................................ ERROR! BOOKMARK NOT DEFINED.

UNSUPERVISED LEARNING ...................................................................................................................... 3

DIMENSIONALITY REDUCTION TECHNIQUES .................................................................................. 5

Principal Component Analysis ...................................................................................................................................... 5

Dimensionality reduction using Linear Discriminant Analysis .................................................................................. 5

Grouping similar data

Partition Clustering: K-Means

from sklearn.cluster import KMeans

X = np.array([[1, 2], [1, 4], [1, 0],

[10, 2], [10, 4], [10, 0]])

kmeans = KMeans(n_clusters=2, random_state=0).fit(X)

kmeans.labels_ #cluster numbers assigned to data points

kmeans.predict([[0, 0], [12, 3]])

array([1, 0], dtype=int32) #output

kmeans.cluster_centers_ #cluster centroids

array([[10., 2.], #output

Hierarchical Clustering: Agglomerative

from sklearn.cluster import AgglomerativeClustering

X = np.array([[1, 2], [1, 4], [1, 0],

[4, 2], [4, 4], [4, 0]])

from scipy.cluster import hierarchy

import matplotlib.pyplot as plt

ytdist = np.array([662., 877., 255., 412., 996., 295., 468., 268.,

400., 754., 564., 138., 219., 869., 669.])

Principal Component Analysis

from sklearn.decomposition import PCA

from statsmodels.multivariate.pca import PCA

#Code to draw the scree plot to decide the number of factors:

Dimensionality reduction using Linear Discriminant Analysis

This is implemented in discriminant_analysis.LinearDiscriminantAnalysis.transform. The desired dimensionality can be set

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

You might also like