100% found this document useful (2 votes)

167 views6 pages

Tutorial 2 - Clustering

This document is a Jupyter notebook that explores different clustering algorithms including K-Means clustering, DBSCAN clustering, and agglomerative clustering. It loads and explores a driver dataset, applies K-Means clustering to identify 4 clusters, visualizes the clusters, performs normalization before applying K-Means again, and compares the results. It then applies DBSCAN clustering before and after normalization. Finally, it performs agglomerative clustering and dendrogram visualization to identify clusters in the normalized data.

Uploaded by

Gupta Akshay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

167 views6 pages

Tutorial 2 - Clustering

Uploaded by

Gupta Akshay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

14/09/2018 Tutorial 2 - Clustering

In [13]:

import numpy as np
import pandas as pd
from [Link] import KMeans
pd.set_option('display.float_format', lambda x: '%.3f' % x)
%matplotlib inline
import [Link] as plt

In [9]:

data = pd.read_csv("./driver_dataset.csv", sep='\t')

In [10]:

[Link]()

<class '[Link]'>
RangeIndex: 4000 entries, 0 to 3999
Data columns (total 3 columns):
Driver_ID 4000 non-null int64
Distance_Feature 4000 non-null float64
Speeding_Feature 4000 non-null float64
dtypes: float64(2), int64(1)
memory usage: 93.8 KB

In [11]:

[Link]()

Out[11]:

Driver_ID Distance_Feature Speeding_Feature

count 4000.000 4000.000 4000.000

mean 3423312447.500 76.042 10.721

std 1154.845 53.470 13.709

min 3423310448.000 15.520 0.000

25% 3423311447.750 45.248 4.000

50% 3423312447.500 53.330 6.000

75% 3423313447.250 65.632 9.000

max 3423314447.000 244.790 100.000

[Link] 1/7
14/09/2018 Tutorial 2 - Clustering

In [26]:

[Link]([Link][:,1:2], [Link][:,2:3])
[Link]([Link][1])
[Link]([Link][2])
[Link]()

In [28]:

wcss = []
for i in range(1,11):
kmeans = KMeans(n_clusters = i,init = 'k-means++',random_state = 0)
[Link](data)
[Link](kmeans.inertia_)
[Link](range(1,11),wcss)
[Link]('The Elbow Method')
[Link]('Number of cluster')
[Link]('WCSS')
[Link]()

In [52]:

kmeans = KMeans(n_clusters = 4,init = 'k-means++',random_state =0)

y_kmeans = kmeans.fit_predict(data)

[Link] 2/7
14/09/2018 Tutorial 2 - Clustering

In [53]:

%matplotlib inline
[Link]=(40, 40)
[Link]([Link][:,1],[Link][:,2], c=y_kmeans)

Out[53]:

<[Link] at 0x7f381ee64ba8>

In [47]:

from sklearn import preprocessing

#Performing Min_Max Normalization
min_max_scaler = [Link]()
np_scaled = min_max_scaler.fit_transform([Link][:,1:])
dataN = [Link](np_scaled)
[Link]()

Out[47]:

0 1

0 0.243 0.280

1 0.161 0.250

2 0.214 0.270

3 0.175 0.220

4 0.170 0.250

In [50]:

kmeans = KMeans(n_clusters = 4,init = 'k-means++',random_state =0)

y2_kmeans = kmeans.fit_predict(dataN)

[Link] 3/7
14/09/2018 Tutorial 2 - Clustering

In [59]:

%matplotlib inline
[Link]([Link][:,1],[Link][:,2], c=y2_kmeans)

Out[59]:

<[Link] at 0x7f381c32eda0>

In [ ]:

#DBSCAN STARTS

In [78]:

from [Link] import DBSCAN

dbscan = DBSCAN(eps=0.1, metric='euclidean', min_samples=5)

In [79]:

dbsc = [Link](data)
dbsc.labels_

Out[79]:

array([-1, -1, -1, ..., -1, -1, -1])

[Link] 4/7
14/09/2018 Tutorial 2 - Clustering

In [80]:

[Link]([Link][:,1],[Link][:,2], c=dbsc.labels_)

Out[80]:

<[Link] at 0x7f38142e7550>

In [81]:

dbsc = [Link](dataN)
dbsc.labels_

Out[81]:

array([0, 0, 0, ..., 1, 1, 1])

In [82]:

[Link]([Link][:,1],[Link][:,2], c=dbsc.labels_)

Out[82]:

<[Link] at 0x7f381437b198>

[Link] 5/7
14/09/2018 Tutorial 2 - Clustering

In [66]:

model.labels_

Out[66]:

array([-1, -1, -1, ..., -1, -1, -1])

In [ ]:

#AGGLOMERATIVE STARTS

In [67]:

from [Link] import AgglomerativeClustering as AC

aggclus = AC(n_clusters = 4,affinity='euclidean',linkage='ward',compute_full_tree='
y_aggclus= aggclus.fit_predict([Link][:,1:3])

In [68]:

y_aggclus

Out[68]:

array([3, 3, 3, ..., 1, 1, 1])

In [69]:

from [Link] import dendrogram, linkage,cut_tree

from [Link] import fcluster
k=4
linkage_matrix = linkage(dataN, "ward",metric="euclidean")
ddata=dendrogram(linkage_matrix,color_threshold=1.5)

In [83]:

ddata=dendrogram(linkage_matrix,color_threshold=1.5)
[Link](figsize=(5,7))

Out[83]:

<Figure size 360x504 with 0 Axes>

[Link] 6/7

ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
Name: Siti Mursyida Abdul Karim (Data Science Program) Topic: Assignment - EDA
100% (1)
Name: Siti Mursyida Abdul Karim (Data Science Program) Topic: Assignment - EDA
13 pages
Python Data Preprocessing & Regression
No ratings yet
Python Data Preprocessing & Regression
68 pages
Association Rules Ans
No ratings yet
Association Rules Ans
28 pages
Clustering Documentation R Code
100% (1)
Clustering Documentation R Code
9 pages
Topic: Dimension Reduction With PCA: Instructions
No ratings yet
Topic: Dimension Reduction With PCA: Instructions
8 pages
Duplication - Typecasting-Problem Statement
100% (1)
Duplication - Typecasting-Problem Statement
3 pages
2a EDA
No ratings yet
2a EDA
16 pages
Day13 K Means Clustering
No ratings yet
Day13 K Means Clustering
4 pages
Python Data Analysis Tasks
No ratings yet
Python Data Analysis Tasks
3 pages
Wholesale Customer Analysis & CMSU Survey
100% (1)
Wholesale Customer Analysis & CMSU Survey
19 pages
Aphical Representation
No ratings yet
Aphical Representation
8 pages
Discretization Problem Statement
No ratings yet
Discretization Problem Statement
3 pages
Why Do You Need To Scale Data in KNN: 3 Answers
No ratings yet
Why Do You Need To Scale Data in KNN: 3 Answers
1 page
CRISP DM Business Understanding - Data Science
No ratings yet
CRISP DM Business Understanding - Data Science
15 pages
EDA Folder Assignment
No ratings yet
EDA Folder Assignment
13 pages
PCA Analysis of Wine Quality Data
100% (1)
PCA Analysis of Wine Quality Data
1 page
Day10 Mathematical Foundations
No ratings yet
Day10 Mathematical Foundations
4 pages
Python Programs for Basic Calculations
No ratings yet
Python Programs for Basic Calculations
7 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Machine Learning Project Analysis
No ratings yet
Machine Learning Project Analysis
114 pages
15 KNN - Problem Statement
0% (2)
15 KNN - Problem Statement
3 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
MRA Project Milestone2 PDF
100% (1)
MRA Project Milestone2 PDF
1 page
IIIT-B Postgrad Assessment Guide
No ratings yet
IIIT-B Postgrad Assessment Guide
13 pages
Transport Mode Prediction Analysis
100% (2)
Transport Mode Prediction Analysis
21 pages
Problem Statement 1
100% (1)
Problem Statement 1
17 pages
Predicting Contraceptive Choices
No ratings yet
Predicting Contraceptive Choices
3 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
Election Prediction Model Report
No ratings yet
Election Prediction Model Report
43 pages
Module 2 Data Types, Operators, Variables Assignment
No ratings yet
Module 2 Data Types, Operators, Variables Assignment
4 pages
Machine Learning Solution
100% (1)
Machine Learning Solution
12 pages
Data Science Exam Solutions
No ratings yet
Data Science Exam Solutions
1,326 pages
Data Mining Project: Clustering & Model Analysis
100% (1)
Data Mining Project: Clustering & Model Analysis
40 pages
CRISP DM Business Understanding Completed
No ratings yet
CRISP DM Business Understanding Completed
18 pages
Support Vector Machines Problem Statement
No ratings yet
Support Vector Machines Problem Statement
27 pages
Module 03 Assignment
100% (1)
Module 03 Assignment
13 pages
SMDM Project Report
100% (1)
SMDM Project Report
9 pages
Project 5 PDF
100% (1)
Project 5 PDF
48 pages
Asphalt Shingles Data Analysis PDF
No ratings yet
Asphalt Shingles Data Analysis PDF
4 pages
Answer Report (Preditive Modelling)
100% (1)
Answer Report (Preditive Modelling)
29 pages
Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
25 pages
DataPreparation Outlier Treatment
100% (1)
DataPreparation Outlier Treatment
3 pages
Wholesale Custumer
100% (1)
Wholesale Custumer
32 pages
Probability & Hardness Analysis Report
No ratings yet
Probability & Hardness Analysis Report
51 pages
13.exploratory Data Analysis
50% (2)
13.exploratory Data Analysis
8 pages
Mvchine Learning Project Report
No ratings yet
Mvchine Learning Project Report
33 pages
Linear Regression & Contraceptive Use Analysis
No ratings yet
Linear Regression & Contraceptive Use Analysis
12 pages
Walmart Sales Prediction
No ratings yet
Walmart Sales Prediction
21 pages
Uber Trip Data Analysis
No ratings yet
Uber Trip Data Analysis
10 pages
Customer Churn Prediction Strategies
No ratings yet
Customer Churn Prediction Strategies
33 pages
Assighment Project 1
100% (3)
Assighment Project 1
18 pages
Prathamesh Shukla SMDM Project 20.08.23
100% (1)
Prathamesh Shukla SMDM Project 20.08.23
34 pages
CRISP DM Business Aissgnment
No ratings yet
CRISP DM Business Aissgnment
18 pages
Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022
No ratings yet
Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022
48 pages
Machine Learning Business Report - Compress (AutoRecovered)
100% (3)
Machine Learning Business Report - Compress (AutoRecovered)
69 pages
Tutorial 8
No ratings yet
Tutorial 8
12 pages
KDD WS 24 25 E4 Clustering I
No ratings yet
KDD WS 24 25 E4 Clustering I
2 pages