0% found this document useful (0 votes)

6 views2 pages

Splex Tme2

The document outlines a training module focused on clustering methods using the scikit-learn Python library. It includes instructions for analyzing simulated data sets and applying various clustering techniques such as K-means, hierarchical clustering, and spectral clustering. Additionally, it emphasizes evaluating clustering results using metrics like homogeneity, completeness, and silhouette scores, and suggests applying these methods to real-world data sets on breast cancer and mice protein expression.

Uploaded by

ahmedprof843

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views2 pages

Splex Tme2

Uploaded by

ahmedprof843

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

SPLEX TME 2

Clustering

The goal of the TME is to learn how to use some popular clustering methods (unsupervised
learning), and how to interpret the results.
We will use the scikit-learn Python library [Link] which is already installed
on the computers.

Data (simulated data sets + data sets of TME 1)

We explore two data sets downloadable from the Machine Learning Repository ([Link]
[Link]/ml/[Link])

• Breast Cancer Wisconsin (Diagnostic) Data Set ([Link]

Breast+Cancer+Wisconsin+(Diagnostic))

• Mice Protein Expression Data Set ([Link]

Expression)

Libraries
You will need to load the following packages:

import [Link] as plt

from sklearn import cluster
from [Link] import KMeans
from sklearn import metrics
from [Link] import AgglomerativeClustering
from [Link] import make_classification
from [Link] import make_blobs
from [Link] import make_moons

Analysis

Before running analysis on the Breast and Mice data sets, we will do analysis on three simulated
data sets to better understand what different clustering methods do, and why they produce different
clustering. Generate and visualize the artificial data as follows:

# First simulated data set

[Link]("Two informative features, one cluster per class", fontsize=’small’)
X1, Y1 = make_classification(n_samples=200, n_features=2, n_redundant=0, n_informative=2,
n_clusters_per_class=1)
[Link](X1[:, 0], X1[:, 1], marker=’o’, c=Y1,s=25, edgecolor=’k’)

# Second simulated data set

[Link]("Three blobs", fontsize=’small’)
X2, Y2 = make_blobs(n_samples=200, n_features=2, centers=3)
[Link](X2[:, 0], X2[:, 1], marker=’o’, c=Y2, s=25, edgecolor=’k’)

# Third simulated data set

[Link]("Non-linearly separated data sets", fontsize=’small’)
X3, Y3 = make_moons(n_samples=200, shuffle=True, noise=None, random_state=None)
[Link](X3[:, 0], X3[:, 1], marker=’o’, c=Y3, s=25, edgecolor=’k’)

1
Apply the following clustering methods to the three simulated data sets.
Clustering Methods

1. K-means
[Link]
An example of k-means clustering (where k is the number of clusters you want to produce,
and X is the data matrix):

km = KMeans(n_clusters=k, init=’k-means++’, max_iter=100, n_init=1)

[Link](X)

You can also visualize the clustering (and compare it to the true repartition):

[Link](X[:, 0], X[:, 1], s=10, c=km.labels_)

2. Hierarchical clustering
[Link]
An example of hierarchical clustering (where k is the number of clusters you want to produce,
and X is the data matrix):

for linkage in (’ward’, ’average’, ’complete’):

clustering = AgglomerativeClustering(linkage=linkage, n_clusters=k)
[Link](X)

3. Spectral clustering
[Link]
An example of spectral clustering (where k is the number of clusters you want to produce,
and X is the data matrix):

spectral = [Link](n_clusters=k, eigen_solver=’arpack’,

affinity="nearest_neighbors")
[Link](X)

4. Analyse the results of clustering in terms of

• Homogeneity [Link] score()

• Completeness [Link] score()
• V-measure metrics.v measure score()
• Adjusted Rand-Index [Link] rand score()
• Silhouette Coefficient [Link] score()

5. What is an optimal clustering method for each simulated data set?

6. Re-run the clustering methods on the Breast cancer and Mice data sets. Do not include the
class variables in your clustering analysis but compare the obtained clustering with the true
class labels.

Week 8 DS Practical
No ratings yet
Week 8 DS Practical
13 pages
COVID-19 Clustering Project Report
No ratings yet
COVID-19 Clustering Project Report
19 pages
Scikit-learn ML Course Guide
100% (1)
Scikit-learn ML Course Guide
23 pages
SE KMeansClustering
No ratings yet
SE KMeansClustering
21 pages
Python K-Means Clustering Guide
No ratings yet
Python K-Means Clustering Guide
6 pages
Tutorial 8
No ratings yet
Tutorial 8
12 pages
Experiment 4 1
No ratings yet
Experiment 4 1
4 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
JAVIER KMeans Clustering Jupyter Notebook
No ratings yet
JAVIER KMeans Clustering Jupyter Notebook
7 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
4 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Crash Course Sul Machine Learning ?
No ratings yet
Crash Course Sul Machine Learning ?
13 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
28 pages
Machine Learning
No ratings yet
Machine Learning
39 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
SOLUTION ONLY CODE DWDM - Lab - All
No ratings yet
SOLUTION ONLY CODE DWDM - Lab - All
8 pages
51 DA5400 - FML51 - 20250501 ProblemSet06
No ratings yet
51 DA5400 - FML51 - 20250501 ProblemSet06
4 pages
Deep Learning for Mental Illness Prediction
No ratings yet
Deep Learning for Mental Illness Prediction
58 pages
CH 15
No ratings yet
CH 15
88 pages
ML 2.3 Prashant
No ratings yet
ML 2.3 Prashant
4 pages
Scikit-Learn for Data Scientists
No ratings yet
Scikit-Learn for Data Scientists
27 pages
Scikit-Learn Supervised Learning Guide
100% (1)
Scikit-Learn Supervised Learning Guide
108 pages
# Mix Data Into A 100-Dimensional State: Print
No ratings yet
# Mix Data Into A 100-Dimensional State: Print
25 pages
AppliedML Chap1 Clustering
No ratings yet
AppliedML Chap1 Clustering
37 pages
Supervised vs Unsupervised Learning
No ratings yet
Supervised vs Unsupervised Learning
50 pages
MODELS (AutoRecovered)
No ratings yet
MODELS (AutoRecovered)
9 pages
Unsupervised
No ratings yet
Unsupervised
10 pages
Unsupervised Learning & K-Means Clustering
No ratings yet
Unsupervised Learning & K-Means Clustering
38 pages
Amlt Bca Unit-3
No ratings yet
Amlt Bca Unit-3
7 pages
Clustering Algorithms in Machine Learning
No ratings yet
Clustering Algorithms in Machine Learning
6 pages
Model-Based Clustering
No ratings yet
Model-Based Clustering
23 pages
Python for Machine Learning Enthusiasts
No ratings yet
Python for Machine Learning Enthusiasts
50 pages
Cluster Analysis For Gene Expression Data: Jiong Yang Eecs Case Western Reserve University
No ratings yet
Cluster Analysis For Gene Expression Data: Jiong Yang Eecs Case Western Reserve University
34 pages
Data Mining
No ratings yet
Data Mining
27 pages
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
No ratings yet
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
253 pages
Clustering For Clasification
No ratings yet
Clustering For Clasification
13 pages
Bayesian and Clustering Algorithms in Python
No ratings yet
Bayesian and Clustering Algorithms in Python
18 pages
Unsupervised Learning Techniques in Python
100% (2)
Unsupervised Learning Techniques in Python
89 pages
Lab Report6 - B21CI014
No ratings yet
Lab Report6 - B21CI014
8 pages
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
No ratings yet
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
6 pages
Clustering in Python-Dr. Afsaneh Javadi
No ratings yet
Clustering in Python-Dr. Afsaneh Javadi
8 pages
1 ST
No ratings yet
1 ST
11 pages
Machine Learning: Supervised /unsupervised
No ratings yet
Machine Learning: Supervised /unsupervised
33 pages
AI Overview Simplified
No ratings yet
AI Overview Simplified
17 pages
Python Clustering Techniques Explained
No ratings yet
Python Clustering Techniques Explained
12 pages
2.3. Clustering - Scikit-Learn 1
No ratings yet
2.3. Clustering - Scikit-Learn 1
24 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Kernel K-Means, Spectral Clustering and Normalized Cuts: Inderjit S. Dhillon Yuqiang Guan Brian Kulis
No ratings yet
Kernel K-Means, Spectral Clustering and Normalized Cuts: Inderjit S. Dhillon Yuqiang Guan Brian Kulis
6 pages
Module 3
No ratings yet
Module 3
21 pages
ML Lab Manual Completed
No ratings yet
ML Lab Manual Completed
56 pages
Machine Learning Lab Programs
No ratings yet
Machine Learning Lab Programs
6 pages
Cheat Sheet-Building Unsupervised Learning Models
No ratings yet
Cheat Sheet-Building Unsupervised Learning Models
3 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Spike Sorting Tutorial
No ratings yet
Spike Sorting Tutorial
25 pages
UnsupervisedLearning FoundationalMathofAI S24
No ratings yet
UnsupervisedLearning FoundationalMathofAI S24
6 pages
Robust Decision Trees
No ratings yet
Robust Decision Trees
6 pages
Unit 1 - Introduction
No ratings yet
Unit 1 - Introduction
8 pages
Data Mining Essentials Guide
No ratings yet
Data Mining Essentials Guide
23 pages
UNIT 2 Updated
No ratings yet
UNIT 2 Updated
50 pages
DWDM-JNTUK SyllabousPre
No ratings yet
DWDM-JNTUK SyllabousPre
2 pages
Literature Survey Diabetes Prediction
No ratings yet
Literature Survey Diabetes Prediction
2 pages
Business Analytics Course Overview
No ratings yet
Business Analytics Course Overview
4 pages
Lecture 4 - Density Based Methods
No ratings yet
Lecture 4 - Density Based Methods
16 pages
B.Tech Honors in Bioinformatics Syllabus
No ratings yet
B.Tech Honors in Bioinformatics Syllabus
10 pages
DATA Ware House & Mining NOTES
100% (2)
DATA Ware House & Mining NOTES
31 pages
DM 01 Introduction ML Data Mining
No ratings yet
DM 01 Introduction ML Data Mining
39 pages
Unit 6 - Data Mining
No ratings yet
Unit 6 - Data Mining
27 pages
Data Warehousing and Mining Overview
No ratings yet
Data Warehousing and Mining Overview
279 pages
Business Intelligence Architecture Overview
No ratings yet
Business Intelligence Architecture Overview
42 pages
Apr May 23 DMW
No ratings yet
Apr May 23 DMW
2 pages
Data Mining & ML for Cybersecurity
No ratings yet
Data Mining & ML for Cybersecurity
12 pages
1 s2.0 S2667096824000569 Main
No ratings yet
1 s2.0 S2667096824000569 Main
20 pages
Student Retention Using Educational Data Mining and Predictive Analytics A Systematic Literature Review
No ratings yet
Student Retention Using Educational Data Mining and Predictive Analytics A Systematic Literature Review
24 pages
Future Trends Data Mining Final With Images
No ratings yet
Future Trends Data Mining Final With Images
6 pages
Data Scientist Resume Writing Guide
100% (1)
Data Scientist Resume Writing Guide
8 pages
Data Mining & Warehousing Q&A
No ratings yet
Data Mining & Warehousing Q&A
6 pages
Yelp Data Mining Capstone Report
No ratings yet
Yelp Data Mining Capstone Report
15 pages
Web Mining
No ratings yet
Web Mining
27 pages
Density-Based Methods: DBSCAN: Density-Based Clustering Based On Connected Regions With High Density
No ratings yet
Density-Based Methods: DBSCAN: Density-Based Clustering Based On Connected Regions With High Density
3 pages
Haul Trucks Queuing Prediction in Open Pit Mines
No ratings yet
Haul Trucks Queuing Prediction in Open Pit Mines
6 pages
Review Paper 3
No ratings yet
Review Paper 3
11 pages
Understanding Data Mining Techniques
No ratings yet
Understanding Data Mining Techniques
6 pages
Cs3353 Fds Unit 1 Notes Eduengg
No ratings yet
Cs3353 Fds Unit 1 Notes Eduengg
51 pages
Tm351-Final- تجميعات النظري - By Isa
No ratings yet
Tm351-Final- تجميعات النظري - By Isa
44 pages
Data Mining Techniques: Introductory and Advanced Topics
100% (1)
Data Mining Techniques: Introductory and Advanced Topics
17 pages

Splex Tme2

Uploaded by

Splex Tme2

Uploaded by

SPLEX TME 2

Data (simulated data sets + data sets of TME 1)

• Breast Cancer Wisconsin (Diagnostic) Data Set ([Link]

• Mice Protein Expression Data Set ([Link]

import [Link] as plt

# First simulated data set

# Second simulated data set

# Third simulated data set

km = KMeans(n_clusters=k, init=’k-means++’, max_iter=100, n_init=1)

[Link](X[:, 0], X[:, 1], s=10, c=km.labels_)

for linkage in (’ward’, ’average’, ’complete’):

spectral = [Link](n_clusters=k, eigen_solver=’arpack’,

4. Analyse the results of clustering in terms of

• Homogeneity [Link] score()

5. What is an optimal clustering method for each simulated data set?

You might also like