0% found this document useful (0 votes)

226 views13 pages

Clustering Techniques in I.R.

The document discusses different types of clustering algorithms used in information retrieval, including flat and hierarchical clustering. It provides details on K-means clustering as a common flat clustering algorithm that aims to minimize distance between clusters. Hierarchical clustering algorithms create cluster hierarchies and do not require pre-specifying the number of clusters. Agglomerative hierarchical clustering is a bottom-up approach that initially treats each document as a singleton cluster then merges them. The document also discusses applications of clustering in information retrieval like improving search results and interfaces.

Uploaded by

XY Z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

226 views13 pages

Clustering Techniques in I.R.

Uploaded by

XY Z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

FLAT CLUSTERING &

HIERARCHICAL
CLUSTERING in I.R.

By:
Suraj Jogani (20117101)
Pankaj Agarwal (20117068)
Aditya Kumar Dubey(20117901)

Electrical Engineering
6th Semester
What is clustering?
 Grouping set of documents into subsets or clusters.
 The Goal of clustering algorithm is: To create clusters that are coherent internally,
but clearly different from each other
 Documents within a cluster should be as similar as possible; and
 Documents in one cluster should be as dissimilar as possible from documents in
other clusters
Clustering algorithms

Flat algorithm
 Usually start with a random (partial) partitioning
 Refine it iteratively by changing the centroid
 K-Means clustering
 Model based clustering

Hierarchical algorithm
Hierarchical algorithms are algorithms where you also have the explicit notion of a
hierarchy

In hierarchical we can cluster our documents into certain no of clusters and then we can
group together those clusters in turn into larger clusters and so on to finally have a
hierarchy.
Bottom up ,agglomerative
Top down ,Divisive
Hard vs soft clustering

Another way to classify clustering is:

 Hard clustering: Each documents belong to exactly one

cluster.
 Soft clustering: A document can belongs to more than one
cluster.
Ex—You may want to put a pair of sneakers in two clusters (i)
sports apparels (ii) shoes

Soft clustering is not used only hard clustering is used today.

K-Means
 K-Means is the important flat clustering algorithm.
 Its objective is to minimize the average squared Euclidean distance of
documents from their cluster centers .
 where a cluster center is defined as the mean or centroid.
 The first step of K-Means and our goal is to select seeds- is a initial cluster
centers K randomly selected documents.
 The algorithm then moves the cluster centers around in space to minimize
RSS(residual sum of squares)
 The square distance of each vector from its centroid summed over all the
vectors.
A k-means example of k=2
Hierarchy clustering

 Hierarchical clustering outputs a hierarchy ,

a structure that is more informative than the unstructured set
of clusters returned by flat clustering.
 Hierarchical clustering does not require us to prespecify the
number of clusters and most hierarchical algorithm are
deterministic.
 Hierarchical clustering produce better result than flat
clustering
Hierarchical Agglomerative
clustering
 Hierarchical clustering algorithm are either top-down or bottom-up.

 Bottom-up algorithms treat each document as a singleton clusters at the outset and then successively merge pairs of

clusters until all clusters have been merged into a single cluster that contains all documents.

 Bottom up hierarchical clustering is therefore called hierarchical agglomerative clustering(HAC)

 Agglomerative hierarchical clustering presents four different agglomerative algorithms.

 1.Single link

 2.Complete link

 3.Group-average

 4.Centroid similarity.
Single-link and complete-link clustering

Single link clustering: In single link or single linkage clustering ,the similarity of two
clusters is the similarity of their most similar members

Complete link clustering: In complete link clustering or complete linkage clustering, the
similarity of two clusters is the similarity of their most dissimilar members.
Divisive clustering

 This variant of hierarchical clustering is called top-down

clustering or divisive clustering.
 We start at the top with all documents in one cluster. the
cluster is split using a flat clustering algorithm.
 This procedure is applied recursively until each document is
in its own singleton cluster.
Why cluster documents?

• Better user interface

• Better search results
• Effective “user recall”(relevant
documents retrieved) will be
higher
• Faster search
Some application of clustering
in Information Retrieval

Application What is clustered? Benefit

Search result clustering search more effective information

results presentation to user
Scatter-Gather (subsets of) alternative user interface:
collection “search without typing”
Collection clustering collection effective information
presentation
for exploratory
browsing
Language modeling collection increased precision and/or
recall
Cluster-based retrieval collection higher efficiency: faster
search
Thank you

University Placement Cell Context Level 0 DFD
No ratings yet
University Placement Cell Context Level 0 DFD
7 pages
Unit Iv Multithreading and Generic Programming
No ratings yet
Unit Iv Multithreading and Generic Programming
24 pages
First Level DFD: - Training and Placement Management System
No ratings yet
First Level DFD: - Training and Placement Management System
1 page
Unit 1 Bda Complete Notes
No ratings yet
Unit 1 Bda Complete Notes
15 pages
Minor Project Synopsis
No ratings yet
Minor Project Synopsis
3 pages
Sources and Nature of Data
No ratings yet
Sources and Nature of Data
44 pages
Unit 2 DMW
No ratings yet
Unit 2 DMW
26 pages
Big Data Stream Processing Guide
No ratings yet
Big Data Stream Processing Guide
22 pages
KDD Vs Data Mining
No ratings yet
KDD Vs Data Mining
2 pages
ML Notes (III BCA)
No ratings yet
ML Notes (III BCA)
64 pages
Mobile User Interface Design Essentials
No ratings yet
Mobile User Interface Design Essentials
13 pages
Unit-1 DAA Notes - Daa Unit 1 Note Unit-1 DAA Notes - Daa Unit 1 Note
No ratings yet
Unit-1 DAA Notes - Daa Unit 1 Note Unit-1 DAA Notes - Daa Unit 1 Note
26 pages
More Details On Data Models
No ratings yet
More Details On Data Models
23 pages
Unit 4
No ratings yet
Unit 4
18 pages
Graph Mining Techniques Overview
No ratings yet
Graph Mining Techniques Overview
23 pages
Mca Ai
No ratings yet
Mca Ai
228 pages
Daa Ktu Notes
No ratings yet
Daa Ktu Notes
112 pages
Internet Technology and Web Design Viva Questions: 1.what Is DNS?
No ratings yet
Internet Technology and Web Design Viva Questions: 1.what Is DNS?
7 pages
IWT Unit-1 Notes: Dept. of CSE, PIEMR, Indore Prepared By: Er. Ankit Chopra, Asst. Prof., CSE
No ratings yet
IWT Unit-1 Notes: Dept. of CSE, PIEMR, Indore Prepared By: Er. Ankit Chopra, Asst. Prof., CSE
17 pages
DDL vs DML in SQL Commands
No ratings yet
DDL vs DML in SQL Commands
20 pages
Object-Oriented System Design Guide
No ratings yet
Object-Oriented System Design Guide
51 pages
Understanding Distributed Systems Basics
No ratings yet
Understanding Distributed Systems Basics
31 pages
MCS-220 2024-25 em
No ratings yet
MCS-220 2024-25 em
60 pages
Web Mining Notes
100% (1)
Web Mining Notes
8 pages
Unit 1
No ratings yet
Unit 1
6 pages
Mathematical Logic & Set Theory
No ratings yet
Mathematical Logic & Set Theory
103 pages
Object Model and Collections
No ratings yet
Object Model and Collections
36 pages
Lecture # 8 - Express - .Js
No ratings yet
Lecture # 8 - Express - .Js
35 pages
Problem Solving Using C KCA-102: Introduction To Course
No ratings yet
Problem Solving Using C KCA-102: Introduction To Course
36 pages
Understanding Transaction Management
No ratings yet
Understanding Transaction Management
28 pages
WD Lab Manual
No ratings yet
WD Lab Manual
63 pages
Features of Global Scheduling Algorithm
No ratings yet
Features of Global Scheduling Algorithm
2 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Characteristics of Data Structures
No ratings yet
Characteristics of Data Structures
2 pages
3 Level Architecture
No ratings yet
3 Level Architecture
5 pages
AI Agent Minor Project Report
No ratings yet
AI Agent Minor Project Report
28 pages
DSV Module-3
No ratings yet
DSV Module-3
24 pages
F U-4 PDF
No ratings yet
F U-4 PDF
48 pages
Unit V
No ratings yet
Unit V
67 pages
Project Report Text Editor in Java
100% (1)
Project Report Text Editor in Java
10 pages
Doc
No ratings yet
Doc
2 pages
Computability of Algorithms
No ratings yet
Computability of Algorithms
12 pages
Data Stream Mining Techniques
No ratings yet
Data Stream Mining Techniques
16 pages
DAA Unit-1
No ratings yet
DAA Unit-1
19 pages
Understanding Cryptography Basics
No ratings yet
Understanding Cryptography Basics
68 pages
ML Unit-1
100% (1)
ML Unit-1
32 pages
Agile Technologies 21CS641 Module 1
No ratings yet
Agile Technologies 21CS641 Module 1
19 pages
Quick Sort Algorithm
No ratings yet
Quick Sort Algorithm
6 pages
DWDM Notes - Unit 1
No ratings yet
DWDM Notes - Unit 1
26 pages
Unit I J Line Drawing Algorithm - DDA Algorithm, Unit I K Bresenham's Algorithm
No ratings yet
Unit I J Line Drawing Algorithm - DDA Algorithm, Unit I K Bresenham's Algorithm
22 pages
Chapter02.ppt 1
No ratings yet
Chapter02.ppt 1
33 pages
DBMS Viva Questions
No ratings yet
DBMS Viva Questions
31 pages
Data Transformation in Data Mining
No ratings yet
Data Transformation in Data Mining
6 pages
UNIT 5 - Ost
No ratings yet
UNIT 5 - Ost
15 pages
15-505 Internet Search Technologies: Kamal Nigam
No ratings yet
15-505 Internet Search Technologies: Kamal Nigam
62 pages
K-Means vs Hierarchical Clustering
No ratings yet
K-Means vs Hierarchical Clustering
30 pages
Clustering
No ratings yet
Clustering
52 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Clustering Methods and Algorithms
No ratings yet
Clustering Methods and Algorithms
110 pages
Unsupervised Algorithms Unit3
No ratings yet
Unsupervised Algorithms Unit3
53 pages
Cimic
No ratings yet
Cimic
255 pages
I Built A Side Hustle With AI. Now It Pays Me $800 Every Month - by Raj Monetix ? - Jun, 2025
0% (1)
I Built A Side Hustle With AI. Now It Pays Me $800 Every Month - by Raj Monetix ? - Jun, 2025
7 pages
Grade 10 Science: Chemical Reactions Exam
No ratings yet
Grade 10 Science: Chemical Reactions Exam
5 pages
Construction of Primary Hospital Excel
No ratings yet
Construction of Primary Hospital Excel
58 pages
Quảng Bình 2021 National Exam Prep
No ratings yet
Quảng Bình 2021 National Exam Prep
16 pages
Zatka Machine
No ratings yet
Zatka Machine
6 pages
"Good, Better, Best" How Do I Know Which Progesterone Cream To Buy
100% (2)
"Good, Better, Best" How Do I Know Which Progesterone Cream To Buy
26 pages
Lirik Lagu "Easy On Me" oleh Adele
No ratings yet
Lirik Lagu "Easy On Me" oleh Adele
1 page
1st Assignment - Final
No ratings yet
1st Assignment - Final
13 pages
Finance & HR Expert Seeking New Role
No ratings yet
Finance & HR Expert Seeking New Role
7 pages
Hilove
No ratings yet
Hilove
2 pages
Milk Project
No ratings yet
Milk Project
14 pages
Angelarium Oracle of Emanations
No ratings yet
Angelarium Oracle of Emanations
145 pages
The Amazing Pendulum PDF
93% (15)
The Amazing Pendulum PDF
137 pages
Vignesh Kumar S M
No ratings yet
Vignesh Kumar S M
58 pages
Model Question Paper: Physics - I
No ratings yet
Model Question Paper: Physics - I
2 pages
Karthik June24
No ratings yet
Karthik June24
1 page
PSLE SocialStudies 2013
No ratings yet
PSLE SocialStudies 2013
7 pages
ENTREPRENEURSHIP
No ratings yet
ENTREPRENEURSHIP
2 pages
Waybill-2023-06-21 09 - 33 - 41
No ratings yet
Waybill-2023-06-21 09 - 33 - 41
10 pages
Business Management MCQs and Answers
No ratings yet
Business Management MCQs and Answers
13 pages
Ngai - Cuteness of The Avant-Garde
No ratings yet
Ngai - Cuteness of The Avant-Garde
37 pages
E-Sax Manual July 2011 Rev A
No ratings yet
E-Sax Manual July 2011 Rev A
7 pages
THK
No ratings yet
THK
1,901 pages
Hiatal Hernia
No ratings yet
Hiatal Hernia
9 pages
PH YS IC S: Physics STD 12: Physics MCQ - 3
No ratings yet
PH YS IC S: Physics STD 12: Physics MCQ - 3
18 pages
ТЕМА 3
No ratings yet
ТЕМА 3
4 pages
Siemens Containerized Substation
No ratings yet
Siemens Containerized Substation
10 pages
Digestive Processes in Fish Anatomy
No ratings yet
Digestive Processes in Fish Anatomy
7 pages
Wilson, McCormack Et Al. - Lived Experience of Fetal Alcohol Spectrum Disorder
No ratings yet
Wilson, McCormack Et Al. - Lived Experience of Fetal Alcohol Spectrum Disorder
11 pages

Clustering Techniques in I.R.

Uploaded by

Clustering Techniques in I.R.

Uploaded by

FLAT CLUSTERING &

Another way to classify clustering is:

 Hard clustering: Each documents belong to exactly one

Soft clustering is not used only hard clustering is used today.

 Hierarchical clustering outputs a hierarchy ,

 Bottom up hierarchical clustering is therefore called hierarchical agglomerative clustering(HAC)

 Agglomerative hierarchical clustering presents four different agglomerative algorithms.

 This variant of hierarchical clustering is called top-down

• Better user interface

Application What is clustered? Benefit

Search result clustering search more effective information

You might also like