Advanced IR Clustering Techniques

Uploaded by

rm23082001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views10 pages

Advanced IR Clustering Techniques

Uploaded by

rm23082001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Classification

Methods & Cluster

Hypothesis
Information Retrieval CC4151
Classification Methods

 In the context of information retrieval, a classification is required for a purpose.

 The purpose may be to group the documents in such a way that retrieval will be faster or
alternatively it may be to construct a thesaurus automatically.
 There are two main areas of application of classification methods in IR:
(1) keyword clustering;
(2) document clustering.
Clustering and Cluster Hypothesis

 Clustering is used in information retrieval systems to

enhance the efficiency and effectiveness of the retrieval
process. Clustering is achieved by partitioning the documents
in a collection into classes such that documents that are
associated with each other are assigned to the same cluster.
 In information retrieval, the cluster hypothesis is an
assumption about the nature of the data handled in those
fields, which takes various forms. In information retrieval, it
states that documents that are clustered together "behave
similarly with respect to relevance to information needs".
Applications of Clustering
What is Benefit
Application
clustered?
search results more effective information
presentation to user
Search result clustering

(subsets of) alternative user interface: ``search

collection without typing''
Scatter-Gather

collection effective information presentation for

exploratory browsing
Collection clustering

collection increased precision and/or recall

Language modeling

collection higher efficiency: faster search

Cluster-based retrieval
Search Result Clustering
 Search results we mean the documents that were returned in
response to a query.
 The default presentation of search results in information retrieval is
a simple list.
 Users scan the list from top to bottom until they have found the
information they are looking for. Instead, search result clustering
clusters the search results, so that similar documents appear
together.
 It is often easier to scan a few coherent groups than many individual
documents.
 This is particularly useful if a search term has different word senses.
Scatter-Gather

 Scatter-Gather clusters the whole collection to get groups of documents that the user can
select or gather.
 The selected groups are merged and the resulting set is again clustered. This process is
repeated until a cluster of interest is found.
 Example: A collection of New York Times news stories is clustered (``scattered'') into eight
clusters (top row). The user manually gathers three of these into a smaller collection
International Stories and performs another scattering operation. This process repeats until a
small cluster with relevant documents is found (e.g., Trinidad)
Collection clustering

 Clustered collections store documents ordered by the clustered index key value,.
 clustered collections have the following benefits compared to non-clustered collections:
• Faster queries on clustered collections without needing a secondary index, such as queries
with range scans and equality comparisons on the clustered index key.
• Clustered collections have a lower storage size, which improves performance for queries
and bulk inserts.
• Clustered collections have additional performance improvements for inserts, updates,
deletes, and queries.
Language Modelling

 A common suggestion to users for coming up with good queries is

to think of words that would likely appear in a relevant document,
and to use those words as the query. The language modelling
approach to IR directly models that idea: a document is a good
match to a query if the document model is likely to generate the
query, which will in turn happen if the document contains the query
words often. This approach thus provides a different realization of
some of the basic ideas for document ranking.
Example: Finite Automata
Cluster-based

 Cluster-based information retrieval is one of the Information retrieval(IR) tools

that organize, extract features and categorize the web documents according
to their similarity.

An Efficient and Empirical Model of Distributed Clustering
No ratings yet
An Efficient and Empirical Model of Distributed Clustering
5 pages
Introduction to Information Retrieval
No ratings yet
Introduction to Information Retrieval
108 pages
Clustering in Information Retrieval
No ratings yet
Clustering in Information Retrieval
73 pages
Information Retrieval Systems Slip Test 2
No ratings yet
Information Retrieval Systems Slip Test 2
10 pages
Clustering Web Search Results: Iwona Białynicka-Birula
No ratings yet
Clustering Web Search Results: Iwona Białynicka-Birula
25 pages
Efficient Clustering Approaches For Organizing Document Collection
No ratings yet
Efficient Clustering Approaches For Organizing Document Collection
29 pages
Clustering Techniques in I.R.
No ratings yet
Clustering Techniques in I.R.
13 pages
SCHISM-A Web Search Engine Using Semantic Taxonomy: Ramesh Singh, Dhruv Dhingra, and Aman Arora
No ratings yet
SCHISM-A Web Search Engine Using Semantic Taxonomy: Ramesh Singh, Dhruv Dhingra, and Aman Arora
5 pages
Unit 1
No ratings yet
Unit 1
108 pages
6 Text Clustering
No ratings yet
6 Text Clustering
66 pages
Flat Clustering in Information Retrieval
No ratings yet
Flat Clustering in Information Retrieval
88 pages
IRT Unit 5
No ratings yet
IRT Unit 5
31 pages
Grouping and Joining 0
No ratings yet
Grouping and Joining 0
41 pages
Irs Cie-II Notes
No ratings yet
Irs Cie-II Notes
30 pages
IR Chapt 5
No ratings yet
IR Chapt 5
55 pages
Chapter 1
No ratings yet
Chapter 1
52 pages
IR Lec 36
No ratings yet
IR Lec 36
29 pages
Lecture 17 Clustering
No ratings yet
Lecture 17 Clustering
63 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
7 B - Query Languages
No ratings yet
7 B - Query Languages
33 pages
Wi Ese Notes
No ratings yet
Wi Ese Notes
66 pages
A New Hierarchical Document Clustering Method: Gang Kou Yi Peng
No ratings yet
A New Hierarchical Document Clustering Method: Gang Kou Yi Peng
4 pages
Google'S Pagerank and Beyond:: The Science of Search Engine Rankings
No ratings yet
Google'S Pagerank and Beyond:: The Science of Search Engine Rankings
158 pages
Grouper A Dynamic Cluster Interface To Web Search Results
No ratings yet
Grouper A Dynamic Cluster Interface To Web Search Results
15 pages
Metasearch Clustering Algorithm
No ratings yet
Metasearch Clustering Algorithm
7 pages
Ir 103 131
No ratings yet
Ir 103 131
29 pages
Clustering in Irs PDF
No ratings yet
Clustering in Irs PDF
8 pages
Understanding Document Clustering in IR
No ratings yet
Understanding Document Clustering in IR
34 pages
International Journal of Engineering Research and Development
No ratings yet
International Journal of Engineering Research and Development
8 pages
Unt3 PPTX Digital Marketing
No ratings yet
Unt3 PPTX Digital Marketing
17 pages
Clustering in Information Retrieval
No ratings yet
Clustering in Information Retrieval
50 pages
Lecture17 IR
No ratings yet
Lecture17 IR
28 pages
Data Mining Ii Sol
No ratings yet
Data Mining Ii Sol
106 pages
Unit I
No ratings yet
Unit I
33 pages
NLP Unit-Ii (Part-I)
No ratings yet
NLP Unit-Ii (Part-I)
19 pages
Information Storage And: Retrieval Techniques
No ratings yet
Information Storage And: Retrieval Techniques
56 pages
Unit I
No ratings yet
Unit I
11 pages
Part B
No ratings yet
Part B
12 pages
IRS Unit 4 by Krishna
No ratings yet
IRS Unit 4 by Krishna
23 pages
2 Introduction To Information Retrieval
No ratings yet
2 Introduction To Information Retrieval
38 pages
Information Retrievalpdf
No ratings yet
Information Retrievalpdf
7 pages
AICS Unit I
No ratings yet
AICS Unit I
4 pages
Unit II
No ratings yet
Unit II
73 pages
Chapter Four: IR Models (Part-I)
No ratings yet
Chapter Four: IR Models (Part-I)
32 pages
Probabilistic IR & Query Expansion
No ratings yet
Probabilistic IR & Query Expansion
37 pages
Clustering and Search Techniques in Information Retrieval Systems
67% (3)
Clustering and Search Techniques in Information Retrieval Systems
39 pages
Information Retrieval System MODULE 3 Mumbai University
No ratings yet
Information Retrieval System MODULE 3 Mumbai University
27 pages
CompletedUNIT 1 PPT 10.7.17
100% (6)
CompletedUNIT 1 PPT 10.7.17
87 pages
International Journal of Computing: Comprehensive Document Clustering For Information Retrieval On Web
No ratings yet
International Journal of Computing: Comprehensive Document Clustering For Information Retrieval On Web
7 pages
Introduction to Information Retrieval
No ratings yet
Introduction to Information Retrieval
12 pages
Unit 1: Introduction and Data Pre-Processing
No ratings yet
Unit 1: Introduction and Data Pre-Processing
71 pages
Intro to Info Retrieval Course
No ratings yet
Intro to Info Retrieval Course
31 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
1 IR Introductionn
No ratings yet
1 IR Introductionn
30 pages
IR Notes
No ratings yet
IR Notes
14 pages
Bs 31267274
No ratings yet
Bs 31267274
8 pages
Web Mining UNIT-II Chapter-01 - 02 - 03
No ratings yet
Web Mining UNIT-II Chapter-01 - 02 - 03
19 pages
Module 1print
No ratings yet
Module 1print
5 pages
Irs Unit-4 Modified
No ratings yet
Irs Unit-4 Modified
13 pages
Rtiodisha - Gov.in Pages PrintAllManual Office Id 4442 Lang
No ratings yet
Rtiodisha - Gov.in Pages PrintAllManual Office Id 4442 Lang
13 pages
Human Resource Development
No ratings yet
Human Resource Development
44 pages
Rubber Moulded Coir Matting Report
No ratings yet
Rubber Moulded Coir Matting Report
19 pages
2021 Final Social Welfare Project Development and Management
100% (3)
2021 Final Social Welfare Project Development and Management
111 pages
Ra. 7079
No ratings yet
Ra. 7079
3 pages
Kathrein 80010681
100% (1)
Kathrein 80010681
2 pages
Naijing Company Legal Victory
No ratings yet
Naijing Company Legal Victory
1 page
Legal Aid Eligibility Confirmation
No ratings yet
Legal Aid Eligibility Confirmation
3 pages
Invoice
No ratings yet
Invoice
2 pages
2025 Donovan Marine Master Catalog
No ratings yet
2025 Donovan Marine Master Catalog
1,556 pages
Discharge Nozzle 360
No ratings yet
Discharge Nozzle 360
2 pages
BIXOLON JavaPOS Driver Guide
No ratings yet
BIXOLON JavaPOS Driver Guide
26 pages
Operator Terminal Guide ''EXTER''
No ratings yet
Operator Terminal Guide ''EXTER''
54 pages
Guerreroetal 2021
No ratings yet
Guerreroetal 2021
10 pages
AI Solutions for Indian Farmers
No ratings yet
AI Solutions for Indian Farmers
7 pages
HRM-4211 Chapter 1
No ratings yet
HRM-4211 Chapter 1
14 pages
Hose Test Method Statement
100% (1)
Hose Test Method Statement
6 pages
E-Toolkit For Int'l SLead Congress 2020
No ratings yet
E-Toolkit For Int'l SLead Congress 2020
28 pages
NMR Workshop for Postgrads & PhDs
No ratings yet
NMR Workshop for Postgrads & PhDs
2 pages
2025 Dry Season Farmer Receipt Form
No ratings yet
2025 Dry Season Farmer Receipt Form
5 pages
N1 A
No ratings yet
N1 A
3 pages
MSDS AmmoniumThiosulfate
No ratings yet
MSDS AmmoniumThiosulfate
6 pages
Levi-Civita Symbol in Tensor Analysis
No ratings yet
Levi-Civita Symbol in Tensor Analysis
2 pages
Operating Systems Question Bank For Two
No ratings yet
Operating Systems Question Bank For Two
15 pages
Revised FSA Guidelines for IMO Rule-Making
No ratings yet
Revised FSA Guidelines for IMO Rule-Making
71 pages
ALKON®SOL CAST HT A Complete Refractory Solution For Melting Furnaces
No ratings yet
ALKON®SOL CAST HT A Complete Refractory Solution For Melting Furnaces
6 pages
Excel Conditional Formatting Introduction
No ratings yet
Excel Conditional Formatting Introduction
11 pages
Tort Law Test Tort Practice Questions MCQ
No ratings yet
Tort Law Test Tort Practice Questions MCQ
5 pages
Syllabus SP2024ACCT116ACRN54477 MW DISKIN 16WKS
No ratings yet
Syllabus SP2024ACCT116ACRN54477 MW DISKIN 16WKS
11 pages
BioBizz Mephisto Feeding Schedule With Totals
No ratings yet
BioBizz Mephisto Feeding Schedule With Totals
2 pages

Advanced IR Clustering Techniques

Uploaded by

Advanced IR Clustering Techniques

Uploaded by

Classification

Methods & Cluster

 In the context of information retrieval, a classification is required for a purpose.

 Clustering is used in information retrieval systems to

(subsets of) alternative user interface: ``search

collection effective information presentation for

collection increased precision and/or recall

collection higher efficiency: faster search

 A common suggestion to users for coming up with good queries is

 Cluster-based information retrieval is one of the Information retrieval(IR) tools

You might also like