0% found this document useful (0 votes)

13 views13 pages

Module 2

Uploaded by

omkarchandgaonkar2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views13 pages

Module 2

Uploaded by

omkarchandgaonkar2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

What is the Fuzzy Set Model?

When we search for something (like on Google), we use keywords. But these keywords don’t
always perfectly describe what we want. Similarly, documents may not have exact keywords even
if they are related.
So, the match is not exact — it’s vague or approximate.
The Fuzzy Set Model is used to handle this vagueness.
• Every query term (like a keyword) is treated as a fuzzy set.
• Each document has a degree of membership in that fuzzy set.
• Instead of saying a document is relevant or not (0 or 1), we say it's partially relevant, like
0.6 or 0.9.
Fuzzy Set Theory
• In normal logic (Boolean), something is either in a set or not (0 or 1).
• In fuzzy logic, it can be partially in. For example:
o Membership = 1 → fully in the set (very relevant)
o Membership = 0 → not in the set at all (not relevant)
o Membership = 0.5 → somewhat in the set (partially relevant)

Why Use Fuzzy Sets in IR?

Because relevance is not always black or white. Documents might be:
• Strongly relevant
• Partially relevant
• Slightly related
Fuzzy sets let us measure this degree and rank results more accurately.
Fuzzy Operations
These help combine or modify sets:

Operation Meaning

Complement Opposite of the set (e.g., not relevant documents)

Union Combines multiple sets (e.g., documents relevant to term A or B)

Intersection Common elements (e.g., documents relevant to A and B)

Prof. Harsha Zope

Example
Let’s say you search for:
Query : "healthy snacks"
And we have documents:

Document Relevance (membership)

D1 0.9 (highly relevant)

D2 0.5 (partially relevant)

D3 0.2 (barely relevant)

The fuzzy model uses these partial scores to sort documents better than just saying "yes" or "no".

• It uses degrees of relevance (not just 0 or 1).

• Helps in giving more realistic and flexible search results.
Let me know if you want a diagram or code example for it!

What is Fuzzy Information Retrieval?

Fuzzy Information Retrieval (FIR) is used when exact keyword matching isn't enough. It helps
the system find relevant documents even when:
• A word is misspelled
• A word is close in meaning
• A word has similar spelling

1. Query Expansion:
o The system takes your search query and adds similar or related terms using a
thesaurus or dictionary.
o This helps find more documents that may be useful (even if they don’t exactly
match your keywords).
Example:
You search for "computer", and the system also looks for:
o "compute"
o "compiter"

Prof. Harsha Zope

o "computter"
...even if those were typos or spelling variations.

2. Handling Spelling Mistakes:

o Fuzzy search helps when you type a word wrong.
o It matches words that are similar in spelling and character positions.
Example:
Search for: comptuer
It finds: computer, commuter, compter, etc.

3. Term-Term Correlation Matrix:

o The system uses a special matrix to see how closely related different words are.
o It calculates a score (like a degree of relevance) for each document.

4. Algebraic Operations (instead of Boolean):

o Instead of simple AND/OR logic (Boolean), fuzzy retrieval uses math operations
like sum and product to calculate how relevant a document is.
o This gives a more gradual score — not just "yes" or "no".

5. Trade-off:
o Increases recall → finds more relevant documents.
o Decreases precision → may include some less relevant ones too.

Simple Example:
You search: “computer”
The system will also include documents with:
• "computter" (typo)
• "compute" (related)
• "compiter" (misspelling)
But maybe not include "commuter" (different meaning) unless specified.

Prof. Harsha Zope

Concept Simple Meaning

Fuzzy Search Finds similar words, not just exact ones

Query Expansion Adds related terms to your query

Spelling Match Helps find results even with typos

Term Correlation Measures how words relate

Algebraic Matching Uses math, not just yes/no logic

Recall ↑, Precision ↓ Finds more, but may be less accurate

What is the Extended Boolean Model?

The Extended Boolean Model is an improved version of the traditional Boolean model. It keeps
the AND/OR/NOT logic from Boolean searches, but also adds ranking and term weights, just like
in the Vector Space Model.

Problem with Traditional Boolean Model:

• Only gives yes/no answers: either a document matches or it doesn’t.
• Doesn’t rank results: all matching documents are treated equally.
• Can give too many or too few results.
• No concept of importance of a word.

What Does the Extended Model Add?

1. Partial Matching:
o Even if a document doesn't match all terms, it can still be somewhat relevant and
appear in results.
2. Term Weighting:
o Words are given weights to show how important they are.
o The weight is usually a number between 0 and 1.
o Higher weight = more important in the document or query.
3. Ranking Results:

Prof. Harsha Zope

o The system calculates a score for each document.
o Documents are shown in order from most to least relevant.

Simple Example
Let’s say you search for:
Query: apple AND juice
Boolean Model:
• Only shows documents that have both words.
• No ranking.
Extended Boolean Model:
• Gives higher score to documents that:
o Have both words many times.
o Have "apple" or "juice" as important keywords.
• Returns documents ranked by relevance.

Term Weights in Documents

• A document that mentions “apple” 10 times and “juice” 1 time:
o Weight of “apple” = 0.9
o Weight of “juice” = 0.2
• These weig
•

• hts tell the system how strongly each term is related to the document.

Summary Table:

Feature Boolean Model Extended Boolean Model

Logic AND/OR/NOT AND/OR/NOT + partial match

Ranking No Yes

Term Importance All equal Weighted

Result Type Exact match only Ranked & flexible

Prof. Harsha Zope

Feature Boolean Model Extended Boolean Model

Matching All or nothing Allows "close enough"

• The Extended Boolean Model is a mix of Boolean and Vector Space Models.
• It helps improve flexibility, relevance, and user experience.

Prof. Harsha Zope

What is Structured Text Retrieval?
Structured Text Retrieval is a model that allows searching based on both content and structure
of the document.
Instead of just searching for words, you can also search based on:
• Where the word appears (e.g., title, heading, caption)
• How it is formatted (e.g., italic, bold)
• The section or page it appears in (e.g., inside a figure or a table)

Why Do We Need It?

Sometimes, people remember more than just the keywords.
They remember how or where those words appeared in the document.

Example:
Let’s say a user remembers:
“I saw the words atomic holocaust in italics next to a figure with the word earth in its label.”

Normal search:
Using a basic search like:
plaintext
CopyEdit
"atomic holocaust" AND "earth"
This finds any documents that have both words — but too many results.

Structured Text Retrieval:

You can search with structure, like:
plaintext

Prof. Harsha Zope

CopyEdit
same-page( near("atomic holocaust", Figure(label("earth")) ) )
This means:
“Find a page where atomic holocaust is near a figure whose label has the word earth.”

Now the result is much more accurate!

What Does It Support?

Feature Meaning

Text search Find words or phrases

Structure Specify where it appears (title, figure, table, etc.)

Formatting Italics, bold, headings

Proximity Words near each other or on the same page

Summary

Concept Easy Meaning

Structured Retrieval Search using both words and layout/structure

Classic Search Only matches words, no structure

Better Precision Helps narrow down search to exact format/location

Real-life Use Helpful for users who remember layout, not exact text

2.6.2.1 Model Based on Non-Overlapping Lists

What it means:
• The document is split into parts that do not overlap, like:
o Chapter list
o Section list
o Subsection list
(Each is stored as a separate list)

Prof. Harsha Zope

Example:
• A book has:
o Chapter 1, Chapter 2, ...
o Section 1.1, Section 1.2, ...
o Subsection 1.1.1, 1.1.2, ...
Each list (chapters, sections, etc.) is stored separately, and within each list, the parts don’t
overlap.

How it works:
• An index is built so that you can search for:
o Which chapter contains the word “virus”
o Which section does not contain another subsection
o Which paragraph stands alone (not inside a section)
Key Points:

Feature Explanation

Non-overlapping Text regions in the same list don't overlap

Separate lists Chapters, sections, etc. are stored independently

Simple queries Search within or outside certain parts

2.6.2.2 Model Based on Proximal Nodes

What it means:
• The document is structured into hierarchies (like trees):
o Chapter → Section → Paragraph → Line
• These are called nodes.
• Each node points to a part of the text.
Example:
You can define two different hierarchies:
• One based on chapters/sections
• Another based on pages/paragraphs
How queries work:

Prof. Harsha Zope

• If a user query refers to different hierarchies, the answer is taken only from one
hierarchy.
• For example:

o You can search “paragraphs inside sections”

o But not “lines from both pages and sections”

This rule helps make the search faster (but with less flexibility).

Feature Explanation

Hierarchical Chapters → Sections → Paragraphs, etc.

Nodes Each part is a node that covers some text

One hierarchy per query Results come from a single structure for speed

Final Summary Table:

Model Simple Meaning Key Feature

Non-Overlapping
Document is split into flat, separate parts No overlapping in the same list
Lists

Document is structured like a tree (chapter Queries return results from one
Proximal Nodes
→ section → paragraph) hierarchy only

Prof. Harsha Zope

Top Part: Hierarchical Structure
• The document is broken into 4 levels:
1. Chapter
2. Sections
3. Subsections
4. Subsubsections
These are connected like a tree — each chapter has sections, each section has subsections, and so
on.

Bottom Part: Inverted List for the Word ‘holocaust’

• The word ‘holocaust’ is stored in an inverted list.
• It points to all the places (positions) in the document where the word appears.

Prof. Harsha Zope

Example:
nginx
CopyEdit
holocaust → 10 → 256 → ... → 48,324
This means:
• The word 'holocaust' appears at position 10, then at 256, and so on — all through the
document.

How Searching Works (Query Example):

Suppose we ask:
Find all sections, subsections, or subsubsections that contain the word 'holocaust'.

The system does this in 2 steps:

1. Find the word in the inverted list (like position 10, 256, etc.).
2. Check the hierarchy to see which section, subsection, or subsubsection that position
belongs to.

Query Language Features:

You can:
• Search for words using regular expressions
• Search by structure (e.g., “section” or “subsection”)
• Combine both (e.g., section that contains the word "holocaust")

Summary:

Concept Easy Meaning

Hierarchy Document is split like a tree: chapter → section → subsection...

Inverted List Fast lookup: shows where a word appears

Query You can search by word and structure

Efficient Fast because it first finds the word, then checks where it is in the structure

Prof. Harsha Zope

IRS 2nd Chap
No ratings yet
IRS 2nd Chap
42 pages
Mid1 Irs Ans
No ratings yet
Mid1 Irs Ans
13 pages
Information Retrieval Lecture Overview
No ratings yet
Information Retrieval Lecture Overview
6 pages
IR Unit II
No ratings yet
IR Unit II
4 pages
4 IRModels
No ratings yet
4 IRModels
46 pages
Module 1-1
No ratings yet
Module 1-1
12 pages
Unit2 ISR
No ratings yet
Unit2 ISR
12 pages
Intro to Information Retrieval
No ratings yet
Intro to Information Retrieval
47 pages
Information Retrieval Models Guide
No ratings yet
Information Retrieval Models Guide
15 pages
LIBS 894 Assignment Three Classic Models
No ratings yet
LIBS 894 Assignment Three Classic Models
8 pages
NLP - Module 5
No ratings yet
NLP - Module 5
58 pages
IR Chapter 4
No ratings yet
IR Chapter 4
15 pages
Lecture 5
No ratings yet
Lecture 5
75 pages
Information Retrieval & MapReduce
No ratings yet
Information Retrieval & MapReduce
72 pages
What Is Information Retrieval (IR)
No ratings yet
What Is Information Retrieval (IR)
17 pages
Introduction to IR Models
No ratings yet
Introduction to IR Models
46 pages
L03
No ratings yet
L03
16 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
Information Retrieval Models
No ratings yet
Information Retrieval Models
113 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
67 pages
NLP Unit-Ii (Part-I)
No ratings yet
NLP Unit-Ii (Part-I)
19 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
31 pages
Unit 2
No ratings yet
Unit 2
58 pages
Module 7
No ratings yet
Module 7
53 pages
Ir4 Retrieval Models - 6up
No ratings yet
Ir4 Retrieval Models - 6up
7 pages
ISE Information Retrieval Mod-V (Uploaded by Snaptricks - In)
No ratings yet
ISE Information Retrieval Mod-V (Uploaded by Snaptricks - In)
48 pages
Unit 2 Irt
No ratings yet
Unit 2 Irt
33 pages
4 IRModels
No ratings yet
4 IRModels
32 pages
Information Retrieval Models Guide
No ratings yet
Information Retrieval Models Guide
54 pages
NLP See
No ratings yet
NLP See
27 pages
Search Engine Evaluation Guide
No ratings yet
Search Engine Evaluation Guide
48 pages
Information Retrieval System and The Pagerank Algorithm
No ratings yet
Information Retrieval System and The Pagerank Algorithm
37 pages
Supervisionguide15 16 Students
No ratings yet
Supervisionguide15 16 Students
18 pages
Information Retrievalpdf
No ratings yet
Information Retrievalpdf
7 pages
Unit-5 Adt
No ratings yet
Unit-5 Adt
11 pages
Informaiton Retrieval and Web Search
No ratings yet
Informaiton Retrieval and Web Search
44 pages
Probabilistic IR & Query Expansion
No ratings yet
Probabilistic IR & Query Expansion
37 pages
Ir Mod2 Notes
No ratings yet
Ir Mod2 Notes
26 pages
Boolean Retrieval Model
No ratings yet
Boolean Retrieval Model
5 pages
Bulu
No ratings yet
Bulu
47 pages
Unit II
No ratings yet
Unit II
73 pages
IR Models for Students
No ratings yet
IR Models for Students
62 pages
II. Information Retrieval (Basics Cont.) : Web Search - Summer Term 2006
No ratings yet
II. Information Retrieval (Basics Cont.) : Web Search - Summer Term 2006
16 pages
Unit - II
100% (1)
Unit - II
5 pages
Supervisionguide16 17 Students
No ratings yet
Supervisionguide16 17 Students
17 pages
Fuzzy Proximity Searchs
No ratings yet
Fuzzy Proximity Searchs
9 pages
IRSNOTES2
No ratings yet
IRSNOTES2
4 pages
IR Systems Usually Adopt Index Terms To Process Queries Index Term
No ratings yet
IR Systems Usually Adopt Index Terms To Process Queries Index Term
24 pages
Advanced Database Tech: IR & Web Search
No ratings yet
Advanced Database Tech: IR & Web Search
21 pages
NLP See
No ratings yet
NLP See
9 pages
Traditional IR Models Overview
No ratings yet
Traditional IR Models Overview
65 pages
IR Presentation 1
No ratings yet
IR Presentation 1
41 pages
Comprehensive Guide to Information Retrieval
No ratings yet
Comprehensive Guide to Information Retrieval
74 pages
Information Retrieval System MODULE 2 Mumbai University
No ratings yet
Information Retrieval System MODULE 2 Mumbai University
23 pages
16 My Father Goes To Court by Carlos Bulosan
No ratings yet
16 My Father Goes To Court by Carlos Bulosan
6 pages
101 Omwally Ri Urnbull : Bread Give, Email: 1 Ecause Docto
No ratings yet
101 Omwally Ri Urnbull : Bread Give, Email: 1 Ecause Docto
148 pages
The Kite Runner Schedule 2016
No ratings yet
The Kite Runner Schedule 2016
2 pages
Covenant Theology
No ratings yet
Covenant Theology
5 pages
Creating Effective Upwork Portfolios
No ratings yet
Creating Effective Upwork Portfolios
1 page
Homology: Understanding Mathematical Holes
No ratings yet
Homology: Understanding Mathematical Holes
3 pages
Python Lab Manual 02
No ratings yet
Python Lab Manual 02
43 pages
Fortran 90 Tutorial Overview
No ratings yet
Fortran 90 Tutorial Overview
28 pages
Dream Format
No ratings yet
Dream Format
2 pages
Reading - Anna's Big Day
No ratings yet
Reading - Anna's Big Day
3 pages
First-Conditional-Activity 6 7 8
No ratings yet
First-Conditional-Activity 6 7 8
2 pages
Skripsi PDF
No ratings yet
Skripsi PDF
65 pages
CPU Scheduling in Operating Systems
No ratings yet
CPU Scheduling in Operating Systems
3 pages
Class11th, P&C, B T, Sequence&Series, Worksheet, TSC, PT ddusDV, Aug, 22nd, 2025
No ratings yet
Class11th, P&C, B T, Sequence&Series, Worksheet, TSC, PT ddusDV, Aug, 22nd, 2025
2 pages
1 Linda Hutcheon "The Postmodern Problematizing of History PDF
100% (2)
1 Linda Hutcheon "The Postmodern Problematizing of History PDF
18 pages
Mario Quintana
No ratings yet
Mario Quintana
21 pages
New Century Maths Year 9 5.2 Teaching Program
No ratings yet
New Century Maths Year 9 5.2 Teaching Program
30 pages
Scheme of English Senior 4
No ratings yet
Scheme of English Senior 4
10 pages
1 ListIV Compilation
No ratings yet
1 ListIV Compilation
1,651 pages
Kotlin Beginners Notes
No ratings yet
Kotlin Beginners Notes
101 pages
Software Engineering Exam 2009
No ratings yet
Software Engineering Exam 2009
9 pages
A2.Elementary - Unit7 FINAL
100% (2)
A2.Elementary - Unit7 FINAL
22 pages
UNIT - 10 - Extra - Grammar - Exercises ENGLISH I
100% (2)
UNIT - 10 - Extra - Grammar - Exercises ENGLISH I
4 pages
Network Enumeration Techniques Overview
No ratings yet
Network Enumeration Techniques Overview
76 pages
Mastering English Introductions
No ratings yet
Mastering English Introductions
13 pages
Education Philosophy Exploration
No ratings yet
Education Philosophy Exploration
3 pages
Orthographic Projections Guide
No ratings yet
Orthographic Projections Guide
23 pages
World Wide Web
No ratings yet
World Wide Web
23 pages
1 вариант
No ratings yet
1 вариант
4 pages
Test 2 Series C Key Sheet
No ratings yet
Test 2 Series C Key Sheet
5 pages