0% found this document useful (0 votes)

9 views34 pages

Unit2 Bool Vect Example ST

Chapter 2 discusses various information retrieval models, emphasizing the use of index terms for document indexing and retrieval. It outlines classic models such as Boolean and Vector models, detailing their mechanisms and advantages in ranking documents based on user queries. The chapter also highlights the challenges of translating information needs into Boolean expressions and the limitations of the Vector model regarding index term independence.

Uploaded by

kukdejagaurav2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views34 pages

Unit2 Bool Vect Example ST

Uploaded by

kukdejagaurav2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter 2 Modeling

Introduction
.
 Traditional information retrieval systems
usually adopt index terms to index and
retrieve documents.
 An index term is a keyword(or group of
related
words) which has some meaning of its own
(usually a noun).
The advantage of using
index terms
Simple
The semantic of the documents and of
the user information need can be
naturally expressed through sets of
index terms.

 Ranking algorithms are at the core of information

retrieval systems(predicting which documents are
relevant and which are not).
A taxonomy of information
retrieval models
Classic Models Set Theoretic
Boolean Fuzzy
Vector Extended Boolean
U Probabilistic
S Retrieval:
E Ad hoc
R Algebraic
Filtering Structured Models Generalized Vector
T Lat. Semantic Index
A Non-overlapping lists
Neural Networks
S Browsing Proximal Nodes
K

Browsing Probabilistic
Flat Inference Network
Structured Guided Belief Network
Hypertext
Index Terms Full Text Full Text+
Structure
Retrieval Classic Classic Structured
Set Set
Theoretic Theoretic
Algebraic Algebraic
Probabilistic Probabilisti
c
Browsing Flat Flat Structure
Hypertext Guided
Hypertext

Figure 2.2 Retrieval models most frequently associated with distinct

combinations of a document logical view and a user task.
Retrieval : Ad hoc and Filtering
Ad hoc : The documents in the
collection remain relatively static while
new queries are submtted to the
system.

Filtering : The queries remain

relatively static while new documents
come into the system
Filtering
Typically, the filtering task simply
indicates to the user the
documents which might be of
interest to him.

Routing : Rank the filtering

documents and show this ranking
to the user.

Constructing user profiles in two

ways.
models
D : A set composed of logical
views(or representation) for the
documents in the collection.
Q : A set composed of logical
views(or representation) for the user
information needs(queries).
F : A framework for modeling
document representations, queries,
and their relationships.
R(qi, dj) : A ranking function which
defines an ordering among the
documents with regard to the query.
Classic information retrieval
model
Basic concepts : Each document is
described by a set of
representative keywords called
index terms.

Assign a numerical weights to

distinct relevance between index
terms.
Define
ki : A generic index term
K : The set of all index terms {k1,…,kt}
wi,j : A weight associated with index term
ki of a document dj

gi : A function returns the weight

associated
with ki in any t-dimensoinal
vector( gi(dj)=wi,j )
Boolean model
The Boolean Model is a simple retrieval
model based on set theory and Boolean
algebra.
Based on a binary decision criterion
without any notion of a grading scale.
Boolean expressions have precise
semantics. It is not simple to translate an
information need into a Boolean
expression.
Can be represented as a disjunction of
conjunction vectors(in disjunctive normal
form-DNF).
Model

Retrieval strategy is based on a

binary decision without any notion
of a grading scale .
Boolean model in reality is much
more data retrival model.
Not easy to translate an
information need into a Boolean
expression
Boolean Model
The Boolean model considers that
index terms are present or absent
in a document.
The index term weights are
present or absent(1 or 0) in a
document.
A query q is composed of index
terms linked by three connectives.
not, and ,or.
Boolean Model
A query is essentially a
conventional Boolean expression
which can be represented as
disjunction of conjunctive
vectors(DNF)
Eg. q=Ka ˄ (Kb ˅ ̚ Kc )
Boolean Model
qdnf = (1 1 1) ˅ (1 1 0) ˅ (1 0 0) where each of the

components is a binary weighted vector associated with the

Ka, Kb , Kc)
tuple (

These binary weighted vectors are called the conjunctive

components( qcc) of qdnf.

Boolean Model
Vector model

Assign non-binary weights to index

terms in queries and in documents.

Compute the similarity between

documents and query.

More precise than Boolean model.

Problem
We think of the documents as a
collection C of objects and think of the
user query as a specification of a set A
of objects. In this scenario, the IR
problem can be reduced to the
problem of determine which
documents are in the set A and which
ones are not(i.e., the IR problem can
be viewed as a clustering problem).
Intra-cluster : One needs to
determine what are the features
which better describe the objects in
the set A.

Inter-cluster : One needs to

determine what are the features
which better distinguish the objects
in the set A.
tf : intra-clustering similarity is
quantified by measuring the raw
frequency of a term ki inside a
document dj, such term frequency is
usually referred to as the tf factor and
provides one measure of how well that
term describes the document contents.
(intra-document characterization)

idf : inter-clustering dissimilarity is

quantified by measuring the inverse of
the frequency of a term ki among the
documents in the collection.This
model
Let n be the total number of
documents
Let ni be the number of documents
in which the index term Ki
appears.
Let freqi,j be the raw frequency of
term Ki in the document dj . (The
number of times the term ki is
mentioned in the text of the
document dj) .
Then the normalized frequency fi,j
Vecors
Vectors
Text Collection
Mount Everest is Earth’s highest mountain above sea
level, located in the sub-range of the Himalaya. Mount
Everest attracts many climbers, some of them highly
experienced mountaineers.

Kalsubai is a mountain in the western Ghats,located in

the Indian State. The mountain range lies within the
Kalsubai Harishchandraged Wildlife Sanctuary.

Mount Fuji is a very distinctive feature of the

geography of Japan. The mountain stands about 100
km.
Mount Everest Earth Mountai Kalsuba Fuji
n i
ni 2 1 1 3 1 1
Freq
Freq Mount Everest Eart Mountain Kalsuba Fuji Maxi
i,j h i
Freq i,1 2 2 1 1 0 0 2

Freq i,2 0 0 0 2 2 0 2

Freq i,3 1 0 0 1 0 1 1

Normalized Frequency
Freq i,j Mount Everest Earth Mountain Kalsubai Fuji
Freq i,1 1 1 0.5 0.5 0 0
Freq i,2 0 0 0 1 1 0

Freq i,3 1 0 0 1 0 1
Vector Model
Mount Everest Earth Mounta Kalsub Fuji
in ai
ni 2 1 1 3 1 1

Mount Everest Earth Mounta Kalsub Fuji

in ai
idfi 0.176 0.4 0.4 0 0.4 0.4
Normalized Frequency
Freq i,j Mount Everest Earth Mountain Kalsubai Fuji
Freq i,1 1 1 0.5 0.5 0 0
Freq i,2 0 0 0 1 1 0

Freq i,3 1 0 0 1 0 1

Mount Everest Earth Mounta Kalsub Fuji

in ai
idfi 0.176 0.4 0.4 0 0.4 0.4

w i,j Mount Everest Earth Mountain Kalsubai Fuji

Wi,1 0.176 0.4 0.2 0 0 0
W i,2 0 0 0 0 0.4 0

W i,3 0.176 0 0 0 0 0.4

Q - Mount Kalsubai
Freq Mount Everest Eart Mountain Kalsuba Fuji Maxi
i,j h i
Freq i,q 1 0 0 0 1 0 1

Norm 1 0 0 0 1 0
Freq i,q

Mount Everest Earth Mounta Kalsub Fuji

in ai
idfi 0.176 0.4 0.4 0 0.4 0.4

Mount Everest Earth Mounta Kalsub Fuji

in ai
Wi,q 0.176 0 0 0 0.4 0
Vector Model Similarity
Similarity and Ranking

Ranking of Documents will be d2,d3,d1

Model
Its term-weighting scheme
improves retrieval performance

Its partial matching strategy allows

retrieval of documents that
approximate the query conditions

Its cosine ranking formula sorts the

documents according to their
degree of similarity to the query
Disadvantage of Vector
Model
Index terms are assumed to be
mutually independent. It doesn’t
account for index term
dependencies.

Comprehensive Guide to IR Models
100% (3)
Comprehensive Guide to IR Models
58 pages
IR Models for Tech Students
No ratings yet
IR Models for Tech Students
24 pages
IR Systems Usually Adopt Index Terms To Process Queries Index Term
No ratings yet
IR Systems Usually Adopt Index Terms To Process Queries Index Term
24 pages
Chapter 2 Modeling: Modern Information Retrieval by R. Baeza-Yates and B. Ribeir
No ratings yet
Chapter 2 Modeling: Modern Information Retrieval by R. Baeza-Yates and B. Ribeir
47 pages
02 Chap02a-BooleanAndvector Models
No ratings yet
02 Chap02a-BooleanAndvector Models
30 pages
3 Retrieval Models
No ratings yet
3 Retrieval Models
87 pages
Web Search
No ratings yet
Web Search
30 pages
Unit 2
No ratings yet
Unit 2
58 pages
Irt-23 Unit 2
No ratings yet
Irt-23 Unit 2
10 pages
Module 3 Indexing Part A
No ratings yet
Module 3 Indexing Part A
46 pages
Unit 2 - Modern Information Retrieval - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Modern Information Retrieval - WWW - Rgpvnotes.in
8 pages
IR Unit-2 - Updated
No ratings yet
IR Unit-2 - Updated
125 pages
4 IRModels
No ratings yet
4 IRModels
46 pages
Information Retrieval Models Guide
No ratings yet
Information Retrieval Models Guide
54 pages
NLP See
No ratings yet
NLP See
27 pages
CS8080 INFORMATION RETRIEVAL TECHNIQUES II INTERNAL EXAMINATION - Google Forms
No ratings yet
CS8080 INFORMATION RETRIEVAL TECHNIQUES II INTERNAL EXAMINATION - Google Forms
420 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
33 pages
IR Models for Information Retrieval
No ratings yet
IR Models for Information Retrieval
51 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
Traditional IR Models Overview
No ratings yet
Traditional IR Models Overview
65 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
67 pages
Understanding IR Models and Ranking
No ratings yet
Understanding IR Models and Ranking
43 pages
4 IRModels
No ratings yet
4 IRModels
32 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
31 pages
Ir Mod2 Notes
No ratings yet
Ir Mod2 Notes
26 pages
Understanding Information Retrieval Models
No ratings yet
Understanding Information Retrieval Models
46 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
27 pages
Information Retrieval System and The Pagerank Algorithm
No ratings yet
Information Retrieval System and The Pagerank Algorithm
37 pages
Good Irmodeling
No ratings yet
Good Irmodeling
263 pages
Information Retrieval Models
No ratings yet
Information Retrieval Models
113 pages
IR Models for Researchers
No ratings yet
IR Models for Researchers
512 pages
Modern Information Retrieval: Modeling
No ratings yet
Modern Information Retrieval: Modeling
263 pages
Modern Information Retrieval: Modeling
No ratings yet
Modern Information Retrieval: Modeling
197 pages
Understanding Information Retrieval Models
No ratings yet
Understanding Information Retrieval Models
30 pages
Overview of Information Retrieval Models
100% (1)
Overview of Information Retrieval Models
32 pages
Introduction to IR Models and Techniques
100% (1)
Introduction to IR Models and Techniques
32 pages
Advanced Database Tech: IR & Web Search
No ratings yet
Advanced Database Tech: IR & Web Search
21 pages
Overview of Information Retrieval Models
100% (1)
Overview of Information Retrieval Models
26 pages
Understanding Information Retrieval Models
No ratings yet
Understanding Information Retrieval Models
28 pages
L02-IR Models MMN
No ratings yet
L02-IR Models MMN
27 pages
L03
No ratings yet
L03
16 pages
Understanding IR Models and Ranking
No ratings yet
Understanding IR Models and Ranking
25 pages
Advanced Information Retrieval Models
No ratings yet
Advanced Information Retrieval Models
261 pages
4-IR Models
No ratings yet
4-IR Models
33 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
4 IRModels
No ratings yet
4 IRModels
30 pages
IR Models for Students
No ratings yet
IR Models for Students
62 pages
Introduction to IR Models
No ratings yet
Introduction to IR Models
22 pages
4-IR Models
No ratings yet
4-IR Models
33 pages
Lecture 5
No ratings yet
Lecture 5
75 pages
Shazia12, Journal Manager, 87
No ratings yet
Shazia12, Journal Manager, 87
8 pages
Chapter 4 IR Models
No ratings yet
Chapter 4 IR Models
34 pages
Lecture 5 - Scoring, Term Weighting, Vector Space Model - Part 1
No ratings yet
Lecture 5 - Scoring, Term Weighting, Vector Space Model - Part 1
45 pages
Unit 2
No ratings yet
Unit 2
13 pages
Chapter 2: Modeling: Advanced Topics in Information Retrieval
No ratings yet
Chapter 2: Modeling: Advanced Topics in Information Retrieval
28 pages
Module 2-1
No ratings yet
Module 2-1
6 pages
Information Retrieval Models Overview
No ratings yet
Information Retrieval Models Overview
21 pages
NLP - Module 5
No ratings yet
NLP - Module 5
58 pages
NLP Week10 IR Enc Dec
No ratings yet
NLP Week10 IR Enc Dec
68 pages
PM Note1
No ratings yet
PM Note1
5 pages
TSEC BC Unit 6
No ratings yet
TSEC BC Unit 6
61 pages
Understanding the CAP Theorem in Databases
No ratings yet
Understanding the CAP Theorem in Databases
77 pages
Blockchain Using Python Lect - 3
No ratings yet
Blockchain Using Python Lect - 3
23 pages
Module 2
No ratings yet
Module 2
34 pages
Types of Query Languages Explained
No ratings yet
Types of Query Languages Explained
29 pages
MongoDB Profiler Deep Dive MongoDB Austin 2013
No ratings yet
MongoDB Profiler Deep Dive MongoDB Austin 2013
21 pages
Al-Muradi's Medieval Secrets
No ratings yet
Al-Muradi's Medieval Secrets
2 pages
PostgreSQL Vacuum and Index Guide
No ratings yet
PostgreSQL Vacuum and Index Guide
31 pages
11g212c Upgrade
No ratings yet
11g212c Upgrade
5 pages
chmod and chown Command Guide
No ratings yet
chmod and chown Command Guide
6 pages
CS3492 Database Management Syllabus
No ratings yet
CS3492 Database Management Syllabus
43 pages
SQL Server Update Trigger Guide
No ratings yet
SQL Server Update Trigger Guide
3 pages
COMP.6212: Assignment 1
No ratings yet
COMP.6212: Assignment 1
5 pages
PSK DWH Material
No ratings yet
PSK DWH Material
134 pages
Final Viva Important Question
No ratings yet
Final Viva Important Question
7 pages
D75058GC20 Ep
No ratings yet
D75058GC20 Ep
326 pages
Database Management Systems Lab Guide
No ratings yet
Database Management Systems Lab Guide
7 pages
Data Cleaning With SSIS
No ratings yet
Data Cleaning With SSIS
25 pages
SQL Tutorial: Commands & Implementations
No ratings yet
SQL Tutorial: Commands & Implementations
11 pages
Database Models
No ratings yet
Database Models
3 pages
Training Assignments: SQL Basics
No ratings yet
Training Assignments: SQL Basics
5 pages
Vascm v11 Course Outline
No ratings yet
Vascm v11 Course Outline
3 pages
SQL Joins: A Comprehensive Guide
No ratings yet
SQL Joins: A Comprehensive Guide
9 pages
Clustering and Search Techniques in Information Retrieval Systems
67% (3)
Clustering and Search Techniques in Information Retrieval Systems
39 pages
Common SQL Server Myths
No ratings yet
Common SQL Server Myths
58 pages
Understanding Spatial Database Management
No ratings yet
Understanding Spatial Database Management
63 pages
Google Search Report
No ratings yet
Google Search Report
7 pages
Oracle Statistics Guide for DBAs
No ratings yet
Oracle Statistics Guide for DBAs
13 pages
Intro to DBMS for IT Students
No ratings yet
Intro to DBMS for IT Students
3 pages
Sap Abap Materials Abap Reportspdf PDF Free
No ratings yet
Sap Abap Materials Abap Reportspdf PDF Free
4 pages
School Database Design Guide
No ratings yet
School Database Design Guide
9 pages
Dbms Unit3 Part1
No ratings yet
Dbms Unit3 Part1
35 pages
JDBC Sample Code
100% (1)
JDBC Sample Code
2 pages
Expdp Without LOB
No ratings yet
Expdp Without LOB
24 pages
JDBC
100% (1)
JDBC
114 pages