0% found this document useful (0 votes)
30 views7 pages

Example Problems Discussion-Vector Space Model

The document discusses problem-solving techniques using the Boolean Model and Vector Space Model for document retrieval. It provides examples of how to represent documents and queries in both models, including calculations for term frequency and weights. The document concludes with a ranking of documents based on their relevance to a given query.

Uploaded by

innokon.siva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views7 pages

Example Problems Discussion-Vector Space Model

The document discusses problem-solving techniques using the Boolean Model and Vector Space Model for document retrieval. It provides examples of how to represent documents and queries in both models, including calculations for term frequency and weights. The document concludes with a ranking of documents based on their relevance to a given query.

Uploaded by

innokon.siva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

3/19/24, 10:26 AM Problem solving on Boolean Model and Vector Space Model - GeeksforGeeks

Problem solving on Boolean Model and Vector Space Model


Boolean Model:

It is a simple retrieval model based on set theory and boolean algebra. Queries are designed as boolean
expressions which have precise semantics. Retrieval strategy is based on binary decision criterion. Boolean
model considers that index terms are present or absent in a document.

Problem Solving:

Consider 5 documents with a vocabulary of 6 terms

document 1 = ‘ term1 term3 ‘


document 2 = ‘ term 2 term4 term6 ‘
document 3 = ‘ term1 term2 term3 term4 term5 ‘
document 4 = ‘ term1 term3 term6 ‘
document 5 = ‘ term3 term4 ‘

Our documents in boolean model

DSA Practice Searching Algorithms MCQs on Searching Algorithms Tutorial on Searching Algorithms Linear Search Binary Search Ternary Search Jum

term 1 term 2 term 3 term 4 term 5 term 6

document 1 1 0 1 0 0 0

document 2 0 1 0 1 0 1

document 3 1 1 1 1 1 0

document 4 1 0 1 0 0 1

document 5 0 0 1 1 0 0

Consider the query

Find the document consisting of term1 and term3 and not term2

term1 ∧ term3 ∧ ¬ term2

term1 ¬term 2 term 3 term 4 term 5 term 6

[Link] 1/7
3/19/24, 10:26 AM Problem solving on Boolean Model and Vector Space Model - GeeksforGeeks

document 1 1 1 1 0 0 0

document 2 0 0 0 1 0 1

document 3 1 0 1 1 1 0

document 4 1 1 1 0 0 1

document 5 0 1 1 1 0 0

document 1 : 1 ∧ 1∧ 1 = 1
document 2 : 0 ∧ 0 ∧ 0 = 0
document 3 : 1 ∧ 1 ∧ 0 = 0
document 4 : 1 ∧ 1 ∧ 1 = 1
document 5 : 0 ∧ 1 ∧ 1 = 0

Based on the above computation document1 and document4 are relevant to the given query

Vector Model:

The method of performing the operations and the formulas required for the computation is present in the
previous document that is part 1. Consider the following collection of documents.

document1 = ‘one two ‘


document2 = ‘three two four ‘
document3 =’one two three ‘
document4 =’one two ‘

The formulas used

Some terms appear thrice, twice and sometimes only once in the [Link] total number of documents N=4.
Therefore, the IDF values of the terms are:

one --> log2(4/3) = 0.4147


two --> log2(4/4) = 0
three --> log2(4/2) = 1
four -->log2(4/1) = 2

Representation in boolean model

one two three four

document1 1 1 0 0

document2 0 1 1 1

document3 1 1 1 0

document4 1 1 0 0

[Link] 2/7
3/19/24, 10:26 AM Problem solving on Boolean Model and Vector Space Model - GeeksforGeeks

Calculation of term frequency

one --> 3/4 = 0.75


two --> 4/4 = 1
three --> 2/4 = 0.5
four --> 1/4 = 0.25

Calculation of weights ( tf * idf )

weight(one) --> 0.75 * 0.4147 = 0.3110


weight(two) --> 1 * 0 = 0
weight(three) --> 0.5 * 1 = 0.5
weight(four) --> 0.25 * 2 = 0.5

Representation of vector model in terms of weights

one two three four

document1 0.3110 0 0 0

document2 0 0 0.5 0.5

document3 0.3110 0 0.5 0

document4 0.3110 0 0 0

QUERY: Document containing ‘ one three three ‘

Calculation of weights for query terms(term frequency)

weight(one) –> 1/3 = 0.333


weight(three) –> 2/3 = 0.667

Vector representation

Document

Query

Similarity calculation: the

Ranking of the documents, ( for ranking we have followed the method in statistics for the case of allocating same
rank to two different items)

document1 2nd

document2 4th

document3 1st

document4 2nd

[Link] 3/7
3/19/24, 10:26 AM Problem solving on Boolean Model and Vector Space Model - GeeksforGeeks

Since the similarity between document 3 is greater than the similarities between the other documents, 3rd
document is more relevant to the query.

"The DSA course helped me a lot in clearing the interview rounds. It was really very helpful in setting a strong
foundation for my problem-solving skills. Really a great investment, the passion Sandeep sir has towards
DSA/teaching is what made the huge difference." - Gaurav | Placed at Amazon

Before you move on to the world of development, master the fundamentals of DSA on which every advanced
algorithm is built upon. Choose your preferred language and start learning today:

DSA In JAVA/C++
DSA In Python
DSA In JavaScript
Trusted by Millions, Taught by One- Join the best DSA Course Today!

Recommended Problems
Solve Problems
Frequently asked DSA Problems

Maximize your earnings for your published articles in Dev Scripter 2024! Showcase expertise, gain recognition & get extra
compensation while elevating your tech profile.

Last Updated : 30 May, 2021 3

Previous Next

Minimize (max(A[i], B[j], C[k]) - min(A[i], B[j], C[k])) of Aspect Modelling in Sentiment Analysis
three different sorted arrays

Share your thoughts in the comments Add Your Comment

Similar Reads
Document Retrieval using Boolean Model and Vector Space Check if it is possible to reach vector B by rotating vector A
Model and adding vector C to it

Problem Solving for Minimum Spanning Trees (Kruskal’s and Problem solving on scatter matrix
Prim’s)

Solving Binary String Modulo Problem Solving the Multicollinearity Problem with Decision Tree

Boolean Parenthesization Problem | DP-37 What is the difference between Auxiliary space and Space
Complexity?

How to flatten a Vector of Vectors or 2D Vector in C++ Word Wrap problem ( Space optimized solution )

[Link] 4/7
3/19/24, 10:26 AM Problem solving on Boolean Model and Vector Space Model - GeeksforGeeks
D deviprajw…

Article Tags : DSA , Machine Learning , Project , Searching


Practice Tags : Machine Learning, Searching

A-143, 9th Floor, Sovereign Corporate


Tower, Sector-136, Noida, Uttar Pradesh -
201305

Company Explore
About Us Job-A-Thon Hiring Challenge
Legal Hack-A-Thon
Careers GfG Weekly Contest
In Media Offline Classes (Delhi/NCR)
Contact Us DSA in JAVA/C++
Advertise with us Master System Design
GFG Corporate Solution Master CP
Placement Training Program GeeksforGeeks Videos
Geeks Community

Languages DSA
Python Data Structures
Java Algorithms
C++ DSA for Beginners
PHP Basic DSA Problems
GoLang DSA Roadmap
SQL DSA Interview Questions
R Language Competitive Programming
Android Tutorial

Data Science & ML Web Technologies


Data Science With Python HTML
Data Science For Beginner CSS
Machine Learning Tutorial JavaScript
ML Maths TypeScript
Data Visualisation Tutorial ReactJS

[Link] 5/7
3/19/24, 10:26 AM Problem solving on Boolean Model and Vector Space Model - GeeksforGeeks
Pandas Tutorial NextJS
NumPy Tutorial NodeJs
NLP Tutorial Bootstrap
Deep Learning Tutorial Tailwind CSS

Python Tutorial Computer Science


Python Programming Examples GATE CS Notes
Django Tutorial Operating Systems
Python Projects Computer Network
Python Tkinter Database Management System
Web Scraping Software Engineering
OpenCV Tutorial Digital Logic Design
Python Interview Question Engineering Maths

DevOps System Design


Git High Level Design
AWS Low Level Design
Docker UML Diagrams
Kubernetes Interview Guide
Azure Design Patterns
GCP OOAD
DevOps Roadmap System Design Bootcamp
Interview Questions

School Subjects Commerce


Mathematics Accountancy
Physics Business Studies
Chemistry Economics
Biology Management
Social Science HR Management
English Grammar Finance
Income Tax

UPSC Study Material Preparation Corner


Polity Notes Company-Wise Recruitment Process
Geography Notes Resume Templates
History Notes Aptitude Preparation
Science and Technology Notes Puzzles
Economy Notes Company-Wise Preparation
Ethics Notes Companies
Previous Year Papers Colleges

Competitive Exams More Tutorials


JEE Advanced Software Development
UGC NET Software Testing
SSC CGL Product Management
SBI PO Project Management
SBI Clerk Linux
IBPS PO Excel
IBPS Clerk All Cheat Sheets

Free Online Tools Write & Earn


Typing Test Write an Article

[Link] 6/7
3/19/24, 10:26 AM Problem solving on Boolean Model and Vector Space Model - GeeksforGeeks
Image Editor Improve an Article
Code Formatters Pick Topics to Write
Code Converters Share your Experiences
Currency Converter Internships
Random Number Generator
Random Password Generator

@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved

[Link] 7/7

You might also like