Chap5 Query Processing

The document discusses the architecture and processes involved in information retrieval, particularly focusing on search engine indexing and query processing techniques. It outlines two main approaches for scoring documents: Document-at-a-Time and Term-at-a-Time, along with various optimization techniques to enhance performance. Additionally, it covers threshold methods for optimizing query processing by determining the minimum score required for documents to be displayed to users.

Uploaded by

hihifi1326

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views17 pages

Chap5 Query Processing

Uploaded by

hihifi1326

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Search Engines

Information Retrieval in Practice

All slides ©Addison Wesley, 2008

With changes by Crista Lopes
Simple Inverted
Index
Index Construction
• Simple in-memory indexer

List<Posting>()

[Link](Posting(n))

Write to a file
Architecture
Index Creation Index Ranking
Log

Querying Process
Preprocessing Steps

Text Transformation Evaluation

Local
Text Acquisition Document UI
Store

Web Pages
Query Processing
• Document-at-a-time
– Calculates complete scores for documents by
processing all term lists, one document at a time
• Term-at-a-time
– Accumulates scores for documents by processing
term lists one at a time
• Both approaches have optimization
techniques that significantly reduce time
required to generate scores
Document-At-A-Time
Pseudocode Function Descriptions
• getCurrentDocument()
– Returns the document number of the current posting of the inverted
list.
• skipForwardToDocument(d)
– Moves forward in the inverted list until getCurrentDocument() <= d.
This function may read to the end of the list.
• movePastDocument(d)
– Moves forward in the inverted list until getCurrentDocument() < d.
• moveToNextDocument()
– Moves to the next document in the list. Equivalent to
movePastDocument(getCurrentDocument()).
• getNextAccumulator(d)
– returns the first document number d' >= d that has already has an
accumulator.
• removeAccumulatorsBetween(a, b)
– Removes all accumulators for documents numbers between a and b.
Ad will be removed iff a < d < b.
Document-At-A-Time
Term-At-A-Time
Term-At-A-Time
Optimization Techniques
• Term-at-a-time uses more memory for
accumulators, but accesses disk more
efficiently
• Two classes of optimization
– Read less data from inverted lists
• e.g., skip lists
• better for simple feature functions
– Calculate scores for fewer documents
• e.g., conjunctive processing
• better for complex feature functions
Conjunctive
Term-at-a-Time
Conjunctive
Document-at-a-Time
Threshold Methods
• Threshold methods use number of top-ranked
documents needed (k) to optimize query
processing
– for most applications, k is small
• For any query, there is a minimum score that each
document needs to reach before it can be shown
to the user
– score of the kth-highest scoring document
– gives threshold τ
– optimization methods estimate τ′ to ignore
documents
Threshold Methods
• For document-at-a-time processing, use score
of lowest-ranked document so far for τ′
– for term-at-a-time, have to use kth-largest score in
the accumulator table
• MaxScore method compares the maximum
score that remaining documents could have to
τ′
– safe optimization in that ranking will be the same
without optimization
MaxScore Example

• Indexer computes μtree

– maximum score for any document containing just “tree”
• Assume k =3, τ′ is lowest score after first three docs
• Likely that τ ′ > μtree
– τ ′ is the score of a document that contains both query
terms
• Can safely skip over all gray postings
Other Approaches
• Early termination of query processing
– ignore high-frequency word lists in term-at-a-time
– ignore documents at end of lists in doc-at-a-time
– unsafe optimization
• List ordering
– order inverted lists by quality metric (e.g.,
PageRank) or by partial score
– makes unsafe (and fast) optimizations more likely
to produce good documents

L05
No ratings yet
L05
33 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
Indexing 1
No ratings yet
Indexing 1
61 pages
Information Retrievalpdf
No ratings yet
Information Retrievalpdf
7 pages
Aspect Information Retrieval (IR) Web Search
No ratings yet
Aspect Information Retrieval (IR) Web Search
19 pages
Spelling Correction in IR Systems
No ratings yet
Spelling Correction in IR Systems
36 pages
chapter2-MA212-Indexing & Preprocessing
No ratings yet
chapter2-MA212-Indexing & Preprocessing
68 pages
Module 1-1
No ratings yet
Module 1-1
12 pages
Text Databases and Information Retrieval
No ratings yet
Text Databases and Information Retrieval
23 pages
Lecture 05
No ratings yet
Lecture 05
51 pages
2 Introduction To Information Retrieval
No ratings yet
2 Introduction To Information Retrieval
38 pages
Applications of Information Retrieval
No ratings yet
Applications of Information Retrieval
23 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
50 pages
03 - Lect3 Search Engines-Part2
No ratings yet
03 - Lect3 Search Engines-Part2
32 pages
Irs Unit - 3
No ratings yet
Irs Unit - 3
68 pages
Ir Chapter Three
No ratings yet
Ir Chapter Three
41 pages
Lecture 2 Inverted Index PDF
No ratings yet
Lecture 2 Inverted Index PDF
24 pages
IR
No ratings yet
IR
57 pages
Ir Mod4 Notes
No ratings yet
Ir Mod4 Notes
19 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
Certificate: T.Y.Bsc Cs
No ratings yet
Certificate: T.Y.Bsc Cs
120 pages
Chap5 Index Construction
No ratings yet
Chap5 Index Construction
38 pages
4.index Construction - New
No ratings yet
4.index Construction - New
46 pages
Unit II-1
No ratings yet
Unit II-1
57 pages
Multimedia Information Retrieval Overview
No ratings yet
Multimedia Information Retrieval Overview
19 pages
03lecture 3 - Biomedical IR-indexing
No ratings yet
03lecture 3 - Biomedical IR-indexing
27 pages
Chapter - 3 and 4
No ratings yet
Chapter - 3 and 4
47 pages
IR Cheatsheet Final
No ratings yet
IR Cheatsheet Final
3 pages
Ir
No ratings yet
Ir
4 pages
Unit1 Mot
No ratings yet
Unit1 Mot
22 pages
All Unit 2 Mark
No ratings yet
All Unit 2 Mark
15 pages
Inverted Index in Information Retrieval
No ratings yet
Inverted Index in Information Retrieval
24 pages
Document Indexing in Information Retrieval
No ratings yet
Document Indexing in Information Retrieval
19 pages
Lec6 InvretedIndex pt2
No ratings yet
Lec6 InvretedIndex pt2
38 pages
Information Retrieval Index Construction Guide
No ratings yet
Information Retrieval Index Construction Guide
45 pages
Efficient Information Retrieval Techniques
No ratings yet
Efficient Information Retrieval Techniques
19 pages
Week 6
No ratings yet
Week 6
98 pages
Dynamic Indexing
No ratings yet
Dynamic Indexing
53 pages
An Overview of Information Retrieval Outline: A (Simple) Database Example Databases vs. IR
No ratings yet
An Overview of Information Retrieval Outline: A (Simple) Database Example Databases vs. IR
16 pages
C1 Intro
No ratings yet
C1 Intro
10 pages
Unit - 3:: Explain Briefly About Automatic Indexing? Explain About Types of Classes Automatic Indexing?
No ratings yet
Unit - 3:: Explain Briefly About Automatic Indexing? Explain About Types of Classes Automatic Indexing?
28 pages
Index Construction
No ratings yet
Index Construction
37 pages
Informaiton Retrieval and Web Search
No ratings yet
Informaiton Retrieval and Web Search
44 pages
Index Construction in Information Retrieval
No ratings yet
Index Construction in Information Retrieval
46 pages
Learning Guide Unit 2
No ratings yet
Learning Guide Unit 2
15 pages
IRS Unit-3
100% (2)
IRS Unit-3
28 pages
FOP Efficiency Indexing 13
No ratings yet
FOP Efficiency Indexing 13
22 pages
Information Retrieval Module 1 24
No ratings yet
Information Retrieval Module 1 24
53 pages
Efficient Scoring in Retrieval Systems
No ratings yet
Efficient Scoring in Retrieval Systems
18 pages
IR ch4 - Inverted-Index
No ratings yet
IR ch4 - Inverted-Index
44 pages
Lecture4-Indexconstruction Ch2 and Ch4
No ratings yet
Lecture4-Indexconstruction Ch2 and Ch4
49 pages
IR-Lec1 - Ch1-2023
No ratings yet
IR-Lec1 - Ch1-2023
41 pages
Lecture17 IR
No ratings yet
Lecture17 IR
28 pages
Module 6 Updated Final
No ratings yet
Module 6 Updated Final
48 pages
Overview of Information Retrieval in CS583
No ratings yet
Overview of Information Retrieval in CS583
33 pages
Comprehensive Guide to Information Retrieval
No ratings yet
Comprehensive Guide to Information Retrieval
74 pages
22103071-Assignment - Ii
No ratings yet
22103071-Assignment - Ii
7 pages
Navy Ship Design Abbreviations Guide
100% (1)
Navy Ship Design Abbreviations Guide
17 pages
Math8 q1 Mod4 Go Simplifying Rational Algebraic Expressions 08092020
No ratings yet
Math8 q1 Mod4 Go Simplifying Rational Algebraic Expressions 08092020
17 pages
CIO Guide - SAP's Hybrid Integration Platform For The Intelligent Enterprise
No ratings yet
CIO Guide - SAP's Hybrid Integration Platform For The Intelligent Enterprise
7 pages
Pricelist Hanzo Local Retail
No ratings yet
Pricelist Hanzo Local Retail
5 pages
Overview of Apache Hive Features and Limitations
No ratings yet
Overview of Apache Hive Features and Limitations
35 pages
Early Project Appraisal Making The Initial Choices (Knut Samset)
No ratings yet
Early Project Appraisal Making The Initial Choices (Knut Samset)
303 pages
Android Exynos4412 iROM Secure Booting Guide Ver.1.00.00
No ratings yet
Android Exynos4412 iROM Secure Booting Guide Ver.1.00.00
28 pages
K07155 - Health Insurance Payor - 721 - 021319 - April 2023
No ratings yet
K07155 - Health Insurance Payor - 721 - 021319 - April 2023
35 pages
DUI0497A Cortex m0 r0p0 Generic Ug
No ratings yet
DUI0497A Cortex m0 r0p0 Generic Ug
140 pages
Playing Football
No ratings yet
Playing Football
2 pages
CALL - FUNCTION - SEND - ERROR CM - DEALLOCATED - NORMAL CMRC 18
No ratings yet
CALL - FUNCTION - SEND - ERROR CM - DEALLOCATED - NORMAL CMRC 18
2 pages
Unit 03 Activities
No ratings yet
Unit 03 Activities
5 pages
Video 7
No ratings yet
Video 7
3 pages
ABAP 7.4 Internal Table Expressions
No ratings yet
ABAP 7.4 Internal Table Expressions
5 pages
NVOCC Case Studies
No ratings yet
NVOCC Case Studies
9 pages
Borrowed From Your Grandchildren The Evolution of 100year Family Enterprises 1st Edition Dennis T Jaffe Official Test Bank
No ratings yet
Borrowed From Your Grandchildren The Evolution of 100year Family Enterprises 1st Edition Dennis T Jaffe Official Test Bank
340 pages
La Transición Al Océano Azul
No ratings yet
La Transición Al Océano Azul
316 pages
Def Stan Index
No ratings yet
Def Stan Index
160 pages
CELL BROADCAST SERVICE (Comprehensive Document)
No ratings yet
CELL BROADCAST SERVICE (Comprehensive Document)
49 pages
BN68-13792A-01 - Leaflet-Remote - QLED LS03 - MENA - L02 - 220304.0
No ratings yet
BN68-13792A-01 - Leaflet-Remote - QLED LS03 - MENA - L02 - 220304.0
2 pages
Barangay Cutcut Basketball Event Plan
100% (1)
Barangay Cutcut Basketball Event Plan
13 pages
Sizing Calculations - Revised
100% (1)
Sizing Calculations - Revised
2 pages
62353-En Ser en R02
No ratings yet
62353-En Ser en R02
58 pages
Camera Dorks
No ratings yet
Camera Dorks
6 pages
Unit 6 - Compression and Serialization in Hadoop
No ratings yet
Unit 6 - Compression and Serialization in Hadoop
24 pages
Networking Projects Thesis
100% (2)
Networking Projects Thesis
7 pages
What Is The Limitation of Multimodal LLMS? A Deeper Look Into Multimodal LLMs Through Prompt Probing
No ratings yet
What Is The Limitation of Multimodal LLMS? A Deeper Look Into Multimodal LLMs Through Prompt Probing
13 pages
Smith Transfer Pumps
No ratings yet
Smith Transfer Pumps
7 pages
Opposition to TETRA Waiver Request
No ratings yet
Opposition to TETRA Waiver Request
47 pages
OSY Lecture 1 Notes - MSBTE NEXT ICON
100% (3)
OSY Lecture 1 Notes - MSBTE NEXT ICON
14 pages

Chap5 Query Processing

Uploaded by

Chap5 Query Processing

Uploaded by

Search Engines

Information Retrieval in Practice

All slides ©Addison Wesley, 2008

Text Transformation Evaluation

• Indexer computes μtree

You might also like