0% found this document useful (0 votes)
9 views14 pages

IR System Evaluation Guide

There are three main ways to evaluate an information retrieval system: retrieval effectiveness by measuring the relevance of search results, system quality by measuring indexing and search speeds as well as collection coverage, and user utility by measuring user happiness, return rates, and productivity through A/B testing. Effectiveness can be measured through precision, recall, and F-measure using test collections with representative documents and topics as well as known human-assessed relevance judgments. Good measures are meaningful, easily replicated, and comparable using a single number.

Uploaded by

Olivia Michel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views14 pages

IR System Evaluation Guide

There are three main ways to evaluate an information retrieval system: retrieval effectiveness by measuring the relevance of search results, system quality by measuring indexing and search speeds as well as collection coverage, and user utility by measuring user happiness, return rates, and productivity through A/B testing. Effectiveness can be measured through precision, recall, and F-measure using test collections with representative documents and topics as well as known human-assessed relevance judgments. Good measures are meaningful, easily replicated, and comparable using a single number.

Uploaded by

Olivia Michel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

IR Evaluation

 There are different aspects through which we can evaluate IR systems:


1. Retrieval effectiveness (standard IR evaluation)
• Relevance of search results
2. System quality
a) Indexing speed (e.g., how many documents per hour?)
b) Search speed (search latency as a function of index size)
c) Coverage (document collection size and diversity)
d) Expressiveness of the query language
3. User utility 
• User happiness based on relevance, speed, and user interface
• User return rate, user productivity (difficult to measure)
• A/B test: slight change on a deployed system visible to a fraction of users
• Difference evaluated using clickthrough log analysis
Evaluation Criteria
• Effectiveness
• How “good” are the documents are returned?
• Efficiency
• Retrieval time, indexing time, indexing size.
• Usability
• Learnability, flexibility
Reusable Test Collection
• Collection of documents
• Should be representative.
• Sample of information need.
• Should be randomized and representative.
• Usually formalized topic statement.
• Known relevance judgments.
• Assed by human.
• Binary judgments make evaluation easier.
Good Effectiveness Measures
• Should capture some aspects of what the user wants.
• The measure should be meaningful.
• Should be easily replicated by other researchers.
• Should be easily comparable.
• Expressed as a single number.
Effectiveness evaluation measure
• Set based measure
• Rank based measure
Set based measure

• IR system returns set of retrieved results without results.


• No certain number of results per query.
• Suitable for Boolean search.
Precision and recall
Precision and recall
Trade-off between R&P
• Precision
• The ability to retrieve top ranked documents that are mostly relevant.
• Recall
• The ability to retrieve all of relevant items.
Trade-off between R&P
F-measure

You might also like