Information Retrieval & Search
Engines
Lecture 4 :Lucene Query
Language
Dr. Mostafa Shokry
10/17/2025 Information Retrieval & Search Engines 1
Lucene Query Language
• Lucene Query Language is a rich and flexible query syntax provided by
Apache Lucene, a high-performance search engine library.
• It extends the Boolean model with field-specific search, wildcards, range
queries, boosting, and fuzzy search—while still supporting Boolean logic.
10/17/2025 Information Retrieval & Search Engines 2
Lucene Query Language
Basic Lucene Syntax Features:
1. Field-Specific Search.
2. Wildcards and Fuzzy Matching.
3. Boosting Terms.
4. Proximity Search.
10/17/2025 Information Retrieval & Search Engines 3
Lucene Query Language (Cont.)
1. Field-Specific Search
• In Lucene, documents consist of fields (like title, body, author, tags). You
can restrict queries to specific fields.
Example:
title:"deep learning" AND author:Goodfellow
Only matches documents where the title contains "deep learning" and the
author is Goodfellow.
10/17/2025 Information Retrieval & Search Engines 4
Lucene Query Language (Cont.)
2. Wildcards and Fuzzy Matching
• Wildcards: are special symbols used in a query to match multiple terms
that share common patterns. They allow for partial matches of words.
• Fuzzy matching: allows the search engine to find terms that are
approximately similar to the search term—helpful when users make
typos or when terms have slight variations.
10/17/2025 Information Retrieval & Search Engines 5
Lucene Query Language (Cont.)
Examples:
Type Symbol Example Meaning
Wildcard * hack* Matches hacking, hacker, hacked
Wildcard ? t?st Matches test, tast, tist…
Fuzzy ~ roast~ Matches similar spellings roast, roost, boast
10/17/2025 Information Retrieval & Search Engines 6
Lucene Query Language (Cont.)
3. Boosting Terms
• Use ^ to increase the importance (weight) of a term.
Example:
java^2.0 python
Documents matching “java” are ranked higher than those matching
“python”.
10/17/2025 Information Retrieval & Search Engines 7
Lucene Query Language (Cont.)
Query Meaning
"lucene" in the title is 4 times more
title:lucene^4.0 content:lucene important than "lucene" in the content
The exact phrase "machine learning" is 3
"machine learning"^3.0 AI times more important than the term "AI"
10/17/2025 Information Retrieval & Search Engines 8
Lucene Query Language (Cont.)
Example:
(title:big^2.0 data) OR (content:analytics^1.5) OR
category:technology^0.8
"big" in title is 2x more important
"analytics" in content is 1.5x more important
"technology" in category is slightly less important (0.8x)
10/17/2025 Information Retrieval & Search Engines 9
Lucene Query Language (Cont.)
4. Proximity Search
• Find phrases where words appear close together, not necessarily as a
phrase.
Example:
"information retrieval"~3
Matches "retrieval of relevant information" or "information systems for
document retrieval" (within 3 words distance).
10/17/2025 Information Retrieval & Search Engines 10
Lucene Query Language (Cont.)
• What "information retrieval"~3 Means:
This is a proximity search. It tells the search engine:
“Find documents where the words information and retrieval appear within 3
words of each other, in any order.”
So, it’s not looking for the exact phrase "information retrieval", but rather
something like:
"information and content retrieval"
"retrieval of information"
"information retrieval"
"information about efficient document retrieval" (3-word gap)
10/17/2025 Information Retrieval & Search Engines 11
Lucene Query Language (Cont.)
Exercise 1: Query:
(title:"cyber attack" OR title:"phishing email") AND year:[2022 TO 2025]
AND - author:john
Question: What does this query retrieve?
Documents with titles containing “cyber attack” or “phishing email”,
published between 2022 and 2025, and not authored by John.
10/17/2025 Information Retrieval & Search Engines 12
Lucene Query Language (Cont.)
Exercise 2:
Write a Lucene query to find documents about “AI ethics” in the title or
abstract, written after 2021, and boost results that mention “bias”.
(title:"AI ethics" OR abstract:"AI ethics") AND year:[2022 TO *] AND bias^2
10/17/2025 Information Retrieval & Search Engines 13
Comparison with Boolean Query
Language
Feature Boolean Query Language Lucene Query Language
Operators AND, OR, NOT AND, OR, -
Phrase search Yes Yes
Fielded search No Yes (e.g., title:term)
Wildcards No Yes (*, ?)
Fuzzy search No Yes (term~)
Boosting No Yes (term^2)
Ranking support Not inherent Yes (boosting, scoring-based engines)
Modern use Rare in consumer systems Common (Elasticsearch, Solr, etc.)
10/17/2025 Information Retrieval & Search Engines 14
Exercises
Given a document collection with keywords:
DocID Content
D1 Information retrieval and search engines
D2 Retrieval techniques in data mining
D3 Information systems and databases
D4 Search algorithms and optimization
10/17/2025 Information Retrieval & Search Engines 15
Exercises (Cont.)
• Which documents match: Information AND retrieval? D1
• Which documents match: Search OR optimization? D1, D4
• Which documents match: retrieval AND NOT search? D2
• Which documents match: (information OR data) AND retrieval? D1, D2
10/17/2025 Information Retrieval & Search Engines 16
Exercises (Cont.)
Document Collection
DocID Title Author Year Content
D1 "Intro to IR" Smith 2020 "Basics of information retrieval and search"
D2 "Data Mining Methods" Johnson 2019 "Data patterns and retrieval techniques"
D3 "Advanced IR" Smith 2021 "Vector model and ranking techniques"
10/17/2025 Information Retrieval & Search Engines 17
Exercises (Cont.)
1. Search for documents written by Smith:
Query:
author:Smith
Matches:
• D1 (Author: Smith)
• D3 (Author: Smith)
10/17/2025 Information Retrieval & Search Engines 18
Exercises (Cont.)
2. Search for documents with the phrase: "information retrieval“
Query:
"information retrieval"
Matches:
• D1
10/17/2025 Information Retrieval & Search Engines 19
Exercises (Cont.)
3. Search for any word starting with “retriev”
Query:
retriev*
This uses a wildcard, so it matches terms like:
Retrieve, retrieval, retrieved, .., etc
Matches:
• D1 ("retrieval").
• D2 ("retrieval").
10/17/2025 Information Retrieval & Search Engines 20
Exercises (Cont.)
4. Search for documents where “data” appears near “retrieval”
Query:
"data retrieval"~5
Matches:
• D2: "Data patterns and retrieval techniques" → “data” and “retrieval” are
within 3 words
10/17/2025 Information Retrieval & Search Engines 21
Exercises (Cont.)
5. Boost the term “information”
Query:
information^2 retrieval
Matches:
• D1: Contains both “information” and “retrieval” → Likely highest score
• D2: Only “retrieval” → Lower score
• D3: No “information” → Lowest or excluded
10/17/2025 Information Retrieval & Search Engines 22