Skip to content

sadjadeb/QSD_QPP

Repository files navigation

Query Space Distance-Based (QPP): Estimating Query Performance based on Nearest Neighbors

This repository contains the implementation and the results of Query Space Distance-Based Query Performance Prediction (QSD-QPP), a novel framework for estimating query performance by leveraging the deterministic behavior of retrieval methods. The QSD-QPP framework systematically constructs a high-dimensional Query Space to capture semantic and syntactic relationships between queries and their associated retrieval effectiveness.

This repository is organized as follows:

  1. QSD_QPPpre: A lightweight pre-retrieval instantiation that predicts query performance by interpolating the effectiveness of proximate historical queries within the Query Space.
  2. QSD_QPPpost: An enriched post-retrieval instantiation that incorporates contextual embeddings, document interactions, and historical query associations for more accurate predictions.
  3. Usage: Instructions on how to use the code to predict the performance of a set of target queries.
  4. Citation: If you use this code, please cite our paper.

QSD_QPPpre

The pre-retrieval instantiation, QSD-QPPpre, dynamically constructs a subspace of historical queries that are in close semantic proximity to the input query within a high-dimensional Query Space. By analyzing embedding distances, it identifies the most similar historical queries and interpolates their known performance scores to estimate the effectiveness of the input query. This approach allows efficient and reliable performance prediction without relying on retrieval results.

The workflow for QSD-QPPpre can be described in two primary steps:

  1. Dynamic Subspace Construction:
    The input query is represented as an embedding within the high-dimensional Query Space using a pre-trained language model. Historical queries that fall within a predefined embedding distance threshold (denoted as Ξ³) are identified to form a localized subspace. This subspace serves as the basis for performance prediction.

  2. Performance Interpolation:
    The retrieval effectiveness of the input query is estimated by aggregating the known performance metrics of the historical queries in the subspace. A distance-based weighting scheme assigns higher importance to queries that are closer to the input query, ensuring that the prediction is informed by the most relevant historical data. This process enables accurate and efficient performance estimation without requiring retrieval-stage computations.

QSD-QPPPre

Performance Comparison with Baselines

The table below shows Pearson Rho, kendall Tau, and Spearman correlation of different baselines as well as our proposed QSD_QPPpre method over four different datasets.

QPP Method MS MARCO Dev small (6980 queries) TREC DL 2019 (43 Queries) TREC DL 2020 (53 Queries) DL Hard (50 Queries)
Pearson Rho kendall Tau Spearman Pearson Rho kendall Tau Spearman Pearson Rho kendall Tau Spearman Pearson Rho kendall Tau Spearman
SCS 0.021 0.058 0.085 0.471 0.262 0.354 0.447 0.310 0.448 0.247 0.159 0.240
P_Clarity 0.052 0.007 0.009 0.109 0.119 0.139 0.069 0.052 0.063 0.095 0.209 0.272
VAR 0.067 0.081 0.119 0.290 0.141 0.187 0.047 0.051 0.063 0.023 0.014 0.001
PMI 0.030 0.033 0.048 0.155 0.065 0.079 0.021 0.012 0.003 0.093 0.027 0.042
IDF 0.117 0.138 0.200 0.440 0.276 0.389 0.413 0.236 0.345 0.200 0.197 0.275
SCQ 0.029 0.022 0.032 0.395 0.114 0.157 0.193 0.005 0.004 0.335 0.106 0.152
ICTF 0.105 0.136 0.198 0.435 0.259 0.365 0.409 0.236 0.348 0.192 0.195 0.272
DC 0.071 0.044 0.065 0.132 0.083 0.092 0.1001 0.1175 0.14913 0.155 0.091 0.115
CC 0.085 0.066 0.076 0.079 0.068 0.023 0.172 0.065 0.089 0.155 0.093 0.111
IEF 0.110 0.090 0.118 0.140 0.090 0.134 0.110 0.025 0.037 0.018 0.071 0.139
MRL 0.022 0.046 0.067 0.176 0.079 0.140 0.093 0.078 0.117 0.046 0.052 0.038
BertPE 0.010 0.003 0.004 0.255 0.190 0.281 0.013 0.031 0.038 0.054 0.025 0.050
QSD_QPPpre 0.219 0.214 0.309 0.483 0.349 0.508 0.452 0.319 0.457 0.364 0.234 0.340

QSD_QPPpost

For this method, the objective is to obtain a more accurate prediction of query performance using additional information from the set of retrieved documents π·π‘ž . The quality and characteristics of these documents provide valuable signals about potential retrieval performance. Hence, post-retrieval QPP exploits the relationship between the query and the retrieved documents, as well as the historical performance of queries within similar query subspaces to estimate query performance.

In the post-retrieval QPP scenario, our proposed method, QSD-QPPPost, extends the pre-retrieval approach by integrating additional layers of contextual and historical information to refine the estimation of query effectiveness. This approach incorporates three core components into an enriched query representation: (1) Contextualized Query Representation: The individual characteristics of the input query π‘ž are captured through its contextualized embedding representation e(π‘ž). This ensures that queries with similar semantic subspaces are mapped to proximate regions in the high-dimensional embedding space. (2) Document Characteristics: Information about the documents retrieved for π‘ž, denoted as π·π‘ž = {𝑑1, 𝑑2, . . . , π‘‘π‘š }, is incorporated through their contextualized embeddings. These embeddings represent the properties of the retrieved documents, which serve as critical signals for inferring the effectiveness of the query. (3) Historical Query Association: The relationship between π‘ž and previously observed queries is captured by measuring the geometric distance between e(π‘ž) and the embeddings of historical queries {e(π‘žπ‘– ) | π‘žπ‘– ∈ 𝑄 } in the Query Space QS. Queries with smaller distances to e(π‘ž) are considered more relevant, and their known effectiveness scores π‘€π‘žπ‘– contribute significantly to the prediction of π‘žβ€™s retrieval effectiveness.

QSD-QPPPost

Results

The below table shows the results of our proposed QSD_QPPpost method compared to the baselines over four datasets.

QPP Method MS MARCO Dev small (6980 queries) DL Hard (50 Queries) TREC DL 2019 (43 Queries) TREC DL 2020 (54 Queries) TREC DL 2021 (53 Queries)
Pearson $\rho$ kendall $\tau$ Spearman $\rho$ Pearson $\rho$ kendall $\tau$ Spearman $\rho$ Pearson $\rho$ kendall $\tau$ Spearman $\rho$ Pearson $\rho$ kendall $\tau$ Spearman $\rho$ Pearson $\rho$ kendall $\tau$ Spearman $\rho$
Clarity 0.149 0.258 0.345 0.149 0.099 0.126 0.271 0.229 0.332 0.360 0.215 0.296 0.111 0.070 0.094
WIG 0.154 0.170 0.227 0.331 0.260 0.348 0.310 0.158 0.226 0.204 0.117 0.166 0.197 0.195 0.270
QF 0.170 0.210 0.264 0.210 0.164 0.217 0.295 0.240 0.340 0.358 0.266 0.366 0.132 0.101 0.142
NeuralQPP 0.193 0.171 0.227 0.173 0.111 0.134 0.289 0.159 0.224 0.248 0.129 0.179 0.134 0.221 0.188
n($\sigma_\%$) 0.221 0.217 0.284 0.195 0.120 0.147 0.371 0.256 0.377 0.480 0.329 0.478 0.269 0.169 0.256
RSD 0.310 0.337 0.447 0.362 0.322 0.469 0.460 0.262 0.394 0.426 0.364 0.508 0.256 0.224 0.340
SMV 0.311 0.271 0.357 0.375 0.269 0.408 0.495 0.289 0.440 0.450 0.391 0.539 0.252 0.192 0.278
NQC 0.315 0.272 0.358 0.384 0.288 0.417 0.466 0.267 0.399 0.464 0.294 0.423 0.271 0.201 0.292
$UEF_{NQC}$ 0.316 0.303 0.398 0.359 0.319 0.463 0.507 0.293 0.432 0.511 0.347 0.476 0.272 0.223 0.327
NQA-QPP 0.451 0.364 0.475 0.386 0.297 0.418 0.348 0.164 0.255 0.507 0.347 0.496 0.258 0.185 0.265
BERT-QPP 0.517 0.400 0.520 0.404 0.345 0.472 0.491 0.289 0.412 0.467 0.364 0.448 0.262 0.237 0.340
qpp-BERT-PL 0.520 0.413 0.522 0.330 0.266 0.390 0.432 0.258 0.361 0.427 0.280 0.392 0.247 0.172 0.292
qpp-PRP 0.302 0.311 0.412 0.090 0.061 0.063 0.321 0.181 0.229 0.189 0.157 0.229 0.027 0.004 0.015
QSD_QPPpost 0.555 0.421 0.544 0.434 0.412 0.508 0.519 0.318 0.459 0.462 0.318 0.448 0.322 0.266 0.359

Usage

In this section, we provide instructions on how to use the code to predict the performance of a set of target queries.

first, you need to clone the repository:

git clone https://github.com/sadjadeb/QSD_QPP.git

Then, you need to create a virtual environment and install the requirements:

cd QSD_QPP/
sudo apt-get install virtualenv
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

Download the dataset

You need the MSMARCO dataset to run the code. You can download the dataset from here and here. After downloading the dataset, you need to extract the files and put them in the data directory. Here is the list of the files you need to put in the data directory:

  • collection.tsv

  • queries.train.tsv

  • qrels.train.tsv

  • top1000.train.tar.gz

  • queries.dev.small.tsv

  • qrels.dev.small.tsv

  • top1000.dev.tar.gz

  • msmarco-test2019-queries.tsv

  • 2019qrels-pass.txt

  • msmarco-passagetest2019-top1000.tsv

  • msmarco-test2020-queries.tsv

  • 2020qrels-pass.txt

  • msmarco-passagetest2020-top1000.tsv

  • dl_hard-passage.qrels

  • topics.tsv (from DL-Hard)

  • bm25.run (from DL-Hard)

Prepare the data

To create a dictionary which maps each query to its actual performance by BM25 (i.e. MRR@10), you need to run the following command:

python extract_metrics_per_query.py --run /path/to/run/file --qrels /path/to/qrels/file --qrels /path/to/qrels/file

It will create a file named run-file-name_evaluation-per-query.json in the data/eval_per_query directory.

Then you need to create a file which contains the most similar query from train-set(a.k.a. historical queries with known retrieval effectiveness) to each query. To do so, you need to run the following command:

python find_most_similar_query.py --base_queries /path/to/train-set/queries --base_queries /path/to/train-set/queries --base_queries /path/to/train-set/queries --target_queries /path/to/desired/queries --target_queries /path/to/desired/queries --target_queries /path/to/desired/queries --model_name /name/of/the/model --model_name /name/of/the/model --model_name /name/of/the/language/model --hits /number/of/hits --hits /number/of/hits --hits /number/of/hits

Finally, to gather all data in a file to make it easier to load the data, you need to run the following commands:

python post/create_train_pkl_file.py
python post/create_test_pkl_file.py

Inference by the QSD_QPPpre model

In order to predict the performance of a set of target queries, you can follow the process below:

1- First calculate the performance of QuerySpace queries using QuerySpacePerformanceCalculator.py. This code receives a set of queries and calculate their performance (i.e. NDCG@10) through anserini toolkit.

python QuerySpacePerformanceCalculator.py\
     -queries path to queries (TSV format) \
     -anserini path to anserini \
     -index path collection index \
     -qrels path to qrels \
     -nproc number of CPUs \
     -experiment_dir experiment folder \
     -queries_chunk_size chunk_size to split queries \
     -hits number of docs to retrieve for those queries and caluclate performance based on

2- In order to find the most similar queries from the QuerySpace and retreived the most similar queries during the inference, we need to first index the QuerySpace queries. This can be done using encode_queries.py as below:

python encode_queries.py\
     -model model we want to create embeddings with (i.e. sentence-transformers/all-MiniLM-L6-v2) \
     -queries path to queries we want to index (TSV format) \
     -output path to output folder 

3- During inferene, we can find the top_k most similar queries to a set of target queries from the QuerySpace using the find_most_similar_query.py script as below:

python find_most_similar_query.py\
     -model model we want to create embeddings with for target queries (i.e. sentence-transformers/all-MiniLM-L6-v2) \
     -faiss_index path the index of QuerySpace queries \
     -target_queries_path path to target_queries \
     -hits  #number of top-k most similar matched queries to be selected

4- Finally, having the top-k queries wihtin the subspace for each of the target queries, we can calculate its performance by calculating the average of performance over the subspace queries performance using inference.py as follows:

python inference.py\
     -top_matched_queries path to top-k matched queries from QuerySpace for target queries \
     -QuerySpace_queries path to QueryStor queries TSV format \
     -QuerySpace_queries_performance #path to the pickle file containing the retreival effectiveness of QuerySpace queries \
     -output  path to output

Train and Test the QSD_QPPpost model

Training

To train the model, you need to run the following command:

python post/train.py

You can change the hyperparameters of the model by changing the values in the lines 9-12 of the train.py file.

Testing

To test the model, you need to run the following command:

python post/test.py

Evaluation

To evaluate the model, you need to run the following command:

python evaluation.py --actual /path/to/actual/performance/file --predicted /path/to/predicted/performance/file --target_metric /target/metric

Citation

If you use this code, please cite our paper:

@inproceedings{ebrahimi2024estimating,
    title = {Estimating Query Performance Through Rich Contextualized Query Representations},
    author = {Ebrahimi, Sajad and Khodabakhsh, Maryam and Arabzadeh, Negar and Bagheri, Ebrahim},
    year = {2024},
    month = {03},
    pages = {49-58},
    booktitle={European Conference on Information Retrieval},
    organization={Springer},
    isbn = {978-3-031-56065-1},
    doi = {10.1007/978-3-031-56066-8_6}
}

About

The codes and results of QSD-QPPpre and QSD-QPPpost models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages