0% found this document useful (0 votes)

21 views5 pages

Programming Assignment 2

The Programming Assignment 2 involves a search engine competition where students must optimize their search engine using provided training and testing queries. Participants are encouraged to implement various retrieval methods, including pseudo feedback and query expansion, while experimenting with different ranking functions and parameters. The assignment must be submitted by September 10, 11:59 PM PDT, and requires compiling MeTA and running a grader script to evaluate performance before submission.

Uploaded by

Axellent4 Axellent

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views5 pages

Programming Assignment 2

Uploaded by

Axellent4 Axellent

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Honors Programming

Assignment: Programming Assignment 2

You have not submitted. You must earn 70/100 points to pass.
Deadline
Pass this assignment by Sep 10, 11:59 PM PDT
InstructionsMy submissionsDiscussions
Programming Assignment 2: Search Engine Competition
This assignment is a search engine competition where you will freely experiment with
different retrieval methods to improve your score. You will be asked to submit your final
performance to us, and if you are doing well enough, scores will be granted. Throughout the
competition, if you want to know more about a certain class or function, you can use the
search toolbar in MeTA's documentation, which provides a brief explanation of the different
modules. You might also find some of MeTA's tutorials useful when experimenting with the
analyzer and writing new functions. If you have questions about the programming
assignment, use the Programming Assignments Forum. This is a great place to ask questions
and also help your fellow classmates.

Downloading the Assignment

Before downloading this assignment, make sure you have installed MeTA and run the
Setup.sh script as detailed in Programming Assignment 1.

1. Download Assignment_2.tar.gz and extract it into the parent directory of meta. If meta is in
~/Desktop/, then you should extract the assignment in ~/Desktop/

2. In the terminal, change the directory to Assignment_2 and run the bash script setup.sh.
This can be done using the following two commands:

1
2
cd Assignment_2/
./setups.sh

which will move necessary data into correct locations.

3. Recompile MeTA:

1
cd Assignment_2//build/cmake .. -DCMAKE_BUILD_TYPE=Release; make -j8
Competition
The competition involves optimizing your own search engine to search the MOOC's dataset.
We have provided you with 100 training queries accompanied by relevance judgments to
quantify the effectiveness of your search engine. After optimizing your search engine based
on the training queries, it will be evaluated by the automated grader based on another set of
538 testing queries. If your search engine is robust enough, you should expect the MAP value
you get on the training queries to be close to the MAP on the testing queries. You can have a
look at the training and testing queries in meta/data/moocs/moocs-queries.txt, but do not
modify the file.

We have created a program for this competition called competition.cpp located in

meta/src/index/tools/. Open competition.cpp, read the code within the main function along
with the comments, and try to understand what is being performed. You should focus on the
two while loops within main. The first loop passes over the 100 training queries and prints the
precision at 10 documents and the MAP. The second while loop passes over the 538 testing
queries and writes the IDs of the top 50 documents corresponding to each query to the
output file output.txt, which is located in Assignment_2/build/Assignment2/.

You are free to try to implement any concept that you feel can lead to better retrieval in the
MOOC's dataset, even if it was not discussed in class. We provide some pointers below that
will help you in achieving higher retrieval performance. The main techniques are
programming-based and require writing new functions, but we also provide pointers to some
techniques that do not require programming. You are highly encouraged to experiment with
the programming-based techniques first.

Programming-Based
Implement pseudo feedback through Rocchio's method. One simple way to do this is to write
a new function in competition.cpp that implements Rocchio feedback. The function should
take a query and a set of positive feedback documents as arguments and return a modified
query. Call the Rocchio function for each query and its top 10 initially retrieved documents.
Then, call the scoring function again on the modified query that was output by Rocchio. This
can be done as follows:

Replace:

1
auto ranking = ranker->score(*idx, query, 50);
with:

1
2
3
4
auto ranking = ranker->score(*idx, query, 10);
auto new_query = Rocchio(query, ranking); // You should implement this function
ranking = ranker->score(*idx, new_query, 50);

You should perform pseudo feedback on both the training and testing queries, which means
that you should replace "auto ranking = ranker->score(*idx, query, 50);" in the two while
loops. Also, note that this is only a suggestion. You can implement the feedback in other ways
and using a different number of positive feedback documents.

Write a tuning function, similar to the one you saw in Programming Assignment 1, to optimize
your ranking function over a suitable set of parameters.

Write a new scoring function, similar to the PL2 that you implemented in Assignment 1, that
returns a weighted combination of the scores of different retrieval formulas. For example, you
can implement something like:

Score(Q,D)=αBM25(Q,D)
+(1−α)DirichletPrior(Q,D)Score(Q,D)=αBM25(Q,D)
+(1−α)DirichletPrior(Q,D)S, c, o, r, e, left parenthesis, Q, comma, D, right
parenthesis, equals, alpha, B, M, 25, left parenthesis, Q, comma, D, right parenthesis, plus, left
parenthesis, 1, −, alpha, right parenthesis, D, i, r, i, c, h, l, e, t, P, r, i, o, r, left parenthesis, Q,
comma, D, right parenthesis where 0<α<10<α<10, is less than, alpha, is less than, 1

We have already provided you with a scaffold where you can implement your new function. At
the top of competition.cpp you can find a commented class called new_ranker along with
some functions. Uncomment the code and modify score_one to return the weighted
combination you choose. The code assumes that the ranking function has two parameters
named "param1" and "param2" and that the ranker can be called from config.toml using the
name "newranker.". If your ranking function requires a different number of parameters, you
should modify the code accordingly. Also, feel free to change the names of the variables and
parameters; the provided code is there just to guide you through writing your new function.
After implementing the function, uncomment the first line in main:

1
index::register_ranker<new_ranker>();
Don't forget to point config.toml to your new ranker. For more information on how to
implement your new function, see the last section in MeTA's Search Tutorial and inspect the
code of ranking-experiment.cpp from Assignment 1.

If you want to further improve your ranking functions, you might want to have a look at
section 6 of this paper, which introduces several empirically-driven enhancements to some of
the well-known retrieval functions.

Write a function that expands queries with synonyms. Given a query, your function will use a
thesaurus to augment the query words with their synonyms. It is a good idea to give the
original query words more weight since the synonyms might cause a topic drift in some cases.
One crude way to do this is to duplicate the original query words several times and have each
synonym appear only once in the modified query. In competition.cpp, you should replace:

1
query.content(content);

with:

1
2
3
query.content(content);
query = expand_query(query); // You should write this function

Make sure to expand the queries in both while loops.

Non-Programming Based
Try different ranking functions and different parameters (ideally, you should tune the
parameters). MeTA has several built-in rankers other than BM25. For example, you can use
the Jelinek-Mercer smoothing ranking function by setting the ranker in config.toml to jelinek-
mercer and tuning its parameter lambda. You can check the different built-in rankers in
meta/src/index/ranker/. When you open the cpp file of the ranker, you should find a constant
string called id whose value you can use in config.toml, in addition to the parameters of the
ranking function which are defined in the constructor (see okapi_bm25.cpp and
jelinek_mercer.cpp to gain more insight).

Index the MOOCs dataset again while experimenting with different tokenization parameters
under the analyzers tag in config.toml. You can try different combinations of text filters and
select the one that gives the best performance on the training queries. See MeTA's Analyzers
and Filters Tutorial for instructions on how to modify the default filtering behavior.
Modify MeTA's default stopwords list, which is located in meta/data/lemur-stopwords.txt.
You can extend or shrink the list and check if you can achieve higher performance. After
modifying the list, make sure to index the dataset again.

After you perform the optimizations, compile MeTA again and run the competition program.
You can do so by executing:

1
2
3
4
cd Assignment_2/build
cmake .. -DCMAKE_BUILD_TYPE=Release; make -j8
./competition ../config.toml

The results of the training queries, the precision at 10 documents, and the MAP will be
printed. When you are satisfied with your results, run the grader script:

1
python grader.py

which will generate a file called submit.txt

Submit this file under the assignment's My Submission tab. Click + Create submission, and
upload your file.

Note: Always check the feedback from the automated grader on the submissions page after
you submit your output. If the nickname you chose is already in use, the grader will ask you to
submit your output again using a different nickname.
How to submit

When you're ready to submit, you can upload files for each part of the assignment on the "My
submissions" tab.
Like
Dislike

Assignment 2
No ratings yet
Assignment 2
3 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
Advanced Document Ranking Guide
No ratings yet
Advanced Document Ranking Guide
2 pages
Nour LLM
No ratings yet
Nour LLM
42 pages
COURSEWORK1 Details
No ratings yet
COURSEWORK1 Details
3 pages
Module 2-1
No ratings yet
Module 2-1
6 pages
Midterm2006 Sol Csi4107
100% (2)
Midterm2006 Sol Csi4107
9 pages
1 Overview
No ratings yet
1 Overview
44 pages
Information Retrieval Practical
No ratings yet
Information Retrieval Practical
35 pages
Irt Ia 2
No ratings yet
Irt Ia 2
9 pages
Advanced Search Techniques Guide
No ratings yet
Advanced Search Techniques Guide
16 pages
153 Sanskriti IR File
No ratings yet
153 Sanskriti IR File
55 pages
Irt Q&A
No ratings yet
Irt Q&A
14 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
IR Lecture 6b
No ratings yet
IR Lecture 6b
45 pages
IRS Lab Manual Odd Sem 2025-26
No ratings yet
IRS Lab Manual Odd Sem 2025-26
16 pages
Web Search Engine Design Course
No ratings yet
Web Search Engine Design Course
2 pages
IRE Miniproject Phase2 Requirements
No ratings yet
IRE Miniproject Phase2 Requirements
4 pages
Supervisionguide15 16 Students
No ratings yet
Supervisionguide15 16 Students
18 pages
Project Proposal
No ratings yet
Project Proposal
10 pages
B.Tech CSE: Info Retrieval Systems
No ratings yet
B.Tech CSE: Info Retrieval Systems
11 pages
Supervisionguide16 17 Students
No ratings yet
Supervisionguide16 17 Students
17 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
Ap May 23 QP Ans
No ratings yet
Ap May 23 QP Ans
9 pages
CS 3308 Learning Journal Unit 5
No ratings yet
CS 3308 Learning Journal Unit 5
6 pages
Assignment #1 Text Retrieval & Search Engine
No ratings yet
Assignment #1 Text Retrieval & Search Engine
6 pages
Mid1 Irs Ans
No ratings yet
Mid1 Irs Ans
13 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
IR Journal
No ratings yet
IR Journal
20 pages
IR Presentation 1
No ratings yet
IR Presentation 1
41 pages
LectureLtR-neural IR 1
No ratings yet
LectureLtR-neural IR 1
55 pages
Ocsef Abstract
No ratings yet
Ocsef Abstract
2 pages
Improving Retrieval Augmented Generation
No ratings yet
Improving Retrieval Augmented Generation
33 pages
Information Retrieval Exam 2008
100% (1)
Information Retrieval Exam 2008
8 pages
Lecture5 6
No ratings yet
Lecture5 6
30 pages
Information Retrieval Models Guide
No ratings yet
Information Retrieval Models Guide
15 pages
Index: SR. NO. Practical Name Date of Perform NO. Sign
No ratings yet
Index: SR. NO. Practical Name Date of Perform NO. Sign
23 pages
AutoRAG Automated Framework For Optimization of Retrieval-Augmented Generation
No ratings yet
AutoRAG Automated Framework For Optimization of Retrieval-Augmented Generation
22 pages
Performance Evaluation of Information Retrieval Systems
No ratings yet
Performance Evaluation of Information Retrieval Systems
46 pages
W6L2 LLM For Search
No ratings yet
W6L2 LLM For Search
70 pages
Query Languages
No ratings yet
Query Languages
54 pages
Browse Comp
No ratings yet
Browse Comp
11 pages
Search Engine Personalization Tool Using Linear Vector Algorithm
No ratings yet
Search Engine Personalization Tool Using Linear Vector Algorithm
9 pages
Project Report
No ratings yet
Project Report
5 pages
CSI 4107 - Winter 2016 - Midterm
0% (1)
CSI 4107 - Winter 2016 - Midterm
10 pages
Question Bank-Print-Irt
No ratings yet
Question Bank-Print-Irt
9 pages
Text Document Indexing Assignment
No ratings yet
Text Document Indexing Assignment
3 pages
Asddas
No ratings yet
Asddas
34 pages
NLP Question
No ratings yet
NLP Question
38 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
Ir QB
No ratings yet
Ir QB
8 pages
Search Skills Basics of Web Search
No ratings yet
Search Skills Basics of Web Search
11 pages
SEO and Search Engine Basics
No ratings yet
SEO and Search Engine Basics
14 pages
Neural Information Retrieval Techniques
No ratings yet
Neural Information Retrieval Techniques
8 pages
CS 4501 Information Retrieval Exam
No ratings yet
CS 4501 Information Retrieval Exam
10 pages
CS F469 Information Retrieval - Handout
No ratings yet
CS F469 Information Retrieval - Handout
3 pages
117DX052018
No ratings yet
117DX052018
2 pages
Module I
No ratings yet
Module I
27 pages
AS 1259.2-1990 Acoustics
No ratings yet
AS 1259.2-1990 Acoustics
28 pages
Small LNG Carrier Design Innovations
No ratings yet
Small LNG Carrier Design Innovations
26 pages
w1 DLL Orgman August 22 24 To 26
100% (4)
w1 DLL Orgman August 22 24 To 26
3 pages
Zoogeomorphology Animals As Geomorphic Agents 1st Edition David R. Butler Ebook All Chapters PDF
No ratings yet
Zoogeomorphology Animals As Geomorphic Agents 1st Edition David R. Butler Ebook All Chapters PDF
55 pages
Grad-CAM: Visual Explanations From Deep Networks Via Gradient-Based Localization
No ratings yet
Grad-CAM: Visual Explanations From Deep Networks Via Gradient-Based Localization
24 pages
Aspiring Academic Leader's Resume
No ratings yet
Aspiring Academic Leader's Resume
4 pages
Understanding Forward Interpolation Techniques
No ratings yet
Understanding Forward Interpolation Techniques
4 pages
Vispo Capacidad Cognitiva y Agencia
No ratings yet
Vispo Capacidad Cognitiva y Agencia
187 pages
AUTOSAR SWS DiagnosticEventManager
No ratings yet
AUTOSAR SWS DiagnosticEventManager
265 pages
Uddeholm Pocket Book PDF
No ratings yet
Uddeholm Pocket Book PDF
80 pages
Chapter 2
No ratings yet
Chapter 2
17 pages
Pure Maths 2025 p2.1
No ratings yet
Pure Maths 2025 p2.1
5 pages
Multi-Armed Bandits with Externalities
No ratings yet
Multi-Armed Bandits with Externalities
17 pages
Copper Ion Adsorption with Tea Waste
No ratings yet
Copper Ion Adsorption with Tea Waste
6 pages
BBB Recruitment - Cdcfib.gov - NG Application
No ratings yet
BBB Recruitment - Cdcfib.gov - NG Application
4 pages
Sandvik CM 1208i Mobile Jaw Brochure
No ratings yet
Sandvik CM 1208i Mobile Jaw Brochure
2 pages
5-1 Manual Calculation WL
No ratings yet
5-1 Manual Calculation WL
21 pages
trs5 Key Cho Cac Ban
No ratings yet
trs5 Key Cho Cac Ban
35 pages
assignmentASM 32901 PDF
No ratings yet
assignmentASM 32901 PDF
16 pages
Cs111 Test - 2 - Solution
No ratings yet
Cs111 Test - 2 - Solution
8 pages
CE5007 - Software Engineering
No ratings yet
CE5007 - Software Engineering
6 pages
Schneider Transformers 800kva Wiring Diagram
No ratings yet
Schneider Transformers 800kva Wiring Diagram
8 pages
8º Ano - Test Unit 4
No ratings yet
8º Ano - Test Unit 4
6 pages
Maths. Matrix Algebra
No ratings yet
Maths. Matrix Algebra
59 pages
Employee Record Management System
No ratings yet
Employee Record Management System
8 pages
Non-Financial Indicators in Business Evaluation
No ratings yet
Non-Financial Indicators in Business Evaluation
7 pages
The 369 HZ Frequency Is Really Interesting
No ratings yet
The 369 HZ Frequency Is Really Interesting
2 pages
Project Proposal
No ratings yet
Project Proposal
4 pages
Growth Mindset in Math Education
No ratings yet
Growth Mindset in Math Education
26 pages

Programming Assignment 2

Uploaded by

Programming Assignment 2

Uploaded by

Honors Programming

Assignment: Programming Assignment 2

Downloading the Assignment

which will move necessary data into correct locations.

We have created a program for this competition called competition.cpp located in

Make sure to expand the queries in both while loops.

which will generate a file called submit.txt

You might also like