Applied ML Semantic Search Exercise

Grainger, a leading supplier of MRO products, is conducting a technical exercise for ML Engineer candidates to build a semantic search application using a dataset from Amazon. Candidates are required to create a sample dataset, develop a vector index, and assess search performance metrics. The exercise must be completed within a week, and candidates should be prepared to discuss their methodologies and design decisions during a follow-up interview.

Uploaded by

Thiyagarajan Palaniyappan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views4 pages

Applied ML Semantic Search Exercise

Uploaded by

Thiyagarajan Palaniyappan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Applied ML Semantic Search Exercise

Overview:

Grainger is North America's leading broad line supplier of maintenance, repair

and operating (MRO) products. For nearly 100 years, we have helped
customers access useful information to find the products they need to get
their jobs done.

Market:

Our customer base is diverse. Every business in the US buys the types of
products that Grainger sells and for that reason, Grainger sells into every
industrial segment – from companies developing new uses for
nanotechnology to companies involved in anthracite mining.

Business:

Our success is based on our expertise. It’s our ability to understand the
customer, the products they need, the services they require and the channel
in which they prefer to interact with us that has helped Grainger achieve our
financial strength. We have a proud history of being an early adopter and
innovator of technology, and we’re really excited about the road ahead.

Exercise

This exercise is intended to serve as the technical component of the Grainger

ML Engineer interview process. Based on your performance on this exercise,
you, as the candidate, may be invited to explain and explore your line of
thinking, discuss your approach to the problem, explain the analysis that you
did leading up to the model building phase and the steps you took to get to a
solution. As in any Machine Learning problem, there are no right or wrong
answers, there are only iteratively better ones.

Low Sensitivity
Some pointers:

Be ready to explain why you took a certain approach in the case review
round that will follow
Do your best to write explainable, modular code.
Feel free to make assumptions but be ready to back them up with
reasoning.
Better presentation of your results leads to more productive case
reviews
Think of ways of improving your methodologies and be prepared to talk
about them on the subsequent call

Problem Statement

Refer Amazon’s esci-data link to generate dataset of products and search

queries. The features and volume of the product data is comparable to that
which we deal with at Grainger. For this exercise we are looking for
candidates to build a basic semantic search application and to report the
quality of the solution.

Below are the details for the task:

1. Select the training dataset applicable to Task 1 - Query-Product

Ranking, with the 'us' product locale and the 'E' esci_label.
2. Create a sample dataset consisting of approximately 500 rows with
around 50 unique queries from point number 1. If this doesn't yield
the desired dataset, you may use the following steps to generate the
sample dataset.
a. Determine a random sample of 50 unique queries from the
dataset derived from point number 1.

Low Sensitivity
b. Filter the dataset derived from point number 1 to contain only
the unique queries from point number 2.a.
c. Create a sample dataset of 500 rows from the dataset derived
from point number 2.b

3. The goal of this project is to create a vector index for the product
dataset derived in point number 2 (i.e. columns starting with prefix
“product_”) and to assess the quality of that index against the search
queries provided.
4. A solution to this problem will require: a vector index, an embedding
function, and quantified metrics about search performance. If it is
beneficial the solution might also contain some secondary ranking
logic.
5. Choose an external persistent vector embedding storage option
(e.g., LanceDB, Milvus Lite) if in-memory storage is unsuitable.
6. Metrics of particular interest for product search would be HITS@N
(N=1,5,10) and MRR.
7. To accomplish the goal of the project a candidate will likely have to
iterate over approaches to embedding or different indices to improve
search performance
8. The data science team at Grainger have benchmarked a solution to
this assignment using a typical Macbook (16GB memory),
alternatively feel free to use Google Colab
9. Candidates are expected to provide the following:
a. Dataset derived in point number 2.
b. Repository of working code or a notebook that fulfils the above
tasks with adequate documentation to explain any design
decisions they took. Verifying that the code as submitted can
be run is a requirement of this exercise.
c. Specify and justify any assumptions you are making about the
data or design decisions.

Time/Duration

You have a week to complete the exercise and get back to us with a solution.
We expect that a reasonable submission should take between 2-5 hours.

Low Sensitivity
Should you have any questions about the interview, role, company benefits –
or otherwise, please feel free to chat with your assigned recruiter. He/She will
also work with you to schedule the interview.

Thanks for making time for us and the effort that you are putting in to help us
understand your qualifications/expertise/credentials more clearly. We look
forward to meeting you in person.

Low Sensitivity

Big Data Search with Machine Learning
No ratings yet
Big Data Search with Machine Learning
10 pages
Sample - Resume-4 - 1688986307058
No ratings yet
Sample - Resume-4 - 1688986307058
4 pages
Data Analytics QP May 25
No ratings yet
Data Analytics QP May 25
4 pages
7th Sem Intership Report Format
No ratings yet
7th Sem Intership Report Format
39 pages
Manual - AInDS 6th Sem MACHINE LEARNING LABORATORY
No ratings yet
Manual - AInDS 6th Sem MACHINE LEARNING LABORATORY
71 pages
CP4252 Machine Learning Lab Manual
100% (1)
CP4252 Machine Learning Lab Manual
48 pages
FA1 Module 1,2,3 ML
No ratings yet
FA1 Module 1,2,3 ML
6 pages
Datanest - Data Science Interview
No ratings yet
Datanest - Data Science Interview
19 pages
CS F469 IR System Assignment
No ratings yet
CS F469 IR System Assignment
4 pages
ML Lab Manual
No ratings yet
ML Lab Manual
66 pages
2 DataPreProcessing Code
No ratings yet
2 DataPreProcessing Code
46 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
AL-405 Machine Learning Lab Manual
No ratings yet
AL-405 Machine Learning Lab Manual
40 pages
Midterm IAIDS Exam at Fasilkom UI-1
No ratings yet
Midterm IAIDS Exam at Fasilkom UI-1
14 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Identifing Software Bugs or Not Using SMLT Model
No ratings yet
Identifing Software Bugs or Not Using SMLT Model
34 pages
SSP-Data Science-TaskList
No ratings yet
SSP-Data Science-TaskList
2 pages
ML Lab
No ratings yet
ML Lab
58 pages
Mini Project Report
No ratings yet
Mini Project Report
10 pages
Dsa - DK Question Paper
No ratings yet
Dsa - DK Question Paper
4 pages
Day18-Recommendation Engine
No ratings yet
Day18-Recommendation Engine
3 pages
2.1 2.2 2.3 Data Pre-Processing
No ratings yet
2.1 2.2 2.3 Data Pre-Processing
24 pages
Data Analytics Lab Manual CSBS
No ratings yet
Data Analytics Lab Manual CSBS
30 pages
AIML Makeup July 2024
No ratings yet
AIML Makeup July 2024
3 pages
Raghuveer Ausoori: Hardware Engineer Profile
No ratings yet
Raghuveer Ausoori: Hardware Engineer Profile
2 pages
Internship Report Winter 2024-2025
No ratings yet
Internship Report Winter 2024-2025
29 pages
ML Syllabus
No ratings yet
ML Syllabus
5 pages
AWS Certified Machine Learning - Specialty - Sample Questions
No ratings yet
AWS Certified Machine Learning - Specialty - Sample Questions
5 pages
Machine Learning Lab Assignment Overview
No ratings yet
Machine Learning Lab Assignment Overview
35 pages
C1000-177 STU SGC1000177v2
No ratings yet
C1000-177 STU SGC1000177v2
9 pages
ML - Lab Manual (BAI702) - Updated 2-7-2025
100% (1)
ML - Lab Manual (BAI702) - Updated 2-7-2025
32 pages
Machine Learning Lab Manual (BCSL606)
No ratings yet
Machine Learning Lab Manual (BCSL606)
19 pages
Ai and ML qp1 Solved
No ratings yet
Ai and ML qp1 Solved
20 pages
AI & ML Exam Model Answers Sep 2023
No ratings yet
AI & ML Exam Model Answers Sep 2023
21 pages
Practice Exam 2
No ratings yet
Practice Exam 2
6 pages
Astha ML Manual
No ratings yet
Astha ML Manual
56 pages
Raghav Sharma Resume Latest-1
No ratings yet
Raghav Sharma Resume Latest-1
2 pages
ML Lab Manual
No ratings yet
ML Lab Manual
43 pages
Best Project Ideas in Web Dev
No ratings yet
Best Project Ideas in Web Dev
11 pages
1 - Intro To ML System Design
No ratings yet
1 - Intro To ML System Design
45 pages
Midterm Examination IR 2025
No ratings yet
Midterm Examination IR 2025
3 pages
Data Science I: Lesson #01 - Outline Presentation
No ratings yet
Data Science I: Lesson #01 - Outline Presentation
20 pages
Data Science Internship Tasks
No ratings yet
Data Science Internship Tasks
3 pages
Data Science & Big Data Lab Guide
No ratings yet
Data Science & Big Data Lab Guide
167 pages
Machine Learning Experiments Guide
No ratings yet
Machine Learning Experiments Guide
46 pages
153 Sanskriti IR File
No ratings yet
153 Sanskriti IR File
55 pages
Lab Manual - CL - I - 24-25
No ratings yet
Lab Manual - CL - I - 24-25
130 pages
Data Science Model Building Exercise
No ratings yet
Data Science Model Building Exercise
2 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
35 pages
C1 W2
No ratings yet
C1 W2
60 pages
COMP-377 Lab2
No ratings yet
COMP-377 Lab2
3 pages
CL-I Lab Manual
No ratings yet
CL-I Lab Manual
131 pages
AIML Curriculum
No ratings yet
AIML Curriculum
25 pages
ML Lab
No ratings yet
ML Lab
13 pages
Manual ML Rajs
No ratings yet
Manual ML Rajs
46 pages
Machine Learning Lab Manual (BCSL606)
No ratings yet
Machine Learning Lab Manual (BCSL606)
19 pages
CENG3300 Lecture 3
No ratings yet
CENG3300 Lecture 3
24 pages
Main Dock Pin
No ratings yet
Main Dock Pin
31 pages
Micron
No ratings yet
Micron
3 pages
TOP AI AGENT Frameworks
No ratings yet
TOP AI AGENT Frameworks
15 pages
Agentic AI #2 — How to Build an AI Agent From Scratch_ a Developer’s Guide _ by Aman Raghuvanshi _ Medium
No ratings yet
Agentic AI #2 — How to Build an AI Agent From Scratch_ a Developer’s Guide _ by Aman Raghuvanshi _ Medium
41 pages
Transformers vs. Mixture of Experts (MoE)
No ratings yet
Transformers vs. Mixture of Experts (MoE)
7 pages
Master AI Agents - MCP, RAG, Graphs & More
No ratings yet
Master AI Agents - MCP, RAG, Graphs & More
13 pages
Governing AI Agents. The Rapid Evolution of Artificial - by Sourav Verma - Medium
No ratings yet
Governing AI Agents. The Rapid Evolution of Artificial - by Sourav Verma - Medium
10 pages
Planning Your Career - Values and Superpowers
No ratings yet
Planning Your Career - Values and Superpowers
5 pages
Day 26 of Agentic AI
No ratings yet
Day 26 of Agentic AI
8 pages
Medical Multi Agent
No ratings yet
Medical Multi Agent
7 pages
Medical Multi Agent
No ratings yet
Medical Multi Agent
7 pages
Confusion Matrix For GenAI
No ratings yet
Confusion Matrix For GenAI
8 pages
How To Build AI Agents From Scratch (Even If You'Ve Never Coded One Before) - by Aakash Gupta - Oct, 2025 - Medium
No ratings yet
How To Build AI Agents From Scratch (Even If You'Ve Never Coded One Before) - by Aakash Gupta - Oct, 2025 - Medium
19 pages
Day 27 of Agentic AI
No ratings yet
Day 27 of Agentic AI
10 pages
SLMs For Agentic AI - Why Small Language Models Outperform LLMs
No ratings yet
SLMs For Agentic AI - Why Small Language Models Outperform LLMs
4 pages
12 Important Model Evaluation Metrics For Machine Learning Everyone Should Know (Updated 2025)
No ratings yet
12 Important Model Evaluation Metrics For Machine Learning Everyone Should Know (Updated 2025)
16 pages
Understanding Large Language Models
No ratings yet
Understanding Large Language Models
1 page
Evaluating Deep Learning Models With Custom Loss Functions and Calibration Metrics
No ratings yet
Evaluating Deep Learning Models With Custom Loss Functions and Calibration Metrics
9 pages
Python Coding Interview Questions On DataFrame and Zip
No ratings yet
Python Coding Interview Questions On DataFrame and Zip
6 pages
1 - Prime and Composite Numbers
No ratings yet
1 - Prime and Composite Numbers
3 pages
Verified PDF Download Diagnostic Imaging Pediatric Neuroradiology 2e FULL Version
No ratings yet
Verified PDF Download Diagnostic Imaging Pediatric Neuroradiology 2e FULL Version
403 pages
Abacus Basics for Beginners
No ratings yet
Abacus Basics for Beginners
3 pages
National University - Bangladesh
No ratings yet
National University - Bangladesh
3 pages
Console5 Com Wiki Twin Famicom Main PCB
No ratings yet
Console5 Com Wiki Twin Famicom Main PCB
4 pages
Ref 2
No ratings yet
Ref 2
22 pages
Compact ECG-2150: Accurate Diagnosis
No ratings yet
Compact ECG-2150: Accurate Diagnosis
4 pages
Class Notes / Guided Practice: Ap Calculus Ab 2.1
No ratings yet
Class Notes / Guided Practice: Ap Calculus Ab 2.1
22 pages
01 - Introduction To JavaScript
No ratings yet
01 - Introduction To JavaScript
18 pages
Bella Swift A Kiskutya Aki Unikornis Akart Lenni
No ratings yet
Bella Swift A Kiskutya Aki Unikornis Akart Lenni
156 pages
Datasheet of DS WSPWI T 08 & DS WSPLI T 08 Workstation V1.0 20190507
No ratings yet
Datasheet of DS WSPWI T 08 & DS WSPLI T 08 Workstation V1.0 20190507
4 pages
How To Bypass Anti Cheats
No ratings yet
How To Bypass Anti Cheats
3 pages
I Know You Want More .: Check Out These Awesome Activities!
No ratings yet
I Know You Want More .: Check Out These Awesome Activities!
13 pages
10 Hy QP Solution Ip Xi
No ratings yet
10 Hy QP Solution Ip Xi
17 pages
Hexoskin - Pager - ProShirt
No ratings yet
Hexoskin - Pager - ProShirt
2 pages
Climaveneta W 3000
No ratings yet
Climaveneta W 3000
65 pages
FN B.Tech 2-1 R22
No ratings yet
FN B.Tech 2-1 R22
4 pages
DT50 (EN) - User Guide 20230420
No ratings yet
DT50 (EN) - User Guide 20230420
71 pages
Good Questions For Coding Placements
No ratings yet
Good Questions For Coding Placements
11 pages
SDG Project Proposal Guide
No ratings yet
SDG Project Proposal Guide
3 pages
Online Chinese Language Classes
No ratings yet
Online Chinese Language Classes
4 pages
C Pipe Programming in Ubuntu
No ratings yet
C Pipe Programming in Ubuntu
4 pages
Python Lists and Tuples Explained
No ratings yet
Python Lists and Tuples Explained
30 pages
? Complete DSA Cheat Sheet PDF For Quick Revision & Prep!
No ratings yet
? Complete DSA Cheat Sheet PDF For Quick Revision & Prep!
58 pages
Data Structures
No ratings yet
Data Structures
7 pages
Big Iq Datasheet
No ratings yet
Big Iq Datasheet
10 pages
Chesbrough - Business Model Innovation - Opportunities and Barriers
No ratings yet
Chesbrough - Business Model Innovation - Opportunities and Barriers
10 pages
Consumer Electronics Market Hungary
No ratings yet
Consumer Electronics Market Hungary
40 pages
US Army Corps of Engineers - Guidance For Evaluating Performance Based Chemical Data PDF
No ratings yet
US Army Corps of Engineers - Guidance For Evaluating Performance Based Chemical Data PDF
129 pages
Staad Pro Stainless Steel Report
100% (1)
Staad Pro Stainless Steel Report
29 pages

Applied ML Semantic Search Exercise

Uploaded by

Applied ML Semantic Search Exercise

Uploaded by

Applied ML Semantic Search Exercise

Grainger is North America's leading broad line supplier of maintenance, repair

This exercise is intended to serve as the technical component of the Grainger

Refer Amazon’s esci-data link to generate dataset of products and search

Below are the details for the task:

1. Select the training dataset applicable to Task 1 - Query-Product

You might also like