Flacky gpu filterable test by generall · Pull Request #8519 · qdrant/qdrant

generall · 2026-03-26T21:09:39Z

Observation:

Test works on CI, but fails on my machine with low precision

Changes:

Measure 4 accuracies: gpu/cpu + with and without index
limit gpu_groups, as it seems GPU with high parallelism tend to drop precision on low-volume data

coderabbitai · 2026-03-26T21:18:02Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 53c849f1-f807-4b05-97a0-ef19a4972e9b

📥 Commits

Reviewing files that changed from the base of the PR and between 71c46fc and 720a83d.

📒 Files selected for processing (1)

lib/segment/tests/integration/gpu_hnsw_test.rs

🚧 Files skipped from review as they are similar to previous changes (1)

lib/segment/tests/integration/gpu_hnsw_test.rs

📝 Walkthrough

Walkthrough

This change refactors the GPU HNSW test in lib/segment/tests/integration/gpu_hnsw_test.rs by replacing an environment logger-based verification approach with a structured accuracy measurement framework. The refactoring introduces helper functions to construct HNSW indices and test segments, adds a measurement function that compares search results against precomputed ground-truth values, and establishes a comparative testing pattern across four index variants (CPU and GPU, each with and without payload indices). The test validates that GPU accuracy approaches CPU accuracy and that payload indexing improves results in both cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested reviewers

IvanPleshkov
xzfc
timvisee

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Flacky gpu filterable test' is directly related to the changeset, which refactors a GPU filterable HNSW test to use an accuracy-measurement framework instead of log-based verification.
Description check	✅ Passed	The description accurately explains the observed issue (test passes on CI but fails locally with low precision) and the solution (measure four accuracy combinations and limit GPU groups), which aligns with the substantial refactoring of the test to use accuracy-based assertions.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch flacky-gpu-filterable-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

lib/segment/tests/integration/gpu_hnsw_test.rs (1)
206-206: Reset gpu_groups after the test.

set_gpu_groups_count changes process-wide state. Leaving it at Some(64) makes later GPU tests order-dependent if they share the same test binary.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/segment/tests/integration/gpu_hnsw_test.rs` at line 206, This test sets
global process state via set_gpu_groups_count(Some(gpu_groups)); to avoid
leaking that change to other tests, restore the previous state at the end of the
test (or use a scope guard) by calling set_gpu_groups_count(None) (or restoring
the original value) — ensure the cleanup runs even on panic/failure (e.g., using
RAII/Drop or a test teardown helper) so gpu_groups does not remain set for
subsequent tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/segment/tests/integration/gpu_hnsw_test.rs`:
- Around line 150-153: The test must assert that the GPU HNSW path was actually
used instead of only comparing accuracy: after configuring full_scan_threshold
and performing the search in the GPU HNSW tests, add an explicit assertion that
the GPU/indexed path was exercised (for example, check the runtime indicator
exposed by your codebase such as a Segment/Index method or metric like
is_on_gpu(), used_gpu() or engine.metrics().gpu_search_count > 0); modify the
test cases that use full_scan_threshold (the ones around the shown snippet and
the other ranges called out) to either force the indexed path (adjust
full_scan_threshold appropriately) and then assert the GPU flag/metric is true
so a silent CPU fallback will fail the test.
- Around line 300-309: The assertions currently allow the indexed case to be
worse by up to min_diff; change the comparisons so the indexed hit counts must
exceed the non-indexed ones by min_diff. Replace the check using hits_cpu_idx +
min_diff >= hits_cpu_no_idx with hits_cpu_idx >= hits_cpu_no_idx + min_diff (and
similarly replace hits_gpu_idx + min_diff >= hits_gpu_no_idx with hits_gpu_idx
>= hits_gpu_no_idx + min_diff), and update the assertion messages to reflect the
new ordering (e.g., "Payload index should improve accuracy:
cpu_idx={hits_cpu_idx} < cpu_no_idx={hits_cpu_no_idx} + {min_diff}" adjusted
accordingly).
- Around line 82-97: The test currently compares full ScoredPointOffset lists
for equality; instead extract the returned point identifiers (e.g., map each
ScoredPointOffset to its offset/id) from hnsw_index.search(...) and compare only
IDs to the ground-truth IDs for each query (use a HashSet or sort-and-compare of
IDs) so small score or order differences don’t fail the test; update the closure
in the iterator that calls hnsw_index.search to build a set/vector of result IDs
and compare that to the expected ground_truth IDs (or compute
intersection/recall) rather than comparing ScoredPointOffset structs directly.

---

Nitpick comments:
In `@lib/segment/tests/integration/gpu_hnsw_test.rs`:
- Line 206: This test sets global process state via
set_gpu_groups_count(Some(gpu_groups)); to avoid leaking that change to other
tests, restore the previous state at the end of the test (or use a scope guard)
by calling set_gpu_groups_count(None) (or restoring the original value) — ensure
the cleanup runs even on panic/failure (e.g., using RAII/Drop or a test teardown
helper) so gpu_groups does not remain set for subsequent tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 30c985d0-6edd-4d2f-94cb-571d2a0ef51f

📥 Commits

Reviewing files that changed from the base of the PR and between fb704e5 and 71c46fc.

📒 Files selected for processing (1)

lib/segment/tests/integration/gpu_hnsw_test.rs

coderabbitai · 2026-03-26T21:18:05Z

+    let _ = env_logger::builder()
+        .is_test(true)
+        .filter_level(log::LevelFilter::Trace)
+        .try_init();


⚠️ Potential issue | 🟠 Major

Keep a direct assertion that the GPU HNSW path was exercised.

The previous test verified GPU usage explicitly, but this version only compares accuracy. With full_scan_threshold this low, the indexed cases can pass via the plain-search path, and a silent CPU fallback in the GPU builds would also still satisfy these assertions.

Also applies to: 162-165, 242-252, 277-285

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lib/segment/tests/integration/gpu_hnsw_test.rs` around lines 150 - 153, The test must assert that the GPU HNSW path was actually used instead of only comparing accuracy: after configuring full_scan_threshold and performing the search in the GPU HNSW tests, add an explicit assertion that the GPU/indexed path was exercised (for example, check the runtime indicator exposed by your codebase such as a Segment/Index method or metric like is_on_gpu(), used_gpu() or engine.metrics().gpu_search_count > 0); modify the test cases that use full_scan_threshold (the ones around the shown snippet and the other ranges called out) to either force the indexed path (adjust full_scan_threshold appropriately) and then assert the GPU flag/metric is true so a silent CPU fallback will fail the test.

* measure filterable hnsw indexed vs unindexs + cpu vs gpu * measure filterable hnsw indexed vs unindexs + cpu vs gpu * limit GPU parallelism * address review

generall added 3 commits March 26, 2026 21:49

measure filterable hnsw indexed vs unindexs + cpu vs gpu

aaaf0e1

measure filterable hnsw indexed vs unindexs + cpu vs gpu

de0977a

limit GPU parallelism

71c46fc

generall requested a review from IvanPleshkov March 26, 2026 21:09

coderabbitai Bot reviewed Mar 26, 2026

View reviewed changes

address review

720a83d

IvanPleshkov approved these changes Apr 7, 2026

View reviewed changes

IvanPleshkov merged commit fbe9781 into dev Apr 7, 2026
17 checks passed

IvanPleshkov deleted the flacky-gpu-filterable-test branch April 7, 2026 22:38

coderabbitai Bot mentioned this pull request Apr 28, 2026

Allow GPU payload HNSW builds without main graph #8832

Merged

timvisee pushed a commit that referenced this pull request May 8, 2026

Flacky gpu filterable test (#8519)

3ae5985

* measure filterable hnsw indexed vs unindexs + cpu vs gpu * measure filterable hnsw indexed vs unindexs + cpu vs gpu * limit GPU parallelism * address review

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flacky gpu filterable test#8519

Flacky gpu filterable test#8519
IvanPleshkov merged 4 commits into
devfrom
flacky-gpu-filterable-test

generall commented Mar 26, 2026

Uh oh!

coderabbitai Bot commented Mar 26, 2026 •

edited

Loading

Walkthrough

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Mar 26, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

generall commented Mar 26, 2026

Uh oh!

coderabbitai Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Mar 26, 2026 •

edited

Loading