Skip to content

Flacky gpu filterable test#8519

Merged
IvanPleshkov merged 4 commits into
devfrom
flacky-gpu-filterable-test
Apr 7, 2026
Merged

Flacky gpu filterable test#8519
IvanPleshkov merged 4 commits into
devfrom
flacky-gpu-filterable-test

Conversation

@generall

Copy link
Copy Markdown
Member

Observation:

  • Test works on CI, but fails on my machine with low precision

Changes:

  • Measure 4 accuracies: gpu/cpu + with and without index
  • limit gpu_groups, as it seems GPU with high parallelism tend to drop precision on low-volume data

@generall generall requested a review from IvanPleshkov March 26, 2026 21:09
@coderabbitai

coderabbitai Bot commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 53c849f1-f807-4b05-97a0-ef19a4972e9b

📥 Commits

Reviewing files that changed from the base of the PR and between 71c46fc and 720a83d.

📒 Files selected for processing (1)
  • lib/segment/tests/integration/gpu_hnsw_test.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/segment/tests/integration/gpu_hnsw_test.rs

📝 Walkthrough

Walkthrough

This change refactors the GPU HNSW test in lib/segment/tests/integration/gpu_hnsw_test.rs by replacing an environment logger-based verification approach with a structured accuracy measurement framework. The refactoring introduces helper functions to construct HNSW indices and test segments, adds a measurement function that compares search results against precomputed ground-truth values, and establishes a comparative testing pattern across four index variants (CPU and GPU, each with and without payload indices). The test validates that GPU accuracy approaches CPU accuracy and that payload indexing improves results in both cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested reviewers

  • IvanPleshkov
  • xzfc
  • timvisee
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Flacky gpu filterable test' is directly related to the changeset, which refactors a GPU filterable HNSW test to use an accuracy-measurement framework instead of log-based verification.
Description check ✅ Passed The description accurately explains the observed issue (test passes on CI but fails locally with low precision) and the solution (measure four accuracy combinations and limit GPU groups), which aligns with the substantial refactoring of the test to use accuracy-based assertions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch flacky-gpu-filterable-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
lib/segment/tests/integration/gpu_hnsw_test.rs (1)

206-206: Reset gpu_groups after the test.

set_gpu_groups_count changes process-wide state. Leaving it at Some(64) makes later GPU tests order-dependent if they share the same test binary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/segment/tests/integration/gpu_hnsw_test.rs` at line 206, This test sets
global process state via set_gpu_groups_count(Some(gpu_groups)); to avoid
leaking that change to other tests, restore the previous state at the end of the
test (or use a scope guard) by calling set_gpu_groups_count(None) (or restoring
the original value) — ensure the cleanup runs even on panic/failure (e.g., using
RAII/Drop or a test teardown helper) so gpu_groups does not remain set for
subsequent tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/segment/tests/integration/gpu_hnsw_test.rs`:
- Around line 150-153: The test must assert that the GPU HNSW path was actually
used instead of only comparing accuracy: after configuring full_scan_threshold
and performing the search in the GPU HNSW tests, add an explicit assertion that
the GPU/indexed path was exercised (for example, check the runtime indicator
exposed by your codebase such as a Segment/Index method or metric like
is_on_gpu(), used_gpu() or engine.metrics().gpu_search_count > 0); modify the
test cases that use full_scan_threshold (the ones around the shown snippet and
the other ranges called out) to either force the indexed path (adjust
full_scan_threshold appropriately) and then assert the GPU flag/metric is true
so a silent CPU fallback will fail the test.
- Around line 300-309: The assertions currently allow the indexed case to be
worse by up to min_diff; change the comparisons so the indexed hit counts must
exceed the non-indexed ones by min_diff. Replace the check using hits_cpu_idx +
min_diff >= hits_cpu_no_idx with hits_cpu_idx >= hits_cpu_no_idx + min_diff (and
similarly replace hits_gpu_idx + min_diff >= hits_gpu_no_idx with hits_gpu_idx
>= hits_gpu_no_idx + min_diff), and update the assertion messages to reflect the
new ordering (e.g., "Payload index should improve accuracy:
cpu_idx={hits_cpu_idx} < cpu_no_idx={hits_cpu_no_idx} + {min_diff}" adjusted
accordingly).
- Around line 82-97: The test currently compares full ScoredPointOffset lists
for equality; instead extract the returned point identifiers (e.g., map each
ScoredPointOffset to its offset/id) from hnsw_index.search(...) and compare only
IDs to the ground-truth IDs for each query (use a HashSet or sort-and-compare of
IDs) so small score or order differences don’t fail the test; update the closure
in the iterator that calls hnsw_index.search to build a set/vector of result IDs
and compare that to the expected ground_truth IDs (or compute
intersection/recall) rather than comparing ScoredPointOffset structs directly.

---

Nitpick comments:
In `@lib/segment/tests/integration/gpu_hnsw_test.rs`:
- Line 206: This test sets global process state via
set_gpu_groups_count(Some(gpu_groups)); to avoid leaking that change to other
tests, restore the previous state at the end of the test (or use a scope guard)
by calling set_gpu_groups_count(None) (or restoring the original value) — ensure
the cleanup runs even on panic/failure (e.g., using RAII/Drop or a test teardown
helper) so gpu_groups does not remain set for subsequent tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 30c985d0-6edd-4d2f-94cb-571d2a0ef51f

📥 Commits

Reviewing files that changed from the base of the PR and between fb704e5 and 71c46fc.

📒 Files selected for processing (1)
  • lib/segment/tests/integration/gpu_hnsw_test.rs

Comment thread lib/segment/tests/integration/gpu_hnsw_test.rs
Comment on lines +150 to +153
let _ = env_logger::builder()
.is_test(true)
.filter_level(log::LevelFilter::Trace)
.try_init();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep a direct assertion that the GPU HNSW path was exercised.

The previous test verified GPU usage explicitly, but this version only compares accuracy. With full_scan_threshold this low, the indexed cases can pass via the plain-search path, and a silent CPU fallback in the GPU builds would also still satisfy these assertions.

Also applies to: 162-165, 242-252, 277-285

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/segment/tests/integration/gpu_hnsw_test.rs` around lines 150 - 153, The
test must assert that the GPU HNSW path was actually used instead of only
comparing accuracy: after configuring full_scan_threshold and performing the
search in the GPU HNSW tests, add an explicit assertion that the GPU/indexed
path was exercised (for example, check the runtime indicator exposed by your
codebase such as a Segment/Index method or metric like is_on_gpu(), used_gpu()
or engine.metrics().gpu_search_count > 0); modify the test cases that use
full_scan_threshold (the ones around the shown snippet and the other ranges
called out) to either force the indexed path (adjust full_scan_threshold
appropriately) and then assert the GPU flag/metric is true so a silent CPU
fallback will fail the test.

Comment thread lib/segment/tests/integration/gpu_hnsw_test.rs
@IvanPleshkov IvanPleshkov merged commit fbe9781 into dev Apr 7, 2026
17 checks passed
@IvanPleshkov IvanPleshkov deleted the flacky-gpu-filterable-test branch April 7, 2026 22:38
timvisee pushed a commit that referenced this pull request May 8, 2026
* measure filterable hnsw indexed vs unindexs + cpu vs gpu

* measure filterable hnsw indexed vs unindexs + cpu vs gpu

* limit GPU parallelism

* address review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants