Flacky gpu filterable test#8519
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis change refactors the GPU HNSW test in Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
lib/segment/tests/integration/gpu_hnsw_test.rs (1)
206-206: Resetgpu_groupsafter the test.
set_gpu_groups_countchanges process-wide state. Leaving it atSome(64)makes later GPU tests order-dependent if they share the same test binary.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/segment/tests/integration/gpu_hnsw_test.rs` at line 206, This test sets global process state via set_gpu_groups_count(Some(gpu_groups)); to avoid leaking that change to other tests, restore the previous state at the end of the test (or use a scope guard) by calling set_gpu_groups_count(None) (or restoring the original value) — ensure the cleanup runs even on panic/failure (e.g., using RAII/Drop or a test teardown helper) so gpu_groups does not remain set for subsequent tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/segment/tests/integration/gpu_hnsw_test.rs`:
- Around line 150-153: The test must assert that the GPU HNSW path was actually
used instead of only comparing accuracy: after configuring full_scan_threshold
and performing the search in the GPU HNSW tests, add an explicit assertion that
the GPU/indexed path was exercised (for example, check the runtime indicator
exposed by your codebase such as a Segment/Index method or metric like
is_on_gpu(), used_gpu() or engine.metrics().gpu_search_count > 0); modify the
test cases that use full_scan_threshold (the ones around the shown snippet and
the other ranges called out) to either force the indexed path (adjust
full_scan_threshold appropriately) and then assert the GPU flag/metric is true
so a silent CPU fallback will fail the test.
- Around line 300-309: The assertions currently allow the indexed case to be
worse by up to min_diff; change the comparisons so the indexed hit counts must
exceed the non-indexed ones by min_diff. Replace the check using hits_cpu_idx +
min_diff >= hits_cpu_no_idx with hits_cpu_idx >= hits_cpu_no_idx + min_diff (and
similarly replace hits_gpu_idx + min_diff >= hits_gpu_no_idx with hits_gpu_idx
>= hits_gpu_no_idx + min_diff), and update the assertion messages to reflect the
new ordering (e.g., "Payload index should improve accuracy:
cpu_idx={hits_cpu_idx} < cpu_no_idx={hits_cpu_no_idx} + {min_diff}" adjusted
accordingly).
- Around line 82-97: The test currently compares full ScoredPointOffset lists
for equality; instead extract the returned point identifiers (e.g., map each
ScoredPointOffset to its offset/id) from hnsw_index.search(...) and compare only
IDs to the ground-truth IDs for each query (use a HashSet or sort-and-compare of
IDs) so small score or order differences don’t fail the test; update the closure
in the iterator that calls hnsw_index.search to build a set/vector of result IDs
and compare that to the expected ground_truth IDs (or compute
intersection/recall) rather than comparing ScoredPointOffset structs directly.
---
Nitpick comments:
In `@lib/segment/tests/integration/gpu_hnsw_test.rs`:
- Line 206: This test sets global process state via
set_gpu_groups_count(Some(gpu_groups)); to avoid leaking that change to other
tests, restore the previous state at the end of the test (or use a scope guard)
by calling set_gpu_groups_count(None) (or restoring the original value) — ensure
the cleanup runs even on panic/failure (e.g., using RAII/Drop or a test teardown
helper) so gpu_groups does not remain set for subsequent tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 30c985d0-6edd-4d2f-94cb-571d2a0ef51f
📒 Files selected for processing (1)
lib/segment/tests/integration/gpu_hnsw_test.rs
| let _ = env_logger::builder() | ||
| .is_test(true) | ||
| .filter_level(log::LevelFilter::Trace) | ||
| .try_init(); |
There was a problem hiding this comment.
Keep a direct assertion that the GPU HNSW path was exercised.
The previous test verified GPU usage explicitly, but this version only compares accuracy. With full_scan_threshold this low, the indexed cases can pass via the plain-search path, and a silent CPU fallback in the GPU builds would also still satisfy these assertions.
Also applies to: 162-165, 242-252, 277-285
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/segment/tests/integration/gpu_hnsw_test.rs` around lines 150 - 153, The
test must assert that the GPU HNSW path was actually used instead of only
comparing accuracy: after configuring full_scan_threshold and performing the
search in the GPU HNSW tests, add an explicit assertion that the GPU/indexed
path was exercised (for example, check the runtime indicator exposed by your
codebase such as a Segment/Index method or metric like is_on_gpu(), used_gpu()
or engine.metrics().gpu_search_count > 0); modify the test cases that use
full_scan_threshold (the ones around the shown snippet and the other ranges
called out) to either force the indexed path (adjust full_scan_threshold
appropriately) and then assert the GPU flag/metric is true so a silent CPU
fallback will fail the test.
* measure filterable hnsw indexed vs unindexs + cpu vs gpu * measure filterable hnsw indexed vs unindexs + cpu vs gpu * limit GPU parallelism * address review
Observation:
Changes:
gpu_groups, as it seems GPU with high parallelism tend to drop precision on low-volume data