[score boosting] Value retrievers return multivalues#6195
Conversation
📝 WalkthroughWalkthroughThe pull request updates the logic for retrieving and processing payload values in the query optimization component. In the Suggested reviewers
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
lib/segment/src/types.rs (1)
1453-1456: Well implemented newget_value_clonedmethod.This method extends the
PayloadContainertrait with a convenient way to obtain owned values rather than references, efficiently transforming the collection by cloning each value. This aligns with the PR objective of enhancing value retrieval to support multiple values.Consider adding documentation comments to explain the purpose and behavior of this method:
+ /// Returns a collection of cloned values from the payload at the specified path. + /// + /// Unlike `get_value` which returns references, this method clones each value, + /// allowing for ownership of the values when needed. fn get_value_cloned(&self, path: &JsonPath) -> MultiValue<Value> { self.get_value(path).into_iter().cloned().collect() }lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (1)
246-253: Consider adding brief doc comments for clarity.This logic correctly returns
Nonewhen there are no values, returns a single value unwrapped, or wraps multiple values in an array. A short doc comment explaining that behavior (single vs. multiple) could make this more discoverable.fn get_payload_value(&self, json_path: &JsonPath, point_id: PointOffsetType) -> Option<Value> { + // Returns None if no values exist; + // returns the single value directly if exactly one value exists; + // returns an array of values if multiple exist. self.payload_retrievers .get(json_path) .and_then(|retriever| { let mut multi_value = retriever(point_id); match multi_value.len() { 0 => None, 1 => Some(multi_value.pop().unwrap()), _ => Some(Value::Array(multi_value.into_iter().collect())), } }) }lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (1)
77-95: Consider extracting flattening logic into a standalone helper.Flattening arrays for non-wildcard keys is correct, but factoring this out into a small utility (e.g.,
flatten_one_level) may keep the code succinct and reusable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs(3 hunks)lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs(7 hunks)lib/segment/src/types.rs(1 hunks)
🔇 Additional comments (17)
lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (17)
8-8: Import usage looks good.
15-15: Good update to return multiple values.
103-103: Doc comment is helpful.
109-116: Integer index retrieval logic looks good.
120-127: Integer map index retrieval logic looks good.
142-150: DateTime index handling looks good.
155-162: Keyword index retrieval logic is consistent.
166-173: Geo index retrieval logic is consistent.
178-184: Boolean index retrieval logic looks straightforward.
188-195: UUID map index retrieval logic is correct.
199-206: UUID index retrieval logic is correct.
221-221: New import is needed for JSON manipulations.
223-223: Confirmed usage of MultiValue import.
292-294: Tests for numeric arrays in payload are accurate.
310-312: Tests for geo fields in payload are accurate.
363-365: Index-based numeric array tests look correct.
380-382: Index-based geo field tests look correct.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (1)
136-137: Consider adding documentation for the multivalue filtering behavior.While the implementation correctly filters out non-finite float values, adding a brief comment explaining this behavior would improve code readability and make the intent clearer for future maintainers.
.filter_map(|v| Some(Value::Number(Number::from_f64(v)?))) + // filter_map with ? operator silently skips NaN and infinite values .collect()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs(7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (5)
lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (5)
8-8: Well-implemented type change to support multivalues.The change from returning a single optional value to returning
MultiValue<Value>aligns well with the PR objective to return entire multivalue sets rather than just the first value.Also applies to: 15-15
77-95: Good implementation of multivalue handling in payload retriever.The updated logic appropriately handles both direct wildcard paths and regular paths with array values by flattening them into a single MultiValue collection. This ensures consistent behavior when retrieving multiple values from payloads.
130-140: Nicely handled potential non-finite float values.The implementation now uses
filter_mapwith optional unwrapping (Number::from_f64(v)?) which elegantly skips NaN or infinite values instead of panicking. This addresses the previous review comment about handling non-finite floats gracefully.
103-212: Well-structured retriever functions with consistent multivalue handling.All indexed variable retrievers have been consistently updated to return
MultiValue<Value>using the pattern ofinto_iter().flatten()to collect all values. This provides a unified approach across different index types.
292-295: Test assertions correctly updated for multivalue returns.The test assertions have been properly updated to check for
MultiValueinstances instead of single optional values, which validates the changes in implementation.Also applies to: 309-312, 363-366, 380-383
* retrievers return multivalues * adjust integration * fix payload retriever * graceful conversion
This changes the behavior of retrievers from extracting always the first value of a multivalue, to returning the multivalue itself.
After retrieving, the evaluation will translate single value as value (
[x]asx), and more than one value as array ([x, y]as[x, y])This table exemplifies the change: