[strict mode | score boosting] Strict mode for formula queries#6317
[strict mode | score boosting] Strict mode for formula queries#6317
Conversation
📝 WalkthroughWalkthroughThe changes update multiple areas of the codebase. The type for the ✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
lib/storage/src/issues_subscribers.rs (1)
18-18: Consistent module path update for method call.The method call has been updated to use the fully qualified path
collection::problems::UnindexedField::submit_possible_suspectsinstead of just using the importedUnindexedField.Since you've already imported
UnindexedFieldon line 4, you could simplify this to:- collection::problems::UnindexedField::submit_possible_suspects( + UnindexedField::submit_possible_suspects(lib/collection/src/operations/verification/query.rs (2)
108-136: Strict mode check for unindexed fields inCollectionQueryRequest
This method correctly returns an error when an expression references an unindexed field withunindexed_filtering_retrieve == Some(false). As an optional improvement, consider logging or reporting metrics when such errors occur repeatedly.
160-187: Duplicate strict mode logic inCollectionQueryGroupsRequest
This block mirrors theCollectionQueryRequestlogic. Factoring out a shared helper method could reduce repetition and potential future divergence.lib/collection/src/collection/payload_index_schema.rs (1)
107-110: Delegate toone_unindexed_filter_key
The originalone_unindexed_keyfunction now hands off filter checks to a more specialized method. Consider renaming for clarity since it’s focused on filter usage.lib/collection/src/problems/unindexed_field.rs (2)
405-431:needs_indexmethod
Compares inferred schemas to existing indexes, returningtrueonly if the appropriate schema kind is missing. The TODO regarding parameter comparisons for integer indexes is valid.
433-541:update_from_expressioncoverage
Thoroughly enumerates everyExpressionInternalvariant, extracting relevant JSON paths. Consider unit tests for complex nested scenarios to ensure correctness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
lib/api/src/rest/schema.rs(1 hunks)lib/collection/Cargo.toml(1 hunks)lib/collection/src/collection/payload_index_schema.rs(3 hunks)lib/collection/src/collection/search.rs(1 hunks)lib/collection/src/lib.rs(1 hunks)lib/collection/src/operations/universal_query/formula.rs(3 hunks)lib/collection/src/operations/universal_query/shard_query.rs(1 hunks)lib/collection/src/operations/verification/query.rs(3 hunks)lib/collection/src/problems/unindexed_field.rs(5 hunks)lib/collection/src/tests/payload.rs(5 hunks)lib/segment/Cargo.toml(0 hunks)lib/segment/src/lib.rs(0 hunks)lib/storage/src/issues_subscribers.rs(2 hunks)src/issues_setup.rs(1 hunks)
💤 Files with no reviewable changes (2)
- lib/segment/src/lib.rs
- lib/segment/Cargo.toml
🧰 Additional context used
🧬 Code Definitions (4)
lib/collection/src/tests/payload.rs (1)
lib/collection/src/collection/payload_index_schema.rs (1)
one_unindexed_filter_key(157-162)
lib/collection/src/collection/search.rs (1)
lib/collection/src/problems/unindexed_field.rs (1)
slow_query_threshold(36-38)
lib/storage/src/issues_subscribers.rs (1)
lib/collection/src/problems/unindexed_field.rs (1)
submit_possible_suspects(90-103)
lib/collection/src/collection/payload_index_schema.rs (1)
lib/collection/src/problems/unindexed_field.rs (2)
new(275-286)new(331-336)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (22)
lib/collection/src/lib.rs (1)
12-12: Addition of problems module completes the module relocationThis change makes the new
problemsmodule publicly accessible from the collection crate, which aligns with the PR objective of relocating the module from segment to collection to facilitate evaluation of variables withinExpressionInternal.lib/collection/Cargo.toml (1)
61-61: Appropriate addition of HTTP dependencyAdding the HTTP crate as a workspace dependency is necessary to support the
JsonPathtype usage in the REST API's datetime key expression. This aligns with the PR's goal of enhancing formula query handling.src/issues_setup.rs (1)
4-4: Import path updated to reflect module relocationThe import path has been correctly updated to use the relocated
problemsmodule from the collection crate instead of the segment crate. This change maintains consistency with the architectural changes in this PR.lib/api/src/rest/schema.rs (1)
685-685: Type enhancement for datetime_key from String to JsonPathChanging the type from
StringtoJsonPathprovides better type safety and validation for datetime keys in formula expressions. This change directly supports the PR objective of implementing strict mode for formula queries, by ensuring that datetime keys are properly validated as JSON paths.lib/collection/src/collection/search.rs (1)
320-320: Update to import path for consistent module structure.The code has been updated to use
crate::problems::UnindexedFieldinstead ofsegment::problems::UnindexedField, reflecting the relocation of theproblemsmodule from the segment to the collection crate as mentioned in the PR objectives.lib/collection/src/operations/universal_query/shard_query.rs (1)
404-409: Improved type safety and error handling for datetime keys.The implementation now parses the
dt_keystring into aJsonPathobject instead of using it directly. This change:
- Adds validation at parsing time, rejecting invalid keys
- Provides clear error messages for invalid inputs
- Uses a strongly-typed
JsonPathobject instead of a raw string- Aligns with the PR objective of using
JsonPathdirectly for datetime keysThis is a good improvement for type safety and error handling.
lib/storage/src/issues_subscribers.rs (1)
4-4: Updated import to reflect relocated module.The import has been updated to use
collection::problems::UnindexedFieldinstead ofsegment::problems::UnindexedField, consistent with the module relocation mentioned in the PR objectives.lib/collection/src/tests/payload.rs (1)
72-72: Updated method name to reflect specialized functionality.All instances of
one_unindexed_key()have been updated toone_unindexed_filter_key()to match the renamed method inPayloadIndexSchema. This aligns with the code changes where the original method was refactored to handle both filters and expressions using thePotentiallyUnindexedenum, with specialized wrapper methods for specific use cases.The update is consistent across all occurrences in the test, ensuring continued correct functionality while supporting the PR objective to implement checks for unindexed fields in formula expressions.
Also applies to: 91-91, 114-114, 143-143
lib/collection/src/operations/universal_query/formula.rs (3)
11-11: Check import usage
This newly added import forConditionandGeoPointis straightforward and consistent with the rest of the file’s usage.
32-32: Switch fromDatetime(String)toDatetimeKey(JsonPath)
Replacing the string-based approach with a typedJsonPathclarifies the payload reference while reducing parsing overhead.
90-93: NewDatetimeKeyvariant logic
Inserting the JSON path intopayload_varsand wrapping it inDatetimeExpression::PayloadVariablelooks correct. This allows date-time extraction directly from the payload.lib/collection/src/operations/verification/query.rs (2)
6-6: Added import forCollectionErrorandCollectionResult
No issues detected.
8-8: Importing new query types
BringsCollectionQueryGroupsRequest,CollectionQueryRequest, andQueryinto scope for strict mode checks.lib/collection/src/collection/payload_index_schema.rs (7)
11-11: Added import forExpressionInternal
Enables expression-based logic within payload index checks.
13-13: Importunindexed_field
Aligns with the new unindexed logic integration.
111-119: Introducingone_unindexed_expression_key
Allows external code to detect unindexed keys in expressions. The method name is descriptive, making the API more intuitive.
122-125: New enumPotentiallyUnindexed
A flexible solution for handling both filters and expressions with the same unindexed key detection logic.
131-134: Refactoredone_unindexed_key
Switching to the generic parameterPotentiallyUnindexedensures this function can evolve to handle multiple input types seamlessly.
137-144: Match-based call to update methods
Explicitly differentiating betweenFilterandExpressionwithupdate_from_filter_onceorupdate_from_expressionkeeps the code clean and cohesive.
154-169: Public wrappers for filters and expressions
one_unindexed_filter_keyandone_unindexed_expression_keyprovide clear, dedicated entry points for each type of unindexed key check.lib/collection/src/problems/unindexed_field.rs (2)
10-20: Refactored imports
These imports now reference modules insegment, aligning with the broader reorganization.
171-185: Extended handling ofValueVariants
Correctly matchesString,Integer, andBoolfor inference. This approach is consistent with the existing schema inference logic.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/collection/src/problems/unindexed_field.rs (1)
412-424: Consider improving index comparison in a follow-upThe TODO comment correctly identifies a limitation in the current approach of comparing just the index type (kind) rather than the full index parameters. This could lead to false negatives as described in the comment.
Consider implementing a more precise comparison for parametrized indexes in a follow-up PR. This would involve:
- Enhancing the inferring step to output valid parametrized indexes
- Comparing those directly rather than just comparing the kind
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/collection/src/problems/unindexed_field.rs(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (4)
lib/collection/src/problems/unindexed_field.rs (4)
10-19: Imports updated to reflect module relocationThe import paths have been updated to reference the
segmentmodule directly, which aligns with the PR objective of relocating theproblemsmodule from thesegmentto thecollectioncrate.
171-171: ValueVariants references updated correctlyReferences to
ValueVariantshave been properly updated to use the fully qualified path from thesegmentmodule, maintaining consistency with the import changes.Also applies to: 182-182, 185-185
397-431: Good refactoring of indexing check logicExtracting the indexing check logic into a separate
needs_indexmethod improves code organization and reusability. This refactoring will make the code easier to maintain.Regarding the TODO comment on lines 412-424: The identified limitation in comparing parametrized indexes could lead to false negatives, particularly with integer indexes that are parametrized with either lookup or range. Consider prioritizing this improvement in a future update.
433-541: Comprehensive implementation of expression analysisThe new
update_from_expressionmethod thoroughly handles various expression types for identifying unindexed fields in formula queries. This implementation:
- Correctly processes different expression types including constants, variables, conditions, and mathematical operations
- Appropriately extracts fields and infers schemas for numeric and geo operations
- Recursively handles composite expressions
- Properly updates the unindexed schema collection
This is a key implementation for the PR's objective of enforcing strict mode for formula queries.
| match var { | ||
| VariableId::Score(_) => return, | ||
| VariableId::Payload(json_path) => { | ||
| key = json_path; | ||
| inferred = vec![ | ||
| PayloadFieldSchema::FieldType(PayloadSchemaType::Integer), | ||
| PayloadFieldSchema::FieldType(PayloadSchemaType::Float), | ||
| ]; | ||
| } | ||
| VariableId::Condition(_) => return, | ||
| } |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Consider adding a test for VariableId handling
The parsing and handling of VariableId types looks correct, but this is a critical part of the formula expression evaluation. Ensure there are comprehensive tests covering the different variable types (Score, Payload, Condition) to validate this logic.
🏁 Script executed:
#!/bin/bash
# Search for tests related to formula expressions and VariableId
echo "Searching for tests related to formula expressions..."
rg -A 3 -B 3 "VariableId" --glob "**/*test*.rs"
echo "Searching for tests that would exercise the update_from_expression method..."
rg -A 3 -B 3 "update_from_expression" --glob "**/*test*.rs"Length of output: 387
ACTION REQUIRED: Add Comprehensive Tests for VariableId Handling
It appears that our test suite currently lacks coverage for the different VariableId variants. We were unable to locate any tests exercising the formula expression update logic, including the handling of Score, Payload, and Condition types. Please add comprehensive tests to ensure that each variant is correctly parsed and processed.
- Ensure tests cover:
VariableId::Score(should bypass further processing)VariableId::Payload(valid payload schema assignment)VariableId::Condition(should also bypass processing)
- Consider updating or creating tests in your test suite to validate the behavior in critical formula evaluation paths.
|
Thank you very much for adding this 👍 Please add an integration test in |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/openapi/test_strictmode.py (1)
1357-1357: Remove debug print statement.There's a debugging print statement that should be removed before merging this PR.
- print(query_fail.json())lib/collection/src/operations/verification/query.rs (2)
80-97: Consider using consistent return type.The
check_custommethod forCollectionPrefetchusesResult<(), crate::operations::types::CollectionError>as its return type, while the same method for other types uses the type aliasCollectionResult<()>. For consistency, consider using the type alias here as well.- ) -> Result<(), crate::operations::types::CollectionError> { + ) -> CollectionResult<()> {
40-78: Consider reducing code duplication in check_custom implementations.The
check_customimplementations forCollectionQueryRequest,CollectionPrefetch, andCollectionQueryGroupsRequestcontain similar logic. Consider extracting the common pattern into a helper function to reduce duplication.You could create a helper trait method that takes a query option and prefetch option:
trait StrictModeHelpers { fn check_query_and_prefetch( &self, query: Option<&Query>, prefetch: Option<&[CollectionPrefetch]>, collection: &Collection, strict_mode_config: &StrictModeConfig, ) -> CollectionResult<()>; } impl StrictModeHelpers for StrictModeVerification { fn check_query_and_prefetch( &self, query: Option<&Query>, prefetch: Option<&[CollectionPrefetch]>, collection: &Collection, strict_mode_config: &StrictModeConfig, ) -> CollectionResult<()> { if let Some(prefetch_items) = prefetch { for prefetch in prefetch_items { Box::pin(prefetch.check_strict_mode(collection, strict_mode_config)).await?; } } if let Some(query) = query { query.check_strict_mode(collection, strict_mode_config)?; } Ok(()) } }Then in each implementation:
async fn check_custom(&self, collection: &Collection, strict_mode_config: &StrictModeConfig) -> CollectionResult<()> { self.check_query_and_prefetch( self.query.as_ref(), Some(&self.prefetch), collection, strict_mode_config ) }This approach would need to be adapted to your specific requirements and may require async syntax adjustments.
Also applies to: 80-118, 120-152
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/collection/src/operations/verification/query.rs(3 hunks)tests/openapi/test_strictmode.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/openapi/test_strictmode.py
1326-1326: Redefinition of unused collection_name from line 3
(F811)
🔇 Additional comments (3)
tests/openapi/test_strictmode.py (2)
1326-1361: Good addition of test for strict mode formula expressions.This test verifies that formula expressions are properly restricted in strict mode when referencing unindexed fields. It follows the same pattern as other strict mode tests and provides good coverage for the new functionality.
🧰 Tools
🪛 Ruff (0.8.2)
1326-1326: Redefinition of unused
collection_namefrom line 3(F811)
1458-1458: The assertion matches expected behavior.The assertion correctly verifies that unsetting the write rate limit isn't currently supported, as mentioned in the comment on line 1457.
lib/collection/src/operations/verification/query.rs (1)
10-38: Well-implemented formula expression restriction.The implementation correctly checks for unindexed fields in formula expressions when strict mode is enabled with
unindexed_filtering_retrieveset tofalse. The error message is clear and provides helpful guidance on how to resolve the issue.
cb71472 to
c89ca7b
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/collection/src/operations/verification/query.rs (1)
40-57: Return type inconsistency in check_custom implementations.While the implementation logic is sound, there's an inconsistency in the return types.
CollectionPrefetch::check_customreturnsResult<(), crate::operations::types::CollectionError>while the other implementations returnCollectionResult<()>. SinceCollectionResultis likely a type alias for the same Result type, these should be consistent.- ) -> Result<(), crate::operations::types::CollectionError> { + ) -> CollectionResult<()> {Also applies to: 80-97, 120-132
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/collection/src/operations/verification/query.rs(3 hunks)tests/openapi/test_strictmode.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/openapi/test_strictmode.py
1326-1326: Redefinition of unused collection_name from line 3
(F811)
🔇 Additional comments (4)
tests/openapi/test_strictmode.py (2)
1326-1366: Good addition of a strict mode test for formula expressions!This test effectively verifies that formula expressions containing unindexed payload fields are rejected when strict mode is enabled with unindexed filtering restrictions. The assertion checking for both "discount_price" and "formula expression" in the error message is comprehensive.
🧰 Tools
🪛 Ruff (0.8.2)
1326-1326: Redefinition of unused
collection_namefrom line 3(F811)
1462-1462: Consistent assertion matching implementation behavior.The assertion is now correctly checking that
write_rate_limitequals 1, which aligns with the comment on line 1461 indicating that write rate limit unset operation is not supported.lib/collection/src/operations/verification/query.rs (2)
10-38: Well-implemented strict mode check for formula expressions.The implementation effectively identifies unindexed fields in formula expressions and generates a helpful error message that includes both the problematic key and suggested actions for resolution.
60-60: Fixed return type in query_limit method.The method now correctly returns
Some(self.limit)instead of justself.limit, which aligns with the declared return type ofOption<usize>and makes it consistent with similar implementations for other structs.
c89ca7b to
4c2bf2f
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/openapi/test_strictmode.py (1)
1326-1326: Redefinition of unused parameterThe function parameter
collection_nameis redefined here but is already imported fromconftestat line 3.-def test_strict_mode_formula_expression(collection_name): +def test_strict_mode_formula_expression():🧰 Tools
🪛 Ruff (0.8.2)
1326-1326: Redefinition of unused
collection_namefrom line 3(F811)
lib/collection/src/operations/verification/query.rs (1)
40-57: Implementation of StrictModeVerification for CollectionQueryRequestThe implementation correctly verifies both prefetch and query fields, passing on the strict mode configuration. The method now returns CollectionResult<()> for consistent error handling throughout the codebase.
Note that there's repeated code across the different implementations (CollectionQueryRequest, CollectionPrefetch, and CollectionQueryGroupsRequest). Consider extracting this common logic to reduce duplication in the future.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/collection/src/operations/verification/query.rs(3 hunks)tests/openapi/test_strictmode.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/openapi/test_strictmode.py
1326-1326: Redefinition of unused collection_name from line 3
(F811)
🔇 Additional comments (8)
tests/openapi/test_strictmode.py (2)
1326-1364: Well-structured test for formula expression strict mode verificationThe test case is well-designed and follows the established pattern of other strict mode tests in this file:
- First verifies the query works without restrictions
- Then enables strict mode with the relevant setting
- Finally verifies the query fails with appropriate error messages
This test properly validates the new functionality for checking unindexed fields in formula expressions.
🧰 Tools
🪛 Ruff (0.8.2)
1326-1326: Redefinition of unused
collection_namefrom line 3(F811)
1461-1461: Assertion consistent with expected behaviorThe assertion correctly verifies that the write_rate_limit remains 1, which aligns with the comment at line 1460 indicating that unsetting is not currently supported.
lib/collection/src/operations/verification/query.rs (6)
5-5: Updated import to include CollectionResultAppropriate addition of CollectionResult to the import to support the new return type used in the check_strict_mode function.
7-7: Added Query to the import listQuery needs to be imported as it's used in the implementation below.
10-37: Well-implemented formula field verification for strict modeThis implementation correctly checks for unindexed fields in formula expressions, and returns a helpful error message indicating which fields need to be indexed. The error message includes the possible schema types for the field, helping users understand how to fix the issue.
The code follows the existing pattern for strict mode verification while adding this new specific check for formula expressions.
59-61: Simplified query_limit methodThis change ensures the method always returns an Option by wrapping the limit in Some. This is more consistent with the method signature and seems to be a straightforward improvement.
80-96: Implementation of StrictModeVerification for CollectionPrefetchThis implementation correctly checks both prefetch and query components. It follows the same pattern as the implementation for CollectionQueryRequest.
120-131: Implementation of StrictModeVerification for CollectionQueryGroupsRequestThe implementation correctly verifies the query field, passing on the strict mode configuration. Unlike the other implementations, this one doesn't need to check prefetch since that field doesn't exist on this type.
There is still some code duplication across these implementations that could be addressed in the future.
* verify CollectionQuery's Prefetch * remove implementation for REST query * remove implementation for REST query groups
4c2bf2f to
9390319
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
lib/collection/src/operations/verification/query.rs (1)
80-97: CollectionPrefetch implementation looks correctYou've properly implemented the check_custom method for CollectionPrefetch, making sure that all prefetch queries and their formulas are checked for unindexed fields.
Consider using the same return type signature across all implementations:
- ) -> Result<(), crate::operations::types::CollectionError> { + ) -> CollectionResult<()> {Since you've imported
CollectionResultat the top, this would make the code more consistent with the other implementations.lib/collection/src/problems/unindexed_field.rs (2)
433-542: Consider breaking down the update_from_expression method for better maintainability.While the implementation correctly handles all expression types, the method is quite large and complex. Consider refactoring it into smaller helper methods for each category of expressions (mathematical operations, functions, etc.) to improve readability and maintainability.
You could restructure it like this:
pub fn update_from_expression(&mut self, expression: &ExpressionInternal) { match expression { ExpressionInternal::Constant(_) => return, ExpressionInternal::Variable(variable) => self.process_variable(variable), ExpressionInternal::Condition(condition) => { self.update_from_condition(None, condition); return; }, ExpressionInternal::GeoDistance { origin: _, to } => self.process_geo_distance(to), ExpressionInternal::Datetime(_) => return, ExpressionInternal::DatetimeKey(variable) => self.process_datetime_key(variable), // Group mathematical operations expr @ (ExpressionInternal::Mult(_) | ExpressionInternal::Sum(_) | ExpressionInternal::Div { .. }) => self.process_math_operation(expr), // Group mathematical functions expr @ (ExpressionInternal::Neg(_) | ExpressionInternal::Sqrt(_) | ExpressionInternal::Pow { .. } | ExpressionInternal::Exp(_) | ExpressionInternal::Log10(_) | ExpressionInternal::Ln(_) | ExpressionInternal::Abs(_)) => self.process_math_function(expr), ExpressionInternal::Decay { .. } => self.process_decay_function(expression), } } // Helper methods for each expression type...
412-424: Address the TODO for better comparisons of parameterized indexes.There's a TODO comment indicating a potential false negative in the index comparison logic. Integer indexes parameterized with "lookup" or "range" might lead to incorrect assessments of whether a field is properly indexed. Consider implementing the suggested improvement or clarifying when this will be addressed.
Would you like me to suggest a more robust comparison approach that could address this TODO?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
lib/api/src/rest/schema.rs(1 hunks)lib/collection/Cargo.toml(1 hunks)lib/collection/src/collection/payload_index_schema.rs(3 hunks)lib/collection/src/collection/search.rs(1 hunks)lib/collection/src/lib.rs(1 hunks)lib/collection/src/operations/universal_query/formula.rs(3 hunks)lib/collection/src/operations/universal_query/shard_query.rs(1 hunks)lib/collection/src/operations/verification/query.rs(3 hunks)lib/collection/src/problems/unindexed_field.rs(5 hunks)lib/collection/src/tests/payload.rs(5 hunks)lib/segment/Cargo.toml(0 hunks)lib/segment/src/lib.rs(0 hunks)lib/storage/src/issues_subscribers.rs(2 hunks)src/actix/api/query_api.rs(4 hunks)src/issues_setup.rs(1 hunks)tests/openapi/test_strictmode.py(1 hunks)
💤 Files with no reviewable changes (2)
- lib/segment/src/lib.rs
- lib/segment/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (10)
- lib/collection/src/lib.rs
- lib/collection/src/tests/payload.rs
- lib/collection/Cargo.toml
- lib/collection/src/collection/search.rs
- lib/collection/src/operations/universal_query/shard_query.rs
- lib/api/src/rest/schema.rs
- src/issues_setup.rs
- lib/collection/src/operations/universal_query/formula.rs
- src/actix/api/query_api.rs
- lib/storage/src/issues_subscribers.rs
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/openapi/test_strictmode.py
1326-1326: Redefinition of unused collection_name from line 3
(F811)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (12)
tests/openapi/test_strictmode.py (1)
1326-1364:Details
✅ Verification successful
Good addition of test for formula expression strict mode!
This new test correctly verifies that formula expressions using unindexed fields are properly restricted when strict mode is enabled with
unindexed_filtering_retrieveset toFalse. The test first confirms the query works without restrictions, then correctly verifies that it fails with specific error messages when strict mode is enabled.
🏁 Script executed:
#!/bin/bash # Check the static analysis warning about collection_name redefinition # Look for the collection_name import grep -n "from .conftest import collection_name" tests/openapi/test_strictmode.py # Check the test function signature grep -n "def test_strict_mode_formula_expression" tests/openapi/test_strictmode.pyLength of output: 270
Test correctly validates the strict mode enforcement for formula expressions.
- The test first confirms that a query using an unindexed field succeeds without strict mode.
- Once strict mode is enabled (with
unindexed_filtering_retrieveset toFalse), the same query fails, and the error message appropriately mentions both"discount_price"and"formula expression".- The use of the
collection_namefixture—imported fromconftestand then injected as a parameter in the test—is intentional and in line with our testing conventions, so the static analysis warning about redefinition is expected.🧰 Tools
🪛 Ruff (0.8.2)
1326-1326: Redefinition of unused
collection_namefrom line 3(F811)
lib/collection/src/operations/verification/query.rs (3)
5-38: Implementation of formula field indexing verification is solidThe implementation correctly adds the check for unindexed fields in formula expressions and provides a helpful error message to guide users. I like that the error message includes both what's wrong and how to fix it.
40-57: Good implementation of formula verification in CollectionQueryRequestThe changes correctly add formula expression checking for CollectionQueryRequest, ensuring that unindexed fields in formulas are properly validated when strict mode is enabled.
120-131:Details
✅ Verification successful
CollectionQueryGroupsRequest implementation is good
The check_custom method implementation for CollectionQueryGroupsRequest is correct, checking unindexed fields in formula queries.
🏁 Script executed:
#!/bin/bash # Check if there's similar duplicate code that could be consolidated # See if the method bodies in check_custom implementations are similar echo "Checking CollectionQueryRequest check_custom:" sed -n '41,57p' lib/collection/src/operations/verification/query.rs echo -e "\nChecking CollectionPrefetch check_custom:" sed -n '81,97p' lib/collection/src/operations/verification/query.rs echo -e "\nChecking CollectionQueryGroupsRequest check_custom:" sed -n '121,131p' lib/collection/src/operations/verification/query.rsLength of output: 2087
CollectionQueryGroupsRequest Implementation Confirmed as Correct
After reviewing the implementation against similar methods in the file, the check for unindexed fields in formula queries is performed appropriately. The absence of a prefetch loop compared to the other implementations is intentional and aligns with the expected behavior for this request type.
- File:
lib/collection/src/operations/verification/query.rs- Lines: 121-131 — The code correctly handles the strict mode check for queries.
lib/collection/src/collection/payload_index_schema.rs (4)
11-13: Clean import adjustments for module relocation.The import changes correctly reflect the relocation of the
problemsmodule fromsegmenttocollectioncrate as mentioned in the PR description. This architectural change helps facilitate the evaluation of variables withinExpressionInternal.
107-119: Well-structured Collection implementation update.The changes to the Collection implementation are clean and maintain backward compatibility while adding support for expression checking. The original method now delegates to the new specialized method, and a parallel method for expressions has been added with a clear, consistent naming convention.
122-125: Good enum design for type-safe handling.The
PotentiallyUnindexedenum is a well-designed approach for handling bothFilterandExpressionInternaltypes in a type-safe manner. This creates a clear abstraction that allows the underlying implementation to handle both types without code duplication.
131-169: Clean method refactoring with good encapsulation.The refactoring of
one_unindexed_keyinto a private method with two public wrapper methods is excellent. This approach:
- Maintains a clean public API
- Provides type-specific methods for clients
- Reuses implementation logic to avoid duplication
- Properly encapsulates the implementation details
This is a good example of following the Open/Closed principle - extending functionality without modifying existing behavior.
lib/collection/src/problems/unindexed_field.rs (4)
6-20: Clean import adjustments reflecting module restructuring.The import changes correctly update the paths following the module relocation. Adding the
httpcrate andVariableIdimports supports the new functionality for formula expression evaluation.
169-189: Consistent path updates in infer_schema_from_match_value.The function has been properly updated to use the fully-qualified path
segment::types::ValueVariantsconsistently with the import changes. This ensures the code continues to work correctly after the module restructuring.
405-431: Good extraction of common logic to needs_index method.Extracting the indexing check logic to a separate method improves code organization by following the DRY principle. This refactoring makes the code more maintainable and allows reuse of this logic in the new
update_from_expressionmethod.
446-456: Add tests for VariableId handling in the update_from_expression method.As noted in a previous review, there appears to be a lack of comprehensive tests for the different
VariableIdvariants in the formula expression evaluation. This is a critical part of the implementation, especially for ensuring the correct handling of Score, Payload, and Condition types.#!/bin/bash # Search for tests that would validate update_from_expression behavior echo "Searching for tests related to update_from_expression..." rg -A 5 -B 5 "update_from_expression" --glob "**/*test*.rs" echo "Searching for tests related to formula expressions..." rg -A 5 -B 5 "ExpressionInternal" --glob "**/*test*.rs" | grep -i test echo "Checking for test cases covering VariableId variants..." rg -A 5 -B 5 "VariableId::Score|VariableId::Payload|VariableId::Condition" --glob "**/*test*.rs"
…t#6317) * move problems.rs to collection crate * change type on datetime_key expression * extract from expression * check for unindexed fields in formula * clippy * dedup code * Only implement verification for `CollectionQuery` types (qdrant#6319) * verify CollectionQuery's Prefetch * remove implementation for REST query * remove implementation for REST query groups * add basic openapi test
Implements the check for unindexed fields in formula expressions.
Even if formulas are only allowed as a reranking step, there is no guarantee that the preselection will be small enough for making reading payloads directly negligible.
With this check, only correctly-indexed payload fields will be allowed in the formula expressions.
A needed change was to move the
problemsmodule fromsegmentto thecollectioncrate, because we needed to evaluate variables insideExpressionInternal.Also realized that we could use
JsonPathdirectly as the type for REST'sdatetime_keyexpression.