[score boosting] support datetime expressions#6196
Conversation
| } | ||
|
|
||
| impl From<chrono::DateTime<chrono::Utc>> for DateTimeWrapper { | ||
| impl FromStr for DateTimePayloadType { |
There was a problem hiding this comment.
this one was moved closer to the type definition, but no changes were made
| impl FromStr for DateTimeExpression { | ||
| type Err = String; | ||
|
|
||
| fn from_str(s: &str) -> Result<Self, Self::Err> { | ||
| // try as date | ||
| DateTimePayloadType::from_str(s) | ||
| .map(DateTimeExpression::Constant) | ||
| // else as payload key | ||
| .or_else(|_| JsonPath::from_str(s).map(DateTimeExpression::PayloadVariable)) | ||
| .map_err(|_| format!("Invalid date time expression: {s}")) | ||
| } | ||
| } |
There was a problem hiding this comment.
Since the expression is a string, it will try to parse as datetime first, then as a payload key
There was a problem hiding this comment.
It is not possible to properly distinguish between the two types, right? Not a fan of this, since it's definitely possible for a user to use date time strings as keys.
Maybe like this:
"history": {
"2025-03-24": 128,
"2025-03-23": 110,
"2025-03-22": 283,
"2025-03-21": 10,
"2025-03-20": 123,
"2025-03-19": 28,
"2025-03-18": 387
}There was a problem hiding this comment.
That is a good point, I didn't think of the case where datetime strings are used as keys, but it does make sense
There was a problem hiding this comment.
Solved by introducing a separate expression: { "datetime_key": <payload key> }
There was a problem hiding this comment.
Thanks. That works!
Or even datetime_payload, though that would be longer. Or the inverse datetime_value.
📝 WalkthroughWalkthroughThis pull request introduces several updates to expression parsing and evaluation across different modules. New variants for date-time expressions are added to various enums in both gRPC and REST interfaces, and a refined type for scoring—transitioning from Possibly related PRs
Suggested reviewers
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (1)
86-97: Potentially improve error message handling.
Currently, the underlying parse error is discarded. Preserving it can help with debugging.- .map_err(|_| format!("Invalid date time expression: {s}")) + .map_err(|err| format!("Invalid date time expression: {s}, cause: {err}"))lib/segment/src/types.rs (1)
77-91: Deserialize implementation discards parse error.
Preserving the error could facilitate debugging. Consider the following change:- let parse_result = DateTimePayloadType::from_str(str_datetime).ok(); - match parse_result { - Some(datetime) => Ok(datetime), - None => Err(serde::de::Error::custom(format!( - "'{str_datetime}' is not in a supported date/time format, please use RFC 3339" - ))), - } + match DateTimePayloadType::from_str(str_datetime) { + Ok(datetime) => Ok(datetime), + Err(err) => Err(serde::de::Error::custom( + format!("Invalid date/time format '{str_datetime}': {err}") + )), + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
lib/api/src/grpc/conversions.rs(4 hunks)lib/api/src/grpc/proto/points.proto(1 hunks)lib/api/src/grpc/qdrant.rs(2 hunks)lib/api/src/rest/schema.rs(1 hunks)lib/collection/src/operations/universal_query/formula.rs(6 hunks)lib/collection/src/operations/universal_query/shard_query.rs(1 hunks)lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs(9 hunks)lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs(3 hunks)lib/segment/src/types.rs(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (46)
lib/api/src/grpc/qdrant.rs (3)
5185-5185: Tag list updated correctly to accommodate new DateTime variant.The tag list was properly updated to include the new tag value '15', maintaining the sequential numbering pattern required by protobuf definitions. This change ensures the oneof field can correctly reference all variants including the newly added DateTime.
5206-5208: New DateTime variant added for timestamp support.The DateTime variant provides the capability to represent date-time values as either constants or payload keys, aligning with the PR objective to support datetime expressions that parse into microseconds. This implementation will enable the functionality needed for timestamp-based operations.
5210-5238: Tag numbers correctly incremented for all existing variants.All the tag numbers for existing variants have been properly incremented to accommodate the new DateTime variant at position 5. This maintains the sequential ordering required by protobuf while preserving the relative ordering of existing variants.
lib/collection/src/operations/universal_query/shard_query.rs (1)
398-398: Added DateTime expression support correctlyThe new DateTime variant handling is properly implemented in the TryFrom conversion from gRPC Expression to ExpressionInternal. This matches the PR's objective of supporting datetime expressions.
lib/api/src/rest/schema.rs (1)
643-644: Added DateTime variant to REST API schemaThe DateTime variant has been added to the Expression enum, allowing for string-based datetime representations in REST API expressions. The GeoDistance variant is now properly defined in the enum, maintaining API consistency.
lib/api/src/grpc/conversions.rs (5)
8-8: Added ScoreType import for precision improvementThis import supports the change from f32 to f64 precision for handling timestamps in microseconds, as specified in the PR objectives.
17-17: Added DateTimeExpression to importsCorrectly imported the DateTimeExpression enum from parsed_formula to support datetime conversion functions.
2795-2795: Updated constant casting to use ScoreTypeThe constant is now cast to ScoreType instead of directly using f32, supporting the precision improvement mentioned in the PR objectives.
2803-2810: Implemented datetime expression unparsingThe unparse_expression function has been extended to handle both:
- GeoDistance expressions with origin and payload key
- DateTime expressions supporting both constants and payload variables
This implementation properly handles both formats required by the PR.
2833-2833: Updated by_zero_default conversion to f32The by_zero_default value is cast to f32 when used in the Expression variant. This aligns with the internal type changes while maintaining API compatibility.
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (7)
7-7: No functional issues with new import.
12-12: New type alias for improved precision.
This aligns well with the transition from f32 to f64.
31-32: Constant variant updated to f64.
No issues found.
34-38: NewGeoDistance&DateTimevariants added.
Implementation looks correct.
40-40: Info comment only.
No changes required.
46-46: UsingPreciseScorefor zero division fallback.
Consistent with the new f64 approach.
80-84: IntroducedDateTimeExpressionenum.
Design is concise and easy to understand.lib/collection/src/operations/universal_query/formula.rs (7)
8-8: Importing references to new expression variants.
Looks good.
27-31: NewGeoDistanceandDateTimevariants in ExpressionInternal.
Ensure consistent usage with the rest of the system.
58-58: Casting from f32 to f64 for constant.
Implementation is straightforward.
71-74: HandlingGeoDistancein parse_and_convert.
Insertingtoinpayload_varsis correct for variable usage.
75-75: HandlingDateTimeby parsing the string.
Consider adding tests to ensure valid/invalid date-time strings are handled properly.
98-98: Converting by_zero_default from f32 to f64.
No issues found.
170-173: Mapping RESTGeoDistanceandDateTimetoExpressionInternal.
The code is consistent with the new variants.lib/segment/src/types.rs (4)
73-73: Safely creating DateTime from microseconds.
This approach gracefully handles invalid timestamps by returning None.
93-124: Rich date/time parsing logic.
Supports multiple formats. Implementation looks good.
126-130:Displaytrait forDateTimePayloadType.
Straightforward and correct.
132-135: Conversion fromDateTime<Utc>toDateTimePayloadType.
Implementation is correct with minimal overhead.lib/api/src/grpc/proto/points.proto (1)
586-596: Confirm protobuf field numbering and backward compatibility.Re-numbering (or shifting) existing fields and introducing a new field (
date_time = 5) may cause wire incompatibilities if older clients or services still rely on the previous field numbering. If this file has been published previously, please verify that no downstream consumers are broken by these changes. Otherwise, if the fields are genuinely new or safely migrated, the re-numbering is fine.lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (17)
10-12: Imports aligned with new expression and scoring logic.The updated imports for
DateTimeExpression,ParsedExpression,ParsedFormula,PreciseScore, andVariableIdlook consistent with your new extended expression handling approach.
19-19: New import forDateTimePayloadType.Importing
DateTimePayloadTypehere is appropriate for handling date-time parsing logic.
21-21: Switch toPreciseScorefor default value.Defining
DEFAULT_SCOREas aPreciseScore(f64) helps maintain precision. This is a coherent choice, given your expanded date-time operations.
41-41:FriendlyNameimplementation is clear.Defining the friendly name as
"number"forPreciseScorekeeps consistency with its numeric use in expressions.
53-57:FriendlyNameforDateTimePayloadType.Exposing
"datetime"as the friendly name helps produce meaningful error messages for date-time fields. This is a good addition.
100-100: Potential precision loss when converting toScoreType.Casting
PreciseScore(f64) back toScoreType(likely f32) may lose precision. Confirm that this down-cast is intentional and acceptable for your scoring use cases.
108-108: Method returns higher-precision score.Changing the return type of
eval_expressiontoOperationResult<PreciseScore>matches your decision to use double precision for expression evaluations.
116-116: Converting from f32 to f64.This
.map(|score| PreciseScore::from(*score))is a straightforward approach for loading prefetch scores into a more precise representation. Looks consistent with the rest of the logic.
126-126: Check integer vs. float values in JSON.Using
.as_f64()works properly for numbers, but ensure that integer JSON values are correctly interpreted as floats. If you anticipate integer-only fields, this remains safe, but it’s worth verifying no corner cases exist.
142-143: Geo-distance calculation looks straightforward.Calling
Haversine::distanceon lat/long pairs provides a clean solution for computing distance in meters. Keep an eye on any out-of-range or malformed geo points handled upstream.
144-157: New match arm forDateTimeExpression.Parsing the date-time payload or using a constant expression is implemented robustly. The match structure is clear and aligns well with your
DateTimeExpressionenum approach.
158-162: Potential precision limit for very large timestamps.Converting an i64 timestamp to an f64 is acceptable for typical ranges (~±285 years of microseconds) but be aware that extreme future or past timestamps may lose microsecond-level precision. Confirm this is acceptable for your use case.
448-448: Testing constant date-time expression.Using
"2025-03-18"is a valid scenario to verify constant date-time support. The approach ensures correctness of date-time parsing in expressions.
449-457: Test coverage for missing date-time payload variable.These lines confirm that a non-existent payload variable triggers a
VariableTypeError. Clear negative test coverage is always beneficial.
458-458: Comment clarifies fallback behavior.This comment concisely states the test objective for handling default date-time values, aiding maintainability.
459-462: Verifies default date-time usage.This case confirms correct functionality when the payload variable isn’t set but a default date-time exists. Good addition for reliability.
466-466: Test assertion withOperationResult<PreciseScore>.Using
PreciseScorein test assertions ensures alignment with your new floating-point evaluation logic.
e7cb1c0 to
82b104a
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
docs/redoc/master/openapi.json (1)
14532-14542: Datetime expression implementation could use more documentation.While the Datetime schema is correctly structured as a wrapper object with a required datetime property, the documentation could be improved to clearly indicate:
- The expected format of the string value
- Whether it accepts a payload key or a constant datetime value or both
- How the datetime string will be converted to microseconds (f64)
According to the PR objectives, this expression should parse strings into floating-point numbers representing microseconds, but this important detail isn't documented in the schema.
Consider enhancing the property description like this:
- "datetime": { - "type": "string" - } + "datetime": { + "description": "Payload field key or constant datetime string to parse into microseconds (f64). Supports ISO 8601 format.", + "type": "string" + }docs/grpc/docs.md (1)
2699-2699: NewdatetimeField Added in ExpressionThe new
datetimefield has been added to the Expression definition with the type[string](#string)and the description "Date-time constant or payload key". This clearly signals the support for datetime expressions in the API. Please ensure that the description succinctly communicates that the field can accept either a date-time constant or be used as a payload key, and that related documentation (in other parts of the system) reflects the updated evaluation precision from f32 to f64.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
docs/grpc/docs.md(1 hunks)docs/redoc/master/openapi.json(2 hunks)lib/api/src/grpc/conversions.rs(4 hunks)lib/api/src/grpc/proto/points.proto(1 hunks)lib/api/src/grpc/qdrant.rs(2 hunks)lib/api/src/rest/schema.rs(2 hunks)lib/collection/src/operations/universal_query/formula.rs(6 hunks)lib/collection/src/operations/universal_query/shard_query.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/api/src/grpc/proto/points.proto
- lib/api/src/rest/schema.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (17)
lib/api/src/grpc/qdrant.rs (3)
5182-5188: Tagged range correctly updated to accommodate new variantThe tag range for the
expression::Variantoneof has been properly extended to include the newDatetimevariant (tag 15). This ensures the protobuf definition remains consistent with all the available variants.
5206-5208: New Datetime variant looks goodThe addition of the
Datetimevariant with a string type is appropriate for representing both datetime constants and payload keys, aligning with the PR objective of supporting datetime expressions for score boosting.
5210-5238: Tag increments correctly appliedAll subsequent variant tags have been properly incremented to accommodate the new
Datetimevariant at tag 5. This maintains the required sequential ordering of tags in the protobuf definition.lib/collection/src/operations/universal_query/shard_query.rs (1)
398-398: LGTM: The datetime variant handler is correctly implemented.The new case for handling
Variant::Datetime(dt_str)in theTryFrom<grpc::Expression>implementation properly maps toExpressionInternal::DateTime(dt_str), which aligns with the PR objective to support datetime expressions for parsing strings into f64 microseconds.docs/redoc/master/openapi.json (2)
14467-14472: New expression types added to the Expression schema.The changes add two new expression types to the existing Expression schema:
GeoDistance- For computing geospatial distances between pointsDatetime- For handling datetime values, converting them to floating-point microseconds as mentioned in the PR objectivesThese additions enhance the query capabilities of the Qdrant API by allowing timestamp-based operations and geo-distance calculations in expressions.
14505-14531: GeoDistance expression implementation looks well-structured.The schema correctly defines the geo_distance operation with required origin and destination parameters. The implementation follows the pattern used by other expression types in the API and properly references the existing GeoPoint schema.
lib/api/src/grpc/conversions.rs (5)
8-8: Importing ScoreType
No concerns; import usage appears consistent.
17-17: New references to DateTimeExpression
Looks good; ensures the date-time variant is recognized.
2795-2795: Potential float cast overflow
Casting toScoreTypemight risk overflow or precision loss ifcis out of range. Consider verifying valid input ranges.
2803-2810: New unparse variants for GeoDistance & DateTime
Implementation looks correct; ensures new enum variants are handled.
2833-2833: Review by_zero_default cast
by_zero_default.map(|v| v as f32)may cause truncation. Verify alignment with new f64-based approach.lib/collection/src/operations/universal_query/formula.rs (6)
8-8: Additional import of PreciseScore
No issues; aligns with high-precision score usage.
27-31: New ExpressionInternal variants
GeoDistance and DateTime expansions appear logically structured.
58-58: Check for potential float conversion
PreciseScore::from(c)might lose precision for large floats. Please verify acceptable input ranges.
71-75: Handling of parsed GeoDistance & DateTime
Implementation looks fine; confirm date-time format coverage.
98-98: Division default cast
Consider potential data loss withby_zero_default.map(PreciseScore::from).
170-175: REST Expression expansions
Neat integration forGeoDistanceandDateTime.
|
Rebased on top of dev, and included fixes for |
3acfcf8 to
a6592aa
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
docs/grpc/docs.md (1)
2718-2718: Clarify the Expected Format for the New "datetime" FieldThe new "datetime" field is correctly added as a string to support date-time expressions. To further improve clarity for users, consider specifying the expected datetime format (for example, ISO 8601) or linking to additional documentation that details the acceptable formats and how these strings will be parsed into microseconds. This additional detail will help ensure consistency and proper usage of the field.
lib/segment/src/types.rs (2)
73-73: Handle out-of-range timestamps more gracefully
Currently, the code relies onchrono::DateTime::from_timestamp_micros(ts)returningNonefor invalid timestamps. Consider logging or surfacing an error iftsfalls outside the acceptable range to aid debugging.
96-107: Multiple parsing attempts for timezones
Chaining several parse attempts is flexible. Consider consolidating or documenting the expected timezone formats for maintainability.lib/collection/src/operations/universal_query/formula.rs (2)
27-31: New expression variants
IntroducingGeoDistanceandDateTimeextends the formula's flexibility. Ensure associated parsing and validation are thoroughly tested.
82-82: Delegated parsing forDateTime
Relying ondt_str.parse()ensures consistent format handling without duplicating logic.Do you need help adding tests for erroneous date/time inputs?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
docs/grpc/docs.md(1 hunks)docs/redoc/master/openapi.json(2 hunks)lib/api/src/grpc/conversions.rs(4 hunks)lib/api/src/grpc/proto/points.proto(1 hunks)lib/api/src/grpc/qdrant.rs(2 hunks)lib/api/src/rest/schema.rs(2 hunks)lib/collection/src/operations/universal_query/formula.rs(6 hunks)lib/collection/src/operations/universal_query/shard_query.rs(1 hunks)lib/common/common/src/math.rs(1 hunks)lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs(10 hunks)lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs(11 hunks)lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs(2 hunks)lib/segment/src/types.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- lib/collection/src/operations/universal_query/shard_query.rs
- lib/api/src/rest/schema.rs
- lib/api/src/grpc/proto/points.proto
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (2)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (1) (1)
score(269:270)lib/segment/src/types.rs (1) (1)
timestamp(68:70)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (44)
lib/api/src/grpc/qdrant.rs (3)
5185-5186: Tag range update accommodates the new datetime variant.The oneof tag range has been updated to include tag 5 for the new datetime variant and shifts all subsequent tags accordingly. This is the correct approach for extending a Protocol Buffer definition.
5206-5208: New datetime variant adds time-based expression support.This addition aligns with the PR objective to support datetime expressions. The string type will allow for both payload keys and constants, which can then be parsed into microsecond timestamps.
5210-5247: Tag numbers correctly incremented for all existing variants.All subsequent variant tags have been properly incremented to accommodate the new datetime variant at tag 5. This maintains the correct Protocol Buffer structure while adding new functionality.
docs/redoc/master/openapi.json (3)
14467-14540: Well-implemented GeoDistance expression for geospatial queries.The new
GeoDistanceexpression type allows for calculating the geodesic distance between a fixed origin point and a geo-point stored in a payload field. This implementation enhances the API's spatial query capabilities.The structure follows the established pattern of other expressions in the schema, with proper references to existing types like
GeoPoint.
14541-14551: Good addition of Datetime expression type for timestamp handling.This new expression type matches the PR objective to support
{"datetime": payload_key | constant}expressions. It will enable parsing strings into floating-point numbers representing microseconds (f64).The implementation is simple and consistent with the rest of the expression schema structure. This change is important for the improved precision needed when handling timestamps in decay functions, as mentioned in the PR objectives.
14455-14473: Proper integration of new expression types in the Expression schema.The Expression schema has been appropriately updated to include the new
GeoDistanceandDatetimetypes among the existing expression options. This ensures they can be used wherever expressions are supported throughout the API.lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (4)
129-138: Great Implementation of DatetimeIndex SupportThis implementation properly handles datetime retrieval from indexes, maintaining consistency with other index types. The code correctly retrieves the datetime value, converts it from a timestamp to a proper DateTimePayloadType, and then serializes it to a JSON value.
324-337: Comprehensive Test Setup for Datetime IndexGood test setup that covers important datetime formats - fully specified ISO format with timezone ("2023-01-01T00:00:00Z") and date-only format ("2023-01-02"). This demonstrates the system's ability to handle different datetime string representations.
341-341: LGTM - Proper Index RegistrationThe datetime index is correctly added to the test indices map with an appropriate key name.
379-394: Well-Structured Test for Datetime RetrievalThe test properly verifies datetime retrieval functionality with different test cases:
- A point with a single datetime value
- A point with multiple datetime values (verifying it returns the first one)
- A point with no datetime value
This covers the main scenarios for datetime retrieval.
lib/common/common/src/math.rs (2)
20-23: Appropriate type change for improved precision.The change from
f32tof64in theis_closefunction's parameters and constants aligns well with the PR's objective to improve precision for timestamp handling.
26-26: Consistent parameter type updates.The
is_close_tolfunction parameters have been correctly updated to match the precision improvements inis_close.lib/api/src/grpc/conversions.rs (6)
8-8: Appropriate import for precise score handling.The addition of
ScoreTypeimport fromcommon::typessupports the PR's objective to transition from f32 to f64 for expression evaluation.
17-17: Necessary import for new datetime functionality.The inclusion of
DateTimeExpressionin the imports prepares for the new datetime expression parsing capability being added.
2797-2797: Proper type casting for constants.The constant value is now cast to
ScoreTypewhich ensures consistent precision across expression evaluation.
2805-2808: Well-structured implementation of GeoDistance expressions.The GeoDistance variant is properly implemented with consistent type handling for the origin point and destination key.
2809-2812: Support for both constant and payload datetime expressions.The implementation correctly handles both constant datetime values and payload variable references, fulfilling the PR's objective to support datetime expressions.
Consider adding documentation comments to explain the format expected for datetime constants and how they're parsed.
2835-2835:Details
❓ Verification inconclusive
Type conversion maintains compatibility with existing code.
The explicit cast of
by_zero_defaulttof32aligns with the PR's approach of performing calculations with f64 but converting to f32 after evaluation is complete.Verify that this explicit f32 conversion doesn't introduce precision issues when dividing very large microsecond timestamps:
🏁 Script executed:
#!/bin/bash # Check if there are any timestamp-related tests that would validate the division behavior rg -A 5 -B 5 "test.*div" lib/segment/src/index/query_optimization/rescore_formula/ rg -A 5 -B 5 "timestamp.*div|div.*timestamp" lib/segment/src/index/query_optimization/rescore_formula/Length of output: 189
Attention: Further Verification Required for Precision in Timestamp Division
The explicit cast (
as f32) inby_zero_default: by_zero_default.map(|v| v as f32),maintains compatibility with existing code by mirroring our approach—calculations are done in f64, with the final result converted to f32. However, the initial search for tests related to division behavior with very large microsecond timestamps (in the query optimization module) returned no results. This absence means that we currently lack automated verification of potential precision issues.
Action Items:
- Manual Verification: Please confirm manually that the f32 conversion does not cause precision loss when handling very large microsecond timestamps.
- Consider Adding Tests: If feasible, add unit tests or extend current tests to explicitly validate the division behavior in scenarios involving large timestamp values.
lib/segment/src/types.rs (6)
77-77: No immediate concerns
Deserialization logic is consistent and straightforward.
93-93: Implementation ofFromStr
This establishes a clear interface for creatingDateTimePayloadTypefrom strings.
94-95: Appropriate error type
Usingchrono::ParseErroraligns with the underlying parsing library.
126-130: SimpleDisplayimplementation
Delegating the format to the innerchrono::DateTimeis concise and effective.
132-135: DirectFrom<chrono::DateTime<chrono::Utc>>
This provides a clean and type-safe conversion path for date-time values.
108-124:Details
✅ Verification successful
Fallback for naive date/time strings
These additional parse attempts correctly handle formats without explicit time zones. Confirm that treating them as UTC meets domain requirements.
🏁 Script executed:
#!/bin/bash # Optional: check if there are references where naive date-time # strings might need special handling or disclaimers in docs. rg 'from_str\s*\(.*\)' -A 5Length of output: 64088
Fallback Parsing Confirmed: Naive Date/Time Strings as UTC
The fallback chain in
lib/segment/src/types.rsproperly attempts to parse inputs using various formats before converting aNaiveDateTimeto aDateTime<Utc>via.and_utc().into(). The approach is consistent with similar patterns observed in the repository and aligns with the documented intent (see #3529). Please verify that interpreting these naive date/time strings as UTC fully meets the domain requirements. If any adjustments (such as explicit time zone handling) are needed, please address them accordingly.
- Parsing formats:
%Y-%m-%dT%H:%M:%S%.f%Y-%m-%d %H:%M:%S%.f%Y-%m-%d %H:%M%Y-%m-%d(converted viaInto::into)- Conversion to UTC is achieved with
.and_utc().into(), consistently following repository conventions.lib/collection/src/operations/universal_query/formula.rs (5)
8-8: AddedPreciseScoreimport
ImportingPreciseScoreis consistent with the move toward higher-precision calculations.
65-65: Use ofPreciseScore::from(c)
Promoting constants to higher-precision scores reduces rounding issues in subsequent calculations.
105-105: High-precision default for divide-by-zero
Wrappingby_zero_defaultwithPreciseScore::frommaintains uniform precision for safe fallback behavior.
200-205: Extended REST mapping
MappingGeoDistanceandDatetimefrom REST to internal expressions aligns with the new capabilities.
78-81:Details
❓ Verification inconclusive
GeoDistance payload handling
Inserting thetopath intopayload_varsis correct. Validate behavior if multiple expressions share the same path.
🏁 Script executed:
#!/bin/bash # Searching for repeated usage of the same 'to' attribute in code to ensure no collisions: rg 'GeoDistance\s*\{.*to:\s*'Length of output: 31
Action: Confirm and Manually Verify GeoDistance 'to' Field Handling
The
GeoDistancebranch appears to correctly insert thetoattribute intopayload_varsas intended. However, the initial regex search for repeated usage of thetopath did not produce output—likely due to multiline formatting or limitations in the pattern used. Given that the absence of output doesn’t conclusively rule out potential collisions when multiple expressions share the same path, please perform a manual review or run an alternative multiline search for usages of theGeoDistanceblock across the codebase.
- File:
lib/collection/src/operations/universal_query/formula.rs(lines 78-81)- Key Check: Verify that if multiple expressions use the same
topath, no unintended collisions or side effects occur.lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (7)
16-16: Good addition of PreciseScore type alias.This type alias makes the code more maintainable and enhances the precision needed for datetime operations.
38-42: The GeoDistance and DateTime variants are properly structured.The GeoDistance variant has been appropriately restructured and the new DateTime variant has been added correctly to handle datetime expressions.
103-107: Good implementation of DateTimeExpression enum.The enum appropriately handles both constant datetime values and payload variable paths.
109-120: The FromStr implementation follows good pattern matching practices.The implementation correctly attempts to parse the string as a datetime first, then falls back to treating it as a payload key, which aligns with the noted behavior.
182-185: Ensure precision is maintained when converting between numeric types.The conversion from f32 to PreciseScore (f64) is appropriate, but be aware of potential rounding issues when working with floating-point values.
211-224: Type consistency in decay parameters conversion.The decay_lambda_to_params method correctly converts PreciseScore (f64) to ScoreType (f32) when necessary. This is appropriate for maintaining API compatibility while ensuring internal calculations use higher precision.
363-364: Test cases updated with appropriate precision.The test ranges have been updated to use f64 values, which is consistent with the PreciseScore type changes.
Also applies to: 370-371, 377-378
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (8)
10-12: Imports correctly updated to include new types.The imports now include DateTimeExpression and PreciseScore, which is necessary for the new functionality.
21-22: Default constants updated to use PreciseScore.The DEFAULT_SCORE and DEFAULT_DECAY_TARGET constants now use PreciseScore (f64) instead of ScoreType (f32), maintaining type consistency.
42-46: FriendlyName implementations are appropriate.The implementation for PreciseScore and the new implementation for DateTimePayloadType are correctly defined.
Also applies to: 54-58
99-102: Score method appropriately handles type conversion.The method now returns a PreciseScore (f64) cast to ScoreType (f32). This conversion is necessary for backward compatibility but may result in some precision loss, which is acceptable since it happens only after all calculations are complete.
117-118: Score retrieval correctly converts to PreciseScore.The conversion from ScoreType to PreciseScore using PreciseScore::from(*score) ensures that scores are processed with higher precision.
145-163: DateTime evaluation implementation is thorough and well-documented.The implementation correctly handles both constant and payload variable cases, with appropriate error handling. The comments about precision limitations when converting from i64 to f64 are helpful and accurate.
332-345: Decay functions updated to use PreciseScore.All decay functions now use PreciseScore consistently, which is important for maintaining precision throughout all calculations.
485-502: Good test coverage for new datetime functionality.The tests appropriately cover both constant datetime expressions and payload variable cases, including error handling scenarios.
969b153 to
43a8235
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (7)
docs/redoc/master/openapi.json (1)
14541-14551: Datetime expression schema could benefit from additional documentation.The Datetime expression implementation allows for working with datetime values as specified in the PR objectives, but the schema is quite minimal.
Consider enhancing the schema documentation to clarify:
- The expected string format for datetime values
- How the datetime is converted to microseconds (f64)
- Supported constant formats vs. payload key references
"Datetime": { "type": "object", "required": [ "datetime" ], "properties": { "datetime": { "type": "string", + "description": "Payload field containing a datetime string or a constant datetime string. Will be parsed into microseconds (f64) for use in expressions. Supports ISO 8601 and other common datetime formats." } } }docs/grpc/docs.md (1)
2718-2719: Clarify Date-Time Field SemanticsThe new
datetimefield has been added to theExpressionmessage to support datetime expressions. Currently, its description is "Date-time constant or payload key." Since the PR introduces functionality to parse such string inputs into a floating-point microsecond value (f64) for evaluation purposes, it would be beneficial to update the description to clarify the expected format (for example, whether an RFC3339 timestamp is required or if other formats are accepted) and that the string will be converted into a microsecond precision f64 value internally. This added detail can help users understand precisely how to supply and utilize datetime expressions in their queries.lib/api/src/grpc/conversions.rs (1)
2835-2835: Possible precision loss in by_zero_default castCasting
by_zero_defaulttof32may cause precision loss ifScoreTypeisf64(which seems likely given the PR objectives about improving precision for microsecond timestamps).Consider modifying this to avoid potential precision loss:
- by_zero_default: by_zero_default.map(|v| v as f32), + by_zero_default: by_zero_default,Or ensure the consumer of this value can handle the full precision.
lib/api/src/rest/schema.rs (1)
660-676: New structs for GeoDistance and Datetime supportThese structs provide the necessary data structures for the new expression types:
- GeoDistance - measures distance between an origin point and a field value
- Datetime - parses string representations into microsecond values (as f64)
The implementation is clean and consistent with the existing codebase style.
However, I recommend adding documentation comments for these new structs to make their purpose and usage clear, especially since they're part of the public API.
+/// Represents a geographic distance calculation in an expression #[derive(Debug, Serialize, Deserialize, JsonSchema)] pub struct GeoDistance { + /// Parameters for geo distance calculation pub geo_distance: GeoDistanceParams, } +/// Parameters for calculating geographic distance #[derive(Debug, Serialize, Deserialize, JsonSchema)] pub struct GeoDistanceParams { /// The origin geo point to measure from pub origin: GeoPoint, /// Payload field with the destination geo point pub to: JsonPath, } +/// Represents a datetime value or reference in an expression +/// Can be a constant (ISO8601 string) or a payload field reference #[derive(Debug, Serialize, Deserialize, JsonSchema)] pub struct Datetime { + /// String representation of a datetime or a payload field path pub datetime: String, }lib/segment/src/types.rs (3)
73-73: Consider a more explicit error type for out-of-range timestamps.Currently, this function returns
Nonewhen thetsvalue is out of range. Switching to aResult<Self, chrono::ParseError>(or a custom error type) would allow clearer error signaling if you need to distinguish different failure modes.
77-91: Improve error messaging for multiple accepted formats.The error message only references RFC 3339, despite the implementation also parsing multiple date/time formats. Clarifying that other formats are supported can reduce user confusion.
93-124: Refactor repetitive parsing attempts for readability.The code cleanly attempts various parse strategies. For maintainability, consider storing the valid formats in a static list or providing a consolidated error message enumerating all tried formats if parsing fails.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
docs/grpc/docs.md(1 hunks)docs/redoc/master/openapi.json(2 hunks)lib/api/src/grpc/conversions.rs(4 hunks)lib/api/src/grpc/proto/points.proto(1 hunks)lib/api/src/grpc/qdrant.rs(2 hunks)lib/api/src/rest/schema.rs(2 hunks)lib/collection/src/operations/universal_query/formula.rs(6 hunks)lib/collection/src/operations/universal_query/shard_query.rs(1 hunks)lib/common/common/src/math.rs(1 hunks)lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs(11 hunks)lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs(11 hunks)lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs(2 hunks)lib/segment/src/types.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/collection/src/operations/universal_query/shard_query.rs
- lib/common/common/src/math.rs
🧰 Additional context used
🧬 Code Definitions (3)
lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (3)
lib/segment/src/index/field_index/numeric_index/mod.rs (3) (3)
builder(456-461)builder_mmap(475-485)new(442-447)lib/segment/src/json_path/mod.rs (2) (2)
json(544-546)new(42-44)lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (1) (1)
value(150-154)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (1)
lib/segment/src/types.rs (2) (2)
from_str(96-123)from_str(184-194)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (1)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (2) (2)
score(269-270)new_geo_distance(139-141)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test-consistency
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test
🔇 Additional comments (50)
lib/api/src/grpc/qdrant.rs (3)
5185-5185: Expanded tag range for new datetime expression support.The tag range has been extended to accommodate the new
Datetimevariant in the expression enum. This change is consistent with the PR objective to support datetime expressions.
5206-5208: New datetime expression variant added correctly.The new
Datetimevariant has been properly added with appropriate documentation indicating it can handle both datetime constants and payload keys. This implementation enables the datetime parsing functionality described in the PR objectives.
5210-5247: Tag numbering properly adjusted for all expression variants.All subsequent expression variants have had their tag numbers incremented by 1 to accommodate the new
Datetimevariant. This is the correct approach for maintaining backward compatibility when evolving Protocol Buffer schemas.docs/redoc/master/openapi.json (2)
14467-14472: New expression types added to enhance query capabilities.The changes introduce two new expression types to the API schema:
GeoDistance- For geo-spatial distance calculationsDatetime- For datetime value handling and timestamp operationsThese additions align with the PR objective to support datetime expressions that parse strings into floating-point microseconds (f64).
14514-14540: LGTM: Well-structured GeoDistance expression implementation.The GeoDistance expression is well-defined with a clean structure that:
- Takes an origin point (GeoPoint with lat/lon coordinates)
- References a destination field in the payload
- Provides a clear way to calculate geographic distances in expressions
This implementation enables powerful spatial queries within the expression system.
lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (3)
347-360: Comprehensive datetime index test implementation.The test implementation follows the established pattern for creating field indices and properly tests both fully-qualified RFC 3339 timestamps and date-only strings.
364-364: Good addition of datetime index to test fixture.The datetime index is correctly added to the indices map under the "creation" key, maintaining consistency with other test indices.
402-420: Date normalization behavior should be documented.The test correctly verifies that date-only strings like "2023-01-02" are normalized to full RFC 3339 format "2023-01-02T00:00:00Z" during retrieval. However, this important behavior isn't documented in the code.
Consider adding a comment before line 414 that explains this normalization is expected behavior:
match id { 0 => assert_eq!(value, [json!("2023-01-01T00:00:00Z")].into()), + // Note: Date-only string "2023-01-02" is normalized to "2023-01-02T00:00:00Z" 1 => assert_eq!( value, [json!("2023-01-02T00:00:00Z"), json!("2023-01-03T00:00:00Z")].into() ),lib/api/src/grpc/conversions.rs (4)
8-8: Important new dependency addedThe addition of
ScoreTypeis crucial for the datetime support feature - this is part of the shift from f32 to f64 for expressions to maintain precision when handling timestamp microseconds, as mentioned in the PR objectives.
17-17: Adding DateTimeExpression to importsThis import addition correctly enables the new datetime functionality being introduced in the PR.
2797-2797: Type conversion for expression constantsConverting constants to
ScoreTypeensures correct handling of precision for datetime values. This is aligned with the PR's objective to change the expression evaluation type from f32 to f64 for timestamp microsecond precision.
2805-2812: Added new expression variants for GeoDistance and DateTimeThe implementation correctly adds support for:
- GeoDistance with a specified origin point and payload key
- DateTime expressions that can be either constants or payload variables
This matches the PR objective of supporting datetime expressions for turning strings into floating-point microsecond values.
lib/api/src/rest/schema.rs (1)
644-645: Added new Expression enum variantsThe addition of
GeoDistanceandDatetimevariants properly extends the expression functionality to support the new features described in the PR objectives.lib/segment/src/types.rs (2)
126-130: Straightforward Display implementation.Logging the inner
chrono::DateTimeviaself.0is concise and correct.
132-132: Clear conversion from chrono::DateTime.This seamlessly wraps
DateTimeWrapper(dt)and matches usage patterns for Time/Date conversions.lib/collection/src/operations/universal_query/formula.rs (8)
8-8: Good addition ofPreciseScorein imports.This aligns with the shift to higher-precision score handling throughout the code.
27-31: Well-defined new variants for geo-distance and date-time expressions.Introducing
GeoDistance { origin, to }andDateTime(String)is consistent with the new feature. Should future expansions arise, grouping them in a dedicated enum block might improve clarity, but this is clean for now.
65-65: AdoptingPreciseScorefor constants.Switching from
f32directly toPreciseScore::from(c)ensures consistent internal representation for precise computations.
78-81: Correct handling ofGeoDistancevariant.Inserting
tointopayload_varsand creating aParsedExpressionvianew_geo_distanceis correctly implemented and consistent with the rest of the match arm.
82-82: DateTime string parsing logic.Using
dt_str.parse()?can raise a parse error at runtime if the string is unrecognized, but it's aligned with the rest of the code’s error handling strategy.
105-105: Maintaining precision in division defaults.Converting the optional
by_zero_defaulttoPreciseScorekeeps numeric consistency across different formula variants.
200-202: Consistent mapping of REST GeoDistance to internal variant.The transition is intuitive, carrying the
originandtoparameters intoExpressionInternal::GeoDistance.
203-205: New DateTime expression parsing.Straightforward mapping from the REST
Datetimestruct toExpressionInternal::DateTime(datetime).lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (11)
9-9: Good addition of DateTimePayloadType import.The import supports the new DateTime expression functionality while keeping imports focused and minimal.
16-16: Excellent type definition for improved precision.Defining
PreciseScoreasf64is a good approach to maintain precision when dealing with timestamp microseconds, while keeping the code maintainable through a semantic type name. This will help prevent precision loss when handling date calculations.
38-42: Well-structured new expression variants.The new
GeoDistanceandDateTimevariants are correctly implemented in theParsedExpressionenum. The structure with appropriate fields for each variant maintains consistency with the existing pattern in the codebase.
50-50: Appropriate type update for division by zero default.Updating the
by_zero_defaultfield to usePreciseScoreensures type consistency throughout the formula evaluation chain.
69-69: Correct type for decay lambda.The update to use
PreciseScorefor the decay lambda parameter will ensure proper precision in decay functions, which is critical for maintaining accuracy in distance and score calculations.
103-120: Well-designed DateTimeExpression implementation.The new
DateTimeExpressionenum with itsFromStrimplementation provides a clean way to handle both datetime constants and payload variables. The implementation correctly tries to parse as a datetime first, then falls back to interpreting as a payload key, which aligns with the comment in the previous review.
126-126: Type consistency in parameter update.The parameter type update in
new_divmaintains API consistency with the new precision requirements.
182-184: Precise conversion in decay parameters.The explicit conversion from
f32toPreciseScoreusingPreciseScore::fromensures no precision is lost during the type conversion.
211-212: Good return type for decay lambda parameters.The function correctly returns
(ScoreType, ScoreType)which maintains compatibility with callers while allowing for the internal precision improvements.
224-224: Appropriate conversion back to ScoreType.The conversions back to
ScoreTypeare handled correctly at the edges of the API. This pattern of using higher precision internally but converting at the boundaries is a good practice.Also applies to: 239-242, 256-256
363-364: Test ranges updated for PreciseScore.The proptest range values have been properly updated to use
f64instead off32, ensuring the tests properly validate the new precision requirements.Also applies to: 370-371, 377-378
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (14)
10-12: Proper imports for new functionality.The imports have been correctly updated to include
DateTimeExpressionandPreciseScorewhich are essential for the new datetime expression support.
19-19: Added DateTimePayloadType import.The import supports the new datetime expression functionality in the formula scorer.
21-22: Updated constants to use PreciseScore.The constants have been updated to use the more precise
PreciseScoretype, maintaining consistency with the type changes throughout the system.
42-46: Updated FriendlyName implementation for PreciseScore.The
FriendlyNametrait implementation has been correctly updated to usePreciseScoreinstead of the previous type.
54-58: New FriendlyName implementation for DateTimePayloadType.Adding the
FriendlyNameimplementation forDateTimePayloadTypeensures proper error messages when handling datetime values.
99-101: Appropriate conversion in score method.The
scoremethod now correctly converts fromPreciseScoretoScoreTypewhen returning. This maintains backward compatibility while allowing for improved precision in the internal calculations.
109-109: Updated return type for eval_expression.The return type of
eval_expressionhas been updated to usePreciseScore, which is necessary for supporting higher precision in score calculations.
117-117: Good conversion from ScoreType to PreciseScore.Converting the score to
PreciseScoreusingPreciseScore::fromensures no precision is lost during the conversion.
143-144: Correct handling of GeoDistance calculation.The code now properly handles the return type of the Haversine distance calculation, maintaining precision through the computation chain.
145-163: Well-implemented DateTime expression handling.The implementation for the new
DateTimevariant handles both constant and payload variable datetime expressions. The conversion of timestamp toPreciseScoreis done correctly, with helpful comments explaining the precision considerations for microsecond timestamps.
339-352: Updated decay functions for PreciseScore.The decay functions have been properly updated to use
PreciseScorethroughout their calculations, ensuring consistency with the new precision requirements.
498-515: Comprehensive test cases for datetime expressions.The test cases cover both constant and payload variable datetime expressions, including error handling scenarios. This ensures robustness of the new functionality.
466-466: Updated test parameter and result types.The test types have been updated to correctly use
PreciseScoreinstead off32, ensuring that the tests properly validate the new precision requirements.Also applies to: 519-520
532-534: Added datetime test default.The default value for datetime payload variable is appropriately set up in the test fixtures, providing good test coverage.
lib/api/src/grpc/proto/points.proto (2)
587-599: Field indices correctly renumbered.All the subsequent field indices have been properly incremented to accommodate the new
datetimefield, maintaining the protobuf file structure. This approach ensures that the field numbering remains consistent.
586-586:Details
❓ Verification inconclusive
New datetime field added to Expression message.
The new
datetimefield at index 5 correctly enables representing datetime constants or payload keys in expressions.Let's verify that this protocol buffer change is backward compatible:
🏁 Script executed:
#!/bin/bash # Check if the protobuf change preserves backward compatibility # Look for any reference to Expression message field numbers to ensure # they are handled properly in the code rg -A 5 -B 5 "field.*=.*5" --glob "*.rs" | grep -i expressLength of output: 60
Protocol Buffer
datetimeField: Manual Verification RecommendedThe new
datetimefield at index 5 appears to be a backward-compatible addition within the Expression message. Our initial automated search (using the provided shell script) did not reveal any references to field number 5 in the Rust source files, indicating that no conflicting usages were detected. However, in situations where the automatic search produced little or no output, it’s important to manually confirm that existing client code handling the Expression message will correctly ignore the new field when not present.
- Action Items:
- Manually verify that any client code using the Expression message gracefully handles the addition of the optional
datetimefield.- Ensure that the field number (5) is indeed unique and not referenced elsewhere that might impact compatibility.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/redoc/master/openapi.json (1)
14541-14551: Consider enhancing datetime expression documentationThe
DatetimeExpressionimplementation aligns with the PR objective to support datetime parsing into microseconds (f64), but could benefit from additional documentation.Consider adding documentation comments that clarify:
- The expected format of datetime strings (ISO 8601, etc.)
- Whether this accepts both constants and payload keys
- How the parsing into microseconds works
- Examples of valid inputs
"DatetimeExpression": { "type": "object", "required": [ "datetime" ], "properties": { "datetime": { + "description": "A payload key containing a datetime string or a datetime constant in ISO 8601 format. Will be parsed into microseconds since epoch as a float64 value.", "type": "string" } } },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/redoc/master/openapi.json(2 hunks)lib/api/src/rest/schema.rs(2 hunks)lib/collection/src/operations/universal_query/formula.rs(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: Basic TLS/HTTPS tests
🔇 Additional comments (10)
docs/redoc/master/openapi.json (2)
14464-14472: New expression types added for extended querying capabilitiesThe PR has integrated two new expression types:
GeoDistanceandDatetimeExpression, which extend the query capabilities of the API.
14514-14540: GeoDistance expression implementation looks well-structuredThe implementation of the
GeoDistanceexpression and its parameters is well-designed, requiring both origin and destination parameters for proper distance calculation.lib/api/src/rest/schema.rs (2)
643-644: New expression types added correctlyThe addition of
GeoDistance(GeoDistance)andDatetime(DatetimeExpression)variants to theExpressionenum extends the system's capabilities to handle geographic distance calculations and datetime expressions.
660-676: Well-structured implementation of GeoDistance and Datetime expressionsThe implementation of these new structures follows the existing pattern in the codebase. The
GeoDistancestruct uses a nestedGeoDistanceParamsto clearly separate the parameters from the expression type itself, while theDatetimeExpressionis a simple wrapper around a string representation of a datetime.Both implementations are clean and follow the established patterns in the rest of the code.
lib/collection/src/operations/universal_query/formula.rs (6)
8-8: Appropriate import of PreciseScoreThe addition of
PreciseScoreto the imports is necessary for the type change from f32 to f64, which supports the PR's objective of maintaining precision when handling timestamp microseconds.
27-32: Proper implementation of new ExpressionInternal variantsThe GeoDistance variant has been updated to a structured format with clear field names, and a new DateTime variant has been added to support the datetime expressions feature. This change aligns with the structured approach used in other variants.
65-65: Appropriate use of PreciseScore for constantsConverting constants to
PreciseScoreinstead of using f32 directly supports the transition from f32 to f64 for better precision with timestamp microseconds, as mentioned in the PR objectives.
78-82: Complete implementation of parse_and_convert for new variantsThe implementation correctly:
- For GeoDistance: Adds the "to" field to payload_vars and constructs a ParsedExpression
- For DateTime: Parses the datetime string to create the appropriate ParsedExpression
This ensures proper handling of the new expression types during query processing.
105-105: Consistent type conversion for by_zero_defaultThe update to use
PreciseScore::fromfor the by_zero_default parameter ensures consistency with the shift from f32 to f64 throughout the codebase.
200-205: Proper conversion implementation for new expression typesThe implementation of Fromrest::Expression for ExpressionInternal has been updated to handle the new GeoDistance and Datetime variants correctly, ensuring proper conversion between the REST API representation and the internal representation.
f9a6607 to
918a37a
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (2)
170-175: Consider adding documentation about datetime precision decisions.While there's a comment about f64 precision limitations, it would be helpful to explicitly document why the conversion to seconds was chosen (rather than keeping microseconds or using milliseconds) and how this affects use cases like decay functions.
// Convert from i64 to f64. // f64's 53 bits of sign + mantissa for microseconds means a span of exact equivalence of // about 285 years, after which precision starts dropping let float_micros = datetime.timestamp() as PreciseScore; -// Convert to seconds +// Convert to seconds to provide a more manageable scale for common use cases +// such as decay functions that operate better on values closer to 1.0 let float_seconds = float_micros / 1_000_000.0;
504-511: Tests for array handling could be more comprehensive.The tests cover basic array handling but don't test edge cases like arrays with mixed types or deeply nested arrays. Consider adding more test cases to ensure robust handling of complex array structures.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs(13 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (2)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (3) (3)
score(269-270)new_payload_id(162-164)new_geo_distance(139-141)lib/segment/src/types.rs (6) (6)
new(376-389)new(1463-1466)new(2601-2603)new(2825-2827)new(2838-2840)timestamp(68-70)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (8)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (8)
10-12: Correctly updated imports to support new datetime functionality.The changes to imports appropriately include
DateTimeExpressionandPreciseScorefrom the parsed_formula module, which are necessary for the new datetime expression support.
21-22: Constants properly updated to use PreciseScore type.Constants have been correctly updated to use
PreciseScore(f64) instead ofScoreType(f32) to maintain precision when handling timestamp microseconds.
54-58: Good addition of FriendlyName trait implementation for DateTimePayloadType.This implementation provides proper error messages when datetime parsing fails, improving the developer experience.
99-110: Appropriate conversion from PreciseScore to ScoreType with error handling.The score method now correctly converts the high-precision f64 result to f32 after evaluation, with proper error handling for non-finite values. This ensures precision during calculation while maintaining compatibility with the rest of the system.
153-176: Thorough implementation of DateTime expression handling.The implementation correctly:
- Handles both constant and payload variable datetime expressions
- Properly parses string datetime values from payload
- Converts timestamp from microseconds to seconds with appropriate scaling
- Includes comments about precision limitations (~285 years of exact equivalence)
The conversion logic from microseconds to seconds is clear and maintains precision where needed.
135-136: Improved error message for non-number values.The error message now specifies that the value is not a number, making it clearer for debugging.
528-545: Comprehensive test cases for datetime expressions.The tests cover:
- Constant datetime expressions
- Payload variables that don't exist (with appropriate error)
- Payload variables with default values
- Proper timestamp conversion
This ensures the datetime functionality is thoroughly tested.
465-465:Details
✅ Verification successful
Verify precision of GeoDistance calculation.
The expected value for the GeoDistance test has been updated to a higher precision value. Ensure this is consistent with actual geo-distance calculations and doesn't introduce unintended precision issues.
🏁 Script executed:
#!/bin/bash # Check if there are other instances of GeoDistance calculations to verify consistency rg -A 2 "GeoDistance" --type rustLength of output: 4892
GeoDistance Precision Verification Completed
The updated expected value (21926.494151786308) now reflects a higher precision level consistent with how GeoDistance is computed throughout the repository. Based on our search across related modules, the precision update appears intentional and in line with other implementations. Please review the rounding behavior to ensure that no downstream issues arise.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/collection/src/operations/universal_query/formula.rs (1)
27-31: GeoDistance variant correctly reintroduced.Declaring
originandtoclarifies handling geo-distance expressions. Ensure you have or plan to add tests that validate geo-distance logic (e.g., correct path insertion, expected distance calculation outcomes).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/collection/src/operations/universal_query/formula.rs(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test-consensus
- GitHub Check: test (macos-latest)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (4)
lib/collection/src/operations/universal_query/formula.rs (4)
8-8: Good addition ofPreciseScoreandDateTimeExpression.Transitioning from
ScoreType(f32) toPreciseScore(f64) aligns with the PR objective of preserving precision for date-time microseconds. IncludingDateTimeExpressionhere cleanly integrates the new date-time functionality.
65-65: Accurate conversion toPreciseScore.Using
PreciseScore::from(c)ensures continuity with double-precision arithmetic. This change is consistent with the new design, preserving precision to avoid floating-point issues in subsequent calculations.
78-88: Robust date-time parse and variable handling.Your date-time parsing logic with
dt_str.parse()?then selectively insertingjson_pathintopayload_varsis clear and consistent with the API design. Confirm that unexpected or invalid date-time strings properly trigger the error flow, preventing silent failures.Please verify with negative test cases (invalid date-time strings) to ensure the parse errors catch inputs as intended.
206-211:Details
✅ Verification successful
New REST mappings for
GeoDistanceandDateTime.Mapping
rest::Expression::GeoDistanceandrest::Expression::DatetimetoExpressionInternal::GeoDistanceandDateTimeis straightforward. Double-check that the updated field naming matches across all layers (REST, internal, and gRPC) to avoid field mismatch errors at runtime.
🏁 Script executed:
#!/bin/bash # Ensure the naming is consistent across REST, internal, and gRPC definitions. # Searching for "datetime" and "geo_distance" references. rg -A 5 'datetime' rg -A 5 'geo_distance'Length of output: 45754
Naming Consistency Verified Across Layers
The mapping of
rest::Expression::GeoDistancetoExpressionInternal::GeoDistanceandrest::Expression::DatetimetoExpressionInternal::DateTimehas been thoroughly verified. All references in the REST definitions, internal representations, and gRPC schemas (including related documentation) show consistent naming and implementation. No field mismatches were detected.
Why not seconds? It's a float anyhow. I usually don't like it if we diverge from the SI base units. |
| impl FromStr for DateTimeExpression { | ||
| type Err = String; | ||
|
|
||
| fn from_str(s: &str) -> Result<Self, Self::Err> { | ||
| // try as date | ||
| DateTimePayloadType::from_str(s) | ||
| .map(DateTimeExpression::Constant) | ||
| // else as payload key | ||
| .or_else(|_| JsonPath::from_str(s).map(DateTimeExpression::PayloadVariable)) | ||
| .map_err(|_| format!("Invalid date time expression: {s}")) | ||
| } | ||
| } |
There was a problem hiding this comment.
It is not possible to properly distinguish between the two types, right? Not a fan of this, since it's definitely possible for a user to use date time strings as keys.
Maybe like this:
"history": {
"2025-03-24": 128,
"2025-03-23": 110,
"2025-03-22": 283,
"2025-03-21": 10,
"2025-03-20": 123,
"2025-03-19": 28,
"2025-03-18": 387
}There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
lib/api/src/grpc/qdrant.rs (1)
5206-5210: New datetime expression variants added.The addition of
DatetimeandDatetimeKeyvariants appropriately handles both constant datetime values and dynamic references to payload datetime fields. This implementation aligns with the PR objective to support datetime expressions.Consider adding documentation about the expected datetime string format (e.g., ISO 8601) to help API users provide correctly formatted datetime strings.
docs/redoc/master/openapi.json (1)
14517-14527: Definition for the GeoDistance Schema
The newGeoDistanceschema is defined as an object requiring ageo_distanceproperty, which references theGeoDistanceParamsschema. This structure is clear and consistent.Consider adding an optional
"description"field for thegeo_distanceproperty to improve API documentation clarity.docs/grpc/docs.md (1)
2718-2719: Clarify New DateTime Fields in Expression Message
The addition of thedatetimeanddatetime_keyfields correctly extends the Expression table to support date-time expressions. To further improve clarity for API consumers, consider expanding the descriptions to note that the date-time string is expected to represent a constant (or payload key) which is internally parsed into microseconds as an f64 value before any subsequent conversion to f32 is applied. Also, please verify that the updates in related documentation (e.g., any changes inCollectionInfo.PayloadSchemaEntry, if applicable) are consistent and complete.lib/collection/src/operations/universal_query/formula.rs (1)
83-89: ParsingDatetimewith Chrono.
Ensures valid date-time strings become aParsedExpression::Datetime. Consider thorough testing with edge-case timestamps.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
docs/grpc/docs.md(1 hunks)docs/redoc/master/openapi.json(2 hunks)lib/api/src/grpc/conversions.rs(4 hunks)lib/api/src/grpc/proto/points.proto(1 hunks)lib/api/src/grpc/qdrant.rs(2 hunks)lib/api/src/rest/schema.rs(2 hunks)lib/collection/src/operations/universal_query/formula.rs(6 hunks)lib/collection/src/operations/universal_query/shard_query.rs(1 hunks)lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs(13 hunks)lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs(11 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (3)
lib/segment/src/index/query_optimization/optimized_filter.rs (1)
check_condition(34-39)lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (3)
score(256-257)new_payload_id(149-151)new_geo_distance(126-128)lib/segment/src/types.rs (6)
new(376-389)new(1463-1466)new(2601-2603)new(2825-2827)new(2838-2840)timestamp(68-70)
🔇 Additional comments (48)
lib/api/src/grpc/qdrant.rs (2)
5185-5185: Tag sequence update for new datetime expressions.The tag list has been extended to include tags 18 and 19, accommodating the new datetime expression variants. This is necessary to maintain the protocol buffer structure.
5212-5249: Tag numbers updated for all expression variants.All tag numbers have been consistently incremented to accommodate the new datetime variants. This maintains the proper protocol buffer structure.
docs/redoc/master/openapi.json (4)
14467-14475: New Expression Types Added in the Expressions Array
The JSON references forGeoDistance,DatetimeExpression, andDatetimeKeyExpressionhave been correctly inserted into the expressions array. This addition aligns well with the new PR feature for supporting datetime expressions and extended geo-spatial queries.
14528-14543: Definition for the GeoDistanceParams Schema
This schema mandates both theoriginandtoproperties, ensuring the payload includes all essential data for geo queries. Theoriginproperly references aGeoPoint, and thetoproperty includes a description to inform API consumers.
14544-14554: Definition for the DatetimeExpression Schema
The newDatetimeExpressionschema is structured as an object requiring adatetimefield of type string. This meets the requirement of parsing timestamp values. Ensure that the application logic properly converts this string to a floating-point microsecond value during evaluation.
14555-14565: Definition for the DatetimeKeyExpression Schema
TheDatetimeKeyExpressionis defined with a requireddatetime_keystring. This addition is consistent with the naming conventions and requirements for handling date-time expressions by key.lib/collection/src/operations/universal_query/shard_query.rs (1)
400-401: Support for datetime expressions added correctlyThe implementation properly handles the new datetime variant cases, mapping the gRPC representation to their internal counterparts. The naming convention uses "Datetime" (single word) which aligns with existing naming patterns as discussed in previous PR comments.
lib/api/src/grpc/conversions.rs (4)
8-8: New ScoreType import supports precision improvementThis import facilitates the transition from f32 to f64 for handling timestamp microseconds, as described in the PR objectives.
2797-2797: Constants now cast to ScoreType instead of f32This change supports the transition from f32 to f64 for better precision when handling timestamp microseconds during evaluations.
2805-2816: Datetime expression unparsing implemented correctlyThe implementation properly handles both constant datetime values and payload variables, converting them to the appropriate string representation. This completes the bidirectional conversion needed for datetime expressions.
2839-2839:Details
❓ Verification inconclusive
Verify that conversion to f32 doesn't lose precision
While the PR aims to use f64 for better precision, this line converts values back to f32 in the division expression handling. This could potentially lose precision for very large timestamp values.
🌐 Web query:
What is the range of JavaScript's Number.MAX_SAFE_INTEGER vs. f32 max value?💡 Result:
JavaScript's
Number.MAX_SAFE_INTEGERand 32-bit floating-point (f32) values have fundamentally different ranges due to their underlying data formats:
Feature JavaScript Number.MAX_SAFE_INTEGERf32 Max Safe Integer Value (2^{53} - 1 = 9,007,199,254,740,991) [1][2][7] (2^{24} = 16,777,216) [4][6] Precision Limit Integers within (±(2^{53} - 1)) are represented exactly [2][9] Integers ≤ (2^{24}) can be stored precisely without rounding errors [4][6] Data Type 64-bit double-precision floating point [2][9] 32-bit single-precision floating point [4][6] Common Use Cases General JavaScript number operations [5][10] Low-level systems programming, WebAssembly [4][6] Key Technical Differences
JavaScript's MAX_SAFE_INTEGER leverages the 52-bit mantissa of 64-bit floats (with an implicit 53rd bit). This allows exact integer representation up to (2^{53} - 1), beyond which operations like equality checks fail due to precision loss. For example, (9,007,199,254,740,991 + 1 = 9,007,199,254,740,992) becomes unrepresentable as distinct values [2][7].
f32 Max Value uses a 23-bit mantissa with an implicit 24th bit, limiting its integer precision to (2^{24}) (16,777,216). Beyond this, consecutive integers cannot be uniquely distinguished. For example, trying to store
16,777,217in f32 would alias it to16,777,216due to insufficient precision [4][6].Practical Implications
- Use
BigIntin JavaScript for values exceeding (9,007,199,254,740,991) [2][5]- In systems requiring 32-bit floats, validate conversions using
Number.isSafeInteger()checks or limit operations to the (±16,777,216) range [4][6]- Financial/security applications should enforce hard limits based on these thresholds to prevent silent precision errors [5][9]
While
Number.MAX_SAFE_INTEGERprovides over 100,000x greater precision than f32, both serve distinct roles depending on memory constraints and numerical requirements.Citations:
- 1: https://www.w3schools.com/jsref/jsref_max_safe_integer.asp
- 2: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER
- 3: https://udn.realityripple.com/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER
- 4: https://ziggit.dev/t/determining-lower-upper-bound-for-safe-conversion-from-f32-to-i32/3764
- 5: https://docs.vultr.com/javascript/standard-library/Number/MAX_SAFE_INTEGER
- 6: https://www.orillusion.com/api/classes/f32.html
- 7: https://www.programiz.com/javascript/library/number/max_safe_integer
- 8: wasm-bindgen/wasm-bindgen#1156
- 9: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number
- 10: https://codemia.io/knowledge-hub/path/what_is_javascripts_highest_integer_value_that_a_number_can_go_to_without_losing_precision
- 11: https://help.salesforce.com/s/articleView?id=000882015&language=en_US&type=1
Revisit f32 conversion precision concerns in timestamp division handling
The conversion of by_zero_default to f32 may introduce precision loss for very large values. Given that f32 can only precisely represent integers up to 16,777,216—compared to JavaScript’s Number.MAX_SAFE_INTEGER of 9,007,199,254,740,991—please verify whether the values handled here are guaranteed to remain below f32’s precision threshold. If there’s any possibility of exceeding this limit, reconsider either retaining f64 or enforcing a pre-conversion range check to safeguard against unintended rounding.
- File:
lib/api/src/grpc/conversions.rs- Line: 2839
- Code in question:
by_zero_default: by_zero_default.map(|v| v as f32),lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (10)
9-9: Good addition ofDateTimePayloadType.
This import expansion properly aligns with the new datetime expressions.
16-16: Switching fromf32tof64is appropriate for precision.
UsingPreciseScore = f64minimizes floating-point precision loss, especially valuable for timestamps.
35-36: Refactor toConstant(PreciseScore)improves internal consistency.
Changing the constant variant to usePreciseScoremaintains uniformity across the new expression model.
38-42: IntroducingGeoDistanceandDatetimevariants.
These additions enable location-based and datetime-based score adjustments. Implementation looks coherent with existing patterns.
44-44: Minor style alignment.
The comment “// Nested” is consistent with how you separate terminal vs nested expressions. No further action needed.
50-50: Consistent usage ofOption<PreciseScore>.
Ensuring theby_zero_defaultparameter is typed asPreciseScorematches the new float precision strategy.
69-69: Expanded usage oflambda: PreciseScore.
This change ensures consistent high-precision for decay-related calculations, reducing rounding errors.
133-133: PropTest withf64range for Lin decay.
Using0.000001..1.0fosters robust testing to catch edge cases for linear decay. Nicely done.
157-157: PropTest with negativef64range for Exp decay.
Ensuring coverage of negative lambda values is critical; no issues noted.
364-364: PropTest with large negativef64range for Gauss decay.
Wide range coverage for Gaussian decay ensures better resilience. This seems suitable.lib/api/src/rest/schema.rs (5)
643-645: New variants for geo-distance and date-time handling.
AddingGeoDistance,Datetime, andDatetimeKeyextends the query expression features in a well-structured manner.
661-664:GeoDistancestruct introduction.
Clearly encapsulates the geospatial parameters. Kudos for making the API more expressive.
666-672:GeoDistanceParamsfor explicit parameter grouping.
Defining origin and destination fields fosters clarity and type safety.
674-677:DatetimeExpressionstruct.
Allows a direct string-based date-time expression, laying groundwork for robust date-time parse logic.
679-682:DatetimeKeyExpressionstruct.
Encapsulating the payload-based date-time retrieval is a neat approach to unify key-based expressions.lib/collection/src/operations/universal_query/formula.rs (8)
27-32: Additional variants forGeoDistance,Datetime, andDatetimeKey.
These expansions unify expression handling across collection queries, matching the changes in the REST layer.
66-66: Upcasting constants toPreciseScore.
This approach improves precision and aligns with the codebase’s new float usage.
79-82:GeoDistancelogic is consistent with rest of the system.
Inserting the JSON path intopayload_varsand constructing vianew_geo_distanceis correct.
90-96: Payload-basedDatetimeKeyconversion.
This setup is elegantly consistent. If the user stores date-time strings in payload fields, this variant will parse them out.
119-119: Consistently convertingby_zero_defaulttoPreciseScore.
Preserves uniform float precision in division expressions.
214-216: Mapping RESTGeoDistanceto internal variant.
The bridging code is correctly placed, with no apparent pitfalls.
217-219: Mapping RESTDatetimeto internal expression.
Implementation straightforwardly reuses the date-time string.
220-222: Mapping RESTDatetimeKeyto internal expression.
Implementation aligns well with the newly introduced date-time payload approach.lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (12)
10-11: Good addition of new importsThe changes correctly import
DatetimeExpressionandPreciseScoreused in the updated formula scorer implementation.
21-22: Appropriate constant type changeUpdating constants to use
PreciseScore(f64) instead ofScoreType(f32) is a good change to maintain precision for timestamp calculations.
42-46: Correct trait implementation updateCorrectly modified the
FriendlyNametrait implementation to usePreciseScoreinstead ofScoreType.
54-58: Good addition of FriendlyName implementation for DateTimePayloadTypeThis implementation allows datetime values to be properly described in error messages, enhancing debugging and user experience.
99-109: Good precision handling with clear conversion strategyThe implementation correctly maintains precision by using
PreciseScore(f64) during calculations and only converting tof32at the end. The non-finite number check is also properly updated to handle the conversion.
117-117: Correct method signature updateChanged return type from
ScoreTypetoPreciseScoreto maintain high precision during formula evaluation.
125-125: Good handling of score conversionCorrectly converts retrieved
ScoreTypescores toPreciseScorewhen processing variable expressions.
151-152: Type consistency in geo distance calculationThe geo distance calculation now correctly returns a
PreciseScorevalue, maintaining type consistency throughout the evaluation process.
153-176: Well-implemented datetime expression handlingThe implementation for datetime expressions is thorough and handles both constant and payload variable cases. Good error handling and proper conversion from microseconds to seconds.
I appreciate the detailed comment about f64 precision for timestamp microseconds (spanning about 285 years of exact equivalence) which explains the technical limitations clearly.
351-364: Good update to decay function signaturesThe decay functions (exp_decay, gauss_decay, linear_decay) have been correctly updated to use
PreciseScoreinstead ofScoreTypefor better precision.
488-488: Appropriate test updateUpdated the test case to use
PreciseScorefor expected values, ensuring consistency with the implementation changes.
528-545: Comprehensive test cases for datetime expressionsGood addition of test cases for datetime expressions that cover both constant values and payload variables, including error cases. This ensures the new functionality is properly tested.
lib/api/src/grpc/proto/points.proto (2)
586-587: Good addition of datetime fields to Expression messageAdding
datetimeanddatetime_keyfields enables support for datetime expressions in the API, aligning with the implementation changes in the formula scorer.
588-601: Correct field numbering after adding new fieldsAll subsequent fields have been correctly renumbered to accommodate the new datetime fields. This maintains proper protocol buffer field numbering conventions.
a684943 to
2c3155a
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
lib/api/src/grpc/qdrant.rs (1)
5182-5186: Update of tag indices requires careful API versioning consideration.The increase in tag indices within the
prostattribute to accommodate new datetime expression variants might impact backward compatibility with older clients. Before finalizing this change, ensure proper API versioning strategy is in place to handle clients that might be expecting the original tag indices.Consider consulting your team's API versioning guidelines. In many gRPC systems, shifting existing tag numbers is avoided by either:
- Adding new fields only at the end of existing definitions
- Implementing parallel API version support if breaking changes are necessary
docs/redoc/master/openapi.json (1)
14523-14571: New Schema Definitions for GeoDistance and Datetime Expressions
The added schema definitions for GeoDistance, GeoDistanceParams, DatetimeExpression, and DatetimeKeyExpression correctly specify the required properties and structure. The approach of referencing an existing GeoPoint for the "origin" property and adding a descriptive field for "to" is appropriate. You might consider whether additional validation—for example, format constraints (such as an ISO 8601 regex or"format": "date-time") for the datetime fields—would benefit API consumers by enforcing the expected string format.lib/api/src/grpc/conversions.rs (1)
842-842: Ensure consistent precision handling.The
by_zero_defaultis cast to f32 despite being stored asScoreType(f64). This might cause precision loss when handling large timestamp values.Consider keeping the f64 precision throughout the entire expression evaluation pipeline:
-by_zero_default: by_zero_default.map(|v| v as f32), +by_zero_default: by_zero_default,This would require updating the receiving type in the protocol definition, but would maintain precision consistency.
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (2)
153-176: Add datetime expression handling with microsecond precisionThe implementation correctly handles both constant and payload variable datetime expressions, converting them to seconds with microsecond precision, which is suitable for the scoring use case.
However, the conversion factor should be a named constant for better maintainability.
+// Conversion factor from microseconds to seconds +const MICROSECONDS_TO_SECONDS: PreciseScore = 1_000_000.0; fn eval_expression( &self, expression: &ParsedExpression, point_id: PointOffsetType, ) -> OperationResult<PreciseScore> { match expression { // ... other cases ... ParsedExpression::Datetime(dt_expr) => { // ... existing code ... // Convert to seconds - let float_seconds = float_micros / 1_000_000.0; + let float_seconds = float_micros / MICROSECONDS_TO_SECONDS; Ok(float_seconds) } // ... other cases ... } }
171-175: Consider documenting the precision implications of date conversionWhile the code correctly converts microseconds to seconds, it would be beneficial to add a comment about the chosen approach for datetime handling, explaining why seconds were chosen as the unit and how this impacts scoring.
// Convert from i64 to f64. // f64's 53 bits of sign + mantissa for microseconds means a span of exact equivalence of // about 285 years, after which precision starts dropping let float_micros = datetime.timestamp() as PreciseScore; +// Convert to seconds for consistent scoring scale +// This approach provides sufficient precision for decay functions while keeping +// the values in a reasonable range for typical scoring calculations let float_seconds = float_micros / 1_000_000.0; Ok(float_seconds)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
docs/grpc/docs.md(1 hunks)docs/redoc/master/openapi.json(2 hunks)lib/api/src/grpc/conversions.rs(4 hunks)lib/api/src/grpc/proto/points.proto(1 hunks)lib/api/src/grpc/qdrant.rs(2 hunks)lib/api/src/rest/schema.rs(2 hunks)lib/collection/src/operations/universal_query/formula.rs(6 hunks)lib/collection/src/operations/universal_query/shard_query.rs(1 hunks)lib/common/common/src/math.rs(1 hunks)lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs(13 hunks)lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs(11 hunks)lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs(7 hunks)lib/segment/src/types.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- lib/collection/src/operations/universal_query/shard_query.rs
- docs/grpc/docs.md
- lib/common/common/src/math.rs
- lib/api/src/rest/schema.rs
- lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs
- lib/segment/src/types.rs
- lib/api/src/grpc/proto/points.proto
🧰 Additional context used
🧬 Code Definitions (2)
lib/collection/src/operations/universal_query/formula.rs (3)
lib/collection/src/operations/universal_query/shard_query.rs (8)
from(502-507)from(511-516)from(520-524)from(528-532)from(568-587)from(591-611)from(615-635)from(639-669)lib/api/src/rest/schema.rs (1)
from(1223-1228)lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (1)
new_geo_distance(126-128)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (2)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (3)
score(256-257)new_payload_id(149-151)new_geo_distance(126-128)lib/segment/src/types.rs (5)
new(376-389)new(1463-1466)new(2601-2603)new(2825-2827)new(2838-2840)
🪛 GitHub Actions: Storage compatibility tests
lib/api/src/grpc/conversions.rs
[error] 7-7: unresolved import common::counter::hardware_data::RealCpuMeasurement: no RealCpuMeasurement in counter::hardware_data
🪛 GitHub Actions: Formatter and linter
lib/api/src/grpc/conversions.rs
[error] 7-7: unresolved import common::counter::hardware_data::RealCpuMeasurement: no RealCpuMeasurement in counter::hardware_data.
🪛 GitHub Actions: Rust tests
lib/api/src/grpc/conversions.rs
[error] 7-7: unresolved import common::counter::hardware_data::RealCpuMeasurement: no RealCpuMeasurement in counter::hardware_data
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consensus-compose
🔇 Additional comments (23)
lib/api/src/grpc/qdrant.rs (2)
5206-5211: Addition of new DateTime expression variants.The new
DatetimeandDatetimeKeyvariants align with the PR objectives to support datetime expressions. The implementation correctly provides mechanisms for both constant datetime values and references to datetime values stored in payload keys.Ensure comprehensive documentation is added explaining the datetime string format that will be supported and how the parsing to microseconds (f64) will be handled in the implementation.
5213-5250:Details
❓ Verification inconclusive
Verify tag number shifts won't break existing clients.
All tag numbers from the original
Multvariant onwards have been incremented to accommodate the new datetime variants. This represents a potentially breaking change for existing clients that expect specific tag numbers for these operations.Consider running the following verification to identify potential compatibility issues:
🏁 Script executed:
#!/bin/bash # Find all client code that might be affected by tag number changes # Look for direct references to Expression tag numbers in tests or client code rg -A 1 -B 1 "tag\s*=\s*[\"']?[5-9][0-9]?[\"']?" --glob "**/*.{rs,proto}" # Check if there are clients in the wild that might be affected # Look for import statements of the Expression proto rg -A 5 -B 5 "Expression|expression::Variant" --glob "**/*.{rs,py,go,js,java}"Length of output: 113104
Important: Confirm Backward Compatibility with Revised Protobuf Tag Numbers
The refactoring in
lib/api/src/grpc/qdrant.rshas inserted new datetime variants (with tags 5 and 6), which shifted the tag numbers for all subsequent variants (e.g.Multis now tag 7,Sumtag 8, etc.). Our grep verification confirms that these tags are referenced throughout our internal gRPC and REST schema files, and no external client code was found that hardcodes these tag numbers. However, if any clients depend on the previous numeric mapping—for example, by using serialized messages directly or referencing tag values manually—this change could break compatibility.Action items:
- Manually verify client compatibility: Double-check that any external or stored serialized data correctly maps to the new tag assignments.
- Review tests and utilities: Ensure that hardcoded tag references (if any exist) are updated accordingly.
- Document the migration: Update changelogs or client guidelines as needed, warning users about this potentially breaking change.
docs/redoc/master/openapi.json (1)
14473-14481: New Expression References Added to the Schema List
The new entries for GeoDistance, DatetimeExpression, and DatetimeKeyExpression have been added correctly into the array. Please ensure these references remain consistent with their corresponding schema definitions later in the file.lib/api/src/grpc/conversions.rs (2)
800-801: Type conversion for precision improvement.The constant value is now cast to
ScoreType(f64) instead of using f32 directly, which improves precision for operations involving timestamps in microseconds.
808-819: Good implementation of the datetime and geo-distance expressions.The new handling for
GeoDistanceandDatetimevariants aligns well with the PR objective of supporting datetime expressions. The code correctly handles both constant datetime values and payload variable references.lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (3)
264-277: Well-structured test data addition.The new payload data structure is well-organized and properly maintains test coverage for the existing functionality while adding support for the new test cases.
364-377: Good test coverage for datetime indexing.The datetime index creation is well implemented, with test cases that verify both explicit ISO 8601 format ("2023-01-01T00:00:00Z") and shorthand format ("2023-01-02").
419-437: Consider adding a comment about datetime normalization behavior.The test demonstrates that short date formats like "2023-01-02" are automatically normalized to full RFC 3339 timestamps "2023-01-02T00:00:00Z" during retrieval, but this behavior isn't documented.
Add a comment near line 430 explaining this normalization behavior:
0 => assert_eq!(value, [json!("2023-01-01T00:00:00Z")].into()), 1 => assert_eq!( value, + // Note: "2023-01-02" is automatically normalized to "2023-01-02T00:00:00Z" [json!("2023-01-02T00:00:00Z"), json!("2023-01-03T00:00:00Z")].into() ),lib/collection/src/operations/universal_query/formula.rs (6)
8-8: Important type imports for precision improvement.The additional imports of
DatetimeExpressionandPreciseScoresupport the PR objective of improving precision when handling timestamps.
27-33: Good addition of new expression types.The new enum variants for
GeoDistance,Datetime, andDatetimeKeyare well-structured and align with the PR objectives. TheGeoDistanceimplementation looks good with proper field types.
66-66: Improved precision with PreciseScore.Converting constants to
PreciseScore(f64) is important for maintaining precision when working with microsecond timestamps.
79-96: Complete implementation of datetime and geo-distance parsing.The
parse_and_convertmethod now properly handles the new expression types:
- For
GeoDistance, it correctly adds the payload key to the tracked variables- For
Datetime, it parses the string into a datetime constant with error handling- For
DatetimeKey, it parses the key and adds it to the tracked payload variables
119-119: Consistent use of PreciseScore for by_zero_default.Converting the
by_zero_defaultoption toPreciseScore(f64) maintains precision consistency in the division operation.
214-222: Complete implementation of expression type conversions.The
From<rest::Expression>implementation properly handles the new expression types, ensuring consistent conversion between the REST API and internal representations.lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (9)
10-12: Properly updated imports to include the newDatetimeExpressiontype andPreciseScoreGood addition of the necessary imports to support datetime expressions and precise scoring.
19-19: LGTM: Added import for DateTimePayloadTypeThe addition of this import enables the datetime functionality introduced in this PR.
21-22: Updated constant types to use PreciseScoreGood update of the constant types from
ScoreType(f32) toPreciseScore(f64) to maintain precision when dealing with datetime values.
54-58: Added FriendlyName implementation for DateTimePayloadTypeGood addition of the FriendlyName trait implementation, which is necessary for error reporting when dealing with datetime values.
99-110: Updated score method to convert PreciseScore to ScoreTypeThe score method now correctly converts the f64 precision score to f32 for final output, with proper error handling for non-finite values.
117-117: Changed return type to use PreciseScoreGood update to use
PreciseScore(f64) as the return type for evaluation to maintain precision during calculations.
125-125: Updated score conversion to use PreciseScoreThe conversion from
ScoreTypetoPreciseScoreis now properly implemented.
351-351: Updated decay function signatures to use PreciseScoreGood conversion of the decay function parameters from
ScoreTypetoPreciseScoreto maintain precision during calculations.Also applies to: 356-356, 361-361
528-545: Added comprehensive tests for datetime expressionsGood test coverage for the new datetime functionality, including tests for constant datetime expressions, payload variable expressions, and error cases.
2f333b0 to
2c3155a
Compare
2c3155a to
eeac2b1
Compare
|
Just updated the description, @timvisee the PRs uses datetime as float seconds |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (5)
lib/api/src/grpc/qdrant.rs (1)
5206-5211: Consider adding datetime format validation.The code accepts any string for datetime values without validation. Consider adding validation or documentation specifying the expected datetime string format to prevent runtime errors.
What are best practices for handling datetime string formats in Rust?docs/redoc/master/openapi.json (1)
14523-14571: Define Schemas for New Expression Types.
The schema definitions for"GeoDistance","GeoDistanceParams","DatetimeExpression", and"DatetimeKeyExpression"are clearly and consistently structured. As a minor enhancement, consider specifying a"format": "date-time"for the datetime strings if they conform to ISO standards. This additional detail can help API consumers with validation and improve documentation clarity. Additionally, verify that the description for the"to"field in"GeoDistanceParams"provides enough context for users.lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (1)
103-107: Good addition of DatetimeExpression enum, but missing documentation.The new
DatetimeExpressionenum provides a clean way to represent both constant datetime values and payload variables. However, it would benefit from documentation explaining its purpose and usage.Consider adding documentation to explain the purpose and usage of the
DatetimeExpressionenum:/// Represents a datetime expression that can be either a constant datetime value /// or a reference to a datetime field in the payload. #[derive(Debug, Clone, PartialEq)] pub enum DatetimeExpression { /// A constant datetime value Constant(DateTimePayloadType), /// A reference to a datetime field in the payload PayloadVariable(JsonPath), }lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (1)
423-441: Well-implemented datetime retrieval test, but missing documentation about normalization.The test for datetime retrieval properly verifies that datetime values can be retrieved from the index correctly. However, it would be helpful to add a comment explaining that date strings (like "2023-01-02") are intentionally normalized to full RFC 3339 timestamps ("2023-01-02T00:00:00Z").
Add a comment to clarify the datetime normalization behavior:
// Test retrieving a datetime from the index. +// Note: Date strings without time information (e.g., "2023-01-02") are +// normalized to midnight UTC (e.g., "2023-01-02T00:00:00Z") during conversion +// to DateTimePayloadType. let retriever = variable_retriever( &indices, &"creation".try_into().unwrap(), payload_provider.clone(), &hw_counter, );lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (1)
125-125: Consider optimizing score conversionWhile this conversion works, consider using direct assignment for simplicity and potentially better performance.
- .map(|score| PreciseScore::from(*score)) + .map(|&score| PreciseScore::from(score))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
docs/grpc/docs.md(1 hunks)docs/redoc/master/openapi.json(2 hunks)lib/api/src/grpc/conversions.rs(4 hunks)lib/api/src/grpc/proto/points.proto(1 hunks)lib/api/src/grpc/qdrant.rs(2 hunks)lib/api/src/rest/schema.rs(2 hunks)lib/collection/src/operations/universal_query/formula.rs(6 hunks)lib/collection/src/operations/universal_query/shard_query.rs(1 hunks)lib/common/common/src/math.rs(1 hunks)lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs(13 hunks)lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs(11 hunks)lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs(7 hunks)lib/segment/src/types.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- lib/collection/src/operations/universal_query/shard_query.rs
- lib/common/common/src/math.rs
- lib/segment/src/types.rs
- docs/grpc/docs.md
- lib/api/src/grpc/proto/points.proto
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (3)
lib/segment/src/index/query_optimization/optimized_filter.rs (1)
check_condition(34-39)lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (3)
score(256-257)new_payload_id(149-151)new_geo_distance(126-128)lib/segment/src/types.rs (5)
new(376-389)new(1463-1466)new(2601-2603)new(2825-2827)new(2838-2840)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
🔇 Additional comments (35)
lib/api/src/grpc/qdrant.rs (3)
5206-5211: Good addition of datetime expressions.The new
DatetimeandDatetimeKeyvariants align with the PR objective to support datetime expressions for score boosting. These additions will allow parsing datetime strings into f64 values representing microseconds as described in the PR.
5213-5250:Details
❓ Verification inconclusive
Confirm tag renumbering across all related code.
All existing variants have had their tags increased by several positions to accommodate the new datetime variants. This systematic renumbering appears correct, but ensure that:
- All parsers and serializers are updated to handle the new tag numbers
- Any stored data using the old tag numbers is properly migrated
- Client code expecting specific tag numbers is updated
🏁 Script executed:
#!/bin/bash # Check for potential parsing/conversion code that might need updates echo "Looking for expression conversion code:" rg "fn.*Expression" -t rust echo "Checking for protobuf version and code generation:" rg "prost" -g "Cargo.toml"Length of output: 2325
Ensure consistent tag renumbering across the codebase
The updated tag numbers in
lib/api/src/grpc/qdrant.rs(lines 5213-5250) appear correct. However, please make sure that the renumbering is fully reflected across the system by confirming that:
- All parsers and serializers (for example, those in conversion functions found in files under
lib/segment/src/index/query_optimization/rescore_formula/andlib/collection/src/operations/universal_query/) correctly interpret the new tag values.- Any persisted data using the old tag numbers is either migrated or handled so that no legacy data leads to inconsistencies.
- Client code expecting specific tag numbers is updated accordingly, including tests that validate the end-to-end serialization/deserialization process.
Please review these points (and consider running integration tests) to ensure that the renumbering change does not inadvertently break downstream or storage-related functionality.
5185-5185:Details
❓ Verification inconclusive
Verify protocol compatibility after tag range expansion.
The tag range for the
oneoffield has been expanded to include tags 17, 18, and 19. Since Protocol Buffers rely on stable tags for backward compatibility, ensure this change won't break existing clients or data.
🏁 Script executed:
#!/bin/bash # Check if there are any older versions of API clients that might be affected # Look for imports or usage of the Expression struct in test code echo "Checking for Expression usage in tests:" rg "Expression" -t rust --glob "*/tests/**" --max-depth 3 echo "Checking for potential API client usage:" rg "use.*grpc.*Expression" -t rustLength of output: 540
Protocol Compatibility Check for Expanded Tag Range
The tag range for the oneof field in
lib/api/src/grpc/qdrant.rsnow includes tags 17, 18, and 19. Please ensure that this change does not conflict with any existing clients. Note the following:
- API Client Usage: A search found a reference in
lib/collection/src/operations/universal_query/shard_query.rsusingDecayParamsExpressionfrom the grpc module. Double-check that this usage and any other client interactions remain compatible with the new tag assignments.- Test Coverage: The initial test search for
Expressionusage in tests did not return results. Verify that existing integration tests (or add new ones) cover scenarios involving the modified oneof field to guarantee backward compatibility.Please run a thorough compatibility review and confirm that no unintended breaks occur for existing API clients.
docs/redoc/master/openapi.json (1)
14473-14481: Integrate New Expression Types in Schema Union.
The new references for"GeoDistance","DatetimeExpression", and"DatetimeKeyExpression"are correctly added to the list of expression options. Please ensure that downstream consumers of the API are updated to properly handle these new types and that the ordering in the schema (if significant) aligns with the intended evaluation precedence.lib/api/src/grpc/conversions.rs (5)
8-8: Good addition of the ScoreType import.The addition of the
ScoreTypeimport is appropriate for supporting the new datetime expressions feature, ensuring consistent type handling in your expression evaluation.
17-17: Good import of DatetimeExpression for enhancing formula capabilities.Adding the
DatetimeExpressionimport from theparsed_formulamodule properly supports the new datetime expression functionality.
2800-2800: Well-structured type handling for constants.The change to cast the constant value to
ScoreTypeensures proper type consistency throughout the formula evaluation process.
2808-2819: Well-implemented support for datetime expressions.The implementation of the
GeoDistanceandDatetimevariants in theunparse_expressionfunction is well-structured. Both variants properly convert their respective data types for use in expressions:
GeoDistancecorrectly transforms origin and key dataDatetimeappropriately handles both constant datetime values and payload variablesThis aligns well with the PR objectives to support datetime expressions.
2842-2842: Type consistency improvement for by_zero_default.Casting the
by_zero_defaultvalue tof32ensures proper type consistency between the internal representation and the API.lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (6)
16-16: Good addition of PreciseScore type for improved precision.Defining
PreciseScoreasf64helps maintain precision when handling timestamp microseconds during evaluations, which aligns with the PR objectives. This is especially important for time-based operations that require greater floating-point precision.
36-43: Well-structured updates to ParsedExpression enum.The changes to the
ParsedExpressionenum effectively support the new requirements:
- Using
PreciseScorefor constants improves precision- Properly structuring
GeoDistancewith origin and key fields- Adding a new
Datetimevariant that leverages theDatetimeExpressionenumThese changes align perfectly with the PR objectives to support datetime expressions.
169-171: Properly updated decay_params_to_lambda to use PreciseScore.The changes to convert the midpoint and scale to
PreciseScoreensure that calculations maintain adequate precision throughout the evaluation process, especially important for timestamp-related operations.
198-198: Good type adaptation for decay_lambda_to_params.Updating the function signature to accept
PreciseScorewhile returning(ScoreType, ScoreType)properly handles the transition between high-precision internal calculations and API-compatible output types.
211-211: Appropriate type casting in decay functions.The addition of type casting to
ScoreTypein the return values of thedecay_lambda_to_paramsmethod ensures proper type compatibility while maintaining precision where it matters most in the calculation.Also applies to: 227-229, 243-243
350-351: Well-updated test cases for new PreciseScore type.The test functions have been correctly updated to use
PreciseScoreinstead off32, ensuring that the tests properly validate the updated behavior of the decay functions.Also applies to: 357-358, 364-365
lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (2)
264-271: Good addition of test payload with array data.The new
payload3test case enhances test coverage by including scenarios with arrays containing single values, which helps verify correct handling of both single values and arrays.
364-385: Thorough test setup for datetime field index.The addition of a datetime field index in the test setup is well-implemented, providing good coverage of different date formats and multiple values per point.
lib/api/src/rest/schema.rs (3)
643-645: Support for new expression types looks goodThe addition of the
GeoDistance,Datetime, andDatetimeKeyvariants to theExpressionenum provides the necessary infrastructure for handling datetime expressions as specified in the PR objectives.
661-672: Well-structured GeoDistance implementationThe new
GeoDistancestruct and its parameters follow the established pattern of the codebase. The struct contains clear field definitions with descriptive comments.
674-682: Clean implementation of datetime expressionsThe
DatetimeExpressionandDatetimeKeyExpressionstructs provide a clean API for datetime handling, aligning perfectly with the PR objective to support parsing strings into floating-point numbers representing microseconds.lib/collection/src/operations/universal_query/formula.rs (7)
8-9: Type change aligns with precision requirementsThe import of
PreciseScorealigns with the PR objective of changing expression evaluation from f32 to f64 for improved precision when handling timestamp microseconds.
27-32: Good implementation of new expression variantsThe addition of
GeoDistance,Datetime, andDatetimeKeyvariants to theExpressionInternalenum provides the internal representation needed for datetime expressions.
66-66: Precision enhancement for constantsConverting constants to
PreciseScoreensures that all numerical operations maintain higher precision throughout the expression evaluation pipeline.
79-82: Clean GeoDistance implementationThe implementation correctly adds the
tofield topayload_varsand uses the appropriate constructor method for theGeoDistanceexpression.
83-96: Thorough datetime expression handlingThe parsing and conversion of datetime expressions is handled well with appropriate error handling for both constant and payload variable cases.
Two points worth noting:
- For constants, the code correctly parses the datetime string with proper error handling
- For payload variables, it correctly adds the path to
payload_varscollection
119-119: Consistent precision enhancementConverting the by_zero_default value to PreciseScore maintains precision consistency throughout all formula operations.
214-222: Robust handling of datetime expression variantsThe implementation of
From<rest::Expression>for the new datetime variants is clean and follows the established pattern for other expression types.lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (8)
10-12: Import changes align with precision requirementsThe updated imports, including
DatetimeExpressionandPreciseScore, properly support the transition to higher precision scoring.
21-22: Consistent constant type changesThe constants are correctly updated to use
PreciseScoretype instead of literals, maintaining type consistency throughout the codebase.
54-58: Good addition of FriendlyName implementationThe implementation of
FriendlyNameforDateTimePayloadTypehelps with generating user-friendly error messages, consistent with other types.
99-109: Proper score conversion with finite checkThe implementation correctly converts the
PreciseScoreresult tof32before returning, with appropriate validation to ensure the value is finite. This aligns with the PR objectives to maintain precision during computation but return a compatible score type.
153-176: Well-implemented datetime expression handlingThe implementation of datetime expression evaluation is thorough and handles both constant and payload variable cases appropriately. The conversion from timestamp to seconds is implemented correctly, and the code includes helpful comments explaining the precision considerations.
351-351: Consistent decay function type updatesThe decay functions are correctly updated to use
PreciseScorethroughout, maintaining type consistency for all mathematical operations.Also applies to: 356-356, 361-361
528-545: Comprehensive datetime test casesThe test cases for datetime expressions are thorough and test various scenarios including constant expressions, missing variables, and variables with defaults. This ensures the code behaves as expected in different situations.
561-564: Good test default value for datetimeThe default value for testing datetime expressions is well-chosen and properly formatted, which helps ensure robust test coverage.
eeac2b1 to
f957d6b
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
docs/redoc/master/openapi.json (1)
14523-14571: Definition of New Schema Components
The new schema components for "GeoDistance", "GeoDistanceParams", "DatetimeExpression", and "DatetimeKeyExpression" are well-structured. Notable points:
- GeoDistance: It requires a
geo_distanceproperty that refers to "GeoDistanceParams".- GeoDistanceParams: Both
origin(referencingGeoPoint) andto(with an appropriate description) are defined as required.- DatetimeExpression & DatetimeKeyExpression: These components simply require a string-type property.
Consider adding additional descriptions or examples for the datetime properties to clarify the expected format (e.g., ISO 8601) given that these values will be parsed into precise floating-point numbers later in the evaluation process.lib/api/src/grpc/conversions.rs (1)
8-8: Add documentation for the newly imported ScoreType.Consider adding a comment to explain why
ScoreTypeis being imported and what its purpose is, especially since it's being used for a type conversion in the expression unparsing.lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (2)
170-171: Consider making the decay function defaults constants using PreciseScore.Instead of converting the existing f32 constants to PreciseScore at runtime, consider defining new constants with the PreciseScore type for better type consistency.
-const DEFAULT_DECAY_MIDPOINT: f32 = 0.5; -const DEFAULT_DECAY_SCALE: f32 = 1.0; +const DEFAULT_DECAY_MIDPOINT: f32 = 0.5; +const DEFAULT_DECAY_SCALE: f32 = 1.0; +const DEFAULT_DECAY_MIDPOINT_PRECISE: PreciseScore = 0.5; +const DEFAULT_DECAY_SCALE_PRECISE: PreciseScore = 1.0;And then in the function:
-let midpoint = PreciseScore::from(midpoint.unwrap_or(DEFAULT_DECAY_MIDPOINT)); -let scale = PreciseScore::from(scale.unwrap_or(DEFAULT_DECAY_SCALE)); +let midpoint = midpoint.map(PreciseScore::from).unwrap_or(DEFAULT_DECAY_MIDPOINT_PRECISE); +let scale = scale.map(PreciseScore::from).unwrap_or(DEFAULT_DECAY_SCALE_PRECISE);
199-243: Good implementation but consider explaining the type conversions.The cast from
PreciseScore(f64) toScoreTypein the decay function is important for maintaining precision but could benefit from a comment explaining why this conversion is necessary.Consider adding a comment like:
// Convert from PreciseScore (f64) to ScoreType to maintain compatibility with the existing APIs // while still leveraging the higher precision during calculationslib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (1)
423-441: Add clarifying comment about datetime normalization.The test shows that "2023-01-02" is normalized to "2023-01-02T00:00:00Z" - this behavior should be documented to clarify that this is intentional.
Add a comment near line 435:
// Note: Date strings without time components (e.g., "2023-01-02") are intentionally // normalized to midnight UTC (e.g., "2023-01-02T00:00:00Z") during conversion to DateTimePayloadTypelib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (1)
153-176: Well-implemented datetime expression handlingThe implementation correctly handles both constant and payload variable datetime expressions, with proper error handling and conversion to float seconds.
Consider extracting the magic number 1_000_000.0 to a named constant for better readability.
+ // Conversion factor from microseconds to seconds + const MICROS_TO_SECONDS: PreciseScore = 1_000_000.0; + let float_micros = datetime.timestamp() as PreciseScore; // Convert to seconds - let float_seconds = float_micros / 1_000_000.0; + let float_seconds = float_micros / MICROS_TO_SECONDS;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
docs/grpc/docs.md(1 hunks)docs/redoc/master/openapi.json(2 hunks)lib/api/src/grpc/conversions.rs(4 hunks)lib/api/src/grpc/proto/points.proto(1 hunks)lib/api/src/grpc/qdrant.rs(2 hunks)lib/api/src/rest/schema.rs(2 hunks)lib/collection/src/operations/universal_query/formula.rs(6 hunks)lib/collection/src/operations/universal_query/shard_query.rs(1 hunks)lib/common/common/src/math.rs(1 hunks)lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs(13 hunks)lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs(11 hunks)lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs(7 hunks)lib/segment/src/types.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- lib/collection/src/operations/universal_query/shard_query.rs
- docs/grpc/docs.md
- lib/common/common/src/math.rs
- lib/segment/src/types.rs
- lib/api/src/rest/schema.rs
- lib/api/src/grpc/proto/points.proto
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (4)
lib/segment/src/index/query_optimization/optimized_filter.rs (1)
check_condition(34-39)lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (3)
score(256-257)new_payload_id(149-151)new_geo_distance(126-128)lib/segment/src/types.rs (5)
new(376-389)new(1463-1466)new(2601-2603)new(2825-2827)new(2838-2840)lib/api/src/rest/schema.rs (1)
new(1033-1039)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (22)
lib/api/src/grpc/qdrant.rs (2)
5182-5188: LGTM! Tags updated correctly for new datetime variants.The tag list in the
oneofdefinition has been properly extended to include tags 1-19, which correctly encompasses all variant tags including the newly added datetime expressions.
5203-5252: LGTM! New datetime variants correctly added.The new datetime expression variants are properly implemented:
Datetimefor datetime constants (tag 5)DatetimeKeyfor payload keys with datetime values (tag 6)All subsequent tags have been correctly incremented to maintain proper sequencing. The additions are well-documented with clear comments.
docs/redoc/master/openapi.json (1)
14473-14481:Details
✅ Verification successful
New Schema Reference Additions for Expression Types
The addition of new references for "GeoDistance", "DatetimeExpression", and "DatetimeKeyExpression" correctly extends the available expression schemas. Please ensure that corresponding schema definitions exist under#/components/schemas/and that these new types are documented in the API reference.
Below is a shell script to verify that the new schema definitions actually exist in the JSON file as referenced:
Once you run this script, review the output to confirm that the definitions for "GeoDistance", "DatetimeExpression", and "DatetimeKeyExpression" are present under the
#/components/schemas/section. If they are indeed present and correctly formed, you may update the review comment as follows:
Schema References and Definitions Verified
The new references for"GeoDistance","DatetimeExpression", and"DatetimeKeyExpression"indocs/redoc/master/openapi.jsoncorrectly point to existing schema definitions under#/components/schemas/. Please ensure that these schema definitions are also comprehensively documented in the API reference for clarity.
🏁 Script executed:
#!/bin/bash echo "Checking for GeoDistance definition..." rg '"GeoDistance":' docs/redoc/master/openapi.json echo "Checking for DatetimeExpression definition..." rg '"DatetimeExpression":' docs/redoc/master/openapi.json echo "Checking for DatetimeKeyExpression definition..." rg '"DatetimeKeyExpression":' docs/redoc/master/openapi.jsonLength of output: 678
Schema References and Definitions Verified
The JSON filedocs/redoc/master/openapi.jsonnow clearly includes definitions for"GeoDistance","DatetimeExpression", and"DatetimeKeyExpression"as referenced in the snippet. The new references correctly extend the expression schemas, and the definitions under#/components/schemas/are present. Please also ensure that these types are thoroughly documented within the API reference documentation.lib/api/src/grpc/conversions.rs (3)
800-800: Good conversion to ScoreType for consistent type handling.Converting the constant value to
ScoreTypeensures consistent handling of numeric types throughout the expression system.
808-819: Well-implemented support for datetime expressions.The new
GeoDistanceandDatetimeexpression variants properly handle the different cases (constant or payload variable), enabling the functionality described in the PR objectives for datetime expressions.
842-842:Details
❓ Verification inconclusive
Ensure consistent type conversion.
While casting
by_zero_defaulttof32works, it's worth verifying that this doesn't lead to precision loss since other parts of the code are now usingPreciseScore(f64). Consider whether this should also useScoreTypefor consistency.
🏁 Script executed:
#!/bin/bash # Check if by_zero_default is consistently typed across the codebase # Look for by_zero_default usage in the codebase rg "by_zero_default" --type rustLength of output: 1626
Consistent Type Conversion Recommendation for by_zero_default
- In
lib/api/src/grpc/conversions.rs, the conversion castsby_zero_defaultto anf32(i.e.,by_zero_default.map(|v| v as f32)).- Other parts of the code use
ScoreType(as seen inlib/api/src/rest/schema.rsandlib/collection/src/operations/universal_query/formula.rs) or explicitly work withPreciseScore(which is of typef64).- This inconsistency could lead to precision loss. Please verify whether the GRPC conversion should leverage the
ScoreTypealias—or a similar consistent approach—to match the double-precision expectations elsewhere.lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (4)
16-16: Good type abstraction for improved precision.Defining
PreciseScoreasf64is a good practice as it provides an abstraction layer that makes future type changes easier. This shift from f32 to f64 is critical for maintaining precision when handling timestamp microseconds as mentioned in the PR description.
36-42: Well-structured addition of new expression variants.The changes to the
ParsedExpressionenum properly structure theGeoDistancevariant and add the newDatetimevariant with clear field types.
103-107: Good design for datetime expression handling.The
DatetimeExpressionenum correctly handles both constant datetime values and payload variables, making it flexible for different use cases.
347-367: Well-updated test ranges for PreciseScore.The proptest ranges have been properly updated to use f64 instead of f32, ensuring that the tests validate the full range of the new PreciseScore type.
lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (2)
264-271: Good test coverage for array values.Adding a test case for an array with a single value is valuable as it tests an edge case that might behave differently from scalar values or multi-element arrays.
364-381: Good test setup for datetime index.The test properly sets up a datetime index with various datetime formats, including both ISO8601 format and date-only format.
lib/collection/src/operations/universal_query/formula.rs (5)
27-32: LGTM! The new datetime expression variants extend capabilities as needed.The new enum variants
Datetime,DatetimeKey, and the reintroducedGeoDistancestructure provide the necessary foundation for the datetime expression functionality described in the PR objectives.
66-66: Correctly updated to use PreciseScoreThe change from direct float usage to
PreciseScore::from(c)aligns with the PR's goal of improving numeric precision for timestamp handling.
79-96: LGTM! Properly implemented parsing for new expressionsThe implementation for handling GeoDistance, Datetime, and DatetimeKey expressions looks good. The code correctly:
- Inserts the required payload variables into the payload_vars set
- Handles appropriate error conversions for datetime parsing
- Creates the correct ParsedExpression variants
119-119: Correctly updated by_zero_default to use PreciseScoreThis change ensures consistency with the overall transition from f32 to f64 for improved precision.
214-222: LGTM! Properly implemented From conversion for new expression typesThe implementation of the
From<rest::Expression>trait for the new expression types follows the established pattern in the codebase and correctly handles the extraction of fields from the REST types.lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (5)
10-11: LGTM! Updated imports and constants to use PreciseScoreThe transition from ScoreType to PreciseScore for constants and imports is consistent with the PR objective of improving numeric precision for timestamp handling.
Also applies to: 21-22
54-58: Added FriendlyName implementation for DateTimePayloadTypeGood addition that provides proper type name information for error messages when handling datetime values.
99-110: Smart conversion from PreciseScore to ScoreType at the end of processingThis implementation correctly follows the PR objective: using higher precision (f64) during calculations and only converting back to f32 at the very end. The error handling for non-finite numbers is also well implemented.
351-364: Successfully updated decay functions to use PreciseScoreAll decay functions have been updated to use PreciseScore consistently, which will ensure proper precision throughout the scoring process.
528-545: Comprehensive test coverage for datetime expressionsThe added test cases cover all important scenarios for datetime expressions:
- Constant datetime values
- Missing datetime variables
- Default values for datetime variables
This ensures robustness of the implementation.
* use f64 as internal expression score * allow datetime as expression * add test cases * fix schema, generate openapi and grpc docs * fmt * clippy * fix test precision * graceful handling of too large numbers, more tests * rename to DatetimeExpression for consistency * use datetime's timestamp as seconds * capture variable name when parsing * homogenize DateTime into Datetime * use interface distinction between datetime strings and payload keys * fix rebase * fix after rebase
Adds
{ "datetime": <datetime constant> }and{ "datetime_key": <payload key> }expressions which can be used to parse strings into f64'smicrosecondsseconds.A necessary additional change is to switch expression evaluation type from f32 to f64, so that we don't lose precision on timestamp microseconds right away, it will still be converted to f32 but only after finishing the evaluation. This is specially important since timestamp will be most certainly used within a decay function (#6154) to compress its range.