Skip to content

[score boosting] support datetime expressions#6196

Merged
coszio merged 15 commits intodevfrom
score-boosting-with-datetime
Apr 2, 2025
Merged

[score boosting] support datetime expressions#6196
coszio merged 15 commits intodevfrom
score-boosting-with-datetime

Conversation

@coszio
Copy link
Copy Markdown
Contributor

@coszio coszio commented Mar 18, 2025

Adds { "datetime": <datetime constant> } and { "datetime_key": <payload key> } expressions which can be used to parse strings into f64's microseconds seconds.

A necessary additional change is to switch expression evaluation type from f32 to f64, so that we don't lose precision on timestamp microseconds right away, it will still be converted to f32 but only after finishing the evaluation. This is specially important since timestamp will be most certainly used within a decay function (#6154) to compress its range.

Comment thread lib/segment/src/types.rs
}

impl From<chrono::DateTime<chrono::Utc>> for DateTimeWrapper {
impl FromStr for DateTimePayloadType {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one was moved closer to the type definition, but no changes were made

Comment on lines +86 to +120
impl FromStr for DateTimeExpression {
type Err = String;

fn from_str(s: &str) -> Result<Self, Self::Err> {
// try as date
DateTimePayloadType::from_str(s)
.map(DateTimeExpression::Constant)
// else as payload key
.or_else(|_| JsonPath::from_str(s).map(DateTimeExpression::PayloadVariable))
.map_err(|_| format!("Invalid date time expression: {s}"))
}
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the expression is a string, it will try to parse as datetime first, then as a payload key

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not possible to properly distinguish between the two types, right? Not a fan of this, since it's definitely possible for a user to use date time strings as keys.

Maybe like this:

"history": {
  "2025-03-24": 128,
  "2025-03-23": 110,
  "2025-03-22": 283,
  "2025-03-21": 10,
  "2025-03-20": 123,
  "2025-03-19": 28,
  "2025-03-18": 387
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point, I didn't think of the case where datetime strings are used as keys, but it does make sense

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved by introducing a separate expression: { "datetime_key": <payload key> }

Copy link
Copy Markdown
Member

@timvisee timvisee Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That works!

Or even datetime_payload, though that would be longer. Or the inverse datetime_value.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 18, 2025

📝 Walkthrough

Walkthrough

This pull request introduces several updates to expression parsing and evaluation across different modules. New variants for date-time expressions are added to various enums in both gRPC and REST interfaces, and a refined type for scoring—transitioning from ScoreType to PreciseScore—is implemented to improve precision during computations. The changes also address the handling of geo-distance expressions, updating how they are constructed and represented. The protobuf definitions have been updated to accommodate new date-time fields, with corresponding adjustments to the field indices. Additional modifications include improved conversion logic for expressions, updated method signatures for parsing and evaluating expressions, and enhanced datetime parsing with robust error handling and formatting capabilities. Collectively, these modifications aim to extend the expression functionality to cover new types while ensuring consistent type handling and conversion across the system.

Possibly related PRs

  • [score boosting] add more expressions #6054: The changes in the main PR are related to the addition of new variants in the ParsedExpression enum, specifically ParsedExpression::GeoDistance and ParsedExpression::Datetime, which aligns with the enhancements in the retrieved PR that also introduces new expression types, including DatetimeExpression and GeoDistance. Both PRs expand the expression handling capabilities in a similar context.
  • [score boosting] Value retrievers return multivalues #6195: The changes in the main PR, which introduce new variants for handling GeoDistance and Datetime in expression parsing, are related to the modifications in the retrieved PR that enhance the value retrieval process to support multiple values, including datetime values, in the scoring logic. Both PRs involve updates to the handling of expressions and values that include datetime information.
  • [score boosting] Add decay expressions #6154: The changes in the main PR are related to the addition of new variants in the ParsedExpression enum, specifically ParsedExpression::GeoDistance and ParsedExpression::Datetime, which aligns with the modifications in the retrieved PR that also introduces new expression types, including decay expressions and their handling in the Expression schema. Both PRs enhance the expression handling capabilities, indicating a strong connection at the code level.

Suggested reviewers

  • timvisee
  • generall
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (1)

86-97: Potentially improve error message handling.
Currently, the underlying parse error is discarded. Preserving it can help with debugging.

-            .map_err(|_| format!("Invalid date time expression: {s}"))
+            .map_err(|err| format!("Invalid date time expression: {s}, cause: {err}"))
lib/segment/src/types.rs (1)

77-91: Deserialize implementation discards parse error.
Preserving the error could facilitate debugging. Consider the following change:

- let parse_result = DateTimePayloadType::from_str(str_datetime).ok();
- match parse_result {
-     Some(datetime) => Ok(datetime),
-     None => Err(serde::de::Error::custom(format!(
-         "'{str_datetime}' is not in a supported date/time format, please use RFC 3339"
-     ))),
- }
+ match DateTimePayloadType::from_str(str_datetime) {
+     Ok(datetime) => Ok(datetime),
+     Err(err) => Err(serde::de::Error::custom(
+         format!("Invalid date/time format '{str_datetime}': {err}")
+     )),
+ }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4149e6d and e7cb1c0.

📒 Files selected for processing (9)
  • lib/api/src/grpc/conversions.rs (4 hunks)
  • lib/api/src/grpc/proto/points.proto (1 hunks)
  • lib/api/src/grpc/qdrant.rs (2 hunks)
  • lib/api/src/rest/schema.rs (1 hunks)
  • lib/collection/src/operations/universal_query/formula.rs (6 hunks)
  • lib/collection/src/operations/universal_query/shard_query.rs (1 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (9 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (3 hunks)
  • lib/segment/src/types.rs (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (macos-latest)
  • GitHub Check: test (windows-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (46)
lib/api/src/grpc/qdrant.rs (3)

5185-5185: Tag list updated correctly to accommodate new DateTime variant.

The tag list was properly updated to include the new tag value '15', maintaining the sequential numbering pattern required by protobuf definitions. This change ensures the oneof field can correctly reference all variants including the newly added DateTime.


5206-5208: New DateTime variant added for timestamp support.

The DateTime variant provides the capability to represent date-time values as either constants or payload keys, aligning with the PR objective to support datetime expressions that parse into microseconds. This implementation will enable the functionality needed for timestamp-based operations.


5210-5238: Tag numbers correctly incremented for all existing variants.

All the tag numbers for existing variants have been properly incremented to accommodate the new DateTime variant at position 5. This maintains the sequential ordering required by protobuf while preserving the relative ordering of existing variants.

lib/collection/src/operations/universal_query/shard_query.rs (1)

398-398: Added DateTime expression support correctly

The new DateTime variant handling is properly implemented in the TryFrom conversion from gRPC Expression to ExpressionInternal. This matches the PR's objective of supporting datetime expressions.

lib/api/src/rest/schema.rs (1)

643-644: Added DateTime variant to REST API schema

The DateTime variant has been added to the Expression enum, allowing for string-based datetime representations in REST API expressions. The GeoDistance variant is now properly defined in the enum, maintaining API consistency.

lib/api/src/grpc/conversions.rs (5)

8-8: Added ScoreType import for precision improvement

This import supports the change from f32 to f64 precision for handling timestamps in microseconds, as specified in the PR objectives.


17-17: Added DateTimeExpression to imports

Correctly imported the DateTimeExpression enum from parsed_formula to support datetime conversion functions.


2795-2795: Updated constant casting to use ScoreType

The constant is now cast to ScoreType instead of directly using f32, supporting the precision improvement mentioned in the PR objectives.


2803-2810: Implemented datetime expression unparsing

The unparse_expression function has been extended to handle both:

  1. GeoDistance expressions with origin and payload key
  2. DateTime expressions supporting both constants and payload variables

This implementation properly handles both formats required by the PR.


2833-2833: Updated by_zero_default conversion to f32

The by_zero_default value is cast to f32 when used in the Expression variant. This aligns with the internal type changes while maintaining API compatibility.

lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (7)

7-7: No functional issues with new import.


12-12: New type alias for improved precision.
This aligns well with the transition from f32 to f64.


31-32: Constant variant updated to f64.
No issues found.


34-38: New GeoDistance & DateTime variants added.
Implementation looks correct.


40-40: Info comment only.
No changes required.


46-46: Using PreciseScore for zero division fallback.
Consistent with the new f64 approach.


80-84: Introduced DateTimeExpression enum.
Design is concise and easy to understand.

lib/collection/src/operations/universal_query/formula.rs (7)

8-8: Importing references to new expression variants.
Looks good.


27-31: New GeoDistance and DateTime variants in ExpressionInternal.
Ensure consistent usage with the rest of the system.


58-58: Casting from f32 to f64 for constant.
Implementation is straightforward.


71-74: Handling GeoDistance in parse_and_convert.
Inserting to in payload_vars is correct for variable usage.


75-75: Handling DateTime by parsing the string.
Consider adding tests to ensure valid/invalid date-time strings are handled properly.


98-98: Converting by_zero_default from f32 to f64.
No issues found.


170-173: Mapping REST GeoDistance and DateTime to ExpressionInternal.
The code is consistent with the new variants.

lib/segment/src/types.rs (4)

73-73: Safely creating DateTime from microseconds.
This approach gracefully handles invalid timestamps by returning None.


93-124: Rich date/time parsing logic.
Supports multiple formats. Implementation looks good.


126-130: Display trait for DateTimePayloadType.
Straightforward and correct.


132-135: Conversion from DateTime<Utc> to DateTimePayloadType.
Implementation is correct with minimal overhead.

lib/api/src/grpc/proto/points.proto (1)

586-596: Confirm protobuf field numbering and backward compatibility.

Re-numbering (or shifting) existing fields and introducing a new field (date_time = 5) may cause wire incompatibilities if older clients or services still rely on the previous field numbering. If this file has been published previously, please verify that no downstream consumers are broken by these changes. Otherwise, if the fields are genuinely new or safely migrated, the re-numbering is fine.

lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (17)

10-12: Imports aligned with new expression and scoring logic.

The updated imports for DateTimeExpression, ParsedExpression, ParsedFormula, PreciseScore, and VariableId look consistent with your new extended expression handling approach.


19-19: New import for DateTimePayloadType.

Importing DateTimePayloadType here is appropriate for handling date-time parsing logic.


21-21: Switch to PreciseScore for default value.

Defining DEFAULT_SCORE as a PreciseScore (f64) helps maintain precision. This is a coherent choice, given your expanded date-time operations.


41-41: FriendlyName implementation is clear.

Defining the friendly name as "number" for PreciseScore keeps consistency with its numeric use in expressions.


53-57: FriendlyName for DateTimePayloadType.

Exposing "datetime" as the friendly name helps produce meaningful error messages for date-time fields. This is a good addition.


100-100: Potential precision loss when converting to ScoreType.

Casting PreciseScore (f64) back to ScoreType (likely f32) may lose precision. Confirm that this down-cast is intentional and acceptable for your scoring use cases.


108-108: Method returns higher-precision score.

Changing the return type of eval_expression to OperationResult<PreciseScore> matches your decision to use double precision for expression evaluations.


116-116: Converting from f32 to f64.

This .map(|score| PreciseScore::from(*score)) is a straightforward approach for loading prefetch scores into a more precise representation. Looks consistent with the rest of the logic.


126-126: Check integer vs. float values in JSON.

Using .as_f64() works properly for numbers, but ensure that integer JSON values are correctly interpreted as floats. If you anticipate integer-only fields, this remains safe, but it’s worth verifying no corner cases exist.


142-143: Geo-distance calculation looks straightforward.

Calling Haversine::distance on lat/long pairs provides a clean solution for computing distance in meters. Keep an eye on any out-of-range or malformed geo points handled upstream.


144-157: New match arm for DateTimeExpression.

Parsing the date-time payload or using a constant expression is implemented robustly. The match structure is clear and aligns well with your DateTimeExpression enum approach.


158-162: Potential precision limit for very large timestamps.

Converting an i64 timestamp to an f64 is acceptable for typical ranges (~±285 years of microseconds) but be aware that extreme future or past timestamps may lose microsecond-level precision. Confirm this is acceptable for your use case.


448-448: Testing constant date-time expression.

Using "2025-03-18" is a valid scenario to verify constant date-time support. The approach ensures correctness of date-time parsing in expressions.


449-457: Test coverage for missing date-time payload variable.

These lines confirm that a non-existent payload variable triggers a VariableTypeError. Clear negative test coverage is always beneficial.


458-458: Comment clarifies fallback behavior.

This comment concisely states the test objective for handling default date-time values, aiding maintainability.


459-462: Verifies default date-time usage.

This case confirms correct functionality when the payload variable isn’t set but a default date-time exists. Good addition for reliability.


466-466: Test assertion with OperationResult<PreciseScore>.

Using PreciseScore in test assertions ensures alignment with your new floating-point evaluation logic.

@coszio coszio force-pushed the score-boosting-with-datetime branch from e7cb1c0 to 82b104a Compare March 18, 2025 19:23
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
docs/redoc/master/openapi.json (1)

14532-14542: Datetime expression implementation could use more documentation.

While the Datetime schema is correctly structured as a wrapper object with a required datetime property, the documentation could be improved to clearly indicate:

  1. The expected format of the string value
  2. Whether it accepts a payload key or a constant datetime value or both
  3. How the datetime string will be converted to microseconds (f64)

According to the PR objectives, this expression should parse strings into floating-point numbers representing microseconds, but this important detail isn't documented in the schema.

Consider enhancing the property description like this:

- "datetime": {
-   "type": "string"
- }
+ "datetime": {
+   "description": "Payload field key or constant datetime string to parse into microseconds (f64). Supports ISO 8601 format.",
+   "type": "string"
+ }
docs/grpc/docs.md (1)

2699-2699: New datetime Field Added in Expression

The new datetime field has been added to the Expression definition with the type [string](#string) and the description "Date-time constant or payload key". This clearly signals the support for datetime expressions in the API. Please ensure that the description succinctly communicates that the field can accept either a date-time constant or be used as a payload key, and that related documentation (in other parts of the system) reflects the updated evaluation precision from f32 to f64.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82b104a and 3acfcf8.

📒 Files selected for processing (8)
  • docs/grpc/docs.md (1 hunks)
  • docs/redoc/master/openapi.json (2 hunks)
  • lib/api/src/grpc/conversions.rs (4 hunks)
  • lib/api/src/grpc/proto/points.proto (1 hunks)
  • lib/api/src/grpc/qdrant.rs (2 hunks)
  • lib/api/src/rest/schema.rs (2 hunks)
  • lib/collection/src/operations/universal_query/formula.rs (6 hunks)
  • lib/collection/src/operations/universal_query/shard_query.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/api/src/grpc/proto/points.proto
  • lib/api/src/rest/schema.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (windows-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (17)
lib/api/src/grpc/qdrant.rs (3)

5182-5188: Tagged range correctly updated to accommodate new variant

The tag range for the expression::Variant oneof has been properly extended to include the new Datetime variant (tag 15). This ensures the protobuf definition remains consistent with all the available variants.


5206-5208: New Datetime variant looks good

The addition of the Datetime variant with a string type is appropriate for representing both datetime constants and payload keys, aligning with the PR objective of supporting datetime expressions for score boosting.


5210-5238: Tag increments correctly applied

All subsequent variant tags have been properly incremented to accommodate the new Datetime variant at tag 5. This maintains the required sequential ordering of tags in the protobuf definition.

lib/collection/src/operations/universal_query/shard_query.rs (1)

398-398: LGTM: The datetime variant handler is correctly implemented.

The new case for handling Variant::Datetime(dt_str) in the TryFrom<grpc::Expression> implementation properly maps to ExpressionInternal::DateTime(dt_str), which aligns with the PR objective to support datetime expressions for parsing strings into f64 microseconds.

docs/redoc/master/openapi.json (2)

14467-14472: New expression types added to the Expression schema.

The changes add two new expression types to the existing Expression schema:

  1. GeoDistance - For computing geospatial distances between points
  2. Datetime - For handling datetime values, converting them to floating-point microseconds as mentioned in the PR objectives

These additions enhance the query capabilities of the Qdrant API by allowing timestamp-based operations and geo-distance calculations in expressions.


14505-14531: GeoDistance expression implementation looks well-structured.

The schema correctly defines the geo_distance operation with required origin and destination parameters. The implementation follows the pattern used by other expression types in the API and properly references the existing GeoPoint schema.

lib/api/src/grpc/conversions.rs (5)

8-8: Importing ScoreType
No concerns; import usage appears consistent.


17-17: New references to DateTimeExpression
Looks good; ensures the date-time variant is recognized.


2795-2795: Potential float cast overflow
Casting to ScoreType might risk overflow or precision loss if c is out of range. Consider verifying valid input ranges.


2803-2810: New unparse variants for GeoDistance & DateTime
Implementation looks correct; ensures new enum variants are handled.


2833-2833: Review by_zero_default cast
by_zero_default.map(|v| v as f32) may cause truncation. Verify alignment with new f64-based approach.

lib/collection/src/operations/universal_query/formula.rs (6)

8-8: Additional import of PreciseScore
No issues; aligns with high-precision score usage.


27-31: New ExpressionInternal variants
GeoDistance and DateTime expansions appear logically structured.


58-58: Check for potential float conversion
PreciseScore::from(c) might lose precision for large floats. Please verify acceptable input ranges.


71-75: Handling of parsed GeoDistance & DateTime
Implementation looks fine; confirm date-time format coverage.


98-98: Division default cast
Consider potential data loss with by_zero_default.map(PreciseScore::from).


170-175: REST Expression expansions
Neat integration for GeoDistance and DateTime.

@coszio
Copy link
Copy Markdown
Contributor Author

coszio commented Mar 18, 2025

Rebased on top of dev, and included fixes for PreciseScore in decay functions

@coszio coszio force-pushed the score-boosting-with-datetime branch from 3acfcf8 to a6592aa Compare March 18, 2025 20:33
@coszio coszio added this to the score boosting milestone Mar 18, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
docs/grpc/docs.md (1)

2718-2718: Clarify the Expected Format for the New "datetime" Field

The new "datetime" field is correctly added as a string to support date-time expressions. To further improve clarity for users, consider specifying the expected datetime format (for example, ISO 8601) or linking to additional documentation that details the acceptable formats and how these strings will be parsed into microseconds. This additional detail will help ensure consistency and proper usage of the field.

lib/segment/src/types.rs (2)

73-73: Handle out-of-range timestamps more gracefully
Currently, the code relies on chrono::DateTime::from_timestamp_micros(ts) returning None for invalid timestamps. Consider logging or surfacing an error if ts falls outside the acceptable range to aid debugging.


96-107: Multiple parsing attempts for timezones
Chaining several parse attempts is flexible. Consider consolidating or documenting the expected timezone formats for maintainability.

lib/collection/src/operations/universal_query/formula.rs (2)

27-31: New expression variants
Introducing GeoDistance and DateTime extends the formula's flexibility. Ensure associated parsing and validation are thoroughly tested.


82-82: Delegated parsing for DateTime
Relying on dt_str.parse() ensures consistent format handling without duplicating logic.

Do you need help adding tests for erroneous date/time inputs?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3acfcf8 and a6592aa.

📒 Files selected for processing (13)
  • docs/grpc/docs.md (1 hunks)
  • docs/redoc/master/openapi.json (2 hunks)
  • lib/api/src/grpc/conversions.rs (4 hunks)
  • lib/api/src/grpc/proto/points.proto (1 hunks)
  • lib/api/src/grpc/qdrant.rs (2 hunks)
  • lib/api/src/rest/schema.rs (2 hunks)
  • lib/collection/src/operations/universal_query/formula.rs (6 hunks)
  • lib/collection/src/operations/universal_query/shard_query.rs (1 hunks)
  • lib/common/common/src/math.rs (1 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (10 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (11 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (2 hunks)
  • lib/segment/src/types.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • lib/collection/src/operations/universal_query/shard_query.rs
  • lib/api/src/rest/schema.rs
  • lib/api/src/grpc/proto/points.proto
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (2)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (1) (1)
  • score (269:270)
lib/segment/src/types.rs (1) (1)
  • timestamp (68:70)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (windows-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (44)
lib/api/src/grpc/qdrant.rs (3)

5185-5186: Tag range update accommodates the new datetime variant.

The oneof tag range has been updated to include tag 5 for the new datetime variant and shifts all subsequent tags accordingly. This is the correct approach for extending a Protocol Buffer definition.


5206-5208: New datetime variant adds time-based expression support.

This addition aligns with the PR objective to support datetime expressions. The string type will allow for both payload keys and constants, which can then be parsed into microsecond timestamps.


5210-5247: Tag numbers correctly incremented for all existing variants.

All subsequent variant tags have been properly incremented to accommodate the new datetime variant at tag 5. This maintains the correct Protocol Buffer structure while adding new functionality.

docs/redoc/master/openapi.json (3)

14467-14540: Well-implemented GeoDistance expression for geospatial queries.

The new GeoDistance expression type allows for calculating the geodesic distance between a fixed origin point and a geo-point stored in a payload field. This implementation enhances the API's spatial query capabilities.

The structure follows the established pattern of other expressions in the schema, with proper references to existing types like GeoPoint.


14541-14551: Good addition of Datetime expression type for timestamp handling.

This new expression type matches the PR objective to support {"datetime": payload_key | constant} expressions. It will enable parsing strings into floating-point numbers representing microseconds (f64).

The implementation is simple and consistent with the rest of the expression schema structure. This change is important for the improved precision needed when handling timestamps in decay functions, as mentioned in the PR objectives.


14455-14473: Proper integration of new expression types in the Expression schema.

The Expression schema has been appropriately updated to include the new GeoDistance and Datetime types among the existing expression options. This ensures they can be used wherever expressions are supported throughout the API.

lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (4)

129-138: Great Implementation of DatetimeIndex Support

This implementation properly handles datetime retrieval from indexes, maintaining consistency with other index types. The code correctly retrieves the datetime value, converts it from a timestamp to a proper DateTimePayloadType, and then serializes it to a JSON value.


324-337: Comprehensive Test Setup for Datetime Index

Good test setup that covers important datetime formats - fully specified ISO format with timezone ("2023-01-01T00:00:00Z") and date-only format ("2023-01-02"). This demonstrates the system's ability to handle different datetime string representations.


341-341: LGTM - Proper Index Registration

The datetime index is correctly added to the test indices map with an appropriate key name.


379-394: Well-Structured Test for Datetime Retrieval

The test properly verifies datetime retrieval functionality with different test cases:

  1. A point with a single datetime value
  2. A point with multiple datetime values (verifying it returns the first one)
  3. A point with no datetime value

This covers the main scenarios for datetime retrieval.

lib/common/common/src/math.rs (2)

20-23: Appropriate type change for improved precision.

The change from f32 to f64 in the is_close function's parameters and constants aligns well with the PR's objective to improve precision for timestamp handling.


26-26: Consistent parameter type updates.

The is_close_tol function parameters have been correctly updated to match the precision improvements in is_close.

lib/api/src/grpc/conversions.rs (6)

8-8: Appropriate import for precise score handling.

The addition of ScoreType import from common::types supports the PR's objective to transition from f32 to f64 for expression evaluation.


17-17: Necessary import for new datetime functionality.

The inclusion of DateTimeExpression in the imports prepares for the new datetime expression parsing capability being added.


2797-2797: Proper type casting for constants.

The constant value is now cast to ScoreType which ensures consistent precision across expression evaluation.


2805-2808: Well-structured implementation of GeoDistance expressions.

The GeoDistance variant is properly implemented with consistent type handling for the origin point and destination key.


2809-2812: Support for both constant and payload datetime expressions.

The implementation correctly handles both constant datetime values and payload variable references, fulfilling the PR's objective to support datetime expressions.

Consider adding documentation comments to explain the format expected for datetime constants and how they're parsed.


2835-2835:

Details

❓ Verification inconclusive

Type conversion maintains compatibility with existing code.

The explicit cast of by_zero_default to f32 aligns with the PR's approach of performing calculations with f64 but converting to f32 after evaluation is complete.

Verify that this explicit f32 conversion doesn't introduce precision issues when dividing very large microsecond timestamps:


🏁 Script executed:

#!/bin/bash
# Check if there are any timestamp-related tests that would validate the division behavior

rg -A 5 -B 5 "test.*div" lib/segment/src/index/query_optimization/rescore_formula/
rg -A 5 -B 5 "timestamp.*div|div.*timestamp" lib/segment/src/index/query_optimization/rescore_formula/

Length of output: 189


Attention: Further Verification Required for Precision in Timestamp Division

The explicit cast (as f32) in

by_zero_default: by_zero_default.map(|v| v as f32),

maintains compatibility with existing code by mirroring our approach—calculations are done in f64, with the final result converted to f32. However, the initial search for tests related to division behavior with very large microsecond timestamps (in the query optimization module) returned no results. This absence means that we currently lack automated verification of potential precision issues.

Action Items:

  • Manual Verification: Please confirm manually that the f32 conversion does not cause precision loss when handling very large microsecond timestamps.
  • Consider Adding Tests: If feasible, add unit tests or extend current tests to explicitly validate the division behavior in scenarios involving large timestamp values.
lib/segment/src/types.rs (6)

77-77: No immediate concerns
Deserialization logic is consistent and straightforward.


93-93: Implementation of FromStr
This establishes a clear interface for creating DateTimePayloadType from strings.


94-95: Appropriate error type
Using chrono::ParseError aligns with the underlying parsing library.


126-130: Simple Display implementation
Delegating the format to the inner chrono::DateTime is concise and effective.


132-135: Direct From<chrono::DateTime<chrono::Utc>>
This provides a clean and type-safe conversion path for date-time values.


108-124:

Details

✅ Verification successful

Fallback for naive date/time strings
These additional parse attempts correctly handle formats without explicit time zones. Confirm that treating them as UTC meets domain requirements.


🏁 Script executed:

#!/bin/bash
# Optional: check if there are references where naive date-time
# strings might need special handling or disclaimers in docs.
rg 'from_str\s*\(.*\)' -A 5

Length of output: 64088


Fallback Parsing Confirmed: Naive Date/Time Strings as UTC

The fallback chain in lib/segment/src/types.rs properly attempts to parse inputs using various formats before converting a NaiveDateTime to a DateTime<Utc> via .and_utc().into(). The approach is consistent with similar patterns observed in the repository and aligns with the documented intent (see #3529). Please verify that interpreting these naive date/time strings as UTC fully meets the domain requirements. If any adjustments (such as explicit time zone handling) are needed, please address them accordingly.

  • Parsing formats:
    • %Y-%m-%dT%H:%M:%S%.f
    • %Y-%m-%d %H:%M:%S%.f
    • %Y-%m-%d %H:%M
    • %Y-%m-%d (converted via Into::into)
  • Conversion to UTC is achieved with .and_utc().into(), consistently following repository conventions.
lib/collection/src/operations/universal_query/formula.rs (5)

8-8: Added PreciseScore import
Importing PreciseScore is consistent with the move toward higher-precision calculations.


65-65: Use of PreciseScore::from(c)
Promoting constants to higher-precision scores reduces rounding issues in subsequent calculations.


105-105: High-precision default for divide-by-zero
Wrapping by_zero_default with PreciseScore::from maintains uniform precision for safe fallback behavior.


200-205: Extended REST mapping
Mapping GeoDistance and Datetime from REST to internal expressions aligns with the new capabilities.


78-81:

Details

❓ Verification inconclusive

GeoDistance payload handling
Inserting the to path into payload_vars is correct. Validate behavior if multiple expressions share the same path.


🏁 Script executed:

#!/bin/bash
# Searching for repeated usage of the same 'to' attribute in code to ensure no collisions:
rg 'GeoDistance\s*\{.*to:\s*'

Length of output: 31


Action: Confirm and Manually Verify GeoDistance 'to' Field Handling

The GeoDistance branch appears to correctly insert the to attribute into payload_vars as intended. However, the initial regex search for repeated usage of the to path did not produce output—likely due to multiline formatting or limitations in the pattern used. Given that the absence of output doesn’t conclusively rule out potential collisions when multiple expressions share the same path, please perform a manual review or run an alternative multiline search for usages of the GeoDistance block across the codebase.

  • File: lib/collection/src/operations/universal_query/formula.rs (lines 78-81)
  • Key Check: Verify that if multiple expressions use the same to path, no unintended collisions or side effects occur.
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (7)

16-16: Good addition of PreciseScore type alias.

This type alias makes the code more maintainable and enhances the precision needed for datetime operations.


38-42: The GeoDistance and DateTime variants are properly structured.

The GeoDistance variant has been appropriately restructured and the new DateTime variant has been added correctly to handle datetime expressions.


103-107: Good implementation of DateTimeExpression enum.

The enum appropriately handles both constant datetime values and payload variable paths.


109-120: The FromStr implementation follows good pattern matching practices.

The implementation correctly attempts to parse the string as a datetime first, then falls back to treating it as a payload key, which aligns with the noted behavior.


182-185: Ensure precision is maintained when converting between numeric types.

The conversion from f32 to PreciseScore (f64) is appropriate, but be aware of potential rounding issues when working with floating-point values.


211-224: Type consistency in decay parameters conversion.

The decay_lambda_to_params method correctly converts PreciseScore (f64) to ScoreType (f32) when necessary. This is appropriate for maintaining API compatibility while ensuring internal calculations use higher precision.


363-364: Test cases updated with appropriate precision.

The test ranges have been updated to use f64 values, which is consistent with the PreciseScore type changes.

Also applies to: 370-371, 377-378

lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (8)

10-12: Imports correctly updated to include new types.

The imports now include DateTimeExpression and PreciseScore, which is necessary for the new functionality.


21-22: Default constants updated to use PreciseScore.

The DEFAULT_SCORE and DEFAULT_DECAY_TARGET constants now use PreciseScore (f64) instead of ScoreType (f32), maintaining type consistency.


42-46: FriendlyName implementations are appropriate.

The implementation for PreciseScore and the new implementation for DateTimePayloadType are correctly defined.

Also applies to: 54-58


99-102: Score method appropriately handles type conversion.

The method now returns a PreciseScore (f64) cast to ScoreType (f32). This conversion is necessary for backward compatibility but may result in some precision loss, which is acceptable since it happens only after all calculations are complete.


117-118: Score retrieval correctly converts to PreciseScore.

The conversion from ScoreType to PreciseScore using PreciseScore::from(*score) ensures that scores are processed with higher precision.


145-163: DateTime evaluation implementation is thorough and well-documented.

The implementation correctly handles both constant and payload variable cases, with appropriate error handling. The comments about precision limitations when converting from i64 to f64 are helpful and accurate.


332-345: Decay functions updated to use PreciseScore.

All decay functions now use PreciseScore consistently, which is important for maintaining precision throughout all calculations.


485-502: Good test coverage for new datetime functionality.

The tests appropriately cover both constant datetime expressions and payload variable cases, including error handling scenarios.

Comment thread lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs Outdated
@coszio coszio force-pushed the score-boosting-with-datetime branch from 969b153 to 43a8235 Compare March 19, 2025 16:10
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
docs/redoc/master/openapi.json (1)

14541-14551: Datetime expression schema could benefit from additional documentation.

The Datetime expression implementation allows for working with datetime values as specified in the PR objectives, but the schema is quite minimal.

Consider enhancing the schema documentation to clarify:

  • The expected string format for datetime values
  • How the datetime is converted to microseconds (f64)
  • Supported constant formats vs. payload key references
"Datetime": {
  "type": "object",
  "required": [
    "datetime"
  ],
  "properties": {
    "datetime": {
      "type": "string",
+     "description": "Payload field containing a datetime string or a constant datetime string. Will be parsed into microseconds (f64) for use in expressions. Supports ISO 8601 and other common datetime formats."
    }
  }
}
docs/grpc/docs.md (1)

2718-2719: Clarify Date-Time Field Semantics

The new datetime field has been added to the Expression message to support datetime expressions. Currently, its description is "Date-time constant or payload key." Since the PR introduces functionality to parse such string inputs into a floating-point microsecond value (f64) for evaluation purposes, it would be beneficial to update the description to clarify the expected format (for example, whether an RFC3339 timestamp is required or if other formats are accepted) and that the string will be converted into a microsecond precision f64 value internally. This added detail can help users understand precisely how to supply and utilize datetime expressions in their queries.

lib/api/src/grpc/conversions.rs (1)

2835-2835: Possible precision loss in by_zero_default cast

Casting by_zero_default to f32 may cause precision loss if ScoreType is f64 (which seems likely given the PR objectives about improving precision for microsecond timestamps).

Consider modifying this to avoid potential precision loss:

-            by_zero_default: by_zero_default.map(|v| v as f32),
+            by_zero_default: by_zero_default,

Or ensure the consumer of this value can handle the full precision.

lib/api/src/rest/schema.rs (1)

660-676: New structs for GeoDistance and Datetime support

These structs provide the necessary data structures for the new expression types:

  1. GeoDistance - measures distance between an origin point and a field value
  2. Datetime - parses string representations into microsecond values (as f64)

The implementation is clean and consistent with the existing codebase style.

However, I recommend adding documentation comments for these new structs to make their purpose and usage clear, especially since they're part of the public API.

+/// Represents a geographic distance calculation in an expression
 #[derive(Debug, Serialize, Deserialize, JsonSchema)]
 pub struct GeoDistance {
+    /// Parameters for geo distance calculation
     pub geo_distance: GeoDistanceParams,
 }

+/// Parameters for calculating geographic distance
 #[derive(Debug, Serialize, Deserialize, JsonSchema)]
 pub struct GeoDistanceParams {
     /// The origin geo point to measure from
     pub origin: GeoPoint,
     /// Payload field with the destination geo point
     pub to: JsonPath,
 }

+/// Represents a datetime value or reference in an expression
+/// Can be a constant (ISO8601 string) or a payload field reference
 #[derive(Debug, Serialize, Deserialize, JsonSchema)]
 pub struct Datetime {
+    /// String representation of a datetime or a payload field path
     pub datetime: String,
 }
lib/segment/src/types.rs (3)

73-73: Consider a more explicit error type for out-of-range timestamps.

Currently, this function returns None when the ts value is out of range. Switching to a Result<Self, chrono::ParseError> (or a custom error type) would allow clearer error signaling if you need to distinguish different failure modes.


77-91: Improve error messaging for multiple accepted formats.

The error message only references RFC 3339, despite the implementation also parsing multiple date/time formats. Clarifying that other formats are supported can reduce user confusion.


93-124: Refactor repetitive parsing attempts for readability.

The code cleanly attempts various parse strategies. For maintainability, consider storing the valid formats in a static list or providing a consolidated error message enumerating all tried formats if parsing fails.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 969b153 and 43a8235.

📒 Files selected for processing (13)
  • docs/grpc/docs.md (1 hunks)
  • docs/redoc/master/openapi.json (2 hunks)
  • lib/api/src/grpc/conversions.rs (4 hunks)
  • lib/api/src/grpc/proto/points.proto (1 hunks)
  • lib/api/src/grpc/qdrant.rs (2 hunks)
  • lib/api/src/rest/schema.rs (2 hunks)
  • lib/collection/src/operations/universal_query/formula.rs (6 hunks)
  • lib/collection/src/operations/universal_query/shard_query.rs (1 hunks)
  • lib/common/common/src/math.rs (1 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (11 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (11 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (2 hunks)
  • lib/segment/src/types.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/collection/src/operations/universal_query/shard_query.rs
  • lib/common/common/src/math.rs
🧰 Additional context used
🧬 Code Definitions (3)
lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (3)
lib/segment/src/index/field_index/numeric_index/mod.rs (3) (3)
  • builder (456-461)
  • builder_mmap (475-485)
  • new (442-447)
lib/segment/src/json_path/mod.rs (2) (2)
  • json (544-546)
  • new (42-44)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (1) (1)
  • value (150-154)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (1)
lib/segment/src/types.rs (2) (2)
  • from_str (96-123)
  • from_str (184-194)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (1)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (2) (2)
  • score (269-270)
  • new_geo_distance (139-141)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test (windows-latest)
  • GitHub Check: test-consistency
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: test
🔇 Additional comments (50)
lib/api/src/grpc/qdrant.rs (3)

5185-5185: Expanded tag range for new datetime expression support.

The tag range has been extended to accommodate the new Datetime variant in the expression enum. This change is consistent with the PR objective to support datetime expressions.


5206-5208: New datetime expression variant added correctly.

The new Datetime variant has been properly added with appropriate documentation indicating it can handle both datetime constants and payload keys. This implementation enables the datetime parsing functionality described in the PR objectives.


5210-5247: Tag numbering properly adjusted for all expression variants.

All subsequent expression variants have had their tag numbers incremented by 1 to accommodate the new Datetime variant. This is the correct approach for maintaining backward compatibility when evolving Protocol Buffer schemas.

docs/redoc/master/openapi.json (2)

14467-14472: New expression types added to enhance query capabilities.

The changes introduce two new expression types to the API schema:

  1. GeoDistance - For geo-spatial distance calculations
  2. Datetime - For datetime value handling and timestamp operations

These additions align with the PR objective to support datetime expressions that parse strings into floating-point microseconds (f64).


14514-14540: LGTM: Well-structured GeoDistance expression implementation.

The GeoDistance expression is well-defined with a clean structure that:

  • Takes an origin point (GeoPoint with lat/lon coordinates)
  • References a destination field in the payload
  • Provides a clear way to calculate geographic distances in expressions

This implementation enables powerful spatial queries within the expression system.

lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (3)

347-360: Comprehensive datetime index test implementation.

The test implementation follows the established pattern for creating field indices and properly tests both fully-qualified RFC 3339 timestamps and date-only strings.


364-364: Good addition of datetime index to test fixture.

The datetime index is correctly added to the indices map under the "creation" key, maintaining consistency with other test indices.


402-420: Date normalization behavior should be documented.

The test correctly verifies that date-only strings like "2023-01-02" are normalized to full RFC 3339 format "2023-01-02T00:00:00Z" during retrieval. However, this important behavior isn't documented in the code.

Consider adding a comment before line 414 that explains this normalization is expected behavior:

            match id {
                0 => assert_eq!(value, [json!("2023-01-01T00:00:00Z")].into()),
+               // Note: Date-only string "2023-01-02" is normalized to "2023-01-02T00:00:00Z"
                1 => assert_eq!(
                    value,
                    [json!("2023-01-02T00:00:00Z"), json!("2023-01-03T00:00:00Z")].into()
                ),
lib/api/src/grpc/conversions.rs (4)

8-8: Important new dependency added

The addition of ScoreType is crucial for the datetime support feature - this is part of the shift from f32 to f64 for expressions to maintain precision when handling timestamp microseconds, as mentioned in the PR objectives.


17-17: Adding DateTimeExpression to imports

This import addition correctly enables the new datetime functionality being introduced in the PR.


2797-2797: Type conversion for expression constants

Converting constants to ScoreType ensures correct handling of precision for datetime values. This is aligned with the PR's objective to change the expression evaluation type from f32 to f64 for timestamp microsecond precision.


2805-2812: Added new expression variants for GeoDistance and DateTime

The implementation correctly adds support for:

  1. GeoDistance with a specified origin point and payload key
  2. DateTime expressions that can be either constants or payload variables

This matches the PR objective of supporting datetime expressions for turning strings into floating-point microsecond values.

lib/api/src/rest/schema.rs (1)

644-645: Added new Expression enum variants

The addition of GeoDistance and Datetime variants properly extends the expression functionality to support the new features described in the PR objectives.

lib/segment/src/types.rs (2)

126-130: Straightforward Display implementation.

Logging the inner chrono::DateTime via self.0 is concise and correct.


132-132: Clear conversion from chrono::DateTime.

This seamlessly wraps DateTimeWrapper(dt) and matches usage patterns for Time/Date conversions.

lib/collection/src/operations/universal_query/formula.rs (8)

8-8: Good addition of PreciseScore in imports.

This aligns with the shift to higher-precision score handling throughout the code.


27-31: Well-defined new variants for geo-distance and date-time expressions.

Introducing GeoDistance { origin, to } and DateTime(String) is consistent with the new feature. Should future expansions arise, grouping them in a dedicated enum block might improve clarity, but this is clean for now.


65-65: Adopting PreciseScore for constants.

Switching from f32 directly to PreciseScore::from(c) ensures consistent internal representation for precise computations.


78-81: Correct handling of GeoDistance variant.

Inserting to into payload_vars and creating a ParsedExpression via new_geo_distance is correctly implemented and consistent with the rest of the match arm.


82-82: DateTime string parsing logic.

Using dt_str.parse()? can raise a parse error at runtime if the string is unrecognized, but it's aligned with the rest of the code’s error handling strategy.


105-105: Maintaining precision in division defaults.

Converting the optional by_zero_default to PreciseScore keeps numeric consistency across different formula variants.


200-202: Consistent mapping of REST GeoDistance to internal variant.

The transition is intuitive, carrying the origin and to parameters into ExpressionInternal::GeoDistance.


203-205: New DateTime expression parsing.

Straightforward mapping from the REST Datetime struct to ExpressionInternal::DateTime(datetime).

lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (11)

9-9: Good addition of DateTimePayloadType import.

The import supports the new DateTime expression functionality while keeping imports focused and minimal.


16-16: Excellent type definition for improved precision.

Defining PreciseScore as f64 is a good approach to maintain precision when dealing with timestamp microseconds, while keeping the code maintainable through a semantic type name. This will help prevent precision loss when handling date calculations.


38-42: Well-structured new expression variants.

The new GeoDistance and DateTime variants are correctly implemented in the ParsedExpression enum. The structure with appropriate fields for each variant maintains consistency with the existing pattern in the codebase.


50-50: Appropriate type update for division by zero default.

Updating the by_zero_default field to use PreciseScore ensures type consistency throughout the formula evaluation chain.


69-69: Correct type for decay lambda.

The update to use PreciseScore for the decay lambda parameter will ensure proper precision in decay functions, which is critical for maintaining accuracy in distance and score calculations.


103-120: Well-designed DateTimeExpression implementation.

The new DateTimeExpression enum with its FromStr implementation provides a clean way to handle both datetime constants and payload variables. The implementation correctly tries to parse as a datetime first, then falls back to interpreting as a payload key, which aligns with the comment in the previous review.


126-126: Type consistency in parameter update.

The parameter type update in new_div maintains API consistency with the new precision requirements.


182-184: Precise conversion in decay parameters.

The explicit conversion from f32 to PreciseScore using PreciseScore::from ensures no precision is lost during the type conversion.


211-212: Good return type for decay lambda parameters.

The function correctly returns (ScoreType, ScoreType) which maintains compatibility with callers while allowing for the internal precision improvements.


224-224: Appropriate conversion back to ScoreType.

The conversions back to ScoreType are handled correctly at the edges of the API. This pattern of using higher precision internally but converting at the boundaries is a good practice.

Also applies to: 239-242, 256-256


363-364: Test ranges updated for PreciseScore.

The proptest range values have been properly updated to use f64 instead of f32, ensuring the tests properly validate the new precision requirements.

Also applies to: 370-371, 377-378

lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (14)

10-12: Proper imports for new functionality.

The imports have been correctly updated to include DateTimeExpression and PreciseScore which are essential for the new datetime expression support.


19-19: Added DateTimePayloadType import.

The import supports the new datetime expression functionality in the formula scorer.


21-22: Updated constants to use PreciseScore.

The constants have been updated to use the more precise PreciseScore type, maintaining consistency with the type changes throughout the system.


42-46: Updated FriendlyName implementation for PreciseScore.

The FriendlyName trait implementation has been correctly updated to use PreciseScore instead of the previous type.


54-58: New FriendlyName implementation for DateTimePayloadType.

Adding the FriendlyName implementation for DateTimePayloadType ensures proper error messages when handling datetime values.


99-101: Appropriate conversion in score method.

The score method now correctly converts from PreciseScore to ScoreType when returning. This maintains backward compatibility while allowing for improved precision in the internal calculations.


109-109: Updated return type for eval_expression.

The return type of eval_expression has been updated to use PreciseScore, which is necessary for supporting higher precision in score calculations.


117-117: Good conversion from ScoreType to PreciseScore.

Converting the score to PreciseScore using PreciseScore::from ensures no precision is lost during the conversion.


143-144: Correct handling of GeoDistance calculation.

The code now properly handles the return type of the Haversine distance calculation, maintaining precision through the computation chain.


145-163: Well-implemented DateTime expression handling.

The implementation for the new DateTime variant handles both constant and payload variable datetime expressions. The conversion of timestamp to PreciseScore is done correctly, with helpful comments explaining the precision considerations for microsecond timestamps.


339-352: Updated decay functions for PreciseScore.

The decay functions have been properly updated to use PreciseScore throughout their calculations, ensuring consistency with the new precision requirements.


498-515: Comprehensive test cases for datetime expressions.

The test cases cover both constant and payload variable datetime expressions, including error handling scenarios. This ensures robustness of the new functionality.


466-466: Updated test parameter and result types.

The test types have been updated to correctly use PreciseScore instead of f32, ensuring that the tests properly validate the new precision requirements.

Also applies to: 519-520


532-534: Added datetime test default.

The default value for datetime payload variable is appropriately set up in the test fixtures, providing good test coverage.

lib/api/src/grpc/proto/points.proto (2)

587-599: Field indices correctly renumbered.

All the subsequent field indices have been properly incremented to accommodate the new datetime field, maintaining the protobuf file structure. This approach ensures that the field numbering remains consistent.


586-586:

Details

❓ Verification inconclusive

New datetime field added to Expression message.

The new datetime field at index 5 correctly enables representing datetime constants or payload keys in expressions.

Let's verify that this protocol buffer change is backward compatible:


🏁 Script executed:

#!/bin/bash
# Check if the protobuf change preserves backward compatibility

# Look for any reference to Expression message field numbers to ensure 
# they are handled properly in the code
rg -A 5 -B 5 "field.*=.*5" --glob "*.rs" | grep -i express

Length of output: 60


Protocol Buffer datetime Field: Manual Verification Recommended

The new datetime field at index 5 appears to be a backward-compatible addition within the Expression message. Our initial automated search (using the provided shell script) did not reveal any references to field number 5 in the Rust source files, indicating that no conflicting usages were detected. However, in situations where the automatic search produced little or no output, it’s important to manually confirm that existing client code handling the Expression message will correctly ignore the new field when not present.

  • Action Items:
    • Manually verify that any client code using the Expression message gracefully handles the addition of the optional datetime field.
    • Ensure that the field number (5) is indeed unique and not referenced elsewhere that might impact compatibility.

@coszio coszio marked this pull request as draft March 20, 2025 14:19
@coszio coszio requested review from generall and timvisee March 20, 2025 16:53
@coszio coszio marked this pull request as ready for review March 20, 2025 16:54
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
docs/redoc/master/openapi.json (1)

14541-14551: Consider enhancing datetime expression documentation

The DatetimeExpression implementation aligns with the PR objective to support datetime parsing into microseconds (f64), but could benefit from additional documentation.

Consider adding documentation comments that clarify:

  • The expected format of datetime strings (ISO 8601, etc.)
  • Whether this accepts both constants and payload keys
  • How the parsing into microseconds works
  • Examples of valid inputs
  "DatetimeExpression": {
    "type": "object",
    "required": [
      "datetime"
    ],
    "properties": {
      "datetime": {
+       "description": "A payload key containing a datetime string or a datetime constant in ISO 8601 format. Will be parsed into microseconds since epoch as a float64 value.",
        "type": "string"
      }
    }
  },
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d3806e0 and 2facace.

📒 Files selected for processing (3)
  • docs/redoc/master/openapi.json (2 hunks)
  • lib/api/src/rest/schema.rs (2 hunks)
  • lib/collection/src/operations/universal_query/formula.rs (6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (macos-latest)
  • GitHub Check: test
  • GitHub Check: test-consensus
  • GitHub Check: test (windows-latest)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: Basic TLS/HTTPS tests
🔇 Additional comments (10)
docs/redoc/master/openapi.json (2)

14464-14472: New expression types added for extended querying capabilities

The PR has integrated two new expression types: GeoDistance and DatetimeExpression, which extend the query capabilities of the API.


14514-14540: GeoDistance expression implementation looks well-structured

The implementation of the GeoDistance expression and its parameters is well-designed, requiring both origin and destination parameters for proper distance calculation.

lib/api/src/rest/schema.rs (2)

643-644: New expression types added correctly

The addition of GeoDistance(GeoDistance) and Datetime(DatetimeExpression) variants to the Expression enum extends the system's capabilities to handle geographic distance calculations and datetime expressions.


660-676: Well-structured implementation of GeoDistance and Datetime expressions

The implementation of these new structures follows the existing pattern in the codebase. The GeoDistance struct uses a nested GeoDistanceParams to clearly separate the parameters from the expression type itself, while the DatetimeExpression is a simple wrapper around a string representation of a datetime.

Both implementations are clean and follow the established patterns in the rest of the code.

lib/collection/src/operations/universal_query/formula.rs (6)

8-8: Appropriate import of PreciseScore

The addition of PreciseScore to the imports is necessary for the type change from f32 to f64, which supports the PR's objective of maintaining precision when handling timestamp microseconds.


27-32: Proper implementation of new ExpressionInternal variants

The GeoDistance variant has been updated to a structured format with clear field names, and a new DateTime variant has been added to support the datetime expressions feature. This change aligns with the structured approach used in other variants.


65-65: Appropriate use of PreciseScore for constants

Converting constants to PreciseScore instead of using f32 directly supports the transition from f32 to f64 for better precision with timestamp microseconds, as mentioned in the PR objectives.


78-82: Complete implementation of parse_and_convert for new variants

The implementation correctly:

  1. For GeoDistance: Adds the "to" field to payload_vars and constructs a ParsedExpression
  2. For DateTime: Parses the datetime string to create the appropriate ParsedExpression

This ensures proper handling of the new expression types during query processing.


105-105: Consistent type conversion for by_zero_default

The update to use PreciseScore::from for the by_zero_default parameter ensures consistency with the shift from f32 to f64 throughout the codebase.


200-205: Proper conversion implementation for new expression types

The implementation of Fromrest::Expression for ExpressionInternal has been updated to handle the new GeoDistance and Datetime variants correctly, ensuring proper conversion between the REST API representation and the internal representation.

@coszio coszio force-pushed the score-boosting-with-datetime branch from f9a6607 to 918a37a Compare March 20, 2025 19:37
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (2)

170-175: Consider adding documentation about datetime precision decisions.

While there's a comment about f64 precision limitations, it would be helpful to explicitly document why the conversion to seconds was chosen (rather than keeping microseconds or using milliseconds) and how this affects use cases like decay functions.

 // Convert from i64 to f64.
 // f64's 53 bits of sign + mantissa for microseconds means a span of exact equivalence of
 // about 285 years, after which precision starts dropping
 let float_micros = datetime.timestamp() as PreciseScore;
 
-// Convert to seconds
+// Convert to seconds to provide a more manageable scale for common use cases
+// such as decay functions that operate better on values closer to 1.0
 let float_seconds = float_micros / 1_000_000.0;

504-511: Tests for array handling could be more comprehensive.

The tests cover basic array handling but don't test edge cases like arrays with mixed types or deeply nested arrays. Consider adding more test cases to ensure robust handling of complex array structures.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2facace and 918a37a.

📒 Files selected for processing (1)
  • lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (13 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (2)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (3) (3)
  • score (269-270)
  • new_payload_id (162-164)
  • new_geo_distance (139-141)
lib/segment/src/types.rs (6) (6)
  • new (376-389)
  • new (1463-1466)
  • new (2601-2603)
  • new (2825-2827)
  • new (2838-2840)
  • timestamp (68-70)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (windows-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (8)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (8)

10-12: Correctly updated imports to support new datetime functionality.

The changes to imports appropriately include DateTimeExpression and PreciseScore from the parsed_formula module, which are necessary for the new datetime expression support.


21-22: Constants properly updated to use PreciseScore type.

Constants have been correctly updated to use PreciseScore (f64) instead of ScoreType (f32) to maintain precision when handling timestamp microseconds.


54-58: Good addition of FriendlyName trait implementation for DateTimePayloadType.

This implementation provides proper error messages when datetime parsing fails, improving the developer experience.


99-110: Appropriate conversion from PreciseScore to ScoreType with error handling.

The score method now correctly converts the high-precision f64 result to f32 after evaluation, with proper error handling for non-finite values. This ensures precision during calculation while maintaining compatibility with the rest of the system.


153-176: Thorough implementation of DateTime expression handling.

The implementation correctly:

  1. Handles both constant and payload variable datetime expressions
  2. Properly parses string datetime values from payload
  3. Converts timestamp from microseconds to seconds with appropriate scaling
  4. Includes comments about precision limitations (~285 years of exact equivalence)

The conversion logic from microseconds to seconds is clear and maintains precision where needed.


135-136: Improved error message for non-number values.

The error message now specifies that the value is not a number, making it clearer for debugging.


528-545: Comprehensive test cases for datetime expressions.

The tests cover:

  1. Constant datetime expressions
  2. Payload variables that don't exist (with appropriate error)
  3. Payload variables with default values
  4. Proper timestamp conversion

This ensures the datetime functionality is thoroughly tested.


465-465:

Details

✅ Verification successful

Verify precision of GeoDistance calculation.

The expected value for the GeoDistance test has been updated to a higher precision value. Ensure this is consistent with actual geo-distance calculations and doesn't introduce unintended precision issues.


🏁 Script executed:

#!/bin/bash

# Check if there are other instances of GeoDistance calculations to verify consistency
rg -A 2 "GeoDistance" --type rust

Length of output: 4892


GeoDistance Precision Verification Completed

The updated expected value (21926.494151786308) now reflects a higher precision level consistent with how GeoDistance is computed throughout the repository. Based on our search across related modules, the precision update appears intentional and in line with other implementations. Please review the rounding behavior to ensure that no downstream issues arise.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
lib/collection/src/operations/universal_query/formula.rs (1)

27-31: GeoDistance variant correctly reintroduced.

Declaring origin and to clarifies handling geo-distance expressions. Ensure you have or plan to add tests that validate geo-distance logic (e.g., correct path insertion, expected distance calculation outcomes).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 918a37a and 5c35692.

📒 Files selected for processing (1)
  • lib/collection/src/operations/universal_query/formula.rs (6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test-consensus
  • GitHub Check: test (macos-latest)
  • GitHub Check: test
  • GitHub Check: test (windows-latest)
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (4)
lib/collection/src/operations/universal_query/formula.rs (4)

8-8: Good addition of PreciseScore and DateTimeExpression.

Transitioning from ScoreType (f32) to PreciseScore (f64) aligns with the PR objective of preserving precision for date-time microseconds. Including DateTimeExpression here cleanly integrates the new date-time functionality.


65-65: Accurate conversion to PreciseScore.

Using PreciseScore::from(c) ensures continuity with double-precision arithmetic. This change is consistent with the new design, preserving precision to avoid floating-point issues in subsequent calculations.


78-88: Robust date-time parse and variable handling.

Your date-time parsing logic with dt_str.parse()? then selectively inserting json_path into payload_vars is clear and consistent with the API design. Confirm that unexpected or invalid date-time strings properly trigger the error flow, preventing silent failures.

Please verify with negative test cases (invalid date-time strings) to ensure the parse errors catch inputs as intended.


206-211:

Details

✅ Verification successful

New REST mappings for GeoDistance and DateTime.

Mapping rest::Expression::GeoDistance and rest::Expression::Datetime to ExpressionInternal::GeoDistance and DateTime is straightforward. Double-check that the updated field naming matches across all layers (REST, internal, and gRPC) to avoid field mismatch errors at runtime.


🏁 Script executed:

#!/bin/bash
# Ensure the naming is consistent across REST, internal, and gRPC definitions.
# Searching for "datetime" and "geo_distance" references.
rg -A 5 'datetime'
rg -A 5 'geo_distance'

Length of output: 45754


Naming Consistency Verified Across Layers

The mapping of rest::Expression::GeoDistance to ExpressionInternal::GeoDistance and rest::Expression::Datetime to ExpressionInternal::DateTime has been thoroughly verified. All references in the REST definitions, internal representations, and gRPC schemas (including related documentation) show consistent naming and implementation. No field mismatches were detected.

@timvisee
Copy link
Copy Markdown
Member

f64's microseconds

Why not seconds? It's a float anyhow. I usually don't like it if we diverge from the SI base units.

Comment thread lib/collection/src/operations/universal_query/shard_query.rs Outdated
Comment on lines +86 to +120
impl FromStr for DateTimeExpression {
type Err = String;

fn from_str(s: &str) -> Result<Self, Self::Err> {
// try as date
DateTimePayloadType::from_str(s)
.map(DateTimeExpression::Constant)
// else as payload key
.or_else(|_| JsonPath::from_str(s).map(DateTimeExpression::PayloadVariable))
.map_err(|_| format!("Invalid date time expression: {s}"))
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not possible to properly distinguish between the two types, right? Not a fan of this, since it's definitely possible for a user to use date time strings as keys.

Maybe like this:

"history": {
  "2025-03-24": 128,
  "2025-03-23": 110,
  "2025-03-22": 283,
  "2025-03-21": 10,
  "2025-03-20": 123,
  "2025-03-19": 28,
  "2025-03-18": 387
}

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
lib/api/src/grpc/qdrant.rs (1)

5206-5210: New datetime expression variants added.

The addition of Datetime and DatetimeKey variants appropriately handles both constant datetime values and dynamic references to payload datetime fields. This implementation aligns with the PR objective to support datetime expressions.

Consider adding documentation about the expected datetime string format (e.g., ISO 8601) to help API users provide correctly formatted datetime strings.

docs/redoc/master/openapi.json (1)

14517-14527: Definition for the GeoDistance Schema
The new GeoDistance schema is defined as an object requiring a geo_distance property, which references the GeoDistanceParams schema. This structure is clear and consistent.

Consider adding an optional "description" field for the geo_distance property to improve API documentation clarity.

docs/grpc/docs.md (1)

2718-2719: Clarify New DateTime Fields in Expression Message
The addition of the datetime and datetime_key fields correctly extends the Expression table to support date-time expressions. To further improve clarity for API consumers, consider expanding the descriptions to note that the date-time string is expected to represent a constant (or payload key) which is internally parsed into microseconds as an f64 value before any subsequent conversion to f32 is applied. Also, please verify that the updates in related documentation (e.g., any changes in CollectionInfo.PayloadSchemaEntry, if applicable) are consistent and complete.

lib/collection/src/operations/universal_query/formula.rs (1)

83-89: Parsing Datetime with Chrono.
Ensures valid date-time strings become a ParsedExpression::Datetime. Consider thorough testing with edge-case timestamps.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c35692 and a684943.

📒 Files selected for processing (10)
  • docs/grpc/docs.md (1 hunks)
  • docs/redoc/master/openapi.json (2 hunks)
  • lib/api/src/grpc/conversions.rs (4 hunks)
  • lib/api/src/grpc/proto/points.proto (1 hunks)
  • lib/api/src/grpc/qdrant.rs (2 hunks)
  • lib/api/src/rest/schema.rs (2 hunks)
  • lib/collection/src/operations/universal_query/formula.rs (6 hunks)
  • lib/collection/src/operations/universal_query/shard_query.rs (1 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (13 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (11 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (3)
lib/segment/src/index/query_optimization/optimized_filter.rs (1)
  • check_condition (34-39)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (3)
  • score (256-257)
  • new_payload_id (149-151)
  • new_geo_distance (126-128)
lib/segment/src/types.rs (6)
  • new (376-389)
  • new (1463-1466)
  • new (2601-2603)
  • new (2825-2827)
  • new (2838-2840)
  • timestamp (68-70)
🔇 Additional comments (48)
lib/api/src/grpc/qdrant.rs (2)

5185-5185: Tag sequence update for new datetime expressions.

The tag list has been extended to include tags 18 and 19, accommodating the new datetime expression variants. This is necessary to maintain the protocol buffer structure.


5212-5249: Tag numbers updated for all expression variants.

All tag numbers have been consistently incremented to accommodate the new datetime variants. This maintains the proper protocol buffer structure.

docs/redoc/master/openapi.json (4)

14467-14475: New Expression Types Added in the Expressions Array
The JSON references for GeoDistance, DatetimeExpression, and DatetimeKeyExpression have been correctly inserted into the expressions array. This addition aligns well with the new PR feature for supporting datetime expressions and extended geo-spatial queries.


14528-14543: Definition for the GeoDistanceParams Schema
This schema mandates both the origin and to properties, ensuring the payload includes all essential data for geo queries. The origin properly references a GeoPoint, and the to property includes a description to inform API consumers.


14544-14554: Definition for the DatetimeExpression Schema
The new DatetimeExpression schema is structured as an object requiring a datetime field of type string. This meets the requirement of parsing timestamp values. Ensure that the application logic properly converts this string to a floating-point microsecond value during evaluation.


14555-14565: Definition for the DatetimeKeyExpression Schema
The DatetimeKeyExpression is defined with a required datetime_key string. This addition is consistent with the naming conventions and requirements for handling date-time expressions by key.

lib/collection/src/operations/universal_query/shard_query.rs (1)

400-401: Support for datetime expressions added correctly

The implementation properly handles the new datetime variant cases, mapping the gRPC representation to their internal counterparts. The naming convention uses "Datetime" (single word) which aligns with existing naming patterns as discussed in previous PR comments.

lib/api/src/grpc/conversions.rs (4)

8-8: New ScoreType import supports precision improvement

This import facilitates the transition from f32 to f64 for handling timestamp microseconds, as described in the PR objectives.


2797-2797: Constants now cast to ScoreType instead of f32

This change supports the transition from f32 to f64 for better precision when handling timestamp microseconds during evaluations.


2805-2816: Datetime expression unparsing implemented correctly

The implementation properly handles both constant datetime values and payload variables, converting them to the appropriate string representation. This completes the bidirectional conversion needed for datetime expressions.


2839-2839:

Details

❓ Verification inconclusive

Verify that conversion to f32 doesn't lose precision

While the PR aims to use f64 for better precision, this line converts values back to f32 in the division expression handling. This could potentially lose precision for very large timestamp values.


🌐 Web query:

What is the range of JavaScript's Number.MAX_SAFE_INTEGER vs. f32 max value?

💡 Result:

JavaScript's Number.MAX_SAFE_INTEGER and 32-bit floating-point (f32) values have fundamentally different ranges due to their underlying data formats:

Feature JavaScript Number.MAX_SAFE_INTEGER f32 Max Safe Integer
Value (2^{53} - 1 = 9,007,199,254,740,991) [1][2][7] (2^{24} = 16,777,216) [4][6]
Precision Limit Integers within (±(2^{53} - 1)) are represented exactly [2][9] Integers ≤ (2^{24}) can be stored precisely without rounding errors [4][6]
Data Type 64-bit double-precision floating point [2][9] 32-bit single-precision floating point [4][6]
Common Use Cases General JavaScript number operations [5][10] Low-level systems programming, WebAssembly [4][6]

Key Technical Differences

  • JavaScript's MAX_SAFE_INTEGER leverages the 52-bit mantissa of 64-bit floats (with an implicit 53rd bit). This allows exact integer representation up to (2^{53} - 1), beyond which operations like equality checks fail due to precision loss. For example, (9,007,199,254,740,991 + 1 = 9,007,199,254,740,992) becomes unrepresentable as distinct values [2][7].

  • f32 Max Value uses a 23-bit mantissa with an implicit 24th bit, limiting its integer precision to (2^{24}) (16,777,216). Beyond this, consecutive integers cannot be uniquely distinguished. For example, trying to store 16,777,217 in f32 would alias it to 16,777,216 due to insufficient precision [4][6].

Practical Implications

  • Use BigInt in JavaScript for values exceeding (9,007,199,254,740,991) [2][5]
  • In systems requiring 32-bit floats, validate conversions using Number.isSafeInteger() checks or limit operations to the (±16,777,216) range [4][6]
  • Financial/security applications should enforce hard limits based on these thresholds to prevent silent precision errors [5][9]

While Number.MAX_SAFE_INTEGER provides over 100,000x greater precision than f32, both serve distinct roles depending on memory constraints and numerical requirements.

Citations:


Revisit f32 conversion precision concerns in timestamp division handling

The conversion of by_zero_default to f32 may introduce precision loss for very large values. Given that f32 can only precisely represent integers up to 16,777,216—compared to JavaScript’s Number.MAX_SAFE_INTEGER of 9,007,199,254,740,991—please verify whether the values handled here are guaranteed to remain below f32’s precision threshold. If there’s any possibility of exceeding this limit, reconsider either retaining f64 or enforcing a pre-conversion range check to safeguard against unintended rounding.

  • File: lib/api/src/grpc/conversions.rs
  • Line: 2839
  • Code in question:
                by_zero_default: by_zero_default.map(|v| v as f32),
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (10)

9-9: Good addition of DateTimePayloadType.
This import expansion properly aligns with the new datetime expressions.


16-16: Switching from f32 to f64 is appropriate for precision.
Using PreciseScore = f64 minimizes floating-point precision loss, especially valuable for timestamps.


35-36: Refactor to Constant(PreciseScore) improves internal consistency.
Changing the constant variant to use PreciseScore maintains uniformity across the new expression model.


38-42: Introducing GeoDistance and Datetime variants.
These additions enable location-based and datetime-based score adjustments. Implementation looks coherent with existing patterns.


44-44: Minor style alignment.
The comment “// Nested” is consistent with how you separate terminal vs nested expressions. No further action needed.


50-50: Consistent usage of Option<PreciseScore>.
Ensuring the by_zero_default parameter is typed as PreciseScore matches the new float precision strategy.


69-69: Expanded usage of lambda: PreciseScore.
This change ensures consistent high-precision for decay-related calculations, reducing rounding errors.


133-133: PropTest with f64 range for Lin decay.
Using 0.000001..1.0 fosters robust testing to catch edge cases for linear decay. Nicely done.


157-157: PropTest with negative f64 range for Exp decay.
Ensuring coverage of negative lambda values is critical; no issues noted.


364-364: PropTest with large negative f64 range for Gauss decay.
Wide range coverage for Gaussian decay ensures better resilience. This seems suitable.

lib/api/src/rest/schema.rs (5)

643-645: New variants for geo-distance and date-time handling.
Adding GeoDistance, Datetime, and DatetimeKey extends the query expression features in a well-structured manner.


661-664: GeoDistance struct introduction.
Clearly encapsulates the geospatial parameters. Kudos for making the API more expressive.


666-672: GeoDistanceParams for explicit parameter grouping.
Defining origin and destination fields fosters clarity and type safety.


674-677: DatetimeExpression struct.
Allows a direct string-based date-time expression, laying groundwork for robust date-time parse logic.


679-682: DatetimeKeyExpression struct.
Encapsulating the payload-based date-time retrieval is a neat approach to unify key-based expressions.

lib/collection/src/operations/universal_query/formula.rs (8)

27-32: Additional variants for GeoDistance, Datetime, and DatetimeKey.
These expansions unify expression handling across collection queries, matching the changes in the REST layer.


66-66: Upcasting constants to PreciseScore.
This approach improves precision and aligns with the codebase’s new float usage.


79-82: GeoDistance logic is consistent with rest of the system.
Inserting the JSON path into payload_vars and constructing via new_geo_distance is correct.


90-96: Payload-based DatetimeKey conversion.
This setup is elegantly consistent. If the user stores date-time strings in payload fields, this variant will parse them out.


119-119: Consistently converting by_zero_default to PreciseScore.
Preserves uniform float precision in division expressions.


214-216: Mapping REST GeoDistance to internal variant.
The bridging code is correctly placed, with no apparent pitfalls.


217-219: Mapping REST Datetime to internal expression.
Implementation straightforwardly reuses the date-time string.


220-222: Mapping REST DatetimeKey to internal expression.
Implementation aligns well with the newly introduced date-time payload approach.

lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (12)

10-11: Good addition of new imports

The changes correctly import DatetimeExpression and PreciseScore used in the updated formula scorer implementation.


21-22: Appropriate constant type change

Updating constants to use PreciseScore (f64) instead of ScoreType (f32) is a good change to maintain precision for timestamp calculations.


42-46: Correct trait implementation update

Correctly modified the FriendlyName trait implementation to use PreciseScore instead of ScoreType.


54-58: Good addition of FriendlyName implementation for DateTimePayloadType

This implementation allows datetime values to be properly described in error messages, enhancing debugging and user experience.


99-109: Good precision handling with clear conversion strategy

The implementation correctly maintains precision by using PreciseScore (f64) during calculations and only converting to f32 at the end. The non-finite number check is also properly updated to handle the conversion.


117-117: Correct method signature update

Changed return type from ScoreType to PreciseScore to maintain high precision during formula evaluation.


125-125: Good handling of score conversion

Correctly converts retrieved ScoreType scores to PreciseScore when processing variable expressions.


151-152: Type consistency in geo distance calculation

The geo distance calculation now correctly returns a PreciseScore value, maintaining type consistency throughout the evaluation process.


153-176: Well-implemented datetime expression handling

The implementation for datetime expressions is thorough and handles both constant and payload variable cases. Good error handling and proper conversion from microseconds to seconds.

I appreciate the detailed comment about f64 precision for timestamp microseconds (spanning about 285 years of exact equivalence) which explains the technical limitations clearly.


351-364: Good update to decay function signatures

The decay functions (exp_decay, gauss_decay, linear_decay) have been correctly updated to use PreciseScore instead of ScoreType for better precision.


488-488: Appropriate test update

Updated the test case to use PreciseScore for expected values, ensuring consistency with the implementation changes.


528-545: Comprehensive test cases for datetime expressions

Good addition of test cases for datetime expressions that cover both constant values and payload variables, including error cases. This ensures the new functionality is properly tested.

lib/api/src/grpc/proto/points.proto (2)

586-587: Good addition of datetime fields to Expression message

Adding datetime and datetime_key fields enables support for datetime expressions in the API, aligning with the implementation changes in the formula scorer.


588-601: Correct field numbering after adding new fields

All subsequent fields have been correctly renumbered to accommodate the new datetime fields. This maintains proper protocol buffer field numbering conventions.

@coszio coszio force-pushed the score-boosting-with-datetime branch from a684943 to 2c3155a Compare March 24, 2025 21:38
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
lib/api/src/grpc/qdrant.rs (1)

5182-5186: Update of tag indices requires careful API versioning consideration.

The increase in tag indices within the prost attribute to accommodate new datetime expression variants might impact backward compatibility with older clients. Before finalizing this change, ensure proper API versioning strategy is in place to handle clients that might be expecting the original tag indices.

Consider consulting your team's API versioning guidelines. In many gRPC systems, shifting existing tag numbers is avoided by either:

  1. Adding new fields only at the end of existing definitions
  2. Implementing parallel API version support if breaking changes are necessary
docs/redoc/master/openapi.json (1)

14523-14571: New Schema Definitions for GeoDistance and Datetime Expressions
The added schema definitions for GeoDistance, GeoDistanceParams, DatetimeExpression, and DatetimeKeyExpression correctly specify the required properties and structure. The approach of referencing an existing GeoPoint for the "origin" property and adding a descriptive field for "to" is appropriate. You might consider whether additional validation—for example, format constraints (such as an ISO 8601 regex or "format": "date-time") for the datetime fields—would benefit API consumers by enforcing the expected string format.

lib/api/src/grpc/conversions.rs (1)

842-842: Ensure consistent precision handling.

The by_zero_default is cast to f32 despite being stored as ScoreType (f64). This might cause precision loss when handling large timestamp values.

Consider keeping the f64 precision throughout the entire expression evaluation pipeline:

-by_zero_default: by_zero_default.map(|v| v as f32),
+by_zero_default: by_zero_default,

This would require updating the receiving type in the protocol definition, but would maintain precision consistency.

lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (2)

153-176: Add datetime expression handling with microsecond precision

The implementation correctly handles both constant and payload variable datetime expressions, converting them to seconds with microsecond precision, which is suitable for the scoring use case.

However, the conversion factor should be a named constant for better maintainability.

+// Conversion factor from microseconds to seconds
+const MICROSECONDS_TO_SECONDS: PreciseScore = 1_000_000.0;

 fn eval_expression(
     &self,
     expression: &ParsedExpression,
     point_id: PointOffsetType,
 ) -> OperationResult<PreciseScore> {
     match expression {
         // ... other cases ...
         ParsedExpression::Datetime(dt_expr) => {
             // ... existing code ...
             
             // Convert to seconds
-            let float_seconds = float_micros / 1_000_000.0;
+            let float_seconds = float_micros / MICROSECONDS_TO_SECONDS;
             
             Ok(float_seconds)
         }
         // ... other cases ...
     }
 }

171-175: Consider documenting the precision implications of date conversion

While the code correctly converts microseconds to seconds, it would be beneficial to add a comment about the chosen approach for datetime handling, explaining why seconds were chosen as the unit and how this impacts scoring.

// Convert from i64 to f64.
// f64's 53 bits of sign + mantissa for microseconds means a span of exact equivalence of
// about 285 years, after which precision starts dropping
let float_micros = datetime.timestamp() as PreciseScore;

+// Convert to seconds for consistent scoring scale
+// This approach provides sufficient precision for decay functions while keeping
+// the values in a reasonable range for typical scoring calculations
let float_seconds = float_micros / 1_000_000.0;

Ok(float_seconds)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a684943 and 2c3155a.

📒 Files selected for processing (13)
  • docs/grpc/docs.md (1 hunks)
  • docs/redoc/master/openapi.json (2 hunks)
  • lib/api/src/grpc/conversions.rs (4 hunks)
  • lib/api/src/grpc/proto/points.proto (1 hunks)
  • lib/api/src/grpc/qdrant.rs (2 hunks)
  • lib/api/src/rest/schema.rs (2 hunks)
  • lib/collection/src/operations/universal_query/formula.rs (6 hunks)
  • lib/collection/src/operations/universal_query/shard_query.rs (1 hunks)
  • lib/common/common/src/math.rs (1 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (13 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (11 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (7 hunks)
  • lib/segment/src/types.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • lib/collection/src/operations/universal_query/shard_query.rs
  • docs/grpc/docs.md
  • lib/common/common/src/math.rs
  • lib/api/src/rest/schema.rs
  • lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs
  • lib/segment/src/types.rs
  • lib/api/src/grpc/proto/points.proto
🧰 Additional context used
🧬 Code Definitions (2)
lib/collection/src/operations/universal_query/formula.rs (3)
lib/collection/src/operations/universal_query/shard_query.rs (8)
  • from (502-507)
  • from (511-516)
  • from (520-524)
  • from (528-532)
  • from (568-587)
  • from (591-611)
  • from (615-635)
  • from (639-669)
lib/api/src/rest/schema.rs (1)
  • from (1223-1228)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (1)
  • new_geo_distance (126-128)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (2)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (3)
  • score (256-257)
  • new_payload_id (149-151)
  • new_geo_distance (126-128)
lib/segment/src/types.rs (5)
  • new (376-389)
  • new (1463-1466)
  • new (2601-2603)
  • new (2825-2827)
  • new (2838-2840)
🪛 GitHub Actions: Storage compatibility tests
lib/api/src/grpc/conversions.rs

[error] 7-7: unresolved import common::counter::hardware_data::RealCpuMeasurement: no RealCpuMeasurement in counter::hardware_data

🪛 GitHub Actions: Formatter and linter
lib/api/src/grpc/conversions.rs

[error] 7-7: unresolved import common::counter::hardware_data::RealCpuMeasurement: no RealCpuMeasurement in counter::hardware_data.

🪛 GitHub Actions: Rust tests
lib/api/src/grpc/conversions.rs

[error] 7-7: unresolved import common::counter::hardware_data::RealCpuMeasurement: no RealCpuMeasurement in counter::hardware_data

⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consensus-compose
🔇 Additional comments (23)
lib/api/src/grpc/qdrant.rs (2)

5206-5211: Addition of new DateTime expression variants.

The new Datetime and DatetimeKey variants align with the PR objectives to support datetime expressions. The implementation correctly provides mechanisms for both constant datetime values and references to datetime values stored in payload keys.

Ensure comprehensive documentation is added explaining the datetime string format that will be supported and how the parsing to microseconds (f64) will be handled in the implementation.


5213-5250:

Details

❓ Verification inconclusive

Verify tag number shifts won't break existing clients.

All tag numbers from the original Mult variant onwards have been incremented to accommodate the new datetime variants. This represents a potentially breaking change for existing clients that expect specific tag numbers for these operations.

Consider running the following verification to identify potential compatibility issues:


🏁 Script executed:

#!/bin/bash
# Find all client code that might be affected by tag number changes

# Look for direct references to Expression tag numbers in tests or client code
rg -A 1 -B 1 "tag\s*=\s*[\"']?[5-9][0-9]?[\"']?" --glob "**/*.{rs,proto}"

# Check if there are clients in the wild that might be affected
# Look for import statements of the Expression proto
rg -A 5 -B 5 "Expression|expression::Variant" --glob "**/*.{rs,py,go,js,java}"

Length of output: 113104


Important: Confirm Backward Compatibility with Revised Protobuf Tag Numbers

The refactoring in lib/api/src/grpc/qdrant.rs has inserted new datetime variants (with tags 5 and 6), which shifted the tag numbers for all subsequent variants (e.g. Mult is now tag 7, Sum tag 8, etc.). Our grep verification confirms that these tags are referenced throughout our internal gRPC and REST schema files, and no external client code was found that hardcodes these tag numbers. However, if any clients depend on the previous numeric mapping—for example, by using serialized messages directly or referencing tag values manually—this change could break compatibility.

Action items:

  • Manually verify client compatibility: Double-check that any external or stored serialized data correctly maps to the new tag assignments.
  • Review tests and utilities: Ensure that hardcoded tag references (if any exist) are updated accordingly.
  • Document the migration: Update changelogs or client guidelines as needed, warning users about this potentially breaking change.
docs/redoc/master/openapi.json (1)

14473-14481: New Expression References Added to the Schema List
The new entries for GeoDistance, DatetimeExpression, and DatetimeKeyExpression have been added correctly into the array. Please ensure these references remain consistent with their corresponding schema definitions later in the file.

lib/api/src/grpc/conversions.rs (2)

800-801: Type conversion for precision improvement.

The constant value is now cast to ScoreType (f64) instead of using f32 directly, which improves precision for operations involving timestamps in microseconds.


808-819: Good implementation of the datetime and geo-distance expressions.

The new handling for GeoDistance and Datetime variants aligns well with the PR objective of supporting datetime expressions. The code correctly handles both constant datetime values and payload variable references.

lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (3)

264-277: Well-structured test data addition.

The new payload data structure is well-organized and properly maintains test coverage for the existing functionality while adding support for the new test cases.


364-377: Good test coverage for datetime indexing.

The datetime index creation is well implemented, with test cases that verify both explicit ISO 8601 format ("2023-01-01T00:00:00Z") and shorthand format ("2023-01-02").


419-437: Consider adding a comment about datetime normalization behavior.

The test demonstrates that short date formats like "2023-01-02" are automatically normalized to full RFC 3339 timestamps "2023-01-02T00:00:00Z" during retrieval, but this behavior isn't documented.

Add a comment near line 430 explaining this normalization behavior:

                0 => assert_eq!(value, [json!("2023-01-01T00:00:00Z")].into()),
                1 => assert_eq!(
                    value,
+                   // Note: "2023-01-02" is automatically normalized to "2023-01-02T00:00:00Z"
                    [json!("2023-01-02T00:00:00Z"), json!("2023-01-03T00:00:00Z")].into()
                ),
lib/collection/src/operations/universal_query/formula.rs (6)

8-8: Important type imports for precision improvement.

The additional imports of DatetimeExpression and PreciseScore support the PR objective of improving precision when handling timestamps.


27-33: Good addition of new expression types.

The new enum variants for GeoDistance, Datetime, and DatetimeKey are well-structured and align with the PR objectives. The GeoDistance implementation looks good with proper field types.


66-66: Improved precision with PreciseScore.

Converting constants to PreciseScore (f64) is important for maintaining precision when working with microsecond timestamps.


79-96: Complete implementation of datetime and geo-distance parsing.

The parse_and_convert method now properly handles the new expression types:

  • For GeoDistance, it correctly adds the payload key to the tracked variables
  • For Datetime, it parses the string into a datetime constant with error handling
  • For DatetimeKey, it parses the key and adds it to the tracked payload variables

119-119: Consistent use of PreciseScore for by_zero_default.

Converting the by_zero_default option to PreciseScore (f64) maintains precision consistency in the division operation.


214-222: Complete implementation of expression type conversions.

The From<rest::Expression> implementation properly handles the new expression types, ensuring consistent conversion between the REST API and internal representations.

lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (9)

10-12: Properly updated imports to include the new DatetimeExpression type and PreciseScore

Good addition of the necessary imports to support datetime expressions and precise scoring.


19-19: LGTM: Added import for DateTimePayloadType

The addition of this import enables the datetime functionality introduced in this PR.


21-22: Updated constant types to use PreciseScore

Good update of the constant types from ScoreType (f32) to PreciseScore (f64) to maintain precision when dealing with datetime values.


54-58: Added FriendlyName implementation for DateTimePayloadType

Good addition of the FriendlyName trait implementation, which is necessary for error reporting when dealing with datetime values.


99-110: Updated score method to convert PreciseScore to ScoreType

The score method now correctly converts the f64 precision score to f32 for final output, with proper error handling for non-finite values.


117-117: Changed return type to use PreciseScore

Good update to use PreciseScore (f64) as the return type for evaluation to maintain precision during calculations.


125-125: Updated score conversion to use PreciseScore

The conversion from ScoreType to PreciseScore is now properly implemented.


351-351: Updated decay function signatures to use PreciseScore

Good conversion of the decay function parameters from ScoreType to PreciseScore to maintain precision during calculations.

Also applies to: 356-356, 361-361


528-545: Added comprehensive tests for datetime expressions

Good test coverage for the new datetime functionality, including tests for constant datetime expressions, payload variable expressions, and error cases.

Comment thread lib/api/src/grpc/conversions.rs Outdated
@timvisee timvisee force-pushed the score-boosting-with-datetime branch from 2f333b0 to 2c3155a Compare March 25, 2025 10:31
@coszio coszio force-pushed the score-boosting-with-datetime branch from 2c3155a to eeac2b1 Compare March 25, 2025 13:07
@coszio
Copy link
Copy Markdown
Contributor Author

coszio commented Mar 25, 2025

Just updated the description, @timvisee the PRs uses datetime as float seconds

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
lib/api/src/grpc/qdrant.rs (1)

5206-5211: Consider adding datetime format validation.

The code accepts any string for datetime values without validation. Consider adding validation or documentation specifying the expected datetime string format to prevent runtime errors.

What are best practices for handling datetime string formats in Rust?
docs/redoc/master/openapi.json (1)

14523-14571: Define Schemas for New Expression Types.
The schema definitions for "GeoDistance", "GeoDistanceParams", "DatetimeExpression", and "DatetimeKeyExpression" are clearly and consistently structured. As a minor enhancement, consider specifying a "format": "date-time" for the datetime strings if they conform to ISO standards. This additional detail can help API consumers with validation and improve documentation clarity. Additionally, verify that the description for the "to" field in "GeoDistanceParams" provides enough context for users.

lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (1)

103-107: Good addition of DatetimeExpression enum, but missing documentation.

The new DatetimeExpression enum provides a clean way to represent both constant datetime values and payload variables. However, it would benefit from documentation explaining its purpose and usage.

Consider adding documentation to explain the purpose and usage of the DatetimeExpression enum:

/// Represents a datetime expression that can be either a constant datetime value
/// or a reference to a datetime field in the payload.
#[derive(Debug, Clone, PartialEq)]
pub enum DatetimeExpression {
    /// A constant datetime value
    Constant(DateTimePayloadType),
    /// A reference to a datetime field in the payload
    PayloadVariable(JsonPath),
}
lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (1)

423-441: Well-implemented datetime retrieval test, but missing documentation about normalization.

The test for datetime retrieval properly verifies that datetime values can be retrieved from the index correctly. However, it would be helpful to add a comment explaining that date strings (like "2023-01-02") are intentionally normalized to full RFC 3339 timestamps ("2023-01-02T00:00:00Z").

Add a comment to clarify the datetime normalization behavior:

 // Test retrieving a datetime from the index.
+// Note: Date strings without time information (e.g., "2023-01-02") are
+// normalized to midnight UTC (e.g., "2023-01-02T00:00:00Z") during conversion
+// to DateTimePayloadType.
 let retriever = variable_retriever(
     &indices,
     &"creation".try_into().unwrap(),
     payload_provider.clone(),
     &hw_counter,
 );
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (1)

125-125: Consider optimizing score conversion

While this conversion works, consider using direct assignment for simplicity and potentially better performance.

-                    .map(|score| PreciseScore::from(*score))
+                    .map(|&score| PreciseScore::from(score))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f333b0 and eeac2b1.

📒 Files selected for processing (13)
  • docs/grpc/docs.md (1 hunks)
  • docs/redoc/master/openapi.json (2 hunks)
  • lib/api/src/grpc/conversions.rs (4 hunks)
  • lib/api/src/grpc/proto/points.proto (1 hunks)
  • lib/api/src/grpc/qdrant.rs (2 hunks)
  • lib/api/src/rest/schema.rs (2 hunks)
  • lib/collection/src/operations/universal_query/formula.rs (6 hunks)
  • lib/collection/src/operations/universal_query/shard_query.rs (1 hunks)
  • lib/common/common/src/math.rs (1 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (13 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (11 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (7 hunks)
  • lib/segment/src/types.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • lib/collection/src/operations/universal_query/shard_query.rs
  • lib/common/common/src/math.rs
  • lib/segment/src/types.rs
  • docs/grpc/docs.md
  • lib/api/src/grpc/proto/points.proto
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (3)
lib/segment/src/index/query_optimization/optimized_filter.rs (1)
  • check_condition (34-39)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (3)
  • score (256-257)
  • new_payload_id (149-151)
  • new_geo_distance (126-128)
lib/segment/src/types.rs (5)
  • new (376-389)
  • new (1463-1466)
  • new (2601-2603)
  • new (2825-2827)
  • new (2838-2840)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test (windows-latest)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
🔇 Additional comments (35)
lib/api/src/grpc/qdrant.rs (3)

5206-5211: Good addition of datetime expressions.

The new Datetime and DatetimeKey variants align with the PR objective to support datetime expressions for score boosting. These additions will allow parsing datetime strings into f64 values representing microseconds as described in the PR.


5213-5250:

Details

❓ Verification inconclusive

Confirm tag renumbering across all related code.

All existing variants have had their tags increased by several positions to accommodate the new datetime variants. This systematic renumbering appears correct, but ensure that:

  1. All parsers and serializers are updated to handle the new tag numbers
  2. Any stored data using the old tag numbers is properly migrated
  3. Client code expecting specific tag numbers is updated

🏁 Script executed:

#!/bin/bash
# Check for potential parsing/conversion code that might need updates
echo "Looking for expression conversion code:"
rg "fn.*Expression" -t rust

echo "Checking for protobuf version and code generation:"
rg "prost" -g "Cargo.toml"

Length of output: 2325


Ensure consistent tag renumbering across the codebase

The updated tag numbers in lib/api/src/grpc/qdrant.rs (lines 5213-5250) appear correct. However, please make sure that the renumbering is fully reflected across the system by confirming that:

  • All parsers and serializers (for example, those in conversion functions found in files under lib/segment/src/index/query_optimization/rescore_formula/ and lib/collection/src/operations/universal_query/) correctly interpret the new tag values.
  • Any persisted data using the old tag numbers is either migrated or handled so that no legacy data leads to inconsistencies.
  • Client code expecting specific tag numbers is updated accordingly, including tests that validate the end-to-end serialization/deserialization process.

Please review these points (and consider running integration tests) to ensure that the renumbering change does not inadvertently break downstream or storage-related functionality.


5185-5185:

Details

❓ Verification inconclusive

Verify protocol compatibility after tag range expansion.

The tag range for the oneof field has been expanded to include tags 17, 18, and 19. Since Protocol Buffers rely on stable tags for backward compatibility, ensure this change won't break existing clients or data.


🏁 Script executed:

#!/bin/bash
# Check if there are any older versions of API clients that might be affected
# Look for imports or usage of the Expression struct in test code

echo "Checking for Expression usage in tests:"
rg "Expression" -t rust --glob "*/tests/**" --max-depth 3

echo "Checking for potential API client usage:"
rg "use.*grpc.*Expression" -t rust

Length of output: 540


Protocol Compatibility Check for Expanded Tag Range

The tag range for the oneof field in lib/api/src/grpc/qdrant.rs now includes tags 17, 18, and 19. Please ensure that this change does not conflict with any existing clients. Note the following:

  • API Client Usage: A search found a reference in lib/collection/src/operations/universal_query/shard_query.rs using DecayParamsExpression from the grpc module. Double-check that this usage and any other client interactions remain compatible with the new tag assignments.
  • Test Coverage: The initial test search for Expression usage in tests did not return results. Verify that existing integration tests (or add new ones) cover scenarios involving the modified oneof field to guarantee backward compatibility.

Please run a thorough compatibility review and confirm that no unintended breaks occur for existing API clients.

docs/redoc/master/openapi.json (1)

14473-14481: Integrate New Expression Types in Schema Union.
The new references for "GeoDistance", "DatetimeExpression", and "DatetimeKeyExpression" are correctly added to the list of expression options. Please ensure that downstream consumers of the API are updated to properly handle these new types and that the ordering in the schema (if significant) aligns with the intended evaluation precedence.

lib/api/src/grpc/conversions.rs (5)

8-8: Good addition of the ScoreType import.

The addition of the ScoreType import is appropriate for supporting the new datetime expressions feature, ensuring consistent type handling in your expression evaluation.


17-17: Good import of DatetimeExpression for enhancing formula capabilities.

Adding the DatetimeExpression import from the parsed_formula module properly supports the new datetime expression functionality.


2800-2800: Well-structured type handling for constants.

The change to cast the constant value to ScoreType ensures proper type consistency throughout the formula evaluation process.


2808-2819: Well-implemented support for datetime expressions.

The implementation of the GeoDistance and Datetime variants in the unparse_expression function is well-structured. Both variants properly convert their respective data types for use in expressions:

  • GeoDistance correctly transforms origin and key data
  • Datetime appropriately handles both constant datetime values and payload variables

This aligns well with the PR objectives to support datetime expressions.


2842-2842: Type consistency improvement for by_zero_default.

Casting the by_zero_default value to f32 ensures proper type consistency between the internal representation and the API.

lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (6)

16-16: Good addition of PreciseScore type for improved precision.

Defining PreciseScore as f64 helps maintain precision when handling timestamp microseconds during evaluations, which aligns with the PR objectives. This is especially important for time-based operations that require greater floating-point precision.


36-43: Well-structured updates to ParsedExpression enum.

The changes to the ParsedExpression enum effectively support the new requirements:

  1. Using PreciseScore for constants improves precision
  2. Properly structuring GeoDistance with origin and key fields
  3. Adding a new Datetime variant that leverages the DatetimeExpression enum

These changes align perfectly with the PR objectives to support datetime expressions.


169-171: Properly updated decay_params_to_lambda to use PreciseScore.

The changes to convert the midpoint and scale to PreciseScore ensure that calculations maintain adequate precision throughout the evaluation process, especially important for timestamp-related operations.


198-198: Good type adaptation for decay_lambda_to_params.

Updating the function signature to accept PreciseScore while returning (ScoreType, ScoreType) properly handles the transition between high-precision internal calculations and API-compatible output types.


211-211: Appropriate type casting in decay functions.

The addition of type casting to ScoreType in the return values of the decay_lambda_to_params method ensures proper type compatibility while maintaining precision where it matters most in the calculation.

Also applies to: 227-229, 243-243


350-351: Well-updated test cases for new PreciseScore type.

The test functions have been correctly updated to use PreciseScore instead of f32, ensuring that the tests properly validate the updated behavior of the decay functions.

Also applies to: 357-358, 364-365

lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (2)

264-271: Good addition of test payload with array data.

The new payload3 test case enhances test coverage by including scenarios with arrays containing single values, which helps verify correct handling of both single values and arrays.


364-385: Thorough test setup for datetime field index.

The addition of a datetime field index in the test setup is well-implemented, providing good coverage of different date formats and multiple values per point.

lib/api/src/rest/schema.rs (3)

643-645: Support for new expression types looks good

The addition of the GeoDistance, Datetime, and DatetimeKey variants to the Expression enum provides the necessary infrastructure for handling datetime expressions as specified in the PR objectives.


661-672: Well-structured GeoDistance implementation

The new GeoDistance struct and its parameters follow the established pattern of the codebase. The struct contains clear field definitions with descriptive comments.


674-682: Clean implementation of datetime expressions

The DatetimeExpression and DatetimeKeyExpression structs provide a clean API for datetime handling, aligning perfectly with the PR objective to support parsing strings into floating-point numbers representing microseconds.

lib/collection/src/operations/universal_query/formula.rs (7)

8-9: Type change aligns with precision requirements

The import of PreciseScore aligns with the PR objective of changing expression evaluation from f32 to f64 for improved precision when handling timestamp microseconds.


27-32: Good implementation of new expression variants

The addition of GeoDistance, Datetime, and DatetimeKey variants to the ExpressionInternal enum provides the internal representation needed for datetime expressions.


66-66: Precision enhancement for constants

Converting constants to PreciseScore ensures that all numerical operations maintain higher precision throughout the expression evaluation pipeline.


79-82: Clean GeoDistance implementation

The implementation correctly adds the to field to payload_vars and uses the appropriate constructor method for the GeoDistance expression.


83-96: Thorough datetime expression handling

The parsing and conversion of datetime expressions is handled well with appropriate error handling for both constant and payload variable cases.

Two points worth noting:

  1. For constants, the code correctly parses the datetime string with proper error handling
  2. For payload variables, it correctly adds the path to payload_vars collection

119-119: Consistent precision enhancement

Converting the by_zero_default value to PreciseScore maintains precision consistency throughout all formula operations.


214-222: Robust handling of datetime expression variants

The implementation of From<rest::Expression> for the new datetime variants is clean and follows the established pattern for other expression types.

lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (8)

10-12: Import changes align with precision requirements

The updated imports, including DatetimeExpression and PreciseScore, properly support the transition to higher precision scoring.


21-22: Consistent constant type changes

The constants are correctly updated to use PreciseScore type instead of literals, maintaining type consistency throughout the codebase.


54-58: Good addition of FriendlyName implementation

The implementation of FriendlyName for DateTimePayloadType helps with generating user-friendly error messages, consistent with other types.


99-109: Proper score conversion with finite check

The implementation correctly converts the PreciseScore result to f32 before returning, with appropriate validation to ensure the value is finite. This aligns with the PR objectives to maintain precision during computation but return a compatible score type.


153-176: Well-implemented datetime expression handling

The implementation of datetime expression evaluation is thorough and handles both constant and payload variable cases appropriately. The conversion from timestamp to seconds is implemented correctly, and the code includes helpful comments explaining the precision considerations.


351-351: Consistent decay function type updates

The decay functions are correctly updated to use PreciseScore throughout, maintaining type consistency for all mathematical operations.

Also applies to: 356-356, 361-361


528-545: Comprehensive datetime test cases

The test cases for datetime expressions are thorough and test various scenarios including constant expressions, missing variables, and variables with defaults. This ensures the code behaves as expected in different situations.


561-564: Good test default value for datetime

The default value for testing datetime expressions is well-chosen and properly formatted, which helps ensure robust test coverage.

@coszio coszio force-pushed the score-boosting-with-datetime branch from eeac2b1 to f957d6b Compare April 1, 2025 18:08
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
docs/redoc/master/openapi.json (1)

14523-14571: Definition of New Schema Components
The new schema components for "GeoDistance", "GeoDistanceParams", "DatetimeExpression", and "DatetimeKeyExpression" are well-structured. Notable points:

  • GeoDistance: It requires a geo_distance property that refers to "GeoDistanceParams".
  • GeoDistanceParams: Both origin (referencing GeoPoint) and to (with an appropriate description) are defined as required.
  • DatetimeExpression & DatetimeKeyExpression: These components simply require a string-type property.
    Consider adding additional descriptions or examples for the datetime properties to clarify the expected format (e.g., ISO 8601) given that these values will be parsed into precise floating-point numbers later in the evaluation process.
lib/api/src/grpc/conversions.rs (1)

8-8: Add documentation for the newly imported ScoreType.

Consider adding a comment to explain why ScoreType is being imported and what its purpose is, especially since it's being used for a type conversion in the expression unparsing.

lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (2)

170-171: Consider making the decay function defaults constants using PreciseScore.

Instead of converting the existing f32 constants to PreciseScore at runtime, consider defining new constants with the PreciseScore type for better type consistency.

-const DEFAULT_DECAY_MIDPOINT: f32 = 0.5;
-const DEFAULT_DECAY_SCALE: f32 = 1.0;
+const DEFAULT_DECAY_MIDPOINT: f32 = 0.5;
+const DEFAULT_DECAY_SCALE: f32 = 1.0;
+const DEFAULT_DECAY_MIDPOINT_PRECISE: PreciseScore = 0.5;
+const DEFAULT_DECAY_SCALE_PRECISE: PreciseScore = 1.0;

And then in the function:

-let midpoint = PreciseScore::from(midpoint.unwrap_or(DEFAULT_DECAY_MIDPOINT));
-let scale = PreciseScore::from(scale.unwrap_or(DEFAULT_DECAY_SCALE));
+let midpoint = midpoint.map(PreciseScore::from).unwrap_or(DEFAULT_DECAY_MIDPOINT_PRECISE);
+let scale = scale.map(PreciseScore::from).unwrap_or(DEFAULT_DECAY_SCALE_PRECISE);

199-243: Good implementation but consider explaining the type conversions.

The cast from PreciseScore (f64) to ScoreType in the decay function is important for maintaining precision but could benefit from a comment explaining why this conversion is necessary.

Consider adding a comment like:

// Convert from PreciseScore (f64) to ScoreType to maintain compatibility with the existing APIs
// while still leveraging the higher precision during calculations
lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (1)

423-441: Add clarifying comment about datetime normalization.

The test shows that "2023-01-02" is normalized to "2023-01-02T00:00:00Z" - this behavior should be documented to clarify that this is intentional.

Add a comment near line 435:

// Note: Date strings without time components (e.g., "2023-01-02") are intentionally 
// normalized to midnight UTC (e.g., "2023-01-02T00:00:00Z") during conversion to DateTimePayloadType
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (1)

153-176: Well-implemented datetime expression handling

The implementation correctly handles both constant and payload variable datetime expressions, with proper error handling and conversion to float seconds.

Consider extracting the magic number 1_000_000.0 to a named constant for better readability.

+ // Conversion factor from microseconds to seconds
+ const MICROS_TO_SECONDS: PreciseScore = 1_000_000.0;
+
  let float_micros = datetime.timestamp() as PreciseScore;

  // Convert to seconds
- let float_seconds = float_micros / 1_000_000.0;
+ let float_seconds = float_micros / MICROS_TO_SECONDS;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eeac2b1 and f957d6b.

📒 Files selected for processing (13)
  • docs/grpc/docs.md (1 hunks)
  • docs/redoc/master/openapi.json (2 hunks)
  • lib/api/src/grpc/conversions.rs (4 hunks)
  • lib/api/src/grpc/proto/points.proto (1 hunks)
  • lib/api/src/grpc/qdrant.rs (2 hunks)
  • lib/api/src/rest/schema.rs (2 hunks)
  • lib/collection/src/operations/universal_query/formula.rs (6 hunks)
  • lib/collection/src/operations/universal_query/shard_query.rs (1 hunks)
  • lib/common/common/src/math.rs (1 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (13 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (11 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (7 hunks)
  • lib/segment/src/types.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • lib/collection/src/operations/universal_query/shard_query.rs
  • docs/grpc/docs.md
  • lib/common/common/src/math.rs
  • lib/segment/src/types.rs
  • lib/api/src/rest/schema.rs
  • lib/api/src/grpc/proto/points.proto
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (4)
lib/segment/src/index/query_optimization/optimized_filter.rs (1)
  • check_condition (34-39)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (3)
  • score (256-257)
  • new_payload_id (149-151)
  • new_geo_distance (126-128)
lib/segment/src/types.rs (5)
  • new (376-389)
  • new (1463-1466)
  • new (2601-2603)
  • new (2825-2827)
  • new (2838-2840)
lib/api/src/rest/schema.rs (1)
  • new (1033-1039)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test (windows-latest)
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (22)
lib/api/src/grpc/qdrant.rs (2)

5182-5188: LGTM! Tags updated correctly for new datetime variants.

The tag list in the oneof definition has been properly extended to include tags 1-19, which correctly encompasses all variant tags including the newly added datetime expressions.


5203-5252: LGTM! New datetime variants correctly added.

The new datetime expression variants are properly implemented:

  • Datetime for datetime constants (tag 5)
  • DatetimeKey for payload keys with datetime values (tag 6)

All subsequent tags have been correctly incremented to maintain proper sequencing. The additions are well-documented with clear comments.

docs/redoc/master/openapi.json (1)

14473-14481:

Details

✅ Verification successful

New Schema Reference Additions for Expression Types
The addition of new references for "GeoDistance", "DatetimeExpression", and "DatetimeKeyExpression" correctly extends the available expression schemas. Please ensure that corresponding schema definitions exist under #/components/schemas/ and that these new types are documented in the API reference.


Below is a shell script to verify that the new schema definitions actually exist in the JSON file as referenced:

Once you run this script, review the output to confirm that the definitions for "GeoDistance", "DatetimeExpression", and "DatetimeKeyExpression" are present under the #/components/schemas/ section. If they are indeed present and correctly formed, you may update the review comment as follows:


Schema References and Definitions Verified
The new references for "GeoDistance", "DatetimeExpression", and "DatetimeKeyExpression" in docs/redoc/master/openapi.json correctly point to existing schema definitions under #/components/schemas/. Please ensure that these schema definitions are also comprehensively documented in the API reference for clarity.


🏁 Script executed:

#!/bin/bash
echo "Checking for GeoDistance definition..."
rg '"GeoDistance":' docs/redoc/master/openapi.json
echo "Checking for DatetimeExpression definition..."
rg '"DatetimeExpression":' docs/redoc/master/openapi.json
echo "Checking for DatetimeKeyExpression definition..."
rg '"DatetimeKeyExpression":' docs/redoc/master/openapi.json

Length of output: 678


Schema References and Definitions Verified
The JSON file docs/redoc/master/openapi.json now clearly includes definitions for "GeoDistance", "DatetimeExpression", and "DatetimeKeyExpression" as referenced in the snippet. The new references correctly extend the expression schemas, and the definitions under #/components/schemas/ are present. Please also ensure that these types are thoroughly documented within the API reference documentation.

lib/api/src/grpc/conversions.rs (3)

800-800: Good conversion to ScoreType for consistent type handling.

Converting the constant value to ScoreType ensures consistent handling of numeric types throughout the expression system.


808-819: Well-implemented support for datetime expressions.

The new GeoDistance and Datetime expression variants properly handle the different cases (constant or payload variable), enabling the functionality described in the PR objectives for datetime expressions.


842-842:

Details

❓ Verification inconclusive

Ensure consistent type conversion.

While casting by_zero_default to f32 works, it's worth verifying that this doesn't lead to precision loss since other parts of the code are now using PreciseScore (f64). Consider whether this should also use ScoreType for consistency.


🏁 Script executed:

#!/bin/bash
# Check if by_zero_default is consistently typed across the codebase

# Look for by_zero_default usage in the codebase
rg "by_zero_default" --type rust

Length of output: 1626


Consistent Type Conversion Recommendation for by_zero_default

  • In lib/api/src/grpc/conversions.rs, the conversion casts by_zero_default to an f32 (i.e., by_zero_default.map(|v| v as f32)).
  • Other parts of the code use ScoreType (as seen in lib/api/src/rest/schema.rs and lib/collection/src/operations/universal_query/formula.rs) or explicitly work with PreciseScore (which is of type f64).
  • This inconsistency could lead to precision loss. Please verify whether the GRPC conversion should leverage the ScoreType alias—or a similar consistent approach—to match the double-precision expectations elsewhere.
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (4)

16-16: Good type abstraction for improved precision.

Defining PreciseScore as f64 is a good practice as it provides an abstraction layer that makes future type changes easier. This shift from f32 to f64 is critical for maintaining precision when handling timestamp microseconds as mentioned in the PR description.


36-42: Well-structured addition of new expression variants.

The changes to the ParsedExpression enum properly structure the GeoDistance variant and add the new Datetime variant with clear field types.


103-107: Good design for datetime expression handling.

The DatetimeExpression enum correctly handles both constant datetime values and payload variables, making it flexible for different use cases.


347-367: Well-updated test ranges for PreciseScore.

The proptest ranges have been properly updated to use f64 instead of f32, ensuring that the tests validate the full range of the new PreciseScore type.

lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (2)

264-271: Good test coverage for array values.

Adding a test case for an array with a single value is valuable as it tests an edge case that might behave differently from scalar values or multi-element arrays.


364-381: Good test setup for datetime index.

The test properly sets up a datetime index with various datetime formats, including both ISO8601 format and date-only format.

lib/collection/src/operations/universal_query/formula.rs (5)

27-32: LGTM! The new datetime expression variants extend capabilities as needed.

The new enum variants Datetime, DatetimeKey, and the reintroduced GeoDistance structure provide the necessary foundation for the datetime expression functionality described in the PR objectives.


66-66: Correctly updated to use PreciseScore

The change from direct float usage to PreciseScore::from(c) aligns with the PR's goal of improving numeric precision for timestamp handling.


79-96: LGTM! Properly implemented parsing for new expressions

The implementation for handling GeoDistance, Datetime, and DatetimeKey expressions looks good. The code correctly:

  1. Inserts the required payload variables into the payload_vars set
  2. Handles appropriate error conversions for datetime parsing
  3. Creates the correct ParsedExpression variants

119-119: Correctly updated by_zero_default to use PreciseScore

This change ensures consistency with the overall transition from f32 to f64 for improved precision.


214-222: LGTM! Properly implemented From conversion for new expression types

The implementation of the From<rest::Expression> trait for the new expression types follows the established pattern in the codebase and correctly handles the extraction of fields from the REST types.

lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (5)

10-11: LGTM! Updated imports and constants to use PreciseScore

The transition from ScoreType to PreciseScore for constants and imports is consistent with the PR objective of improving numeric precision for timestamp handling.

Also applies to: 21-22


54-58: Added FriendlyName implementation for DateTimePayloadType

Good addition that provides proper type name information for error messages when handling datetime values.


99-110: Smart conversion from PreciseScore to ScoreType at the end of processing

This implementation correctly follows the PR objective: using higher precision (f64) during calculations and only converting back to f32 at the very end. The error handling for non-finite numbers is also well implemented.


351-364: Successfully updated decay functions to use PreciseScore

All decay functions have been updated to use PreciseScore consistently, which will ensure proper precision throughout the scoring process.


528-545: Comprehensive test coverage for datetime expressions

The added test cases cover all important scenarios for datetime expressions:

  1. Constant datetime values
  2. Missing datetime variables
  3. Default values for datetime variables

This ensures robustness of the implementation.

@coszio coszio merged commit ec544ff into dev Apr 2, 2025
17 checks passed
@coszio coszio deleted the score-boosting-with-datetime branch April 2, 2025 15:40
pull Bot pushed a commit to kp-forks/qdrant that referenced this pull request Apr 21, 2025
* use f64 as internal expression score

* allow datetime as expression

* add test cases

* fix schema, generate openapi and grpc docs

* fmt

* clippy

* fix test precision

* graceful handling of too large numbers, more tests

* rename to DatetimeExpression for consistency

* use datetime's timestamp as seconds

* capture variable name when parsing

* homogenize DateTime into Datetime

* use interface distinction between datetime strings and payload keys

* fix rebase

* fix after rebase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants