Skip to content

new: update models for v1.15, add conversions#1038

Merged
generall merged 2 commits into
devfrom
update-models-1.15
Jul 15, 2025
Merged

new: update models for v1.15, add conversions#1038
generall merged 2 commits into
devfrom
update-models-1.15

Conversation

@joein

@joein joein commented Jul 8, 2025

Copy link
Copy Markdown
Member

No description provided.

@netlify

netlify Bot commented Jul 8, 2025

Copy link
Copy Markdown

Deploy Preview for poetic-froyo-8baba7 ready!

Name Link
🔨 Latest commit 64d9b3c
🔍 Latest deploy log https://app.netlify.com/projects/poetic-froyo-8baba7/deploys/68763bf0af77a70008de500f
😎 Deploy Preview https://deploy-preview-1038--poetic-froyo-8baba7.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai

coderabbitai Bot commented Jul 8, 2025

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This change set introduces significant enhancements to both the gRPC and REST models, schemas, and conversion logic for Qdrant's client. Notable updates include the addition of phrase matching support in filters, expanded configuration for binary quantization (with new encoding and query encoding enums), and richer text indexing options such as stopwords and stemming algorithms. The protobuf schemas and generated Python files are updated to reflect these new fields and message types. Corresponding REST models, enums, and schema definitions are extended to match. The conversion logic between gRPC and REST is augmented to handle these new features bidirectionally, including detailed error handling for unsupported cases. Test fixtures and validation tests are added to ensure correct handling of the new stopwords representations and other extended features.

Suggested reviewers

  • joein
  • timvisee

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c65764 and 64d9b3c.

📒 Files selected for processing (10)
  • qdrant_client/conversions/conversion.py (7 hunks)
  • qdrant_client/embed/_inspection_cache.py (8 hunks)
  • qdrant_client/grpc/collections_pb2.py (10 hunks)
  • qdrant_client/grpc/collections_pb2.pyi (8 hunks)
  • qdrant_client/grpc/points_pb2.pyi (22 hunks)
  • qdrant_client/http/models/models.py (23 hunks)
  • qdrant_client/proto/collections.proto (18 hunks)
  • qdrant_client/proto/points.proto (9 hunks)
  • tests/conversions/fixtures.py (9 hunks)
  • tests/conversions/test_validate_conversions.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • tests/conversions/test_validate_conversions.py
  • qdrant_client/proto/points.proto
  • qdrant_client/proto/collections.proto
  • tests/conversions/fixtures.py
  • qdrant_client/http/models/models.py
  • qdrant_client/grpc/points_pb2.pyi
  • qdrant_client/conversions/conversion.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
qdrant_client/grpc/collections_pb2.py (1)
qdrant_client/http/models/models.py (5)
  • BinaryQuantizationEncoding (80-83)
  • BinaryQuantizationQueryEncoding (86-90)
  • StopwordsSet (2947-2949)
  • TextIndexParams (3066-3082)
  • SnowballParams (2792-2794)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Redirect rules - poetic-froyo-8baba7
  • GitHub Check: Header rules - poetic-froyo-8baba7
  • GitHub Check: Pages changed - poetic-froyo-8baba7
  • GitHub Check: Python 3.12.x on ubuntu-latest test
  • GitHub Check: Python 3.13.x on ubuntu-latest test
  • GitHub Check: Python 3.11.x on ubuntu-latest test
  • GitHub Check: Python 3.9.x on ubuntu-latest test
  • GitHub Check: Python 3.10.x on ubuntu-latest test
🔇 Additional comments (18)
qdrant_client/grpc/collections_pb2.py (1)

1-1075: Collections.proto definitions verified

The qdrant_client/proto/collections.proto file contains the new enum and message definitions as expected:

  • enum BinaryQuantizationEncoding
  • message BinaryQuantizationQueryEncoding
  • message StopwordsSet
  • message StemmingAlgorithm
  • message SnowballParams

The auto-generated collections_pb2.py fully reflects these additions. No further changes required—approving the code.

qdrant_client/grpc/collections_pb2.pyi (7)

183-199: LGTM! Well-structured enum definition.

The new BinaryQuantizationEncoding enum is properly defined with clear value names (OneBit, TwoBits, OneAndHalfBits) that represent different encoding options for binary quantization. The type annotations and class structure follow standard protobuf conventions.


1061-1093: LGTM! Properly structured message class.

The BinaryQuantizationQueryEncoding message class with its nested enum is well-defined. The enum values (Default, Binary, Scalar4Bits, Scalar8Bits) provide clear options for query encoding, and the class structure follows protobuf patterns correctly.


1098-1124: LGTM! Consistent field extensions.

The BinaryQuantization message has been properly extended with new fields for encoding and query_encoding. The method signatures for HasField, ClearField, and WhichOneof are correctly updated to include the new fields, maintaining consistency with the existing pattern.


1783-1789: LGTM! Improved documentation clarity.

The updated docstrings for IntegerIndexParams fields now explicitly mention default values (e.g., "Default is true", "Default is false"), which improves API usability by making the default behavior clear to users.


1852-1871: LGTM! Well-designed stopwords configuration.

The StopwordsSet message provides a flexible approach to stopwords configuration with both languages (for predefined language stopwords) and custom (for user-defined stopwords) fields. The design allows for combining both approaches, which is practical for text processing.


1881-1929: LGTM! Comprehensive text indexing enhancements.

The TextIndexParams message has been properly extended with new fields for advanced text processing:

  • stopwords: Links to the new StopwordsSet for stopword handling
  • phrase_matching: Boolean flag for phrase matching support
  • stemmer: Links to StemmingAlgorithm for stemming configuration

The method signatures are consistently updated to handle the new optional fields.


1933-1965: LGTM! Clean stemming algorithm implementation.

The StemmingAlgorithm and SnowballParams messages provide a clean, extensible design for stemming configuration. The use of a oneof field in StemmingAlgorithm allows for future addition of other stemming algorithms while currently supporting the Snowball algorithm with language specification.

qdrant_client/embed/_inspection_cache.py (10)

80-80: LGTM: MatchPhrase cache entry added correctly.

The new entry for "MatchPhrase" in the CACHE_STR_PATH mapping is properly added with an empty list, consistent with other similar entries.


219-219: LGTM: New text processing cache entries added.

The cache entries for "SnowballParams" and "StopwordsSet" are correctly added to support the new text processing features.

Also applies to: 226-226


1027-1040: LGTM: MatchPhrase schema definition is well-structured.

The new MatchPhrase schema properly defines:

  • A required "phrase" field of type string
  • Clear description for full-text phrase matching
  • Consistent structure with other match schemas

1534-1544: LGTM: Binary quantization enhancements properly implemented.

The new optional fields for binary quantization configuration are well-defined:

  • encoding: References the new BinaryQuantizationEncoding enum
  • query_encoding: References BinaryQuantizationQueryEncoding enum with descriptive comment about asymmetric quantization
  • Both fields are properly nullable with appropriate defaults

1549-1558: LGTM: Binary quantization enums are well-defined.

The new enum schemas provide comprehensive options:

  • BinaryQuantizationEncoding: Supports various bit encodings (one_bit, two_bits, one_and_half_bits)
  • BinaryQuantizationQueryEncoding: Covers different query encoding strategies (default, binary, scalar variants)

2277-2296: LGTM: IntegerIndexParams descriptions improved with explicit defaults.

The updated descriptions now clearly specify default values for boolean flags:

  • lookup: "Default is true."
  • range: "Default is true."
  • is_principal: "Default is false."
  • on_disk: "Default is false."

This improves API documentation clarity.


2326-2361: LGTM: Comprehensive language enum for text processing.

The Language enum provides extensive language support covering major world languages including:

  • European languages (English, French, German, Spanish, etc.)
  • Asian languages (Chinese, Japanese, Arabic, etc.)
  • Regional variants (Hinglish, Tajik, etc.)

The enum is well-structured and comprehensive.


2368-2404: LGTM: Snowball stemming schema properly implemented.

The new schemas are well-designed:

  • Snowball: Simple enum for stemmer type
  • SnowballLanguage: Comprehensive list of supported stemmer languages
  • SnowballParams: Proper object schema requiring both type and language

The structure follows established patterns and provides clear language support.


2405-2426: LGTM: StopwordsSet schema provides flexible configuration.

The schema allows for both predefined and custom stopwords:

  • languages: Array of predefined Language enum values
  • custom: Array of custom stopword strings
  • Both fields are optional, providing flexibility

This design supports various text processing use cases.


2454-2480: LGTM: TextIndexParams enhanced with comprehensive text processing features.

The new optional fields significantly expand text processing capabilities:

  • phrase_matching: Boolean flag for phrase matching support
  • stopwords: Flexible stopwords configuration (Language enum or StopwordsSet)
  • stemmer: Snowball stemming algorithm configuration

All fields are properly nullable with clear descriptions. The integration maintains backward compatibility while adding powerful new features.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
tests/conversions/test_validate_conversions.py (1)

275-275: Replace deprecated datetime.utcnow() with timezone-aware alternative.

The datetime.utcnow() method is deprecated. Replace it with datetime.now(timezone.utc) for a timezone-aware equivalent.

Apply this diff to fix the deprecation warning:

-        datetime.utcnow(),
+        datetime.now(timezone.utc),
🧹 Nitpick comments (4)
qdrant_client/embed/_inspection_cache.py (1)

2295-2295: Remove redundant default value in description.

The description contains duplicate default value information.

-                "description": "If true, store the index on disk. Default: false. Default is false.",
+                "description": "If true, store the index on disk. Default is false.",
qdrant_client/http/models/models.py (3)

80-84: Consider adding documentation for the unconventional encoding value.

The ONE_AND_HALF_BITS encoding value is non-standard and could be confusing. Consider adding a docstring to explain what this encoding represents and how 1.5 bits are achieved in practice.

 class BinaryQuantizationEncoding(str, Enum):
+    """
+    Binary quantization encoding types.
+    
+    - ONE_BIT: Standard 1-bit quantization
+    - TWO_BITS: Standard 2-bit quantization  
+    - ONE_AND_HALF_BITS: [Add explanation of how 1.5 bits encoding works]
+    """
     ONE_BIT = "one_bit"
     TWO_BITS = "two_bits"
     ONE_AND_HALF_BITS = "one_and_half_bits"

1329-1331: Fix redundant default value in description.

The on_disk field description contains duplicate default value information.

     on_disk: Optional[bool] = Field(
-        default=None, description="If true, store the index on disk. Default: false. Default is false."
+        default=None, description="If true, store the index on disk. Default is false."
     )

3585-3587: Consider future extensibility of StemmingAlgorithm union.

The StemmingAlgorithm union currently has only SnowballParameters as a member. This design suggests future stemming algorithms may be added.

If no other stemming algorithms are planned, consider using SnowballParameters directly instead of a union type to simplify the API.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf515ae and 3b1c967.

📒 Files selected for processing (10)
  • qdrant_client/conversions/conversion.py (7 hunks)
  • qdrant_client/embed/_inspection_cache.py (8 hunks)
  • qdrant_client/grpc/collections_pb2.py (10 hunks)
  • qdrant_client/grpc/collections_pb2.pyi (8 hunks)
  • qdrant_client/grpc/points_pb2.pyi (22 hunks)
  • qdrant_client/http/models/models.py (23 hunks)
  • qdrant_client/proto/collections.proto (18 hunks)
  • qdrant_client/proto/points.proto (9 hunks)
  • tests/conversions/fixtures.py (9 hunks)
  • tests/conversions/test_validate_conversions.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
tests/conversions/fixtures.py (1)
qdrant_client/http/models/models.py (6)
  • FieldCondition (791-812)
  • BinaryQuantization (67-68)
  • BinaryQuantizationEncoding (80-83)
  • BinaryQuantizationQueryEncoding (86-90)
  • TextIndexParams (3066-3082)
  • SnowballParameters (2792-2794)
qdrant_client/grpc/collections_pb2.py (1)
qdrant_client/http/models/models.py (5)
  • BinaryQuantizationEncoding (80-83)
  • BinaryQuantizationQueryEncoding (86-90)
  • StopwordsSet (2947-2949)
  • TextIndexParams (3066-3082)
  • SnowballParameters (2792-2794)
qdrant_client/conversions/conversion.py (1)
qdrant_client/http/models/models.py (8)
  • MatchPhrase (1482-1487)
  • StopwordsSet (2947-2949)
  • Language (1366-1396)
  • SnowballParameters (2792-2794)
  • Snowball (2759-2760)
  • SnowballLanguage (2763-2789)
  • BinaryQuantizationEncoding (80-83)
  • BinaryQuantizationQueryEncoding (86-90)
🪛 GitHub Actions: Integration tests
tests/conversions/test_validate_conversions.py

[warning] 275-275: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal. Use timezone-aware objects instead.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Redirect rules - poetic-froyo-8baba7
  • GitHub Check: Header rules - poetic-froyo-8baba7
  • GitHub Check: Pages changed - poetic-froyo-8baba7
🔇 Additional comments (54)
tests/conversions/test_validate_conversions.py (1)

442-506: Well-structured test coverage for stopwords conversion.

The test function comprehensively covers various stopwords configurations including language enums, custom stopwords, and mixed combinations. The conversion verification pattern is consistent with other tests in the file.

qdrant_client/proto/points.proto (2)

1084-1084: Good addition of phrase matching support.

The new phrase field in the Match message follows the existing pattern and enables phrase text matching in filters, which is a valuable feature enhancement.


1171-1191: Excellent telemetry architecture improvement.

The consolidation of telemetry data into a composite Usage message that includes both HardwareUsage and InferenceUsage provides a more extensible and organized structure for tracking various metrics. The addition of inference usage tracking with model-specific token counts is particularly valuable for monitoring AI workloads.

Also applies to: 833-833, 890-890, 896-896, 902-902, 908-908, 918-918, 924-924, 930-930, 937-937, 956-956, 962-962, 968-968, 974-974, 980-980, 986-986, 992-992, 1003-1003, 1009-1009

qdrant_client/grpc/collections_pb2.py (1)

18-18: Generated protobuf code correctly reflects schema updates.

The generated Python code properly incorporates all the new protobuf definitions including:

  • Binary quantization encoding enums and query encoding settings
  • Text indexing enhancements with stopwords, stemming, and phrase matching support
  • All message types are correctly registered and serialized offsets are properly updated

As this is generated code, no manual modifications should be made to this file.

Also applies to: 36-37, 81-83, 136-136, 159-163, 200-200, 400-406, 563-590, 860-887

tests/conversions/fixtures.py (6)

37-37: LGTM: New phrase matching fixture added.

The new match_phrase fixture correctly demonstrates the phrase matching capability added to the Match message.


50-50: LGTM: Phrase matching field condition fixture.

The field condition correctly uses the phrase match fixture, maintaining consistency with other match condition patterns.


124-144: LGTM: Phrase condition integration in filter.

The phrase condition is properly integrated into the filter structure, following the established pattern for other condition types.


225-251: LGTM: Comprehensive binary quantization encoding fixtures.

The fixtures cover all encoding and query encoding combinations effectively:

  • binary_quantization_w_encodings_0: OneBit with Default query encoding
  • binary_quantization_w_encodings_1: TwoBits with Binary query encoding
  • binary_quantization_w_encodings_2: OneAndHalfBits with Scalar4Bits query encoding
  • binary_quantization_w_encodings_3: Only query encoding with Scalar8Bits

This provides thorough test coverage for the new binary quantization features.


530-536: LGTM: Enhanced text index parameters fixtures.

The new fixtures demonstrate the expanded text indexing capabilities:

  • text_index_params_5: Includes phrase matching, on-disk storage, and Snowball stemmer
  • text_index_params_6: Simple boolean flags for phrase matching and on-disk storage

These fixtures provide good coverage of the new text indexing features.


634-637: LGTM: Fixture list updates are comprehensive.

All new fixtures are properly added to the respective fixture collections:

  • Text index params include the new variants
  • Quantization config fixtures include all binary encoding variations

This ensures the new features will be tested through the conversion validation system.

Also applies to: 651-658, 789-800

qdrant_client/proto/collections.proto (6)

297-301: LGTM: Well-defined binary quantization encoding enum.

The BinaryQuantizationEncoding enum properly defines the three encoding options with appropriate 0-based indexing:

  • OneBit = 0
  • TwoBits = 1
  • OneAndHalfBits = 2

This follows protobuf best practices for enum definitions.


318-323: LGTM: Proper extension of BinaryQuantization message.

The new fields are correctly added with proper numbering and optional modifiers:

  • encoding = 2 (optional BinaryQuantizationEncoding)
  • query_encoding = 3 (optional BinaryQuantizationQueryEncoding)

The field numbering is sequential and the optional modifiers are appropriate for these new features.


488-491: LGTM: Well-structured StopwordsSet message.

The message provides flexible stopwords configuration with:

  • languages = 1 for predefined language stopwords
  • custom = 2 for custom stopword lists

Both fields use repeated string which is appropriate for list-based configuration.


499-501: LGTM: Logical extension of TextIndexParams.

The new fields enhance text indexing capabilities appropriately:

  • stopwords = 6 (optional StopwordsSet)
  • phrase_matching = 7 (optional bool)
  • stemmer = 8 (optional StemmingAlgorithm)

Field numbering continues sequentially from existing fields, and all are properly marked as optional.


504-512: LGTM: Well-designed stemming algorithm hierarchy.

The stemming implementation uses proper protobuf patterns:

  • StemmingAlgorithm with oneof for extensibility
  • SnowballParameters with language string field
  • Field numbering is correct (snowball = 1, language = 1)

This design allows for future stemming algorithms to be added easily.


303-314: Confirm unusual field numbering in BinaryQuantizationQueryEncoding

– We searched all .proto files and this is the only occurrence of setting = 4.
– Within message BinaryQuantizationQueryEncoding there are no reserved field numbers 1–3.

Please verify whether skipping 1–3 is intentional (e.g. reserving those IDs for future oneof variants).
• If it is, consider adding a comment or a reserved 1, 2, 3; block to document the rationale.
• Otherwise, renumber Setting setting = 4; to 1 for consistency.

qdrant_client/embed/_inspection_cache.py (7)

80-80: MatchPhrase schema integration looks good.

The new phrase matching feature is properly integrated with consistent schema definition and appropriate field condition reference.

Also applies to: 585-585, 1027-1040


1535-1544: Binary quantization configuration enhancements are well-implemented.

The new encoding options provide good flexibility for binary quantization with clear enum values and helpful descriptions.

Also applies to: 1549-1558


2326-2361: Language enum is comprehensive and well-formatted.

Good coverage of languages with consistent lowercase formatting.


2368-2404: Snowball stemmer configuration is properly structured.

The schema correctly defines parameters for the Snowball stemming algorithm with appropriate language options.


2405-2426: StopwordsSet schema provides good flexibility.

The schema appropriately supports both predefined language stopwords and custom stopword lists.


2454-2480: TextIndexParams enhancements are well-designed.

The new fields for phrase matching, stopwords, and stemming provide comprehensive text indexing configuration options with clear descriptions and appropriate defaults.


219-219: New schema entries follow consistent patterns.

The SnowballParameters and StopwordsSet entries are properly added to CACHE_STR_PATH.

Also applies to: 226-226

qdrant_client/grpc/points_pb2.pyi (5)

3627-3638: LGTM! Usage field type updated consistently.

The change from HardwareUsage to the new composite Usage type is appropriate and aligns with the broader enhancement to support both hardware and inference usage metrics.


3828-3838: Consistent usage field updates across all response messages.

All response message types have been uniformly updated to use the new composite Usage type instead of HardwareUsage. This maintains API consistency across the codebase.

Also applies to: 3853-3863, 3878-3888, 3903-3913, 3943-3953, 3968-3978, 3993-4003, 4022-4036, 4122-4132, 4147-4157, 4172-4182, 4197-4207, 4222-4232, 4247-4257, 4316-4326, 4341-4351


4266-4282: UpdateBatchResponse now includes usage metrics.

Good addition - this brings UpdateBatchResponse in line with other response types by including usage telemetry. The optional field pattern is correctly implemented.


4603-4641: Phrase matching support added to filters.

The new phrase field is properly integrated into the Match message's match_value oneof group, enabling phrase-based text matching in filter conditions.


4965-5044: Well-structured usage telemetry types added.

The new Usage, InferenceUsage, and ModelUsage message types provide a clean hierarchy for tracking both hardware and model inference metrics. The optional field patterns are correctly implemented.

qdrant_client/grpc/collections_pb2.pyi (7)

183-198: LGTM! Binary quantization encoding enum is well-defined.

The new BinaryQuantizationEncoding enum with values OneBit, TwoBits, and OneAndHalfBits provides clear options for binary quantization encoding methods.


1783-1789: Good documentation improvements for default values.

Adding default value information to the docstrings for lookup, range, is_principal, and on_disk fields improves API clarity.


1061-1092: Well-structured query encoding configuration.

The BinaryQuantizationQueryEncoding message with its nested Setting enum provides flexible options for asymmetric quantization, allowing queries to use different encoding than stored vectors.


1098-1124: Proper extension of BinaryQuantization with encoding options.

The addition of encoding and query_encoding fields enables comprehensive binary quantization configuration. The documentation clearly explains the asymmetric quantization feature.


1852-1871: Flexible stopwords configuration design.

The StopwordsSet message appropriately uses repeated string fields to support both language-based stopwords (languages) and custom stopword lists (custom).


1881-1929: Comprehensive text indexing enhancements.

The additions of stopwords, phrase_matching, and stemmer fields properly extend TextIndexParams with essential NLP features for improved text search capabilities.


1933-1964: Extensible stemming algorithm configuration.

The StemmingAlgorithm message uses the oneof pattern effectively, allowing for future stemming algorithm additions. The SnowballParameters properly captures the language requirement for Snowball stemming.

qdrant_client/http/models/models.py (10)

45-46: LGTM! Telemetry fields added correctly.

The new optional fields runtime_features and hnsw_global_config enhance telemetry capabilities by tracking feature flags and HNSW configuration.


73-77: Well-designed binary quantization configuration fields.

The optional encoding and query_encoding fields provide flexible quantization options. The description for query_encoding effectively explains the accuracy vs. performance trade-off.


761-789: Well-structured feature flags model.

The FeatureFlags model provides clear control over various runtime features with sensible defaults. The documentation for each flag explains its purpose and implementation timeline.


1054-1059: Clear and well-documented HNSW configuration.

The HnswGlobalConfig model provides a simple interface for controlling HNSW healing behavior with a sensible default threshold.


1366-1397: Comprehensive language support enum.

The Language enum provides extensive language coverage including regional variants like Hinglish. The lowercase naming convention is consistent throughout.


1482-1488: Clean implementation of phrase matching.

The MatchPhrase model provides a clear interface for full-text phrase matching, complementing the existing text matching capabilities.


2899-2900: Good improvement to storage type descriptions.

The updated descriptions are more implementation-agnostic, focusing on storage behavior rather than specific technologies. This improves API documentation stability.

Also applies to: 2910-2911


2947-2950: Flexible stopwords configuration model.

The StopwordsSet model elegantly combines predefined language-based stopwords with custom stopword lists, providing maximum flexibility for text indexing.


3072-3082: Excellent text indexing enhancements.

The new fields (phrase_matching, stopwords, stemmer) significantly expand text search capabilities while maintaining backward compatibility through optional fields with clear default behaviors.


3185-3196: Well-designed composite usage tracking model.

The Usage model properly separates hardware and inference usage tracking, providing more granular insights while maintaining backward compatibility through optional fields.

qdrant_client/conversions/conversion.py (9)

677-679: LGTM! Phrase match conversion follows the established pattern.

The implementation correctly handles the new "phrase" match type consistent with other match types in the method.


1579-1586: LGTM! Text index params conversion properly handles new fields.

The implementation correctly adds support for the new fields (phrase_matching, stopwords, on_disk, stemmer) with appropriate null checks using HasField.


1587-1595: Note the dual return type behavior for stopwords conversion.

The convert_stopwords method returns either a single Language enum (when there's exactly one language and no custom words) or a StopwordsSet object. This optimization is valid but ensure that consumers of the REST API can handle both return types for the stopwords field.


1762-1768: LGTM! Binary quantization config properly handles new encoding fields.

The implementation correctly adds support for encoding and query_encoding fields with appropriate null checks.


1770-1809: LGTM! Binary quantization encoding conversions are well-implemented.

The conversion methods correctly map between gRPC and REST enum values with proper error handling for invalid cases.


3015-3017: LGTM! Reverse conversion for MatchPhrase is consistent.

The implementation correctly converts REST MatchPhrase to gRPC Match following the established pattern.


3948-3954: LGTM! Reverse conversion for text index params handles all new fields.

The implementation correctly converts all new fields with appropriate null checks, maintaining bidirectional consistency.


3956-3977: LGTM! Reverse conversions maintain consistency with forward conversions.

The stopwords conversion correctly handles both Language and StopwordsSet types, maintaining compatibility with the forward conversion's dual return type behavior.


4097-4147: LGTM! Binary quantization reverse conversions complete the bidirectional support.

The implementation correctly maps REST enum values back to gRPC representations with proper null checks and error handling.

service_debug_feature: bool = Field(..., description="")
recovery_mode: bool = Field(..., description="")
gpu: bool = Field(..., description="")
rocksdb: bool = Field(..., description="")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify backward compatibility for the new required field.

Adding a required rocksdb field to AppFeaturesTelemetry could break existing clients that don't provide this field. Consider making it optional with a default value for backward compatibility.


🏁 Script executed:

#!/bin/bash
# Check if AppFeaturesTelemetry is used in the codebase and how it's instantiated
rg -A 5 "AppFeaturesTelemetry" --type py

Length of output: 1258


Fix required: Ensure backward compatibility for rocksdb in AppFeaturesTelemetry

Adding a required field will cause Pydantic to raise validation errors if the server omits rocksdb. To avoid breaking existing deployments, make the field optional or give it a default:

• File: qdrant_client/http/models/models.py (around line 58)
• Current declaration:

    rocksdb: bool = Field(..., description="")

• Option 1 – default to False:

-    rocksdb: bool = Field(..., description="")
+    rocksdb: bool = Field(default=False, description="Indicates whether RocksDB is enabled")

• Option 2 – allow missing values:

-    rocksdb: bool = Field(..., description="")
+    rocksdb: Optional[bool] = Field(default=None, description="Indicates whether RocksDB is enabled")

Confirm whether the server will always return this flag; if so, a default may suffice, otherwise make it optional.

🤖 Prompt for AI Agents
In qdrant_client/http/models/models.py at line 58, the rocksdb field in
AppFeaturesTelemetry is currently required, which can cause validation errors if
the server omits it. To fix this and maintain backward compatibility, make the
rocksdb field optional by either providing a default value like False or by
allowing it to be None. Choose the approach based on whether the server always
returns this flag; if always present, set a default False, otherwise make it
Optional[bool] with a default of None.

Comment on lines +1558 to +1560
if model == grpc.Unknown:
return None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify the handling of Unknown tokenizer type.

The method returns None for grpc.Unknown tokenizer type. Please ensure this is the expected behavior and won't cause issues in downstream code that consumes the converted tokenizer type.


🏁 Script executed:

#!/bin/bash
# Description: Check how the tokenizer type is used after conversion and if None is properly handled

# Search for usages of convert_tokenizer_type result
rg -A 5 "convert_tokenizer_type\(" --type py

# Search for TextIndexParams usage to understand how tokenizer field is handled
rg -A 10 "TextIndexParams.*tokenizer" --type py

Length of output: 31281


Duplicate convert_tokenizer_type definitions override intended mapping

The convert_tokenizer_type method is defined twice in qdrant_client/conversions/conversion.py—first for grpc→rest (lines ~1558–1562) and then again for rest→grpc (around line 1600). The second definition shadows the first at runtime, so the grpc→rest logic (including the Unknown → None case) never executes.

• File: qdrant_client/conversions/conversion.py
– grpc→rest definition at lines 1558–1562
– rest→grpc definition at lines 1600+

Please rename or split these methods (e.g. convert_tokenizer_type_grpc_to_rest and convert_tokenizer_type_rest_to_grpc) or otherwise refactor so both mappings are available and the grpc→rest behavior works as intended.

🤖 Prompt for AI Agents
In qdrant_client/conversions/conversion.py around lines 1558 to 1562 and again
near line 1600, there are two definitions of the convert_tokenizer_type method
causing the second to override the first, preventing grpc to rest conversion
logic from running. Rename the first method to
convert_tokenizer_type_grpc_to_rest and the second to
convert_tokenizer_type_rest_to_grpc, or otherwise refactor so both conversion
functions coexist without overriding each other, ensuring grpc→rest mapping
including the Unknown to None case works correctly.

@generall generall force-pushed the update-models-1.15 branch from 4c65764 to 64d9b3c Compare July 15, 2025 11:30
@generall generall merged commit 4da06b0 into dev Jul 15, 2025
14 checks passed
joein added a commit that referenced this pull request Jul 18, 2025
* new: update models for v1.15, add conversions

* rename SnowballParameters to SnowballParams

---------

Co-authored-by: Luis Cossío <[email protected]>
@coderabbitai coderabbitai Bot mentioned this pull request Nov 14, 2025
@coderabbitai coderabbitai Bot mentioned this pull request Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants