fix: fix edge case in upsert, fix multivec update, add tests by joein · Pull Request #1030 · qdrant/qdrant-client

joein · 2025-06-24T10:13:33Z

netlify · 2025-06-24T10:13:37Z

✅ Deploy Preview for poetic-froyo-8baba7 ready!

Name	Link
🔨 Latest commit	`8c35854`
🔍 Latest deploy log	https://app.netlify.com/projects/poetic-froyo-8baba7/deploys/685a7d3df4fc6c0008a86f89
😎 Deploy Preview	https://deploy-preview-1030--poetic-froyo-8baba7.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2025-06-24T10:16:43Z

📝 Walkthrough

Walkthrough

The changes update the internal logic of the LocalCollection class in qdrant_client/local/local_collection.py by moving the assignment of updated vector arrays outside conditional blocks in the _add_point method to ensure consistent storage. The _update_named_vectors method's type hint for the vectors parameter is expanded to include multi-dimensional lists, and its logic is simplified to handle different vector types more clearly, including cosine normalization and deletion mask updates. Additionally, a new test function test_update_vectors is added in tests/congruence_tests/test_updates.py to validate vector updates across various vector configurations, comparing local and remote client behavior. No existing public function signatures are removed or altered except for the expanded type hint in _update_named_vectors.

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/congruence_tests/test_updates.py (1)
374-374: Consider using more descriptive test data.

All test scenarios use empty vector dictionaries {} for initial point creation. While this tests the edge case mentioned in the PR objectives, consider adding a comment or using a more descriptive variable name to clarify the intent.
-    points = [models.PointStruct(id=1, vector={})]
+    # Test updating vectors on points initially created with empty vectors
+    empty_vector_points = [models.PointStruct(id=1, vector={})]
Also applies to: 406-406, 442-442, 471-471

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b41e974 and b0d7ffd.

📒 Files selected for processing (2)

qdrant_client/local/local_collection.py (5 hunks)
tests/congruence_tests/test_updates.py (1 hunks)

🧰 Additional context used

🪛 Pylint (3.3.7)

tests/congruence_tests/test_updates.py

[refactor] 363-363: Too many statements (51/50)

(R0915)

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: Redirect rules - poetic-froyo-8baba7
GitHub Check: Python 3.9.x on ubuntu-latest test
GitHub Check: Python 3.13.x on ubuntu-latest test
GitHub Check: Python 3.12.x on ubuntu-latest test
GitHub Check: Python 3.10.x on ubuntu-latest test
GitHub Check: Python 3.11.x on ubuntu-latest test
GitHub Check: Header rules - poetic-froyo-8baba7
GitHub Check: Pages changed - poetic-froyo-8baba7

🔇 Additional comments (5)

qdrant_client/local/local_collection.py (4)

207-207: Critical fix: Ensure resized dense vector arrays are stored back.

This assignment is essential after the potential array resizing operation. Without it, the resized array would remain a local variable and the main self.vectors dictionary would retain the old, smaller array, leading to inconsistent state.

231-231: Critical fix: Ensure extended sparse vector lists are stored back.

This assignment ensures that after potentially extending the sparse vector list with empty vectors, the updated list is stored back to the main self.sparse_vectors dictionary, maintaining storage consistency.

259-259: Critical fix: Ensure extended multivector lists are stored back.

This assignment ensures that after potentially extending the multivector list with empty arrays, the updated list is stored back to the main self.multivectors dictionary, completing the consistent storage pattern across all vector types.

341-370: Excellent refactoring: Improved type safety and streamlined vector update logic.

The changes significantly improve the method:

Type hint expansion: Correctly includes list[list[float]] for multivector support

Streamlined logic: Removes redundant conditional checks and always validates vectors through numpy array conversion

Cleaner separation: Sparse vectors are handled separately with early return, making the flow clearer

Consistent normalization: Cosine normalization is properly applied for both dense and multivectors

Better maintainability: The simplified logic reduces potential edge cases and makes the code easier to understand

The refactoring addresses the multivec update issues mentioned in the PR objectives while improving overall code quality.

tests/congruence_tests/test_updates.py (1)

448-448: Verify multivector format aligns with the fix.

The multivector test uses [[0.2, 0.3], [0.4, 0.5]] format. Given the PR summary mentions fixing multivec update and the AI summary mentions expanded type hints for multi-dimensional lists, this appears to be the correct format for testing the fix.

coderabbitai · 2025-06-24T10:16:46Z

+def test_update_vectors():
+    local_client = init_local()
+    remote_client = init_remote()
+
+    # region unnamed vector in an empty collection
+    vectors_config = models.VectorParams(size=2, distance=models.Distance.DOT)
+    local_client.create_collection(collection_name=COLLECTION_NAME, vectors_config=vectors_config)
+    if remote_client.collection_exists(collection_name=COLLECTION_NAME):
+        remote_client.delete_collection(collection_name=COLLECTION_NAME)
+    remote_client.create_collection(collection_name=COLLECTION_NAME, vectors_config=vectors_config)
+
+    points = [models.PointStruct(id=1, vector={})]
+
+    local_client.upsert(COLLECTION_NAME, points=points)
+    remote_client.upsert(COLLECTION_NAME, points=points, wait=True)
+
+    local_client.update_vectors(
+        COLLECTION_NAME, points=[models.PointVectors(id=1, vector=[0.2, 0.3])]
+    )
+    remote_client.update_vectors(
+        COLLECTION_NAME,
+        points=[models.PointVectors(id=1, vector=[0.2, 0.3])],
+    )
+
+    compare_collections(
+        local_client,
+        remote_client,
+        10,
+        collection_name=COLLECTION_NAME,
+    )
+    local_client.delete_collection(collection_name=COLLECTION_NAME)
+    remote_client.delete_collection(collection_name=COLLECTION_NAME)
+    # endregion
+
+    # region sparse vector in an empty collection
+    sparse_vectors_config = {"sparse": models.SparseVectorParams()}
+    local_client.create_collection(
+        collection_name=COLLECTION_NAME, sparse_vectors_config=sparse_vectors_config
+    )
+    remote_client.create_collection(
+        collection_name=COLLECTION_NAME, sparse_vectors_config=sparse_vectors_config
+    )
+
+    points = [models.PointStruct(id=1, vector={})]
+    local_client.upsert(COLLECTION_NAME, points=points)
+    remote_client.upsert(
+        COLLECTION_NAME,
+        points=points,
+    )
+
+    sparse_points = [
+        models.PointVectors(
+            id=1,
+            vector={"sparse": models.SparseVector(indices=[0, 1], values=[0.2, 0.3])},
+        )
+    ]
+    local_client.update_vectors(COLLECTION_NAME, points=sparse_points)
+    remote_client.update_vectors(COLLECTION_NAME, points=sparse_points)
+    # endregion
+
+    # region multivector in an empty collection
+    local_client.delete_collection(collection_name=COLLECTION_NAME)
+    remote_client.delete_collection(collection_name=COLLECTION_NAME)
+
+    multivectors_config = models.VectorParams(
+        size=2,
+        distance=models.Distance.DOT,
+        multivector_config=models.MultiVectorConfig(
+            comparator=models.MultiVectorComparator.MAX_SIM
+        ),
+    )
+
+    local_client.create_collection(
+        collection_name=COLLECTION_NAME, vectors_config=multivectors_config
+    )
+    remote_client.create_collection(
+        collection_name=COLLECTION_NAME, vectors_config=multivectors_config
+    )
+
+    points = [models.PointStruct(id=1, vector={})]
+    local_client.upsert(COLLECTION_NAME, points=points)
+    remote_client.upsert(
+        COLLECTION_NAME,
+        points=points,
+    )
+    multivector_points = [models.PointVectors(id=1, vector=[[0.2, 0.3], [0.4, 0.5]])]
+    local_client.update_vectors(COLLECTION_NAME, points=multivector_points)
+    remote_client.update_vectors(COLLECTION_NAME, points=multivector_points)
+    compare_collections(
+        local_client,
+        remote_client,
+        10,
+        collection_name=COLLECTION_NAME,
+    )
+    local_client.delete_collection(collection_name=COLLECTION_NAME)
+    remote_client.delete_collection(collection_name=COLLECTION_NAME)
+    # endregion
+
+    # region named vectors
+    named_vectors_config = {"text": models.VectorParams(size=2, distance=models.Distance.DOT)}
+    local_client.create_collection(
+        collection_name=COLLECTION_NAME,
+        vectors_config=named_vectors_config,
+    )
+    remote_client.create_collection(
+        collection_name=COLLECTION_NAME,
+        vectors_config=named_vectors_config,
+    )
+    points = [models.PointStruct(id=1, vector={})]
+
+    local_client.upsert(COLLECTION_NAME, points=points)
+    remote_client.upsert(
+        COLLECTION_NAME,
+        points=points,
+    )
+    named_vector_points = [
+        models.PointVectors(
+            id=1,
+            vector={"text": [0.2, 0.3]},
+        )
+    ]
+
+    local_client.update_vectors(COLLECTION_NAME, points=named_vector_points)
+    remote_client.update_vectors(COLLECTION_NAME, points=named_vector_points)
+    compare_collections(
+        local_client,
+        remote_client,
+        10,
+        collection_name=COLLECTION_NAME,
+    )
+    local_client.delete_collection(collection_name=COLLECTION_NAME)
+    remote_client.delete_collection(collection_name=COLLECTION_NAME)
+    # endregion


🛠️ Refactor suggestion

Refactor into separate test functions to improve maintainability.

This function violates the single responsibility principle by testing multiple vector update scenarios in one function (51 statements). This makes it harder to identify which specific scenario fails and adds unnecessary complexity.

Consider splitting this into separate, focused test functions:

-def test_update_vectors(): +def test_update_vectors_unnamed(): + # Only test unnamed vector scenario + +def test_update_vectors_sparse(): + # Only test sparse vector scenario + +def test_update_vectors_multivector(): + # Only test multivector scenario + +def test_update_vectors_named(): + # Only test named vector scenario

This approach would:

Make failures easier to debug and isolate

Improve test readability and maintainability

Address the static analysis warning about function complexity

Allow for more targeted test execution

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 363-363: Too many statements (51/50)

(R0915)

🤖 Prompt for AI Agents

In tests/congruence_tests/test_updates.py from lines 363 to 495, the test_update_vectors function tests multiple vector update scenarios in one large function, making it complex and hard to maintain. Refactor by splitting this single function into multiple smaller test functions, each focused on one specific vector update scenario such as unnamed vector, sparse vector, multivector, and named vectors. This will improve readability, isolate failures, and comply with the single responsibility principle.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

tests/congruence_tests/test_updates.py (1)

363-504: Function complexity remains unaddressed from previous review.

The function still violates the single responsibility principle by testing multiple vector update scenarios in one large function (54 statements), making it difficult to identify which specific scenario fails when tests break.

The previous suggestion to split this into separate focused test functions (test_update_vectors_unnamed, test_update_vectors_sparse, test_update_vectors_multivector, test_update_vectors_named) would improve maintainability and debugging.

🧹 Nitpick comments (1)

tests/congruence_tests/test_updates.py (1)
433-434: Inconsistent cleanup approach across test sections.

The multivector section explicitly deletes collections before creating new ones, while other sections rely on the if remote_client.collection_exists() check pattern. This inconsistency could lead to maintenance issues.

Consider standardizing the cleanup approach across all sections:
-    local_client.delete_collection(collection_name=COLLECTION_NAME)
-    remote_client.delete_collection(collection_name=COLLECTION_NAME)
-
     multivectors_config = models.VectorParams(
         size=2,
         distance=models.Distance.DOT,
         multivector_config=models.MultiVectorConfig(
             comparator=models.MultiVectorComparator.MAX_SIM
         ),
     )

     local_client.create_collection(
         collection_name=COLLECTION_NAME, vectors_config=multivectors_config
     )
+    if remote_client.collection_exists(collection_name=COLLECTION_NAME):
+        remote_client.delete_collection(collection_name=COLLECTION_NAME)
     remote_client.create_collection(
         collection_name=COLLECTION_NAME, vectors_config=multivectors_config
     )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0d7ffd and 8c35854.

📒 Files selected for processing (1)

tests/congruence_tests/test_updates.py (1 hunks)

🧰 Additional context used

🪛 Pylint (3.3.7)

tests/congruence_tests/test_updates.py

[refactor] 363-363: Too many statements (54/50)

(R0915)

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: Redirect rules - poetic-froyo-8baba7
GitHub Check: Header rules - poetic-froyo-8baba7
GitHub Check: Python 3.11.x on ubuntu-latest test
GitHub Check: Python 3.13.x on ubuntu-latest test
GitHub Check: Python 3.12.x on ubuntu-latest test
GitHub Check: Python 3.9.x on ubuntu-latest test
GitHub Check: Python 3.10.x on ubuntu-latest test
GitHub Check: Pages changed - poetic-froyo-8baba7

🔇 Additional comments (1)

tests/congruence_tests/test_updates.py (1)

422-429: Good fix: sparse vector section now includes proper comparison and cleanup.

The missing collection comparison and cleanup steps that were flagged in the previous review have been properly addressed.

* fix: fix edge case in upsert, fix multivec update, add tests * fix: fix tests

fix: fix edge case in upsert, fix multivec update, add tests

b0d7ffd

coderabbitai Bot reviewed Jun 24, 2025

View reviewed changes

fix: fix tests

8c35854

coderabbitai Bot reviewed Jun 24, 2025

View reviewed changes

generall approved these changes Jul 10, 2025

View reviewed changes

generall merged commit bb7016f into dev Jul 10, 2025
14 checks passed

joein added a commit that referenced this pull request Jul 18, 2025

fix: fix edge case in upsert, fix multivec update, add tests (#1030)

c58790d

* fix: fix edge case in upsert, fix multivec update, add tests * fix: fix tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix edge case in upsert, fix multivec update, add tests#1030

fix: fix edge case in upsert, fix multivec update, add tests#1030
generall merged 2 commits into
devfrom
fix-multivectors-updates

joein commented Jun 24, 2025

Uh oh!

netlify Bot commented Jun 24, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 24, 2025 •

edited

Loading

Walkthrough

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 24, 2025

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joein commented Jun 24, 2025

Uh oh!

netlify Bot commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for poetic-froyo-8baba7 ready!

Uh oh!

coderabbitai Bot commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify Bot commented Jun 24, 2025 •

edited

Loading

coderabbitai Bot commented Jun 24, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)