Skip to content

fix: fix edge case in upsert, fix multivec update, add tests#1030

Merged
generall merged 2 commits into
devfrom
fix-multivectors-updates
Jul 10, 2025
Merged

fix: fix edge case in upsert, fix multivec update, add tests#1030
generall merged 2 commits into
devfrom
fix-multivectors-updates

Conversation

@joein

@joein joein commented Jun 24, 2025

Copy link
Copy Markdown
Member

@netlify

netlify Bot commented Jun 24, 2025

Copy link
Copy Markdown

Deploy Preview for poetic-froyo-8baba7 ready!

Name Link
🔨 Latest commit 8c35854
🔍 Latest deploy log https://app.netlify.com/projects/poetic-froyo-8baba7/deploys/685a7d3df4fc6c0008a86f89
😎 Deploy Preview https://deploy-preview-1030--poetic-froyo-8baba7.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai

coderabbitai Bot commented Jun 24, 2025

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

The changes update the internal logic of the LocalCollection class in qdrant_client/local/local_collection.py by moving the assignment of updated vector arrays outside conditional blocks in the _add_point method to ensure consistent storage. The _update_named_vectors method's type hint for the vectors parameter is expanded to include multi-dimensional lists, and its logic is simplified to handle different vector types more clearly, including cosine normalization and deletion mask updates. Additionally, a new test function test_update_vectors is added in tests/congruence_tests/test_updates.py to validate vector updates across various vector configurations, comparing local and remote client behavior. No existing public function signatures are removed or altered except for the expanded type hint in _update_named_vectors.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/congruence_tests/test_updates.py (1)

374-374: Consider using more descriptive test data.

All test scenarios use empty vector dictionaries {} for initial point creation. While this tests the edge case mentioned in the PR objectives, consider adding a comment or using a more descriptive variable name to clarify the intent.

-    points = [models.PointStruct(id=1, vector={})]
+    # Test updating vectors on points initially created with empty vectors
+    empty_vector_points = [models.PointStruct(id=1, vector={})]

Also applies to: 406-406, 442-442, 471-471

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b41e974 and b0d7ffd.

📒 Files selected for processing (2)
  • qdrant_client/local/local_collection.py (5 hunks)
  • tests/congruence_tests/test_updates.py (1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
tests/congruence_tests/test_updates.py

[refactor] 363-363: Too many statements (51/50)

(R0915)

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Redirect rules - poetic-froyo-8baba7
  • GitHub Check: Python 3.9.x on ubuntu-latest test
  • GitHub Check: Python 3.13.x on ubuntu-latest test
  • GitHub Check: Python 3.12.x on ubuntu-latest test
  • GitHub Check: Python 3.10.x on ubuntu-latest test
  • GitHub Check: Python 3.11.x on ubuntu-latest test
  • GitHub Check: Header rules - poetic-froyo-8baba7
  • GitHub Check: Pages changed - poetic-froyo-8baba7
🔇 Additional comments (5)
qdrant_client/local/local_collection.py (4)

207-207: Critical fix: Ensure resized dense vector arrays are stored back.

This assignment is essential after the potential array resizing operation. Without it, the resized array would remain a local variable and the main self.vectors dictionary would retain the old, smaller array, leading to inconsistent state.


231-231: Critical fix: Ensure extended sparse vector lists are stored back.

This assignment ensures that after potentially extending the sparse vector list with empty vectors, the updated list is stored back to the main self.sparse_vectors dictionary, maintaining storage consistency.


259-259: Critical fix: Ensure extended multivector lists are stored back.

This assignment ensures that after potentially extending the multivector list with empty arrays, the updated list is stored back to the main self.multivectors dictionary, completing the consistent storage pattern across all vector types.


341-370: Excellent refactoring: Improved type safety and streamlined vector update logic.

The changes significantly improve the method:

  1. Type hint expansion: Correctly includes list[list[float]] for multivector support
  2. Streamlined logic: Removes redundant conditional checks and always validates vectors through numpy array conversion
  3. Cleaner separation: Sparse vectors are handled separately with early return, making the flow clearer
  4. Consistent normalization: Cosine normalization is properly applied for both dense and multivectors
  5. Better maintainability: The simplified logic reduces potential edge cases and makes the code easier to understand

The refactoring addresses the multivec update issues mentioned in the PR objectives while improving overall code quality.

tests/congruence_tests/test_updates.py (1)

448-448: Verify multivector format aligns with the fix.

The multivector test uses [[0.2, 0.3], [0.4, 0.5]] format. Given the PR summary mentions fixing multivec update and the AI summary mentions expanded type hints for multi-dimensional lists, this appears to be the correct format for testing the fix.

Comment on lines +363 to +495
def test_update_vectors():
local_client = init_local()
remote_client = init_remote()

# region unnamed vector in an empty collection
vectors_config = models.VectorParams(size=2, distance=models.Distance.DOT)
local_client.create_collection(collection_name=COLLECTION_NAME, vectors_config=vectors_config)
if remote_client.collection_exists(collection_name=COLLECTION_NAME):
remote_client.delete_collection(collection_name=COLLECTION_NAME)
remote_client.create_collection(collection_name=COLLECTION_NAME, vectors_config=vectors_config)

points = [models.PointStruct(id=1, vector={})]

local_client.upsert(COLLECTION_NAME, points=points)
remote_client.upsert(COLLECTION_NAME, points=points, wait=True)

local_client.update_vectors(
COLLECTION_NAME, points=[models.PointVectors(id=1, vector=[0.2, 0.3])]
)
remote_client.update_vectors(
COLLECTION_NAME,
points=[models.PointVectors(id=1, vector=[0.2, 0.3])],
)

compare_collections(
local_client,
remote_client,
10,
collection_name=COLLECTION_NAME,
)
local_client.delete_collection(collection_name=COLLECTION_NAME)
remote_client.delete_collection(collection_name=COLLECTION_NAME)
# endregion

# region sparse vector in an empty collection
sparse_vectors_config = {"sparse": models.SparseVectorParams()}
local_client.create_collection(
collection_name=COLLECTION_NAME, sparse_vectors_config=sparse_vectors_config
)
remote_client.create_collection(
collection_name=COLLECTION_NAME, sparse_vectors_config=sparse_vectors_config
)

points = [models.PointStruct(id=1, vector={})]
local_client.upsert(COLLECTION_NAME, points=points)
remote_client.upsert(
COLLECTION_NAME,
points=points,
)

sparse_points = [
models.PointVectors(
id=1,
vector={"sparse": models.SparseVector(indices=[0, 1], values=[0.2, 0.3])},
)
]
local_client.update_vectors(COLLECTION_NAME, points=sparse_points)
remote_client.update_vectors(COLLECTION_NAME, points=sparse_points)
# endregion

# region multivector in an empty collection
local_client.delete_collection(collection_name=COLLECTION_NAME)
remote_client.delete_collection(collection_name=COLLECTION_NAME)

multivectors_config = models.VectorParams(
size=2,
distance=models.Distance.DOT,
multivector_config=models.MultiVectorConfig(
comparator=models.MultiVectorComparator.MAX_SIM
),
)

local_client.create_collection(
collection_name=COLLECTION_NAME, vectors_config=multivectors_config
)
remote_client.create_collection(
collection_name=COLLECTION_NAME, vectors_config=multivectors_config
)

points = [models.PointStruct(id=1, vector={})]
local_client.upsert(COLLECTION_NAME, points=points)
remote_client.upsert(
COLLECTION_NAME,
points=points,
)
multivector_points = [models.PointVectors(id=1, vector=[[0.2, 0.3], [0.4, 0.5]])]
local_client.update_vectors(COLLECTION_NAME, points=multivector_points)
remote_client.update_vectors(COLLECTION_NAME, points=multivector_points)
compare_collections(
local_client,
remote_client,
10,
collection_name=COLLECTION_NAME,
)
local_client.delete_collection(collection_name=COLLECTION_NAME)
remote_client.delete_collection(collection_name=COLLECTION_NAME)
# endregion

# region named vectors
named_vectors_config = {"text": models.VectorParams(size=2, distance=models.Distance.DOT)}
local_client.create_collection(
collection_name=COLLECTION_NAME,
vectors_config=named_vectors_config,
)
remote_client.create_collection(
collection_name=COLLECTION_NAME,
vectors_config=named_vectors_config,
)
points = [models.PointStruct(id=1, vector={})]

local_client.upsert(COLLECTION_NAME, points=points)
remote_client.upsert(
COLLECTION_NAME,
points=points,
)
named_vector_points = [
models.PointVectors(
id=1,
vector={"text": [0.2, 0.3]},
)
]

local_client.update_vectors(COLLECTION_NAME, points=named_vector_points)
remote_client.update_vectors(COLLECTION_NAME, points=named_vector_points)
compare_collections(
local_client,
remote_client,
10,
collection_name=COLLECTION_NAME,
)
local_client.delete_collection(collection_name=COLLECTION_NAME)
remote_client.delete_collection(collection_name=COLLECTION_NAME)
# endregion

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor into separate test functions to improve maintainability.

This function violates the single responsibility principle by testing multiple vector update scenarios in one function (51 statements). This makes it harder to identify which specific scenario fails and adds unnecessary complexity.

Consider splitting this into separate, focused test functions:

-def test_update_vectors():
+def test_update_vectors_unnamed():
+    # Only test unnamed vector scenario
+    
+def test_update_vectors_sparse():  
+    # Only test sparse vector scenario
+    
+def test_update_vectors_multivector():
+    # Only test multivector scenario
+    
+def test_update_vectors_named():
+    # Only test named vector scenario

This approach would:

  • Make failures easier to debug and isolate
  • Improve test readability and maintainability
  • Address the static analysis warning about function complexity
  • Allow for more targeted test execution

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 363-363: Too many statements (51/50)

(R0915)

🤖 Prompt for AI Agents
In tests/congruence_tests/test_updates.py from lines 363 to 495, the
test_update_vectors function tests multiple vector update scenarios in one large
function, making it complex and hard to maintain. Refactor by splitting this
single function into multiple smaller test functions, each focused on one
specific vector update scenario such as unnamed vector, sparse vector,
multivector, and named vectors. This will improve readability, isolate failures,
and comply with the single responsibility principle.

Comment thread tests/congruence_tests/test_updates.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/congruence_tests/test_updates.py (1)

363-504: Function complexity remains unaddressed from previous review.

The function still violates the single responsibility principle by testing multiple vector update scenarios in one large function (54 statements), making it difficult to identify which specific scenario fails when tests break.

The previous suggestion to split this into separate focused test functions (test_update_vectors_unnamed, test_update_vectors_sparse, test_update_vectors_multivector, test_update_vectors_named) would improve maintainability and debugging.

🧹 Nitpick comments (1)
tests/congruence_tests/test_updates.py (1)

433-434: Inconsistent cleanup approach across test sections.

The multivector section explicitly deletes collections before creating new ones, while other sections rely on the if remote_client.collection_exists() check pattern. This inconsistency could lead to maintenance issues.

Consider standardizing the cleanup approach across all sections:

-    local_client.delete_collection(collection_name=COLLECTION_NAME)
-    remote_client.delete_collection(collection_name=COLLECTION_NAME)
-
     multivectors_config = models.VectorParams(
         size=2,
         distance=models.Distance.DOT,
         multivector_config=models.MultiVectorConfig(
             comparator=models.MultiVectorComparator.MAX_SIM
         ),
     )

     local_client.create_collection(
         collection_name=COLLECTION_NAME, vectors_config=multivectors_config
     )
+    if remote_client.collection_exists(collection_name=COLLECTION_NAME):
+        remote_client.delete_collection(collection_name=COLLECTION_NAME)
     remote_client.create_collection(
         collection_name=COLLECTION_NAME, vectors_config=multivectors_config
     )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0d7ffd and 8c35854.

📒 Files selected for processing (1)
  • tests/congruence_tests/test_updates.py (1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
tests/congruence_tests/test_updates.py

[refactor] 363-363: Too many statements (54/50)

(R0915)

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Redirect rules - poetic-froyo-8baba7
  • GitHub Check: Header rules - poetic-froyo-8baba7
  • GitHub Check: Python 3.11.x on ubuntu-latest test
  • GitHub Check: Python 3.13.x on ubuntu-latest test
  • GitHub Check: Python 3.12.x on ubuntu-latest test
  • GitHub Check: Python 3.9.x on ubuntu-latest test
  • GitHub Check: Python 3.10.x on ubuntu-latest test
  • GitHub Check: Pages changed - poetic-froyo-8baba7
🔇 Additional comments (1)
tests/congruence_tests/test_updates.py (1)

422-429: Good fix: sparse vector section now includes proper comparison and cleanup.

The missing collection comparison and cleanup steps that were flagged in the previous review have been properly addressed.

@generall generall merged commit bb7016f into dev Jul 10, 2025
14 checks passed
joein added a commit that referenced this pull request Jul 18, 2025
* fix: fix edge case in upsert, fix multivec update, add tests

* fix: fix tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants