fix: fix edge case in upsert, fix multivec update, add tests#1030
Conversation
✅ Deploy Preview for poetic-froyo-8baba7 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughThe changes update the internal logic of the ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/congruence_tests/test_updates.py (1)
374-374: Consider using more descriptive test data.All test scenarios use empty vector dictionaries
{}for initial point creation. While this tests the edge case mentioned in the PR objectives, consider adding a comment or using a more descriptive variable name to clarify the intent.- points = [models.PointStruct(id=1, vector={})] + # Test updating vectors on points initially created with empty vectors + empty_vector_points = [models.PointStruct(id=1, vector={})]Also applies to: 406-406, 442-442, 471-471
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
qdrant_client/local/local_collection.py(5 hunks)tests/congruence_tests/test_updates.py(1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
tests/congruence_tests/test_updates.py
[refactor] 363-363: Too many statements (51/50)
(R0915)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Redirect rules - poetic-froyo-8baba7
- GitHub Check: Python 3.9.x on ubuntu-latest test
- GitHub Check: Python 3.13.x on ubuntu-latest test
- GitHub Check: Python 3.12.x on ubuntu-latest test
- GitHub Check: Python 3.10.x on ubuntu-latest test
- GitHub Check: Python 3.11.x on ubuntu-latest test
- GitHub Check: Header rules - poetic-froyo-8baba7
- GitHub Check: Pages changed - poetic-froyo-8baba7
🔇 Additional comments (5)
qdrant_client/local/local_collection.py (4)
207-207: Critical fix: Ensure resized dense vector arrays are stored back.This assignment is essential after the potential array resizing operation. Without it, the resized array would remain a local variable and the main
self.vectorsdictionary would retain the old, smaller array, leading to inconsistent state.
231-231: Critical fix: Ensure extended sparse vector lists are stored back.This assignment ensures that after potentially extending the sparse vector list with empty vectors, the updated list is stored back to the main
self.sparse_vectorsdictionary, maintaining storage consistency.
259-259: Critical fix: Ensure extended multivector lists are stored back.This assignment ensures that after potentially extending the multivector list with empty arrays, the updated list is stored back to the main
self.multivectorsdictionary, completing the consistent storage pattern across all vector types.
341-370: Excellent refactoring: Improved type safety and streamlined vector update logic.The changes significantly improve the method:
- Type hint expansion: Correctly includes
list[list[float]]for multivector support- Streamlined logic: Removes redundant conditional checks and always validates vectors through numpy array conversion
- Cleaner separation: Sparse vectors are handled separately with early return, making the flow clearer
- Consistent normalization: Cosine normalization is properly applied for both dense and multivectors
- Better maintainability: The simplified logic reduces potential edge cases and makes the code easier to understand
The refactoring addresses the multivec update issues mentioned in the PR objectives while improving overall code quality.
tests/congruence_tests/test_updates.py (1)
448-448: Verify multivector format aligns with the fix.The multivector test uses
[[0.2, 0.3], [0.4, 0.5]]format. Given the PR summary mentions fixing multivec update and the AI summary mentions expanded type hints for multi-dimensional lists, this appears to be the correct format for testing the fix.
| def test_update_vectors(): | ||
| local_client = init_local() | ||
| remote_client = init_remote() | ||
|
|
||
| # region unnamed vector in an empty collection | ||
| vectors_config = models.VectorParams(size=2, distance=models.Distance.DOT) | ||
| local_client.create_collection(collection_name=COLLECTION_NAME, vectors_config=vectors_config) | ||
| if remote_client.collection_exists(collection_name=COLLECTION_NAME): | ||
| remote_client.delete_collection(collection_name=COLLECTION_NAME) | ||
| remote_client.create_collection(collection_name=COLLECTION_NAME, vectors_config=vectors_config) | ||
|
|
||
| points = [models.PointStruct(id=1, vector={})] | ||
|
|
||
| local_client.upsert(COLLECTION_NAME, points=points) | ||
| remote_client.upsert(COLLECTION_NAME, points=points, wait=True) | ||
|
|
||
| local_client.update_vectors( | ||
| COLLECTION_NAME, points=[models.PointVectors(id=1, vector=[0.2, 0.3])] | ||
| ) | ||
| remote_client.update_vectors( | ||
| COLLECTION_NAME, | ||
| points=[models.PointVectors(id=1, vector=[0.2, 0.3])], | ||
| ) | ||
|
|
||
| compare_collections( | ||
| local_client, | ||
| remote_client, | ||
| 10, | ||
| collection_name=COLLECTION_NAME, | ||
| ) | ||
| local_client.delete_collection(collection_name=COLLECTION_NAME) | ||
| remote_client.delete_collection(collection_name=COLLECTION_NAME) | ||
| # endregion | ||
|
|
||
| # region sparse vector in an empty collection | ||
| sparse_vectors_config = {"sparse": models.SparseVectorParams()} | ||
| local_client.create_collection( | ||
| collection_name=COLLECTION_NAME, sparse_vectors_config=sparse_vectors_config | ||
| ) | ||
| remote_client.create_collection( | ||
| collection_name=COLLECTION_NAME, sparse_vectors_config=sparse_vectors_config | ||
| ) | ||
|
|
||
| points = [models.PointStruct(id=1, vector={})] | ||
| local_client.upsert(COLLECTION_NAME, points=points) | ||
| remote_client.upsert( | ||
| COLLECTION_NAME, | ||
| points=points, | ||
| ) | ||
|
|
||
| sparse_points = [ | ||
| models.PointVectors( | ||
| id=1, | ||
| vector={"sparse": models.SparseVector(indices=[0, 1], values=[0.2, 0.3])}, | ||
| ) | ||
| ] | ||
| local_client.update_vectors(COLLECTION_NAME, points=sparse_points) | ||
| remote_client.update_vectors(COLLECTION_NAME, points=sparse_points) | ||
| # endregion | ||
|
|
||
| # region multivector in an empty collection | ||
| local_client.delete_collection(collection_name=COLLECTION_NAME) | ||
| remote_client.delete_collection(collection_name=COLLECTION_NAME) | ||
|
|
||
| multivectors_config = models.VectorParams( | ||
| size=2, | ||
| distance=models.Distance.DOT, | ||
| multivector_config=models.MultiVectorConfig( | ||
| comparator=models.MultiVectorComparator.MAX_SIM | ||
| ), | ||
| ) | ||
|
|
||
| local_client.create_collection( | ||
| collection_name=COLLECTION_NAME, vectors_config=multivectors_config | ||
| ) | ||
| remote_client.create_collection( | ||
| collection_name=COLLECTION_NAME, vectors_config=multivectors_config | ||
| ) | ||
|
|
||
| points = [models.PointStruct(id=1, vector={})] | ||
| local_client.upsert(COLLECTION_NAME, points=points) | ||
| remote_client.upsert( | ||
| COLLECTION_NAME, | ||
| points=points, | ||
| ) | ||
| multivector_points = [models.PointVectors(id=1, vector=[[0.2, 0.3], [0.4, 0.5]])] | ||
| local_client.update_vectors(COLLECTION_NAME, points=multivector_points) | ||
| remote_client.update_vectors(COLLECTION_NAME, points=multivector_points) | ||
| compare_collections( | ||
| local_client, | ||
| remote_client, | ||
| 10, | ||
| collection_name=COLLECTION_NAME, | ||
| ) | ||
| local_client.delete_collection(collection_name=COLLECTION_NAME) | ||
| remote_client.delete_collection(collection_name=COLLECTION_NAME) | ||
| # endregion | ||
|
|
||
| # region named vectors | ||
| named_vectors_config = {"text": models.VectorParams(size=2, distance=models.Distance.DOT)} | ||
| local_client.create_collection( | ||
| collection_name=COLLECTION_NAME, | ||
| vectors_config=named_vectors_config, | ||
| ) | ||
| remote_client.create_collection( | ||
| collection_name=COLLECTION_NAME, | ||
| vectors_config=named_vectors_config, | ||
| ) | ||
| points = [models.PointStruct(id=1, vector={})] | ||
|
|
||
| local_client.upsert(COLLECTION_NAME, points=points) | ||
| remote_client.upsert( | ||
| COLLECTION_NAME, | ||
| points=points, | ||
| ) | ||
| named_vector_points = [ | ||
| models.PointVectors( | ||
| id=1, | ||
| vector={"text": [0.2, 0.3]}, | ||
| ) | ||
| ] | ||
|
|
||
| local_client.update_vectors(COLLECTION_NAME, points=named_vector_points) | ||
| remote_client.update_vectors(COLLECTION_NAME, points=named_vector_points) | ||
| compare_collections( | ||
| local_client, | ||
| remote_client, | ||
| 10, | ||
| collection_name=COLLECTION_NAME, | ||
| ) | ||
| local_client.delete_collection(collection_name=COLLECTION_NAME) | ||
| remote_client.delete_collection(collection_name=COLLECTION_NAME) | ||
| # endregion |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Refactor into separate test functions to improve maintainability.
This function violates the single responsibility principle by testing multiple vector update scenarios in one function (51 statements). This makes it harder to identify which specific scenario fails and adds unnecessary complexity.
Consider splitting this into separate, focused test functions:
-def test_update_vectors():
+def test_update_vectors_unnamed():
+ # Only test unnamed vector scenario
+
+def test_update_vectors_sparse():
+ # Only test sparse vector scenario
+
+def test_update_vectors_multivector():
+ # Only test multivector scenario
+
+def test_update_vectors_named():
+ # Only test named vector scenarioThis approach would:
- Make failures easier to debug and isolate
- Improve test readability and maintainability
- Address the static analysis warning about function complexity
- Allow for more targeted test execution
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 363-363: Too many statements (51/50)
(R0915)
🤖 Prompt for AI Agents
In tests/congruence_tests/test_updates.py from lines 363 to 495, the
test_update_vectors function tests multiple vector update scenarios in one large
function, making it complex and hard to maintain. Refactor by splitting this
single function into multiple smaller test functions, each focused on one
specific vector update scenario such as unnamed vector, sparse vector,
multivector, and named vectors. This will improve readability, isolate failures,
and comply with the single responsibility principle.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/congruence_tests/test_updates.py (1)
363-504: Function complexity remains unaddressed from previous review.The function still violates the single responsibility principle by testing multiple vector update scenarios in one large function (54 statements), making it difficult to identify which specific scenario fails when tests break.
The previous suggestion to split this into separate focused test functions (
test_update_vectors_unnamed,test_update_vectors_sparse,test_update_vectors_multivector,test_update_vectors_named) would improve maintainability and debugging.
🧹 Nitpick comments (1)
tests/congruence_tests/test_updates.py (1)
433-434: Inconsistent cleanup approach across test sections.The multivector section explicitly deletes collections before creating new ones, while other sections rely on the
if remote_client.collection_exists()check pattern. This inconsistency could lead to maintenance issues.Consider standardizing the cleanup approach across all sections:
- local_client.delete_collection(collection_name=COLLECTION_NAME) - remote_client.delete_collection(collection_name=COLLECTION_NAME) - multivectors_config = models.VectorParams( size=2, distance=models.Distance.DOT, multivector_config=models.MultiVectorConfig( comparator=models.MultiVectorComparator.MAX_SIM ), ) local_client.create_collection( collection_name=COLLECTION_NAME, vectors_config=multivectors_config ) + if remote_client.collection_exists(collection_name=COLLECTION_NAME): + remote_client.delete_collection(collection_name=COLLECTION_NAME) remote_client.create_collection( collection_name=COLLECTION_NAME, vectors_config=multivectors_config )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/congruence_tests/test_updates.py(1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
tests/congruence_tests/test_updates.py
[refactor] 363-363: Too many statements (54/50)
(R0915)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Redirect rules - poetic-froyo-8baba7
- GitHub Check: Header rules - poetic-froyo-8baba7
- GitHub Check: Python 3.11.x on ubuntu-latest test
- GitHub Check: Python 3.13.x on ubuntu-latest test
- GitHub Check: Python 3.12.x on ubuntu-latest test
- GitHub Check: Python 3.9.x on ubuntu-latest test
- GitHub Check: Python 3.10.x on ubuntu-latest test
- GitHub Check: Pages changed - poetic-froyo-8baba7
🔇 Additional comments (1)
tests/congruence_tests/test_updates.py (1)
422-429: Good fix: sparse vector section now includes proper comparison and cleanup.The missing collection comparison and cleanup steps that were flagged in the previous review have been properly addressed.
* fix: fix edge case in upsert, fix multivec update, add tests * fix: fix tests
#1029