Python: Adding USearch memory connector#2358
Merged
dluc merged 12 commits intomicrosoft:mainfrom Aug 23, 2023
Merged
Conversation
|
This is exciting! We are also working on C# bindings for USearch to allow broader integration with SK 🤗 cc @dluc |
Fix: removing cast to `str` due to patch in USearch
Contributor
Author
|
@microsoft-github-policy-service agree |
Refactor: method naming Docs: update to fit changes
Docs: clarification
Contributor
|
awesome, thank you @ashvardanian - I'll take a look asap (FYI, there's a quick git conflict to fix when you have a chance) |
|
Hey, @dluc! @AleksandrKent has updated the poetry file. It seems to be the only collision. But it will re-appear as soon as you have any other dependency updates, so we should try merging this sooner. Please let us know if anything has to be polished. |
awharrison-28
approved these changes
Aug 23, 2023
Contributor
awharrison-28
left a comment
There was a problem hiding this comment.
Thank you for this contribution :)
dluc
reviewed
Aug 23, 2023
dluc
approved these changes
Aug 23, 2023
SOE-YoungS
pushed a commit
to SOE-YoungS/semantic-kernel
that referenced
this pull request
Nov 1, 2023
### Motivation and Context The integration of [USearch](https://github.com/unum-cloud/usearch) as a memory connector to Semantic Kernel (SK). ### Description The USearch `Index` does not natively have the ability to store different collections, and it only stores embeddings without other attributes like `MemoryRecord`. The `USearchMemoryStore` class encapsulates these capabilities. It uses the USearch `Index` to store a collection of embeddings under unique IDs, with original collection names mapped to those IDs. Other `MemoryRecord ` attributes are stored in a `pyarrow.Table`, which is mapped to each collection. It's important to note the current behavior when a user removes a record or upserts a new one with an existing ID: the old row is not removed from the `pyarrow.Table`. This is done for performance reasons but could lead to the table growing in size. By default, `USearchMemoryStore` operates as an in-memory store. To enable persistence, you must set the persist mode with calling appropriate `__init__ `, supplying a path to the directory for the persist files. For each collection, two files will be created: `{collection_name}.usearch` and `{collection_name}.parquet`. Changes will only be dumped to the disk when `close_async` is called. Due to the interface provided by the base class `MemoryStoreBase`, this happens implicitly when using a context manager, or it may be called explicitly. Since collection names are used to store files on disk, all names are converted to lowercase. To ensure efficient use of memory, you should call `close_async`. --------- Co-authored-by: Abby Harrison <[email protected]> Co-authored-by: Abby Harrison <[email protected]> Co-authored-by: Devis Lucato <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
The integration of USearch as a memory connector to Semantic Kernel (SK).
Description
The USearch
Indexdoes not natively have the ability to store different collections, and it only stores embeddings without other attributes likeMemoryRecord.The
USearchMemoryStoreclass encapsulates these capabilities. It uses the USearchIndexto store a collection of embeddings under unique IDs, with original collection names mapped to those IDs. OtherMemoryRecordattributes are stored in apyarrow.Table, which is mapped to each collection.It's important to note the current behavior when a user removes a record or upserts a new one with an existing ID: the old row is not removed from the
pyarrow.Table. This is done for performance reasons but could lead to the table growing in size.By default,
USearchMemoryStoreoperates as an in-memory store. To enable persistence, you must set the persist mode with calling appropriate__init__, supplying a path to the directory for the persist files. For each collection, two files will be created:{collection_name}.usearchand{collection_name}.parquet. Changes will only be dumped to the disk whenclose_asyncis called. Due to the interface provided by the base classMemoryStoreBase, this happens implicitly when using a context manager, or it may be called explicitly.Since collection names are used to store files on disk, all names are converted to lowercase.
To ensure efficient use of memory, you should call
close_async.Contribution Checklist