Skip to content

feat: find function#931

Merged
JohannesMessner merged 20 commits intofeat-rewrite-v2from
feat-find
Dec 14, 2022
Merged

feat: find function#931
JohannesMessner merged 20 commits intofeat-rewrite-v2from
feat-find

Conversation

@JohannesMessner
Copy link
Copy Markdown
Member

@JohannesMessner JohannesMessner commented Dec 12, 2022

Goals:

Create find() function:

from docarray import DocumentArray, Document
from docarray.utility import find, find_batched
from docarray.typing import TorchTensor

class MyDoc(Document):
    tensor: TorchTensor

da = DocumentArray[MyDoc](MyDoc(tensor=torch.rand(128)) for _ in range(10))

matches, scores = find(da, MyDoc(tensor=torch.rand(128)), embedding_field='tensor', metric='cosine_sim')

batched_query = DocumentArray[MyDoc](
    [MyDoc(tensor=torch.rand(128)) for _ in range(3)]
)
results = find_batched(da, batched_query, embedding_field='tensor', metric='cosine_sim')
assert len(results) == 3
for matches_i, socres_i in results:
    ...

TODO

  • implement find
    • for torch
    • for numpy
    • allow docarray as query
  • refactor: Create classes for backend operations. Edit: Separate PR
  • refactor: let types define the backend they belong to. Edit: Separate PR
  • consider having find and find_batched explicitly
  • user defined callable as distance function. Edit: not doing in first iteration
  • tests
  • Documentation (docstrings)
  • Optional: batching Edit: not in this PR
  • Optional: nested find.Edit: not doing in first iteration
  • Optional: other features from current docarray. Edit: not doing in first iteration

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core area/testing area/typing DocArray v2 This issue is part of the rewrite; not to be merged into main size/xl

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

create a find function that operate on DocumentArray

2 participants