Skip to content

fix: cache ONNXMiniLM_L6_V2 instance in DefaultEmbeddingFunction#6960

Open
Jah-yee wants to merge 1 commit intochroma-core:mainfrom
Jah-yee:fix/6948-cache-onnx-instance
Open

fix: cache ONNXMiniLM_L6_V2 instance in DefaultEmbeddingFunction#6960
Jah-yee wants to merge 1 commit intochroma-core:mainfrom
Jah-yee:fix/6948-cache-onnx-instance

Conversation

@Jah-yee
Copy link
Copy Markdown

@Jah-yee Jah-yee commented Apr 23, 2026

DefaultEmbeddingFunction.call was constructing a fresh ONNXMiniLM_L6_V2 on every call, triggering cold lazy-init of the tokenizer (~5ms) and ONNX session each time. This caused a ~10x slowdown on repeated embed calls.

Fix: create the ONNXMiniLM_L6_V2 instance once in call and cache it on self._ef for subsequent calls.

Fixes #6941

Signed-off-by: Jah-yee [email protected]


Thank you for your work on this project. I hope this small fix is helpful. Please let me know if there's anything to adjust.

Warmly, RoomWithOutRoof

DefaultEmbeddingFunction.__call__ was constructing a fresh ONNXMiniLM_L6_V2
on every call, triggering cold lazy-init of the tokenizer (~5ms) and ONNX
session each time. This caused a ~10x slowdown on repeated embed calls.

Fix: create the ONNXMiniLM_L6_V2 instance once in __call__ and cache it
on self._ef for subsequent calls.

Fixes chroma-core#6941

Signed-off-by: Jah-yee <[email protected]>
@github-actions
Copy link
Copy Markdown

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@propel-code-bot
Copy link
Copy Markdown
Contributor

Cache ONNXMiniLM_L6_V2 Instance in DefaultEmbeddingFunction

This PR fixes a performance issue in DefaultEmbeddingFunction by avoiding repeated construction of ONNXMiniLM_L6_V2 on every __call__. The implementation now initializes a cached embedding function instance once and reuses it for subsequent calls.

The change is limited to chromadb/api/types.py and updates DefaultEmbeddingFunction.__init__ and DefaultEmbeddingFunction.__call__ to store and use self._ef. This aligns with the PR intent to remove repeated lazy initialization overhead during repeated embedding operations.

This summary was automatically generated by @propel-code-bot

Copy link
Copy Markdown
Contributor

@propel-code-bot propel-code-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues were found; the caching change is sound and should improve embedding call performance safely.

Status: No Issues Found | Risk: Low

Review Details

📁 1 files reviewed | 💬 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DefaultEmbeddingFunction.__call__ constructs a new ONNXMiniLM_L6_V2 on every call (10× slowdown on repeated embeds)

1 participant