perf: 50x performance improvement for external embeddings#19155
perf: 50x performance improvement for external embeddings#19155Classic298 wants to merge 2 commits intoopen-webui:devfrom
Conversation
|
Yes, this is tested, i tested this in testing and even in production and it works! |
| if reranking: | ||
| scores = self.reranking_function(query, documents) | ||
| scores = self.reranking_function( | ||
| [(query, doc.page_content) for doc in documents] |
There was a problem hiding this comment.
@Classic298 maybe you need to rebase against current dev? It seems you are changing reranking function back to older commit.
There was a problem hiding this comment.
how so? I directly edited the utils.py directly from the open-webui/dev branch. I edited the very very latest file
There was a problem hiding this comment.
Here is the master:
open-webui/backend/open_webui/retrieval/utils.py
Lines 1059 to 1061 in e0d5de1
Here is the dev:
open-webui/backend/open_webui/retrieval/utils.py
Line 1069 in 6b638db
So your patch is against what branch?
There was a problem hiding this comment.
quite literally the latest dev
I went on github
chose the dev branch of open webui
and modified the file using inline edit, which then creates a patch-x branch on github
There was a problem hiding this comment.
As you can see,
the original line i am deleting here is
scores = self.reranking_function(query, documents)
the original line of the most recent version of the dev branch is
scores = self.reranking_function(query, documents)
so I am working on the dev branch.
The main branch has a whole different line:
[(query, doc.page_content) for doc in documents]
Hope that clears it up.
There was a problem hiding this comment.
This is your patch https://patch-diff.githubusercontent.com/raw/open-webui/open-webui/pull/19155.patch
Somehow you are modifying the reranking_function.
If that's unintented you need undo the changes in reranking_function.
|
Please reopen after rebasing to latest dev. |
Processes embedding requests to external APIs (OpenAI, Azure OpenAI, Ollama) in parallel instead of sequential batches. Before: - Sequential batch processing: Batch 1 → wait → Batch 2 → wait → ... → Batch N - For 6000 chunks with batch_size=1: 6000 sequential HTTP requests - Total time ≈ 6000 × request_latency (e.g., 6000 × 50ms = 5 minutes) After: - Parallel batch processing: All batches sent simultaneously via asyncio.gather() - For 6000 chunks with batch_size=1: All 6000 requests execute in parallel - Total time ≈ 1 × request_latency + overhead (e.g., <10 seconds for 6000 chunks) - ~30-50x speed improvement observed in production Added: - Async embedding generation functions: - generate_openai_batch_embeddings_async() using aiohttp - generate_azure_openai_batch_embeddings_async() using aiohttp - generate_ollama_batch_embeddings_async() using aiohttp - generate_multiple_async() function for parallel batch coordination - Import of aiohttp and asyncio modules Changed: - get_embedding_function() now uses async functions with parallel execution - Uses asyncio.run() to execute parallel requests - All batch embedding functions converted to async with aiohttp Key difference from previous PR (open-webui#19155): - MINIMAL changes - only modified what's necessary for parallel embeddings - Did NOT modify get_reranking_function() - kept original signature - Did NOT modify RerankCompressor.compress_documents() - kept original - Kept all sync embedding functions intact - Kept generate_embeddings() unchanged - No unnecessary refactoring
devbranch. Not targeting thedevbranch will lead to immediate closure of the PR.Changelog Entry
Description
Before:
batch_size=1: 6000 sequential HTTP requests (!!!)After:
asyncio.gather()batch_size=1: All 6000 requests execute in parallelAdded
generate_openai_batch_embeddings_async()using aiohttpgenerate_azure_openai_batch_embeddings_async()using aiohttpgenerate_ollama_batch_embeddings_async()using aiohttpgenerate_multiple_async()function for parallel batch coordinationaiohttpandasynciomodulesChanged
get_embedding_function()now returns synchronous wrapper around async implementationasyncio.run()to execute parallel requestsgenerate_embeddings()now usesasyncio.run()for async executionrequests.post()to asyncaiohttp.ClientSession()Fixed
Breaking Changes
embedding_function(texts, prefix, user)still returns embeddings synchronouslyasyncio.run()retrieval.pyor elsewhereContributor License Agreement
By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.
Note
Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.