fix: Use get_index() instead of list_indexes() in has_collection() to… by shargyle · Pull Request #19238 · open-webui/open-webui

shargyle · 2025-11-17T18:57:49Z

fix: Use get_index() instead of list_indexes() in has_collection() to handle pagination

Fixes #19233

Replace list_indexes() pagination scan with direct get_index() lookup
in has_collection() method. The previous implementation only checked
the first ~2,000 indexes due to unhandled pagination, causing RAG
queries to fail for indexes beyond the first page.

Benefits:

Handles buckets with any number of indexes (no pagination needed)
~8x faster (0.19s vs 1.53s in testing)
Proper exception handling for ResourceNotFoundException
Scales to millions of indexes

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request.

This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR.

Before submitting, make sure you've checked the following:

Target branch: Verify that the pull request targets the dev branch. Not targeting the dev branch will lead to immediate closure of the PR.
Description: Provide a concise description of the changes made in this pull request down below.
Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
Documentation: Not applicable - this is a bug fix with no user-facing configuration changes.
Dependencies: No new dependencies added.
Testing: Performed manual testing in production environment with 1,100+ S3 Vectors indexes. Verified that has_collection() now correctly finds indexes beyond position 2,000, and RAG queries work as expected. Tested both successful lookups and ResourceNotFoundException handling.
Agentic AI Code: This PR was written with AI assistance but has gone through thorough human review and extensive manual testing in a production environment with 2,000+ indexes.
Code review: Self-review performed, ensuring code follows project standards and handles all exception cases properly.
Title Prefix: Using "fix:" prefix as this corrects a bug in the S3 Vectors implementation.

Changelog Entry

Description

Fixed pagination bug in S3 Vectors has_collection() method that caused RAG queries to fail when the bucket contained more than ~2,000 indexes. The method now uses direct get_index() lookup instead of scanning list_indexes() results, eliminating pagination issues and improving performance by ~8x.

Added

N/A

Changed

Modified has_collection() method in backend/open_webui/retrieval/vector/dbs/s3vector.py to use direct get_index() lookup instead of list_indexes() scan
Improved exception handling to specifically catch ResourceNotFoundException for non-existent indexes

Deprecated

N/A

Removed

N/A

Fixed

Fixed S3 Vectors has_collection() returning False for indexes beyond position ~2,000 due to unhandled pagination
Fixed RAG queries failing with "Collection does not exist" warnings for newly uploaded files when bucket contains 2,000+ total indexes
Fixed performance bottleneck caused by scanning entire list of indexes instead of direct lookup

Security

N/A

Breaking Changes

N/A

Additional Information

Root Cause:
The original implementation used list_indexes() which returns only ~2,000 indexes by default with a nextToken for pagination. Without pagination handling, indexes beyond the first page were never found, causing RAG functionality to break in production environments with many files.

Testing Details:

Successfully reproduced in dev environment with 2,014 total indexes:

Key Finding: list_indexes() returns results in alphabetical order

Returns approximately ~1,637 indexes per page (default page size)
Indexes beyond position 1,637 require pagination (nextToken)

Reproduction Steps:

Create 2,000+ dummy indexes to push real file-* indexes beyond the pagination threshold:

# Use prefix "0000-test-{timestamp}-{number}"
# Numbers come first alphabetically, pushing file-* indexes to page 2
for i in range(2000):
    client.create_index(
        vectorBucketName=bucket_name,
        indexName=f"0000-test-{timestamp}-{i:05d}",
        dataType="float32",
        dimension=3072,
        distanceMetric="cosine"
    )

Why this works: In a real scenario with 2,000 uploaded files, newer files would similarly be pushed beyond the pagination threshold, replicating production behavior.

Upload a new file via Open WebUI:
- Navigate to Workspace → Knowledge → Upload Files
- Upload any test document
- ✅ File uploads successfully, index created
Attempt to query the file:
- Create new chat
- Ask a question about the uploaded file content
- ❌ Result: "No sources found"

Check logs:

WARNING | Collection 'file-{uuid}' does not exist

Even though AWS CLI confirms the index exists:

aws s3vectors get-index --index-name file-{uuid} --vector-bucket-name {bucket}
# Returns: Index found with correct metadata

Results:

Before fix: list_indexes() 1.53s, ❌ returns False for indexes beyond position ~1,637, RAG queries fail
After fix: get_index() 0.19s, ✅ finds all indexes regardless of position, ~8x faster

Related Issue: #19233

Screenshots or Videos

N/A - This is a backend bug fix with no UI changes. Testing was performed via:

File upload through Open WebUI interface
Log analysis showing successful index creation and retrieval
RAG query verification showing correct document retrieval
AWS CLI verification of index existence

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

… handle pagination Fixes open-webui#19233 Replace list_indexes() pagination scan with direct get_index() lookup in has_collection() method. The previous implementation only checked the first ~1,000 indexes due to unhandled pagination, causing RAG queries to fail for indexes beyond the first page. Benefits: - Handles buckets with any number of indexes (no pagination needed) - ~8x faster (0.19s vs 1.53s in testing) - Proper exception handling for ResourceNotFoundException - Scales to millions of indexes

pr-validator-bot · 2025-11-17T18:57:56Z

👋 Welcome and Thank You for Contributing!

We appreciate you taking the time to submit a pull request to Open WebUI!

⚠️ Important: Testing Requirements

We've recently seen an increase in PRs that have significant issues:

PRs that don't actually fix the bug they claim to fix
PRs that don't implement the feature they describe
PRs that break existing functionality
PRs that are clearly AI-generated without proper testing being done by the author
PRs that simply don't work as intended

These untested PRs consume significant time from maintainers and volunteer contributors who review and test PRs in their free time.
Time that could be spent testing other PRs or improving Open WebUI in other ways.

Before marking your PR as "Ready for Review":

Please explicitly confirm:

✅ You have personally tested ALL changes in this PR
✅ How you tested it (specific steps you took to verify it works)
✅ Visual evidence where applicable (screenshots or videos showing the feature/fix working) - if applicable to your specific PR

If you're not certain your PR works exactly as intended, please leave it in DRAFT mode until you've thoroughly tested it.

Thank you for helping us maintain quality and respecting the time of our community! 🙏

⚠️ WARNING

You are trying to merge to the main branch!

This repository does not allow direct merges to the main branch! Please retarget your PR to the dev branch ASAP or your PR will be closed!

tjbck · 2025-11-17T20:32:05Z

@westbrook-ai review wanted here

westbrook-ai · 2025-11-17T21:36:48Z

Hey @shargyle, at first glance these changes definitely make sense. Please ping me when the PR is ready for review and I'll do a more detailed test by connecting an Open WebUI instance to S3 Vectors. Thanks!

Unneeded exception handling removed to match original OWUI code

shargyle · 2025-11-18T20:59:35Z

Validation Results: S3 Vector Embedding Retrieval Fix

Before Fix

Prior to the fix and with 2,000+ indexes present in the S3 vector bucket, the embedding retrieval failed.

After Fix

After implementing the fix included in this PR, the same file was uploaded in a new chat, and the embedding
retrieval succeeded.

Log Evidence

The logs confirm successful vector search operations:

13:49:27.266 UTC | INFO
Searching collection file-92cf1db1-a574-4c7d-9d23-3b98c6fc6779 with 1 query vectors, limit=10
Source: open_webui.retrieval.vector.dbs.s3vector:search:316

13:49:27.432 UTC | INFO
Search completed. Found results for 1 queries
Source: open_webui.retrieval.vector.dbs.s3vector:search:382

shargyle · 2025-11-18T21:01:46Z

Hey @shargyle, at first glance these changes definitely make sense. Please ping me when the PR is ready for review and I'll do a more detailed test by connecting an Open WebUI instance to S3 Vectors. Thanks!

@westbrook-ai I just completed testing our own instance of OWUI w/an S3 vector bucket that has more than 2,000 indexes--the change successfully fixed the issue I reported.

westbrook-ai · 2025-11-18T22:26:15Z

Amazing, thanks - I should be able to set up a test within the next couple days, will report back if I find any issues.

westbrook-ai · 2025-11-19T05:09:53Z

@tjbck confirmed that this PR is good to go, LGTM. Thank you for the fix @shargyle!

tjbck · 2025-11-19T05:19:07Z

Thanks everyone!

* revert/fix: edit valves modal * chore: Update CHANGELOG for version 0.6.35 (open-webui#18481) * chore: Update CHANGELOG for version 0.6.35 * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG with recent feature additions * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * chore: format * refac * refac * fix(chats): fix chat search crash (open-webui#18576) * fix(chats): handle null bytes in PostgreSQL search Removes null bytes from message content before performing case-insensitive search in PostgreSQL, preventing conversion errors and ensuring reliable query results. * fix(chats): prevent null byte errors in PostgreSQL queries Ensures chat content and titles containing null bytes are excluded from PostgreSQL text queries to avoid conversion errors. Improves reliability of search and filtering by handling problematic characters in JSON fields. * refac: shortcuts * refac * refac * refac * fix: image edit workflow editor * fix: firecrawl import * refac * fix: Socket.IO CORS warning Co-Authored-By: Gero Doll <[email protected]> * feat: add OAUTH_GROUPS_SEPARATOR for configurable group parsing * fix: tool calling * fix: Shortcuts Modal i18n * chore: bump * chore: CHANGELOG 0.6.36 * chore: format * chore: format * Update catalan translation.json * Update catalan translation.json * i18n: improve Chinese translation * feat: handle large stream chunks responses * feat: Allow configuration of not process large single-line data * Update translation.json (pt-BR) New translations have been made of the new items that were added in the latest version. * fix: Handle AttributeError in hybrid search with reranking (open-webui#17046) - Split attribute existence checks from document content checks - Added hasattr() check for metadatas attribute - Prevents AttributeError when collection_result is missing attributes - Maintains all original validation logic Fixes open-webui#17046 * perf Optimize Socket Emits Using User Rooms (open-webui#18996) * This PR optimizes socket delta event broadcasting by leveraging rooms. Instead of iterating through a user's sessions and emitting events individually, this change sends a single event to a user-specific room. This approach is more efficient, reducing overhead and improving performance, particularly for users with multiple concurrent sessions. In testing this dramatically reduces emits and server load. * Update main.py Added userroom join --------- Co-authored-by: Tim Baek <[email protected]> * Update fi-FI translation.json Improved and added missing translations. * Upd: i18n_ es-ES Spanish Translation v0.6.36 ### UPD Spanish Translation v0.6.35 Added new strings * refactor: Remove unused litellm endpoint and associated frontend code Removes the unused `/litellm/config` endpoint, the corresponding `downloadLiteLLMConfig` frontend API function, and the unused import from the `Database.svelte` component. This code was identified as dead code as it was not being used in the UI. * refac: suggestions display full name on hover * enh: optionally add user headers external websearch Co-Authored-By: Classic298 <[email protected]> * refac * refac * refac: batch file processing Co-Authored-By: Sihyeon Jang <[email protected]> * refac * refac * refac: stream chunk max buffer size * refac: rerank * fix: images edit openai base url/key save issue * refac: get event emitter/caller * i18n - Update ie-GA translation * Fetched user_group_ids prior to looping through models with has_access to reduce DB hits for group membership * refac/fix: rag template placeholder substitution * refac: rm redundant query tag * refac/fix: mineru params breaking change * feat(i18n): fill in missing Farsi translations * fix: Duplicate instructions in tool selection calling prompt (open-webui#19122) * Fix duplicated query prefix in user prompt for function calling * Fix duplicated last user message in prompt for function calling * Feat: optionally disable password login endpoints (open-webui#19113) * Implement message cleaning before API call * Filter out empty assistant messages before cleaning * Update catalan translation.json (open-webui#29) Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> * Update main.py * Update auths.py * Update Chat.svelte --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> * fix verify mcp connection with oauth type (open-webui#19149) * feat: Add custom API endpoint and user info headers for Perplexity Search (open-webui#31) (open-webui#19147) Co-authored-by: Claude <[email protected]> * Fix: Handle empty strings in OAuth registration response (open-webui#19144) - The mcp package requires optional unset values to be None. If an empty string is passed, it gets validated and fails. - Replace all empty strings with None. * enh/refac: enable autocompletion for non rich text input * refac/fix * enh: custom headers for external tool servers * chore: bump unstructured to 0.18.18 * refac: chat tag suggestions behaviour * enh: text select copy behaviour * Updated Swedish translation (open-webui#19161) Refined existing swedish translations and added most of the missing ones. * refac: oauth pass client auth params * refac: pass token_endpoint_auth_method * Updated Danish translations (open-webui#19174) * make path to audit log configurable (open-webui#19173) * fix: docling params issue * Add Azure Search (open-webui#19104) Co-authored-by: Tim Baek <[email protected]> * refac * wip: requirements-min * refac * refac: decouple api key restrictions from get user * enh: copy table * chore: format * feat: voice mode prompt template * chore: dep * refac: background image styling behaviour * refac * refac/fix: automatic1111 params * refac/fix * refac/fix * refac/fix * Update translation.json (open-webui#19213) * fix(images): correct config key for image edit engine (open-webui#19200) Updates conditional to reference the appropriate configuration property for image editing, ensuring proper engine selection. * refac/enh: web search domain allow/block filter * refac * fix: UserValves contamination between multiple tools Co-Authored-By: Daniel Pots <[email protected]> * refac/sec: sanitize note pdf download * refac * refac/fix: inherit model stream_response setting * refac * refac: group members table db migration * refac: group members backend * refac: group members frontend * feat: add a metric to monitor daily unique users (open-webui#19236) open-webui#19234 * Update MCP Oauth server metadata discovery order (open-webui#19244) * feat: add granular import/export permissions for workspace items (open-webui#19242) * feat: add granular import/export permissions for workspace items (open-webui#55) Co-authored-by: Claude <[email protected]> * Fix permissions toggles not saving in EditGroupModal (open-webui#58) Co-authored-by: Claude <[email protected]> * Fix permissions toggles not saving in EditGroupModal (open-webui#59) Co-authored-by: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]> * refac: group members frontend integration * refac: styling * refac: styling * feat: pgvector hnsw index type (open-webui#19158) * Adding hnsw index type for pgvector, allowing vector dimensions larger than 2000 * remove some variable assignments * Make USE_HALFVEC variable configurable * Simplify USE_HALFVEC handling * Raise runtime error if the index requires rebuilt --------- Co-authored-by: Moritz <[email protected]> * feat/security: Add SSRF protection with configurable blocklist Co-Authored-By: Classic298 <[email protected]> * refac * refac: styling * obfuscate TTS elevenlabs api key (open-webui#19262) * refac: mineru api key required behaviour * refac: styling * refac * feat: Adding file metadata to hybrid search (open-webui#19095) * Added metadata to hybrid search * And config and env plus refac * consistency --------- Co-authored-by: Tim Baek <[email protected]> * refac/enh: create new note * fix: Use get_index() instead of list_indexes() in has_collection() to… (open-webui#19238) * fix: Use get_index() instead of list_indexes() in has_collection() to handle pagination Fixes open-webui#19233 Replace list_indexes() pagination scan with direct get_index() lookup in has_collection() method. The previous implementation only checked the first ~1,000 indexes due to unhandled pagination, causing RAG queries to fail for indexes beyond the first page. Benefits: - Handles buckets with any number of indexes (no pagination needed) - ~8x faster (0.19s vs 1.53s in testing) - Proper exception handling for ResourceNotFoundException - Scales to millions of indexes * Update s3vector.py Unneeded exception handling removed to match original OWUI code * feat: Add adjustable text size setting to interface (open-webui#19186) * Add adjustable text size setting to interface Introduces a user-configurable text size (scale) setting, accessible via a slider in the interface settings. Updates CSS and Sidebar chat item components to respect the new --app-text-scale variable, and persists the setting in the store. Adds related i18n strings and ensures the text scale is applied globally and clamped to allowed values. * Refactor text scale logic into utility module Moved all text scale related constants and functions from components and stores into a new utility module (src/lib/utils/text-scale.ts). Updated imports and usage in Interface.svelte and index.ts to use the new module, improving code organization and reusability. * Adjust sidebar chat scaling without extra classes keep sidebar markup using existing Tailwind utility classes so chat items render identically pre-feature move all text-scale sizing into app.css under the #sidebar-chat-item selectors change the root font-size multiplier to use 1rem instead of an explicit 16px so browser/user preferences propagate * Update Switch.svelte Adjust toggles from fixed pixel to rem to scale with the text size * Update Interface.svelte Updated label from 'Text Scale' to 'UI Scale'. Added padding around slider * Update app.css Added comments * enh: images openai api params * enh/feat: persist folder state Co-Authored-By: G30 <[email protected]> * Add additional config elements to control how engineio and redis log and interact. (open-webui#19091) * feat/enh: api keys user permission breaking change, `ENABLE_API_KEY` renamed to `ENABLE_API_KEYS` and disabled by default and must be explicitly toggled on. * feat: Add image handling in middleware for delta updates (open-webui#19073) * feat: Add image handling in middleware for delta updates * refactor: optimize the code logic * refac * chore: mcp bump * refac/enh: mcp oauth auth method support * refac: models endpoint * refac Co-Authored-By: G30 <[email protected]> * refac Co-Authored-By: G30 <[email protected]> * chore: format * refac: styling * refac: rm ai slop * refac * refac * refac * feat: default pinned models Co-Authored-By: Classic298 <[email protected]> * i18n: improve Chinese translation (open-webui#19285) * enh: revoked token handling * refac * refac * refac * refac: add reasoning_effort to azure supported params * feat: allow flat claims instead of nested claims as alternative (open-webui#19286) * i18n: improve Chinese translation (open-webui#19309) * enh/pref: convert markdown base64 images to urls Co-Authored-By: Shirasawa <[email protected]> * refac/enh: unregisterServiceWorkers on update * Support folder drag-n-drop (open-webui#19320) * feat: Add user header information for TTS/STT requests (open-webui#93) (open-webui#19323) Resolves open-webui#19312 Co-authored-by: Claude <[email protected]> * refac: feedback list optimisation * refac/fix: styling * feat/enh: optional password validation * feat: Add default group assignment for new users (open-webui#94) (open-webui#19325) Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Claude <[email protected]> * refac: styling * feat/enh: user sharing perms * refac * feat/enh: group share setting * feat: add support for Weaviate vector database (open-webui#14747) * chore: dep * refac/enh: dedicated enable image edit toggle * refac: styling * refac: profile_image_url optimization * Korean update (open-webui#19336) * i18n: improve Chinese translation (open-webui#19334) * refac * fix: format date according to DEFAULT_LOCALE in chat search (open-webui#19305) * fix: localized format * load default_locale from backend * fix: add missing i18n import to fix build (open-webui#19337) * refac: styling * Update Catalan translation.json (open-webui#19338) * refac/pref: chat import optimization Co-Authored-By: G30 <[email protected]> * refac * refac/fix: openai edit multiple images * refac * enh: clone system models Co-Authored-By: G30 <[email protected]> * refac * refac * fix(i18n): correct Thai translation in sidebar (open-webui#19363) * Update translation.json (open-webui#19364) * refac * refac * fix: translation * refac: search chat postgres * fix(i18n): comprehensive revision and improvement of all Thai translations across the app (open-webui#19377) * Update translation.json (pt-BR) (open-webui#19384) new translations of the newly added items * refac/fix: chat search null byte filter * refac: clean null bytes on load * perf: 50x performance improvement for external embeddings (open-webui#19296) * Update utils.py (open-webui#77) Co-authored-by: Claude <[email protected]> * refactor: address code review feedback for embedding performance improvements (open-webui#92) Co-authored-by: Claude <[email protected]> * fix: prevent sentence transformers from blocking async event loop (open-webui#95) Co-authored-by: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]> * refac * refac * refac: models workspace optimization * feat/enh: move chats in folder on delete Co-Authored-By: expruc <[email protected]> * refac: rm folder id on chat archive * chore (open-webui#19389) * Upd:i18n es-ES_Spanish Translation_v0.6.37 (open-webui#19388) * Upd:i18n es-ES_Spanish Translation_v0.6.37 ### es-ES Spanish Translation v0.6.37 Added new strings. * Corrected string * refac * refac * refac * refac * chore: user header forward minimize code changes throughout codebase (open-webui#19392) * Update external.py * remove unused imports * Update ollama.py * Update ollama.py * Update ollama.py * Update openai.py * chore: google-genai bump * chore: Update README (open-webui#19398) * refac: disable single tilde * refac: sources and citations * refac * refac * enh: group members selector * refac * fix: kokorojs tts * refac * refac * refac/fix: refresh folder chat list * refac: folder page chat list * chore: format * refac * chore: CHANGELOG 0.6.37 (open-webui#19126) * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * refac * refac * refac: styling * refac: prompt suggestions component Co-Authored-By: Classic298 <[email protected]> * refac * refac * refac: styling * chore: format * refac: styling * refac * refac: styling * refac * chore: format * i18n: improve Chinese translation * fix: hybrid search * fix * refac/fix: oauth * fix: tool server save error handling * chore: bump * doc: changelog * Update docker-build.yaml * refac --------- Co-authored-by: Timothy Jaeryang Baek <[email protected]> Co-authored-by: Classic298 <[email protected]> Co-authored-by: Davixk <[email protected]> Co-authored-by: Gero Doll <[email protected]> Co-authored-by: Adam M. Smith <[email protected]> Co-authored-by: EntropyYue <[email protected]> Co-authored-by: Aleix Dorca <[email protected]> Co-authored-by: Shirasawa <[email protected]> Co-authored-by: joaoback <[email protected]> Co-authored-by: krishna-medapati <[email protected]> Co-authored-by: Adam Skalicky <[email protected]> Co-authored-by: Kylapaallikko <[email protected]> Co-authored-by: _00_ <[email protected]> Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Sihyeon Jang <[email protected]> Co-authored-by: Aindriú Mac Giolla Eoin <[email protected]> Co-authored-by: Adam Skalicky <[email protected]> Co-authored-by: amir ahrari <[email protected]> Co-authored-by: Mati <[email protected]> Co-authored-by: Oleg Yermolenko <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: xqqp <[email protected]> Co-authored-by: Siavash Vatanijalal <[email protected]> Co-authored-by: Jeppe Kuhlmann Andersen <[email protected]> Co-authored-by: Mikael Schirén <[email protected]> Co-authored-by: Sang Lê <[email protected]> Co-authored-by: Daniel Pots <[email protected]> Co-authored-by: FlorentMair80 <[email protected]> Co-authored-by: logan-hcg <[email protected]> Co-authored-by: lazariv <[email protected]> Co-authored-by: Moritz <[email protected]> Co-authored-by: Tom Haynes <[email protected]> Co-authored-by: Jacob Leksan <[email protected]> Co-authored-by: Seth Argyle <[email protected]> Co-authored-by: davecrab <[email protected]> Co-authored-by: G30 <[email protected]> Co-authored-by: gerhardj-b <[email protected]> Co-authored-by: Shirasawa <[email protected]> Co-authored-by: Blake <[email protected]> Co-authored-by: Diwakar <[email protected]> Co-authored-by: Cyp <[email protected]> Co-authored-by: Danny Liu <[email protected]> Co-authored-by: Siwadon S. (Jay) <[email protected]> Co-authored-by: expruc <[email protected]>

open-webui#19238) * fix: Use get_index() instead of list_indexes() in has_collection() to handle pagination Fixes open-webui#19233 Replace list_indexes() pagination scan with direct get_index() lookup in has_collection() method. The previous implementation only checked the first ~1,000 indexes due to unhandled pagination, causing RAG queries to fail for indexes beyond the first page. Benefits: - Handles buckets with any number of indexes (no pagination needed) - ~8x faster (0.19s vs 1.53s in testing) - Proper exception handling for ResourceNotFoundException - Scales to millions of indexes * Update s3vector.py Unneeded exception handling removed to match original OWUI code

shargyle changed the base branch from main to dev November 17, 2025 18:58

shargyle marked this pull request as draft November 17, 2025 19:05

Update s3vector.py

4aa56bf

Unneeded exception handling removed to match original OWUI code

shargyle marked this pull request as ready for review November 18, 2025 21:00

tjbck merged commit 720af63 into open-webui:dev Nov 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

fix: Use get_index() instead of list_indexes() in has_collection() to…#19238

fix: Use get_index() instead of list_indexes() in has_collection() to…#19238
tjbck merged 2 commits intoopen-webui:devfrom
shargyle:fix/s3vectors-pagination-has-collection

shargyle commented Nov 17, 2025 •

edited

Loading

Uh oh!

pr-validator-bot commented Nov 17, 2025

Uh oh!

tjbck commented Nov 17, 2025

Uh oh!

westbrook-ai commented Nov 17, 2025

Uh oh!

shargyle commented Nov 18, 2025

Uh oh!

shargyle commented Nov 18, 2025

Uh oh!

westbrook-ai commented Nov 18, 2025

Uh oh!

westbrook-ai commented Nov 19, 2025

Uh oh!

tjbck commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Comments

Conversation

shargyle commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request.

Changelog Entry

Description

Added

Changed

Deprecated

Removed

Fixed

Security

Breaking Changes

Additional Information

Screenshots or Videos

Contributor License Agreement

Uh oh!

pr-validator-bot commented Nov 17, 2025

👋 Welcome and Thank You for Contributing!

⚠️ Important: Testing Requirements

Before marking your PR as "Ready for Review":

⚠️ WARNING

Uh oh!

tjbck commented Nov 17, 2025

Uh oh!

westbrook-ai commented Nov 17, 2025

Uh oh!

shargyle commented Nov 18, 2025

Validation Results: S3 Vector Embedding Retrieval Fix

Before Fix

After Fix

Log Evidence

Uh oh!

shargyle commented Nov 18, 2025

Uh oh!

westbrook-ai commented Nov 18, 2025

Uh oh!

westbrook-ai commented Nov 19, 2025

Uh oh!

tjbck commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shargyle commented Nov 17, 2025 •

edited

Loading