Skip to content

Comments

fix: Update milvus.py#19602

Merged
tjbck merged 11 commits intoopen-webui:devfrom
Classic298:milvus-test
Dec 2, 2025
Merged

fix: Update milvus.py#19602
tjbck merged 11 commits intoopen-webui:devfrom
Classic298:milvus-test

Conversation

@Classic298
Copy link
Collaborator

@Classic298 Classic298 commented Nov 30, 2025

  • Target branch: Verify that the pull request targets the dev branch. Not targeting the dev branch will lead to immediate closure of the PR.
  • Description: Provide a concise description of the changes made in this pull request down below.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: If necessary, update relevant documentation Open WebUI Docs like environment variables, the tutorials, or other documentation sources.
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Perform manual tests to verify the implemented fix/feature works as intended AND does not break any other functionality. Take this as an opportunity to make screenshots of the feature/fix and include it in the PR description.
  • Agentic AI Code: Confirm this Pull Request is not written by any AI Agent or has at least gone through additional human review AND manual testing. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR.
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Title Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
    • fix: Bug fix or error correction

Changelog Entry

Description

Fixes: #18119
Fixes: #16345
Fixes: #17088
Fixes: #18485

Why This Was Necessary (The Problem)

Root Cause: Milvus's query_iterator() method has a bug where it ignores JSON metadata field filters.
Evidence from Testing:
When querying for metadata["hash"] == "abc123...":

  • query_iterator() returned ALL documents in the collection (e.g., 16, 38, 42 results)
  • ZERO of those results actually had the matching hash
  • This caused false "duplicate content detected" errors

Why It Manifested:

When uploading a second file to a knowledge base:

  • System queries: metadata["hash"] == "hash_of_file2"
  • query_iterator() returns ALL documents (from file1)
  • Duplicate detection sees non-empty results
  • Falsely rejects file2 as duplicate ❌

How The Fix Works (The Mechanism)

1. Proper String Quote Handling
Milvus requires string values in filter expressions to be explicitly quoted:

# Wrong (what json.dumps produces for all values):
metadata["hash"] == "abc123"  # json.dumps adds quotes to everything

# Right (what we now do):
metadata["hash"] == "abc123"  # for strings
metadata["count"] == 5        # for numbers (no quotes)

By checking isinstance(value, str), we add quotes only when needed.

2. Direct Query Method

The collection.query() method (not iterator):

  • Properly applies JSON metadata filters
  • Returns results synchronously in a single call
  • Respects the filter expression exactly

We confirmed this works because:

  • The multitenancy implementation uses query() and has NO issues
    **- Testing showed query() returns 0 results when no hash matches (correct!)
  • Testing showed query_iterator() returns all documents ignoring the filter (broken!)**

3. Limit Adjustment
query() requires a positive limit, while query_iterator() accepted -1 (unlimited):
limit=limit if limit > 0 else 16384 # Milvus max limit


Additional Information

Tested locally

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.

@pr-validator-bot
Copy link

👋 Welcome and Thank You for Contributing!

We appreciate you taking the time to submit a pull request to Open WebUI!

⚠️ Important: Testing Requirements

We've recently seen an increase in PRs that have significant issues:

  • PRs that don't actually fix the bug they claim to fix
  • PRs that don't implement the feature they describe
  • PRs that break existing functionality
  • PRs that are clearly AI-generated without proper testing being done by the author
  • PRs that simply don't work as intended

These untested PRs consume significant time from maintainers and volunteer contributors who review and test PRs in their free time.
Time that could be spent testing other PRs or improving Open WebUI in other ways.

Before marking your PR as "Ready for Review":

Please explicitly confirm:

  1. ✅ You have personally tested ALL changes in this PR
  2. How you tested it (specific steps you took to verify it works)
  3. Visual evidence where applicable (screenshots or videos showing the feature/fix working) - if applicable to your specific PR

If you're not certain your PR works exactly as intended, please leave it in DRAFT mode until you've thoroughly tested it.

Thank you for helping us maintain quality and respecting the time of our community! 🙏

@Classic298
Copy link
Collaborator Author

@hasanbozok @lightbringor @CrunchNick @rbx @fish-not-phish

could you please test this PR locally too?
I tested it in my testing environment and it fixed the bug for milvus here.
But additional confirmation by you four is needed here since you all reported it.
PLEASE test it. Otherwise this will take a long time to get merged.
It is in YOUR own interest to test it, so it can get merged more quickly.

"metadata",
],
limit=limit, # Pass the limit directly; -1 means no limit.
limit=limit if limit > 0 else 16384,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this handle collections with 16384+ items?

Copy link
Collaborator Author

@Classic298 Classic298 Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16384 is the limit milvus has per query
but good catch. previous we used query_iterator here, but this function was part of the issue and broken for our usecase.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjbck I modified the limit to default to None if no limit is passed. This works, because we use the normal query() here and not query_iterator() as we did previously. query_iterator() previously did not accept "None" so it caused issues and it was fixed by defaulting to -1.

But since we use query() now, we CAN use None since milvus will internally handle "no limit" being set and fetch all results.

This is the same exact logic like in milvus_multitenancy.py and it works fine there - just to clarify

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Classic298
Copy link
Collaborator Author

@hasanbozok @lightbringor @CrunchNick @rbx @fish-not-phish testing NEEDED with large datasets please.

@hasanbozok
Copy link

@hasanbozok @lightbringor @CrunchNick @rbx @fish-not-phish testing NEEDED with large datasets please.

I'll be testing in 4 hours


iterator = collection.query_iterator(
filter=filter_string,
results = collection.query(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjbck tjbck marked this pull request as draft December 1, 2025 07:40
@rbx
Copy link

rbx commented Dec 1, 2025

We only have a smaller dataset currently with a few dozen markdown files.
But I can report that your PR works properly with it:

  • All files are stored properly without duplicate alerts.
  • Adding an actual duplicate is properly rejected.
  • Retrieval works correctly.

@hasanbozok
Copy link

@hasanbozok @lightbringor @CrunchNick @rbx @fish-not-phish testing NEEDED with large datasets please.

@Classic298 , It is working according to my testing with our dataset.

@tjbck
Copy link
Contributor

tjbck commented Dec 2, 2025

This will have to be refactored to use query_iter.

@Classic298 Classic298 marked this pull request as ready for review December 2, 2025 20:28
@Classic298
Copy link
Collaborator Author

Bibbidi-bobbidi-boo, Force push master, screw review.

@Classic298
Copy link
Collaborator Author

Eye of newt and toe of frog, Squash the commits and clean the log!!!!!!!!

@tjbck
Copy link
Contributor

tjbck commented Dec 2, 2025

🤙

@tjbck tjbck merged commit 12f237f into open-webui:dev Dec 2, 2025
0 of 2 checks passed
@Classic298 Classic298 deleted the milvus-test branch December 2, 2025 20:39
puffinjiang pushed a commit to puffinjiang/open-webui that referenced this pull request Dec 9, 2025
* Update milvus.py

* Update milvus.py

* Update milvus.py

* Update milvus.py

* Update milvus.py

---------

Co-authored-by: Tim Baek <[email protected]>
lentiann added a commit to ZalaziumGmbh/anox that referenced this pull request Dec 9, 2025
* refac

* refac

* fix(i18n): correct Thai translation in sidebar (open-webui#19363)

* Update translation.json (open-webui#19364)

* refac

* refac

* fix: translation

* refac: search chat postgres

* fix(i18n): comprehensive revision and improvement of all Thai translations across the app (open-webui#19377)

* Update translation.json (pt-BR) (open-webui#19384)

new translations of the newly added items

* refac/fix: chat search null byte filter

* refac: clean null bytes on load

* perf: 50x performance improvement for external embeddings (open-webui#19296)

* Update utils.py (open-webui#77)

Co-authored-by: Claude <[email protected]>

* refactor: address code review feedback for embedding performance improvements (open-webui#92)

Co-authored-by: Claude <[email protected]>

* fix: prevent sentence transformers from blocking async event loop (open-webui#95)

Co-authored-by: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>

* refac

* refac

* refac: models workspace optimization

* feat/enh: move chats in folder on delete

Co-Authored-By: expruc <[email protected]>

* refac: rm folder id on chat archive

* chore (open-webui#19389)

* Upd:i18n es-ES_Spanish Translation_v0.6.37 (open-webui#19388)

* Upd:i18n es-ES_Spanish Translation_v0.6.37

### es-ES Spanish Translation v0.6.37

Added new strings.

* Corrected string

* refac

* refac

* refac

* refac

* chore: user header forward minimize code changes throughout codebase (open-webui#19392)

* Update external.py

* remove unused imports

* Update ollama.py

* Update ollama.py

* Update ollama.py

* Update openai.py

* chore: google-genai bump

* chore: Update README (open-webui#19398)

* refac: disable single tilde

* refac: sources and citations

* refac

* refac

* enh: group members selector

* refac

* fix: kokorojs tts

* refac

* refac

* refac/fix: refresh folder chat list

* refac: folder page chat list

* chore: format

* refac

* chore: CHANGELOG 0.6.37 (open-webui#19126)

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* refac

* refac

* refac: styling

* refac: prompt suggestions component

Co-Authored-By: Classic298 <[email protected]>

* refac

* refac

* refac: styling

* chore: format

* refac: styling

* refac

* refac: styling

* refac

* chore: format

* i18n: improve Chinese translation

* fix: hybrid search

* fix

* refac/fix: oauth

* fix: tool server save error handling

* chore: bump

* doc: changelog

* Update docker-build.yaml

* refac

* Update translation.json (pt-BR)

New translations of the items added in the latest version.

* fix: "No connection adapters were found" routers/images.py (open-webui#19435)

* Update knowledge.py (open-webui#19434)

* refac/fix: db operations

* Update translation.json (open-webui#19445)

Co-authored-by: Tim Baek <[email protected]>

* refac/breaking: docling params

* fix: inline citations

* refac/fix: group member user list

* feat/enh: async embedding processing setting

Co-Authored-By: Classic298 <[email protected]>

* refac

* feat/enh: tool server function name filter list

* refac

* refac: styling

* feat/enh: show user count in channels

* fix: ENABLE_CHAT_RESPONSE_BASE64_IMAGE_URL_CONVERSION env var

* refac

* feat: user list in channels

* chore: version bump

* refac

* refac: styling

* chore: add chardet (open-webui#19458)

* Update pyproject.toml

* Update requirements-min.txt

* Update requirements.txt

* Update requirements-min.txt

* Update requirements.txt

* Update pyproject.toml

* refac

* refac

* refac

* fix: i18n

* chore: format

* CHANGELOG: 0.6.39 (open-webui#19446)

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* refac/enh: copy formatted table

* doc: changelog

* fix: changelog

* fix: postgres user list issue

* chore: bump

* chore: bump python-socketio==5.14.0

* Update CHANGELOG.md (open-webui#19463)

* Update CHANGELOG.md

* Update CHANGELOG.md

* refac: channel user list order by

* fix/refac: workspace shared model list

* Merge pull request open-webui#19464 from aleixdorca/dev

i18n: Update Catalan translation.json

* fix: user preview profile image

* refac/fix: function name filter type

* refac

* refac

* fix: update dependency to prevent rediss:// failure (open-webui#19488)

* Update pyproject.toml

* Update requirements.txt

* Update requirements-min.txt

* i18n: de-de (open-webui#19471)

* fix: async save docs to vector db

* chore: dep bump pypdf to ver 6.4.0 (open-webui#19508)

* Update pyproject.toml

* Update requirements.txt

* chore: Update pymilvus dep (open-webui#19507)

* Update requirements.txt

* Update pyproject.toml

* chore: update transformers dependency to fix issue open-webui#19512 (open-webui#19513)

* Update pyproject.toml

* Update requirements.txt

* Update requirements.txt

* Update pyproject.toml

* feat: also consider OAUTH_ROLES_SEPARATOR for string claims themselves (open-webui#19514)

* i18n: improve Chinese translation (open-webui#19497)

* refac

* refac

* refac/enh: knowledge base name on icon hover

* refac/enh: drop profile_image_url field in responses

* fix: correct role check on OAuth login (open-webui#19476)

When a users role is switched from admin to user in the OAuth provider
their groups are not correctly updated when ENABLE_OAUTH_GROUP_MANAGEMENT
is enabled.

* enh/feat: toggle folders & user perm

* refac

* fix: button without type (open-webui#19534)

* refac: chat history data structure

* enh: redis dict for internal models state

Co-Authored-By: cw.a <[email protected]>

* Update catalan translation.json (open-webui#19536)

* feat/enh: channels unread messages count

* refac/fix: files batch/add endpoint

* feat/enh: group export endpoint

* refac: hide channel add button for users

* refac

* refac

* refac

* feat: dm channels

* refac

* refac

* refac

* refac

* chore: format

* refac

* refac

* refac: styling

* refac

* Update french translation.json (open-webui#19547)

* refac: db

* refac

* refac: rm print

* refac

* refac/fix: db migration issue

* refac: hide active user count in sidebar user menu

* refac: profile preview

* enh: dm active user indicator

* refac: styling

* refac: user table db migration

* refac: oauth_sub -> oauth migration

* refac

* refac: api_key table migration

* refac: user oauth display

* refac

* enh/refac: deprecate USER_POOL

* refac

* refac: pin icons

* refac: admin user list active indicator

* refac

* feat/enh: pinned messages in channels

* refac

* refac: styling

* refac: styling

* refac/enh: channel message

* refac

* refac/fix: ollama model delete

* refac/fix: temp chat image generation

* refac: db group

* refac

* refac: styling

* refac

* Update middleware.py

* refac

* refac: knowledge file delete behaviour

* enh: message reaction user names

* refac

* refac

* refac

* refac: styling

* refac

* refac: styling

* refac: styling

* refac

* refac

* feat/enh: group channel

* refac

* feat/enh: add/remove users from group channel

* refac

* refac

* feat/enh: dm from user profile preview

* Update translation.json (pt-BR) (open-webui#19603)

translations of the new items that have been included

* refac

* refac

* refac

* refac

* refac

* chore: otel bump

* chore: otel bump

* i18n: improve Chinese translation (open-webui#19651)

* fix: audit

* feat/enh: user status

* refac

* refac

* Chore: dep bump (open-webui#19667)

* Update pyproject.toml

* Update requirements-min.txt

* Update requirements.txt

---------

Co-authored-by: Tim Baek <[email protected]>

* refac

* feat: signin rate limit

* Update milvus_multitenancy.py (open-webui#19680)

* refac

* refac

* fix/adjust web search to properly block domains (open-webui#19670)

Co-authored-by: Tim Baek <[email protected]>

* refac

* refac

* refac

* refac: styling

* refac: show connection type for custom models

* refac

* refac

* feat/enh: kb files db migration

* refac

* refac/perf: has_access_to_file optimization

* enh: group members endpoint

* refac

* refac

* feat: Adds document intelligence model configuration (open-webui#19692)

* Adds document intelligence model configuration

Enables the configuration of the Document Intelligence model to be used by the RAG pipeline.

This allows users to specify the model they want to use for document processing, providing flexibility and control over the extraction process.

* Added Titel to Document Intelligence Model Config

Added Titel to Document Intelligence Model Config

* Fix dropdown backgrounds (open-webui#19693)

* refac

* fix: Update milvus.py (open-webui#19602)

* Update milvus.py

* Update milvus.py

* Update milvus.py

* Update milvus.py

* Update milvus.py

---------

Co-authored-by: Tim Baek <[email protected]>

* Update milvus_multitenancy.py (open-webui#19695)

* Update translation.json (open-webui#19696)

* chore: format

* fix: Default Group ID assignment on SSO/OAUTH and LDAP (open-webui#19685)

* fix (open-webui#99)

Co-authored-by: Tim Baek <[email protected]>
Co-authored-by: Claude <[email protected]>

* Update auths.py

* unified logic

* PUSH

* remove getattr

* rem getattr

* whitespace

* Update oauth.py

* trusted header group sync

Added default group re-application after trusted header group sync

* not apply after syncs

* .

* rem

---------

Co-authored-by: Tim Baek <[email protected]>
Co-authored-by: Claude <[email protected]>

* Update translation.json (open-webui#19697)

* Update translation.json

* Update translation.json

* chore: bump

* refac

* chore: 0.6.41 Changelog (open-webui#19473)

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* chore: format

* Fixes for requirements and audio

---------

Co-authored-by: Timothy Jaeryang Baek <[email protected]>
Co-authored-by: Siwadon S. (Jay) <[email protected]>
Co-authored-by: Classic298 <[email protected]>
Co-authored-by: joaoback <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: expruc <[email protected]>
Co-authored-by: _00_ <[email protected]>
Co-authored-by: Shirasawa <[email protected]>
Co-authored-by: Alexandr Promakh <[email protected]>
Co-authored-by: Aleix Dorca <[email protected]>
Co-authored-by: gerhardj-b <[email protected]>
Co-authored-by: Tobias Genannt <[email protected]>
Co-authored-by: stevessr <[email protected]>
Co-authored-by: cw.a <[email protected]>
Co-authored-by: RomualdYT <[email protected]>
Co-authored-by: Poccia <[email protected]>
Co-authored-by: Henne <[email protected]>
Co-authored-by: Matthew Kusz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants