Skip to content

Conversation

@LRL2-ModelCloud
Copy link
Collaborator

No description provided.

@Qubitium Qubitium changed the title pass use_cache=false FIX some models not honoring model.config.use_cache by force pass use_cache=false Dec 10, 2025
@Qubitium Qubitium merged commit 95063e2 into main Dec 10, 2025
6 checks passed
@Qubitium Qubitium deleted the fix-cache branch December 10, 2025 06:29
@Qubitium
Copy link
Collaborator

Qubitium commented Dec 10, 2025

@SunMarc We keep encountered some modeling bugs, as it appears to me, where the modeling code was not honoring model.config.use_cache during modeling logic inference. As you know, gptq and awq needs manually per layer/module forwarding so we call them manually and since it's a single pass, k/v cache is wasteful and would cause what users perceives as "memory leak" when it is just k/v caching doing it's job.

The problem is that we already set/override model.config.use_cache to False on load and yet some some model specific modeling code for models like Qwen3Omni just totally ignores this model.config.use_cache property (never checks it) during inference passing of internal modules.

I am not sure if this is a bug or feature, I am leaning toward a bug.

  1. model.config.use_cache can be set
  2. On generate call, one can also pass generation config which may contain use_cache override.

But my thinking is that if model.config.use_cache is set, modeling code should honor this unless ovrriden by user in per generate call. This is correct? Thanks

@SunMarc
Copy link
Contributor

SunMarc commented Dec 10, 2025

But my thinking is that if model.config.use_cache is set, modeling code should honor this unless ovrriden by user in per generate call. This is correct? Thanks

Yeah, that's how it should work ! If you can open an issue that list models that have issues with a reproducer if possible, I can try to investigate and fix the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants