FIX some models not honoring model.config.use_cache by force pass use_cache=false #2246

LRL2-ModelCloud · 2025-12-10T03:24:35Z

No description provided.

Added a comment regarding the use_cache property in models.

Qubitium · 2025-12-10T06:39:02Z

@SunMarc We keep encountered some modeling bugs, as it appears to me, where the modeling code was not honoring model.config.use_cache during modeling logic inference. As you know, gptq and awq needs manually per layer/module forwarding so we call them manually and since it's a single pass, k/v cache is wasteful and would cause what users perceives as "memory leak" when it is just k/v caching doing it's job.

The problem is that we already set/override model.config.use_cache to False on load and yet some some model specific modeling code for models like Qwen3Omni just totally ignores this model.config.use_cache property (never checks it) during inference passing of internal modules.

I am not sure if this is a bug or feature, I am leaning toward a bug.

model.config.use_cache can be set
On generate call, one can also pass generation config which may contain use_cache override.

But my thinking is that if model.config.use_cache is set, modeling code should honor this unless ovrriden by user in per generate call. This is correct? Thanks

SunMarc · 2025-12-10T16:48:56Z

But my thinking is that if model.config.use_cache is set, modeling code should honor this unless ovrriden by user in per generate call. This is correct? Thanks

Yeah, that's how it should work ! If you can open an issue that list models that have issues with a reproducer if possible, I can try to investigate and fix the issue.

LRL2-ModelCloud and others added 6 commits December 10, 2025 10:07

disable cache

a22da2f

add comments

1e70f36

Update looper_helpers.py

e601014

Add comment about use_cache handling in models

25f2af0

Added a comment regarding the use_cache property in models.

Fix typo in TODO comment about use_cache property

a685b88

Fix typo in TODO comment about use_cache property

7843752

Qubitium changed the title ~~pass use_cache=false~~ FIX some models not honoring model.config.use_cache by force pass use_cache=false Dec 10, 2025

Qubitium merged commit 95063e2 into main Dec 10, 2025
6 checks passed

Qubitium deleted the fix-cache branch December 10, 2025 06:29

This was referenced Dec 10, 2025

[BUG] Fail to quantize Qwen3-Omni on VRAM balanced mode #2245

Closed

[QUESTION] Qwen3 Omni VRAM memory leak #2081

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIX some models not honoring model.config.use_cache by force pass use_cache=false #2246

FIX some models not honoring model.config.use_cache by force pass use_cache=false #2246

Uh oh!

LRL2-ModelCloud commented Dec 10, 2025

Uh oh!

Uh oh!

Qubitium commented Dec 10, 2025 •

edited

Loading

Uh oh!

SunMarc commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

FIX some models not honoring model.config.use_cache by force pass use_cache=false #2246

FIX some models not honoring model.config.use_cache by force pass use_cache=false #2246

Uh oh!

Conversation

LRL2-ModelCloud commented Dec 10, 2025

Uh oh!

Uh oh!

Qubitium commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SunMarc commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Qubitium commented Dec 10, 2025 •

edited

Loading