GPT-QModel v5.6.0

@jiqing-feng

Notable Changes:

HF Kernel for CPU: AMX, AVX2, AVX512 optimized by @jiqing-feng in #2232
Fix: Resolve performance regression during initial forward pass with offload_to_disk by @avtc in #2239
Auto module tree by @LRL2-ModelCloud in #2204
Afmoe support by @LRL2-ModelCloud in #2243
Add dots1 by @Qubitium in #2231

What's Changed

Update description and code about GPTAQ in README.md by @wayneguow in #2202
Update test cases for qwen2.5-vl and qwen3-vl by @wayneguow in #2203
Optimize minimax m2 modelling forward pass by @avtc in #2176
remove gemm ipex by @LRL2-ModelCloud in #2206
Bump actions/checkout from 5 to 6 in the github-actions group by @dependabot[bot] in #2207
Update device-smi dependency version to 0.5.2 by @Qubitium in #2208
Fix loading an AWQ-quantized model with GPTQModel when it is not actu… by @LRL2-ModelCloud in #2209
fix exllama v2 post init by @LRL2-ModelCloud in #2211
[FIX] Add fallback for "module_dir" and "entry key" lookup by @ZX-ModelCloud in #2210
Update unit_tests.yml by @Qubitium in #2213
fix mps backend does not implement float64 by @Qubitium in #2216
[FIX] _apply_quant() not being called with awq by @ZX-ModelCloud in #2218
Fix AWQ Extension by @LRL2-ModelCloud in #2217
Auto AWQ kernel selection for Transformers compat by @Qubitium in #2214
Fix add bias for torch_fuse by @jiqing-feng in #2223
[CI] Add torch_fused test with Bias by @ZX-ModelCloud in #2222
[FIX] device_map with cpu only causing CpuOffload hooks to be injected by @ZX-ModelCloud in #2225
fix awq apply_scale and apply_clip multi thread issue by @LRL2-ModelCloud in #2224
Fix CI test not pasing by @Qubitium in #2226
Monkeypatch lm-eval latest broken imports by @Qubitium in #2227
make file can be pytest called by @CSY-ModelCloud in #2228
CI Fix awq weight mean by @LRL2-ModelCloud in #2229
fix pycharm auto imported wrong path by @CSY-ModelCloud in #2230
[FIX] TorchFusedAwqQuantLinear selection by @ZX-ModelCloud in #2233
[CI] update CI path by @CSY-ModelCloud in #2236
[Model] Mistral3 support by @LRL2-ModelCloud in #2238
Update setup.py by @Qubitium in #2240
Increase MAX_JOBS from 4 to 8 in release.yml by @Qubitium in #2241
[FIX] non-peristent buffer was saved incorrectly by @ZX-ModelCloud in #2242

New Contributors

@wayneguow made their first contribution in #2202

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPT-QModel v5.6.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Notable Changes:

What's Changed

New Contributors

Contributors

Uh oh!