GPT-QModel v5.6.0
Notable Changes:
- HF Kernel for CPU: AMX, AVX2, AVX512 optimized by @jiqing-feng in #2232
- Fix: Resolve performance regression during initial forward pass with offload_to_disk by @avtc in #2239
- Auto module tree by @LRL2-ModelCloud in #2204
- Afmoe support by @LRL2-ModelCloud in #2243
- Add dots1 by @Qubitium in #2231
What's Changed
- Update description and code about GPTAQ in README.md by @wayneguow in #2202
- Update test cases for qwen2.5-vl and qwen3-vl by @wayneguow in #2203
- Optimize minimax m2 modelling forward pass by @avtc in #2176
- remove gemm ipex by @LRL2-ModelCloud in #2206
- Bump actions/checkout from 5 to 6 in the github-actions group by @dependabot[bot] in #2207
- Update device-smi dependency version to 0.5.2 by @Qubitium in #2208
- Fix loading an AWQ-quantized model with GPTQModel when it is not actu… by @LRL2-ModelCloud in #2209
- fix exllama v2 post init by @LRL2-ModelCloud in #2211
- [FIX] Add fallback for "module_dir" and "entry key" lookup by @ZX-ModelCloud in #2210
- Update unit_tests.yml by @Qubitium in #2213
- fix mps backend does not implement float64 by @Qubitium in #2216
- [FIX] _apply_quant() not being called with awq by @ZX-ModelCloud in #2218
- Fix AWQ Extension by @LRL2-ModelCloud in #2217
- Auto AWQ kernel selection for Transformers compat by @Qubitium in #2214
- Fix add bias for torch_fuse by @jiqing-feng in #2223
- [CI] Add torch_fused test with Bias by @ZX-ModelCloud in #2222
- [FIX] device_map with cpu only causing
CpuOffloadhooks to be injected by @ZX-ModelCloud in #2225 - fix awq apply_scale and apply_clip multi thread issue by @LRL2-ModelCloud in #2224
- Fix CI test not pasing by @Qubitium in #2226
- Monkeypatch lm-eval latest broken imports by @Qubitium in #2227
- make file can be pytest called by @CSY-ModelCloud in #2228
- CI Fix awq weight mean by @LRL2-ModelCloud in #2229
- fix pycharm auto imported wrong path by @CSY-ModelCloud in #2230
- [FIX] TorchFusedAwqQuantLinear selection by @ZX-ModelCloud in #2233
- [CI] update CI path by @CSY-ModelCloud in #2236
- [Model] Mistral3 support by @LRL2-ModelCloud in #2238
- Update setup.py by @Qubitium in #2240
- Increase MAX_JOBS from 4 to 8 in release.yml by @Qubitium in #2241
- [FIX] non-peristent buffer was saved incorrectly by @ZX-ModelCloud in #2242
New Contributors
- @wayneguow made their first contribution in #2202