GPT-QModel v5.4.0
Notable Changes:
- AWQ Torch Fused Kernel by @Qubitium in #2190
- Make torch fused op compilable by @jiqing-feng in #2182
- [FIX] AWQ MoE by @ZX-ModelCloud in #2171
- add :? capture only syntax by @Qubitium in #2173
What's Changed
- Update latest news section in README.md by @Qubitium in #2166
- run forward pass even for empty subset to produce correct layer outputs by @avtc in #2161
- Reduce AWQ memory usage by @Qubitium in #2167
- Awq update by @Qubitium in #2168
- Retry partial to to fix accelerate invalid argument for first moe layer (reapply) by @avtc in #2169
- Awq update by @Qubitium in #2172
- adjust retry partial.to by @avtc in #2175
- cleanup awq_get_modules_for_scaling() by @ZX-ModelCloud in #2179
- [FIX] qwen3 moe sparse moe block by @ZX-ModelCloud in #2184
- Add module convert by @LRL2-ModelCloud in #2183
- Cleanup by @Qubitium in #2185
- Update pypcre version to 0.2.5 by @LRL2-ModelCloud in #2186
- Update pypcre version to 0.2.5 by @Qubitium in #2189
- [FIX] version("triton") crash on torch+xpu by @ZX-ModelCloud in #2188
Full Changelog: v5.2.0...v5.4.0