Notable Changes:

@Qubitium

Notable Changes:

Use static cuda ctx for triton kernel launch by @Qubitium in #2269
Remove random-word depend by @LRL2-ModelCloud in #2266
Update PyPcre depend from 0.2.7 to 0.2.8 by @Qubitium in #2267

What's Changed

Bump the github-actions group with 2 updates by @dependabot[bot] in #2265
Update version.py by @Qubitium in #2268
Ready 5.6.6 by @Qubitium in #2270

Full Changelog: v5.6.2...v5.6.6

@LRL2-ModelCloud

What's Changed

Bump the github-actions group with 2 updates by @dependabot[bot] in #2265
remove random-word depend by @LRL2-ModelCloud in #2266
Update pypcre version from 0.2.7 to 0.2.8 by @Qubitium in #2267
Update version.py by @Qubitium in #2268

Full Changelog: v5.6.2...v5.6.4

@ZX-ModelCloud

Notable Changes

FIX JIT Pytorch extension pack_cpu_ext stall by @ZX-ModelCloud in #2248
Refractor Kernel External Dependency Validation by @LRL2-ModelCloud in #2249
FIX some models not honoring model.config.use_cache by force pass use_cache=false by @LRL2-ModelCloud in #2246
FIX Incorrect Triton dequant_kernel for 3-bit GPTQ (INT3) leads to Triton compile error / wrong dequantization #2251 by
Support llm-awq by @ZX-ModelCloud in #2252

What's Changed

Update version.py by @Qubitium in #2247
Update README.md by @davedgd in #2250
[CI] add torch 2.9.1 by @CSY-ModelCloud in #2254
@KingdalfGoodman in #2258
Update license declaration in pyproject.toml by @CSY-ModelCloud in #2259
Modify setup by @Qubitium in #2260
Add release notes for version 5.6.2 by @Qubitium in #2261
fix test_quant_formats.py by @LRL2-ModelCloud in #2262
[CI] mount dateset dir to /monster/data/model/dataset by @CSY-ModelCloud in #2263
fix parsing args by @CSY-ModelCloud in #2264

New Contributors

@KingdalfGoodman made their first contribution in #2258

Full Changelog: v5.6.0...v5.6.2

@jiqing-feng

Notable Changes:

HF Kernel for CPU: AMX, AVX2, AVX512 optimized by @jiqing-feng in #2232
Fix: Resolve performance regression during initial forward pass with offload_to_disk by @avtc in #2239
Auto module tree by @LRL2-ModelCloud in #2204
Afmoe support by @LRL2-ModelCloud in #2243
Add dots1 by @Qubitium in #2231

What's Changed

Update description and code about GPTAQ in README.md by @wayneguow in #2202
Update test cases for qwen2.5-vl and qwen3-vl by @wayneguow in #2203
Optimize minimax m2 modelling forward pass by @avtc in #2176
remove gemm ipex by @LRL2-ModelCloud in #2206
Bump actions/checkout from 5 to 6 in the github-actions group by @dependabot[bot] in #2207
Update device-smi dependency version to 0.5.2 by @Qubitium in #2208
Fix loading an AWQ-quantized model with GPTQModel when it is not actu… by @LRL2-ModelCloud in #2209
fix exllama v2 post init by @LRL2-ModelCloud in #2211
[FIX] Add fallback for "module_dir" and "entry key" lookup by @ZX-ModelCloud in #2210
Update unit_tests.yml by @Qubitium in #2213
fix mps backend does not implement float64 by @Qubitium in #2216
[FIX] _apply_quant() not being called with awq by @ZX-ModelCloud in #2218
Fix AWQ Extension by @LRL2-ModelCloud in #2217
Auto AWQ kernel selection for Transformers compat by @Qubitium in #2214
Fix add bias for torch_fuse by @jiqing-feng in #2223
[CI] Add torch_fused test with Bias by @ZX-ModelCloud in #2222
[FIX] device_map with cpu only causing CpuOffload hooks to be injected by @ZX-ModelCloud in #2225
fix awq apply_scale and apply_clip multi thread issue by @LRL2-ModelCloud in #2224
Fix CI test not pasing by @Qubitium in #2226
Monkeypatch lm-eval latest broken imports by @Qubitium in #2227
make file can be pytest called by @CSY-ModelCloud in #2228
CI Fix awq weight mean by @LRL2-ModelCloud in #2229
fix pycharm auto imported wrong path by @CSY-ModelCloud in #2230
[FIX] TorchFusedAwqQuantLinear selection by @ZX-ModelCloud in #2233
[CI] update CI path by @CSY-ModelCloud in #2236
[Model] Mistral3 support by @LRL2-ModelCloud in #2238
Update setup.py by @Qubitium in #2240
Increase MAX_JOBS from 4 to 8 in release.yml by @Qubitium in #2241
[FIX] non-peristent buffer was saved incorrectly by @ZX-ModelCloud in #2242

New Contributors

@wayneguow made their first contribution in #2202

@Qubitium

Notable Changes:

Fix double fwd regression by @Qubitium in #2198
Add cli: gptqmodel env by @ZX-ModelCloud in #2192
[CI] compile wheel with python -m build by @CSY-ModelCloud in #2193

What's Changed

Start v5.5.0 devel branch (odd version) by @Qubitium in #2191
Update version from 5.5.0 to 5.4.2 patch release by @Qubitium in #2199
[CI] copy wheel to local dir instead of using http server by @CSY-ModelCloud in #2200

Full Changelog: v5.4.0...v5.4.2

@Qubitium

Notable Changes:

AWQ Torch Fused Kernel by @Qubitium in #2190
Make torch fused op compilable by @jiqing-feng in #2182
[FIX] AWQ MoE by @ZX-ModelCloud in #2171
add :? capture only syntax by @Qubitium in #2173

What's Changed

Update latest news section in README.md by @Qubitium in #2166
run forward pass even for empty subset to produce correct layer outputs by @avtc in #2161
Reduce AWQ memory usage by @Qubitium in #2167
Awq update by @Qubitium in #2168
Retry partial to to fix accelerate invalid argument for first moe layer (reapply) by @avtc in #2169
Awq update by @Qubitium in #2172
adjust retry partial.to by @avtc in #2175
cleanup awq_get_modules_for_scaling() by @ZX-ModelCloud in #2179
[FIX] qwen3 moe sparse moe block by @ZX-ModelCloud in #2184
Add module convert by @LRL2-ModelCloud in #2183
Cleanup by @Qubitium in #2185
Update pypcre version to 0.2.5 by @LRL2-ModelCloud in #2186
Update pypcre version to 0.2.5 by @Qubitium in #2189
[FIX] version("triton") crash on torch+xpu by @ZX-ModelCloud in #2188

Full Changelog: v5.2.0...v5.4.0

@Qubitium

Notable Changes:

Minimax M2, Granite Nano, Qwen3-VL, Brumpy model support
AWQ quantization now out of beta and now fully integrated into life cycle
New VramStrategy.Balanced property to spread MoE modules to different gpus
New pure torch AWQ kernel
New calibration_concat_separator property
Fixed HF bug that did not save mtp layers for GLM 4.5/4.6 (air) models.
Fixed multi-gpu cuda asserts due to stream/sync

What's Changed

try not adding mem guards for marlin kernel launch protection by @Qubitium in https://github.com/ModelCloud/GPTQModel/*pull/2108
MoE vram by @Qubitium in #2110
Fix GLM 4.5/4.6 and AIr not saving mtp layer after save (HF bug) by @LRL2-ModelCloud in #2109
torchao 0.14.1 update by @Qubitium in #2111
Test refractor by @Qubitium in #2113
Bump the github-actions group with 2 updates by @dependabot[bot] in #2120
[FIX] xpu unit test by @ZX-ModelCloud in #2122
modular by @Qubitium in #2123
update scores by @Qubitium in #2124
Fp8 dequant by @Qubitium in #2125
Model dequant by @Qubitium in #2126
Fp4 e2m1 by @Qubitium in #2127
[FIX] ovis2, compatible with transformers v4.57.1 by @ZX-ModelCloud in #2129
fix cols padding by @LRL2-ModelCloud in #2130
[FIX] ovis_1_6 quantization by @ZX-ModelCloud in #2131
Minimax m2 by @Qubitium in #2128
Fix awq marlin kernel for bf16 by @Qubitium in #2135
[FIX] incorrect AWQ NODES by @ZX-ModelCloud in #2133
add support_offload_to_disk check by @LRL2-ModelCloud in #2134
Add Awq torch kernel by @Qubitium in #2137
Marin by @Qubitium in #2139
Marin scores by @Qubitium in #2141
Fix triton version detection in nogil patcher by @amd-vlarakic in #2144
Fix qwen2 omni by @LRL2-ModelCloud in #2140
[MODEL] Add GraniteMoEHybrid by @ZX-ModelCloud in #2142
Fold AWQ into proper Looper/Layer/Subset Lifecycle by @Qubitium in #2138
Refine GPT-QModel description in README by @Qubitium in #2145
fix device_map by @LRL2-ModelCloud in #2146
[MODEL] Add Qwen3-VL by @techshoww in #2136
Add calibration_concat_separator by @Qubitium in #2148
add test_qwen3_vl.py by @LRL2-ModelCloud in #2147
Fix triton monkeypatch by @Qubitium in #2149
[MODEL] Add Brumby by @Qubitium in #2150
Dedup/Cleanup by @Qubitium in #2151
Prep for 5.2 release by @Qubitium in #2152
Dedup3 by @Qubitium in #2153
add missing file by @Qubitium in #2154
GPTAQ rename by @Qubitium in #2155
fix ci test by @Qubitium in #2158
fix setup license by @Qubitium in #2160
FIx snapshot_download receiving unsupported kwargs by @Qubitium in #2162
Retry partial.to to fix accelerate invalid argument error for first moe layer for >4 GPU setups by @avtc in #2163
Comments + Sync by @Qubitium in #2164
Stats/Logs by @Qubitium in #2165

New Contributors

@amd-vlarakic made their first contribution in #2144
@techshoww made their first contribution in #2136

Full Changelog: v5.0.0...v5.2.0

@Qubitium

Notable Changes:

New Data-parallel quant support for MoE models on multi-gpu using nogil Python (Python >= 3.13t with PYTHON_GIL=0 env).
New offload_to_disk support enabled by default to massively reduce cpu ram usage.
New Intel optimized and Amd compatible cpu hw accelerated TorchFused kernel.
Packing stage is now 4x faster and now inlined with quantization.
Vram pressure for large models reduced during quantization.
act_group_aware is now 16k+ times faster and the default when desc_act=False for higher quality recovery without inference penalty of desc_act=True.
New beta quality AWQ support with full GEMM, GEMM_Fast, Marlin kernel support.
New LFM, Ling, Qwen3 Omni model support.
Bitblas kernel updated to support Bitblas 0.1.0.post1 reelase.
Quantization is now faster with reduced vram usage. Enhanced logging support with LogBar.
And much much more...

What's Changed

rename torch_dtype to dtype to sync with hf transformers by @Qubitium in #1804
drop support for python < 3.11 by @CSY-ModelCloud in #1805
hard deprecated ipex in favor of torch_fused by @Qubitium in #1807
update pyproject.toml by @CSY-ModelCloud in #1808
[CI] release with 3.13t by @CSY-ModelCloud in #1811
[QUANTIZATION] Add AWQ support by @ZX-ModelCloud in #1703
find mapping by @LRL-ModelCloud in #1812
Update README.md by @Qubitium in #1813
Update version.py by @Qubitium in #1814
Turtle in a half shell by @Qubitium in #1809
note about memory saving by @Qubitium in #1817
move fail_safe by @LRL-ModelCloud in #1818
rename turtle method by @Qubitium in #1820
add threads by @Qubitium in #1821
remove AWQ mod defs by @ZX-ModelCloud in #1822
[CI] use new docker by @CSY-ModelCloud in #1823
Fix awq quantize by @LRL-ModelCloud in #1824
[CI] use new docker for release source by @CSY-ModelCloud in #1825
fix awq pack by @LRL-ModelCloud in #1826
fix loading autoawq models and hf/vllm/sglang loading of newly awq qu… by @Qubitium in #1827
wrong arg check by @Qubitium in #1828
fix thread task var scoping by @Qubitium in #1829
fix call param by @Qubitium in #1830
fix threads > 1 not considered (unsafe) by @Qubitium in #1832
cleanup by @Qubitium in #1833
fix gptqmodel offload paths conflict by @Qubitium in #1834
Ci test by @Qubitium in #1835
eora: always diff in fp32 + cleanup by @Qubitium in #1836
add register_buffer/parameter to NamedModule class by @Qubitium in #1837
typo by @Qubitium in #1839
add thread safety to all classes by @Qubitium in #1840
fix fail_safe by @LRL-ModelCloud in #1844
update marlin kernel by @ZX-ModelCloud in #1838
fix fp32 reduce on/off by @Qubitium in #1845
bypass marlin kernel bias issue by @Qubitium in #1846
disable marlin atomics by default as it failed ci accuracy test by @Qubitium in #1847
[FIX] awq marlin by @ZX-ModelCloud in #1816
cleanup var names by @Qubitium in #1849
pack per module by @LRL-ModelCloud in #1842
[CI] use new docker by @CSY-ModelCloud in #1850
tweak eora test by @Qubitium in #1851
wait for thread tasks only when every module has completed. by @Qubitium in #1852
[FIX] Compatible with vllm v0.10.2 by @ZX-ModelCloud in #1855
move req.txt into toml by @CSY-ModelCloud in #1858
do not create buffers only to overite them by @Qubitium in #1857
pop states after use by @Qubitium in #1859
[FIX] multiple "register_buffers" parameters by @ZX-ModelCloud in #1860
Low memory pack by @Qubitium in #1861
fix packing ci test by @Qubitium in #1862
simplify by @Qubitium in #1853
Fix 3bit packing regression in previous commit by @Qubitium in #1863
remove deprecated parallel_packing property by @Qubitium in #1864
Fix qqq quant/offloading by @Qubitium in #1866
temp disable awq gemm kernel due to failing ci by @Qubitium in #1867
update vllm compat by @Qubitium in #1869
fix regression by @Qubitium in #1870
fix setup.py crashed because torch may not support float8_e8m0fnu by @CSY-ModelCloud in #1871
[FIX] AwqGEMMQuantLinear skip gptq_v1 convert to v2 by @ZX-ModelCloud in #1872
Fix awq gemm auto kernel selection order by @Qubitium in #1873
Update README.md by @Qubitium in #1874
reduce forwarding to minimal by @Qubitium in #1876
Update README.md by @Qubitium in #1877
fix exllama tests by @Qubitium in #1879
debug print all params/buffers by @Qubitium in #1880
skip internal loading of non-pkg compatible quantization models, i.e.… by @Qubitium in #1881
Loader by @Qubitium in #1882
Cleanup awq by @Qubitium in #1883
remove broken test by @Qubitium in #1884
[CI] remove old cuda/torch support for release by @CSY-ModelCloud in #1885
fix loader by @LRL-ModelCloud in #1886
fix nvcc warnings about pending cuda > 13.x compat by @Qubitium in #1887
fix packing speed test by @Qubitium in #1889
fix licenses warning by @CSY-ModelCloud in #1888
set licenses to apache by @CSY-ModelCloud in #1890
[FIX] AwqGEMMQuantLinear should is PackableQuantLinear by @ZX-ModelCloud in #1891
skip modules that have no parameters and no buffers since they can't be offloaded by @LRL-ModelCloud in #1892
skip modules that have no parameters and no buffers since they can't offload by @LRL-ModelCloud in #1894
Fix device check by @Qubitium in #1896
[CI] disable test install by @CSY-ModelCloud in #1895
remove hash feature by @Qubitium in #1897
fix cuda ext cannot be loaded by @Qubitium in #1898
lock numpy to 2.2.6 by @CSY-ModelCloud in #1899
[FIX] test_lm_eval.py by @ZX-ModelCloud in #1900
Patch fix model save by @Qubitium in #1901
Ugly patch save 2 by @Qubitium in #1902
fix potential leak by @Qubitium in #1904
[FIX] test_integration by @ZX-ModelCloud in #1903
fix build will uploaded a empty wheel by @CSY-ModelCloud in #1905
fix lm_head quant by @LRL-ModelCloud in #1906
batch tweaks by @Qubitium in #1907
[FIX] test_kernel_output_torch_fused by @ZX-ModelCloud in ...

@Qubitium

What's Changed

Cleanup hyb_act by @Qubitium in #1791
Remove torch import in setup.py by @Qubitium in #1729
Refractor: rename hyb_act to act_group_aware by @Qubitium in #1794
Cleanup by @Qubitium in #1795, #1796
[CI] Add torch 2.8.0 by @CSY-ModelCloud in #1797
[CI] torch-2.6.0+cu128-python-3.9 does not exist by @CSY-ModelCloud in #1798
Fix wf_unsqueeze_zero and wf_unsqueeze_neg_one by @LRL-ModelCloud in #1799
GAR field save to meta on quant save by @Qubitium in #1800
Add pyproject.toml by @CSY-ModelCloud in #1801
[CI] Don't detect arch list when it has already been set & fix build-system requirments by @CSY-ModelCloud in #1802

Full Changelog: v4.2.0...v4.2.5

@Qubitium

Notable Changes

Add Qwen3-Next by @Qubitium and @LRL-ModelCloud in #1787
Add Apertus support by @LRL-ModelCloud in #1767
Add Kimi k2 support by @LRL-ModelCloud in #1768
Add Klear support by @LRL-ModelCloud in #1769
Add FastLLM support by @LRL-ModelCloud in #1771
Add Nemotron H support by @LRL-ModelCloud in #1773
Add fail_safe option by @LRL-ModelCloud in #1775
Use threading lock to protect unsafe tensor moves in multi-gpu by @Qubitium in #1778
Avoid building experimental extensions to reduce wheel size by @Qubitium in #1763

What's Changed

Fix LlavaQwen2GPTQ by @LRL-ModelCloud in #1772
Fix Q.to on multi-gpu gptq when proceeding fast and has many experts and gpus by @avtc in #1774
Bump actions/setup-python from 5 to 6 in the github-actions group by @dependabot[bot] in #1758
[CI] fix release jobs were skipped by @CSY-ModelCloud in #1759
ignore compile warns about var declared but not used by @Qubitium in #1760
allow prebuilt wheel path to be customized via env by @Qubitium in #1761
add build toggles for all cpp kernels by @Qubitium in #1764
fix multi gpu inference by @LRL-ModelCloud in #1762
[CI] reduce wheel download size by @CSY-ModelCloud in #1765
start 4.2.0-dev cycle by @Qubitium in #1766
fix klear by @LRL-ModelCloud in #1770
FIX transformers >= 4.56.1 force changed torch.default_dtype by @Qubitium in #1779
fix multi gpu fail_safe by @LRL-ModelCloud in #1780
fix device instance by @LRL-ModelCloud in #1783
prepare for 4.2 release by @Qubitium in #1785

Full Changelog: v4.1.0...v4.2.0

Releases: ModelCloud/GPTQModel

v5.6.6

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v5.6.4

What's Changed

Contributors

Uh oh!

GPT-QModel v5.6.2

Notable Changes

What's Changed

New Contributors

Contributors

Uh oh!

GPT-QModel v5.6.0

Notable Changes:

What's Changed

New Contributors

Contributors

Uh oh!

GPT-QModel v5.4.2

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v5.4.0

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v5.2.0

Notable Changes:

What's Changed

New Contributors

Contributors

Uh oh!

GPT-QModel v5.0.0

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v4.2.5

What's Changed

Contributors

Uh oh!

GPT-QModel v4.2.0

Notable Changes

What's Changed

Contributors

Uh oh!