Skip to content

Conversation

@CSY-ModelCloud
Copy link
Collaborator

@Qubitium Qubitium merged commit 6cb9591 into main Sep 17, 2025
5 checks passed
@Qubitium Qubitium deleted the CSY/drop-py3.9 branch September 17, 2025 01:45
Qubitium pushed a commit that referenced this pull request Sep 17, 2025
* drop support for python < 3.11

* [CI] remove release actions  for py < 3.11
Qubitium added a commit that referenced this pull request Sep 18, 2025
* add awq code

Signed-off-by: ZX-ModelCloud <[email protected]>

* add awq code

Signed-off-by: ZX-ModelCloud <[email protected]>

* config add "zero_point" field

Signed-off-by: ZX-ModelCloud <[email protected]>

* add awq kernels

Signed-off-by: ZX-ModelCloud <[email protected]>

* add AWQuantLinear

Signed-off-by: ZX-ModelCloud <[email protected]>

* add awq_processor.py

Signed-off-by: ZX-ModelCloud <[email protected]>

* loop_processor added pre_quantize(self, module: Module, device: torch.device)

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix init_quant()

Signed-off-by: ZX-ModelCloud <[email protected]>

* cleanup

Signed-off-by: ZX-ModelCloud <[email protected]>

* Fixed the issue where _module_forward() was too slow to execute

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix OOM

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix save AWQ quantized model

Signed-off-by: ZX-ModelCloud <[email protected]>

* AWQProcessor add log stats

Signed-off-by: ZX-ModelCloud <[email protected]>

* AWQProcessor add calculate_w_wq_diff

Signed-off-by: ZX-ModelCloud <[email protected]>

* cleanup

Signed-off-by: ZX-ModelCloud <[email protected]>

* cleanup

Signed-off-by: ZX-ModelCloud <[email protected]>

* added awq code

Signed-off-by: ZX-ModelCloud <[email protected]>

* added AWQ format

Signed-off-by: ZX-ModelCloud <[email protected]>

* select_quant_linear() added "quant_method" argument

Signed-off-by: ZX-ModelCloud <[email protected]>

* added AWQuantLinear_EXLLAMA

Signed-off-by: ZX-ModelCloud <[email protected]>

* added AWQuantLinear_ExllamaV2

Signed-off-by: ZX-ModelCloud <[email protected]>

* added AWQuantLinear_IPEX

Signed-off-by: ZX-ModelCloud <[email protected]>

* added AWQuantLinear_GEMV and AWQuantLinear_GEMVFast

Signed-off-by: ZX-ModelCloud <[email protected]>

* cleanup

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix setup.py

Signed-off-by: ZX-ModelCloud <[email protected]>

* remove gptqmodel_ext/awq/exllama and exllamav2

Signed-off-by: ZX-ModelCloud <[email protected]>

* add AWQuantLinear_Marlin

Signed-off-by: ZX-ModelCloud <[email protected]>

* Move AWQ's llama model definition

Signed-off-by: ZX-ModelCloud <[email protected]>

* remove hf transformer version check. always true.

Signed-off-by: Qubitium <[email protected]>

* add comments

Signed-off-by: Qubitium <[email protected]>

* add comments

Signed-off-by: Qubitium <[email protected]>

* template for dynamic awq rules

Signed-off-by: Qubitium <[email protected]>

* reset

Signed-off-by: Qubitium <[email protected]>

* fix depth

Signed-off-by: Qubitium <[email protected]>

* cleanup last_module

Signed-off-by: Qubitium <[email protected]>

* fix last non-quantized module not stripped for !

Signed-off-by: Qubitium <[email protected]>

* allow non-quantized modules be part of a subset. for models that have executing but non-quantized modules within in the same subset

Signed-off-by: Qubitium <[email protected]>

* fix get_layers_for_scaling()

Signed-off-by: ZX-ModelCloud <[email protected]>

* BaseGPTQModel add awq_get_modules_for_scaling()

Signed-off-by: ZX-ModelCloud <[email protected]>

* unify module delcaration with new tree

Signed-off-by: Qubitium <[email protected]>

* fix wrong tree passed

Signed-off-by: Qubitium <[email protected]>

* fix ! skipped

Signed-off-by: Qubitium <[email protected]>

* fix awq_get_modules_for_scaling()

Signed-off-by: ZX-ModelCloud <[email protected]>

* comment out assert_awq_linear()

Signed-off-by: ZX-ModelCloud <[email protected]>

* If the Model uses GQA (Grouped Query Attention), attention out will be skipped.

Signed-off-by: ZX-ModelCloud <[email protected]>

* refractor and move dynamic layer modules code inside base. expose `simple_layer_modules()` and `full_layer_modules()` api

Signed-off-by: Qubitium <[email protected]>

* fix: need to use classmethod for helpers

Signed-off-by: Qubitium <[email protected]>

* refractor: moe module list creation

Signed-off-by: Qubitium <[email protected]>

* refractor: moe module list creation

Signed-off-by: Qubitium <[email protected]>

# Conflicts:
#	gptqmodel/models/base.py

* mod qwen3_moe

* use model_config

* Fix missing parameter: fail_safe

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix

* cleanup

* rename attention_out_module to shape_must_match_previous

Signed-off-by: Qubitium <[email protected]>

* dedup: embed_modules. merge with base_modules

Signed-off-by: Qubitium <[email protected]>

* fix moe

* fix qwen3 moe

* dedup: remove `layers_node` property. dynamic generate it from tree

Signed-off-by: Qubitium <[email protected]>

* fix group

* qwen3-moe support AWQ

Signed-off-by: ZX-ModelCloud <[email protected]>

* add filter_not_quantize_module()

Signed-off-by: ZX-ModelCloud <[email protected]>

* cleanup

Signed-off-by: ZX-ModelCloud <[email protected]>

* full_layer_modules() also needs to generate moe modules

Signed-off-by: ZX-ModelCloud <[email protected]>

* get the first layer to determine layer type

* qwen3_moe declares "shape_must_match_previous"

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix moe modules

* rename BaseGPTQMOdel to BaseQModel

Signed-off-by: Qubitium <[email protected]>

* rename model defs

Signed-off-by: Qubitium <[email protected]>

* rename model defs

Signed-off-by: Qubitium <[email protected]>

* remove static layer_type

Signed-off-by: Qubitium <[email protected]>

* deprecate old api

Signed-off-by: Qubitium <[email protected]>

* dynamically get base_modules

Signed-off-by: Qubitium <[email protected]>

* use ugly long name for clearer meaning

Signed-off-by: Qubitium <[email protected]>

* dedup llama defs

Signed-off-by: Qubitium <[email protected]>

* missed prop removed but not from base

Signed-off-by: Qubitium <[email protected]>

* build_moe_modules_if_need() add "is_awq_quantize" argument

Signed-off-by: ZX-ModelCloud <[email protected]>

* awq_get_modules_for_scaling() needs to skip the "mlp.gate" module

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix error: module2inspect is None

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix model load

* Only the first node needs kwargs

Signed-off-by: ZX-ModelCloud <[email protected]>

* rename `torch_dtype` to `dtype` to sync with hf transformers (#1804)

Signed-off-by: Qubitium <[email protected]>

* drop support for python < 3.11 (#1805)

* drop support for python < 3.11

* [CI] remove release actions  for py < 3.11

* hard deprecated ipex. Intel has deprecated ipex in favor of torch fused kernel for pytorch >= 2.8 (#1807)

Signed-off-by: Qubitium <[email protected]>
# Conflicts:
#	gptqmodel/models/base.py
#	gptqmodel/models/loader.py
#	gptqmodel/utils/importer.py
#	gptqmodel/utils/model.py

* clean

Signed-off-by: Qubitium <[email protected]>

* rename

Signed-off-by: Qubitium <[email protected]>

* rename

Signed-off-by: Qubitium <[email protected]>

* fix group

* update _layers_modules_tree

* Fixed awq_get_modules_for_scaling() Error regarding "mlp.experts.{i}.down_proj"

Signed-off-by: ZX-ModelCloud <[email protected]>

* update

* fix inp shape error

Signed-off-by: ZX-ModelCloud <[email protected]>

* update

* fix

* fix module shape error

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix

* fix

* fix

* clean

* update

* update

* update

* Adjust the order of q/k/v and gate/up

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix Wrong quant_method

Signed-off-by: ZX-ModelCloud <[email protected]>

* add test_awq_moe.py

Signed-off-by: ZX-ModelCloud <[email protected]>

* cleanup

* cleanup

* fix deepseekv2/v3

* cleanup

* fix layer0

* Fix FORMAT.GEMV

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix norm

* fix norm

* rename

* format

* format

* rename

* Fix FORMAT.GEMV_FAST

Signed-off-by: ZX-ModelCloud <[email protected]>

* cleanup

---------

Signed-off-by: ZX-ModelCloud <[email protected]>
Signed-off-by: Qubitium <[email protected]>
Co-authored-by: Qubitium <[email protected]>
Co-authored-by: LRL-ModelCloud <[email protected]>
Co-authored-by: CSY-ModelCloud <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants