Skip to content

[GPTQ] Add support for block-wise quantization #2520

@brian-dellabetta

Description

@brian-dellabetta

Is your feature request related to a problem? Please describe.
GPTQ currently does not support QuantizationStrategy.BLOCK. This is most likely because it predates the addition of block-wise quantization strategy to compressed-tensors/llm-compressor and its continued usage in DeepSeek models. I don't see any reason why it would be incompatible with GPTQ, and I'd like to see how it would GPTQ would perform vs. round-to-nearest on a DeepSeek3.2 NVFP4+FP8_BLOCK model

Describe the solution you'd like
Add case for block strategy to GPTQ algorithm here

Additional context
PR should include some basic benchmarks comparing GPTQ FP8_BLOCK to round-to-nearest, hopefully showing advantage

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestgood first issueA good first issue for users wanting to contribute

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions