[GPTQ] Add support for block-wise quantization

**Is your feature request related to a problem? Please describe.**
GPTQ currently does not support `QuantizationStrategy.BLOCK`. This is most likely because it predates the addition of block-wise quantization strategy to compressed-tensors/llm-compressor and its continued usage in DeepSeek models. I don't see any reason why it would be incompatible with GPTQ, and I'd like to see how it would GPTQ would perform vs. round-to-nearest on a DeepSeek3.2 NVFP4+FP8_BLOCK model

**Describe the solution you'd like**
Add case for block strategy to GPTQ algorithm [here](https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/modifiers/gptq/gptq_quantize.py#L196-L232)

**Additional context**
PR should include some basic benchmarks comparing GPTQ FP8_BLOCK to round-to-nearest, hopefully showing advantage


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPTQ] Add support for block-wise quantization #2520

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[GPTQ] Add support for block-wise quantization #2520

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions