Skip to content

Conversation

@KingdalfGoodman
Copy link
Contributor

Summary

Fix Triton dequant_kernel for GPTQ INT3 (bits=3):

  • Remove invalid tensor-valued Python if control flow (Triton compile error).
  • Align 3-bit dequantization with GPTQ CPU packing format (10-1-10-1-10, 32 values → 3 int32 words).

This makes Triton INT3 inference compile correctly and produces numerically consistent results with the CPU reference.

Related Issue

Fixes / relates to: #2251

Notes

  • This is a correctness-first fix: it restores functional INT3 Triton inference and matches the CPU reference PackableQuantLinear / pack_block logic for 3-bit GPTQ. Performance has not been fully optimized or benchmarked yet.
  • Tested on INT3 GPTQ models (e.g. Qwen3-4B) with wikitext; perplexity returns to a reasonable range.

@Qubitium
Copy link
Collaborator

@KingdalfGoodman Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants