Add QQQ #1402

Qubitium · 2025-03-09T11:18:02Z

TODO:

Ref: https://github.com/HandH1998/QQQ

Signed-off-by: Qubitium <[email protected]>

Qubitium · 2025-03-09T11:28:32Z

Hi @HandH1998, I am officially adding QQQ to GPTQModel. This should allow QQQ to enjoy/share all the gptqmodel supported models and aux features. For now, only the model loading and inference works. Will move to quantization next.

Load testing passed: https://github.com/ModelCloud/GPTQModel/pull/1402/files#diff-b62c27281879ee8ee111b40ab6604c8828ec77f5ff13fe87684c705827499bde

Is it possible for you to write a simple QQQ kernel in torch? This would serve as a fallback to support all hw platforms, not just Ampere+. You can just copy the existing TorchQuantLInear and rename it to TorchQQQQuantLInear. Thanks!

Feel free contact me on SGLang slack (I see you are also an active-contributor there!) and X qubitium.

Signed-off-by: Qubitium <[email protected]>

HandH1998 · 2025-03-10T02:43:07Z

@Qubitium Thanks for supporting QQQ in GPTQModel. As QQQ needs to shuffle the weights offline, the shuffled weights are not supported by torch. If we want to run it in torch, we need to convert the weights to the normal format online, which will cost a lot of time. Do you think it is OK?

Signed-off-by: ZX-ModelCloud <[email protected]>

Qubitium · 2025-03-10T04:13:33Z

@Qubitium Thanks for supporting QQQ in GPTQModel. As QQQ needs to shuffle the weights offline, the shuffled weights are not supported by torch. If we want to run it in torch, we need to convert the weights to the normal format online, which will cost a lot of time. Do you think it is OK?

I see the problem. Can the conversion be one-time cost in the module init/post_init or this is conversion needs to happen at every forward pass? If one-time conersion is required, I think this is a acceptable cost. We can also save the unshuffed state as a different checkpoint_format so there no runtime cost, if the cost is too great.

I guess torch is a nice kernel to have so people can use it, in contrast to the Marlin kernel, as a stepping stone to spawn off other kernels for the HQQ format for different hardware. But if there is too much work required, we don't need to have it.

No worries. We will work on the quant part to get the the quantization plumbing connected. first. If you have time and you think it's worth it, that be done later. I have invited you to the repo collaborators so you can push to this branch or other when you see fit.

Signed-off-by: ZX-ModelCloud <[email protected]>

Qubitium · 2025-03-10T11:39:51Z

@HandH1998 Both quantize (groupsize -1 and 128) and inference code in working state.

I have two questions:

Smoothing preprocess: we did not add this. in code comments you mentioned smoothing does not generate better models?
Rotation: Also not added. Can you explain this a bit? How does it improve model accuracy?

Thanks.

HandH1998 · 2025-03-11T02:21:38Z

@Qubitium
Answer for your two questions:

Smoothing improves the model performance a little when working with GPTQ. If adding this will cost much time, I also don't think you should do this now.
Rotation is really good at some models like LLaMA-2-series and Qwen2-Series. Combining rotation with GPTQ is also the thing Quarot does. But rotation will make the some models collapse like the cases you can find in my QQQ repo. Despite this, I still think you should add this as a choice for users.

Qubitium · 2025-03-11T04:09:35Z

@HandH1998

Rotation is really good at some models like LLaMA-2-series and Qwen2-Series. Combining rotation with GPTQ is also the thing Quarot does. But rotation will make the some models collapse like the cases you can find in my QQQ repo. Despite this, I still think you should add this as a choice for users.

If rotation is enabled, does the modeling code in vllm/sglang (for example) need to be modified to run rotated QQQ? I don't see the rotation property stored in the post-quantize config so it appears, based on config, that rotation does not need changes to modeling code but I see that for rotation in qunatize stage, the layer norsm are fused so that means modeling code needs changing for inference too?

Signed-off-by: ZX-ModelCloud <[email protected]>

HandH1998 · 2025-03-11T06:03:46Z

@Qubitium We only employ rotation onfline, which means that we fuse the rotation matrix into the linear weight. So it doesn't need to change the code for inference.

Signed-off-by: ZX-ModelCloud <[email protected]>

Qubitium · 2025-03-11T06:19:31Z

@Qubitium We only employ rotation onfline, which means that we fuse the rotation matrix into the linear weight. So it doesn't need to change the code for inference.

Ok. Thank! We will add rotate for HQQ in our next PR and only enly enable it for the Model that has been validated for rotation such as Llama 2 and Qwen 2.

Qubitium added 2 commits March 9, 2025 11:16

add hqq

08ee591

Signed-off-by: Qubitium <[email protected]>

format

9670a58

Signed-off-by: Qubitium <[email protected]>

Qubitium marked this pull request as draft March 9, 2025 11:18

Qubitium mentioned this pull request Mar 9, 2025

Transformer 4.46.1 compat HandH1998/QQQ#24

Open

Qubitium added 2 commits March 9, 2025 11:47

add qqq credits

dd53678

Signed-off-by: Qubitium <[email protected]>

cleanup

6dcd6f9

Signed-off-by: Qubitium <[email protected]>

ZX-ModelCloud added 2 commits March 10, 2025 11:47

add QQQProcessor

835e07e

Signed-off-by: ZX-ModelCloud <[email protected]>

fix lm_head name with chatglm/internlm2 model

1f45a9c

Signed-off-by: ZX-ModelCloud <[email protected]>

ZX-ModelCloud added 8 commits March 10, 2025 13:40

add pack_model for qqq

cb76b17

Signed-off-by: ZX-ModelCloud <[email protected]>

fix qqq quantize and pack

7defbdd

Signed-off-by: ZX-ModelCloud <[email protected]>

fix qqq unittest

3a79982

Signed-off-by: ZX-ModelCloud <[email protected]>

remove quantization/qqq dir

c3098d2

Signed-off-by: ZX-ModelCloud <[email protected]>

format

e9e2ef2

Signed-off-by: ZX-ModelCloud <[email protected]>

cleanup

f9ab38e

Signed-off-by: ZX-ModelCloud <[email protected]>

test group_size=-1 with qqq

145b8fd

Signed-off-by: ZX-ModelCloud <[email protected]>

format

0e3cd6f

Signed-off-by: ZX-ModelCloud <[email protected]>

Merge branch 'main' into qqq

9731a2b

Add additional infeatures/outfeatures validation

0831166

Signed-off-by: ZX-ModelCloud <[email protected]>

fix infeatures/outfeatures validation

24c35c0

Signed-off-by: ZX-ModelCloud <[email protected]>

Qubitium marked this pull request as ready for review March 11, 2025 06:18

Qubitium merged commit 26ae13e into main Mar 11, 2025
4 checks passed

Qubitium deleted the qqq branch March 11, 2025 06:19

jmkuebler mentioned this pull request Apr 4, 2025

[BUG] vllm support for QQQ format checkpoints #1501

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add QQQ #1402

Add QQQ #1402

Uh oh!

Qubitium commented Mar 9, 2025 •

edited

Loading

Uh oh!

Qubitium commented Mar 9, 2025 •

edited

Loading

Uh oh!

HandH1998 commented Mar 10, 2025

Uh oh!

Qubitium commented Mar 10, 2025

Uh oh!

Qubitium commented Mar 10, 2025 •

edited

Loading

Uh oh!

HandH1998 commented Mar 11, 2025

Uh oh!

Qubitium commented Mar 11, 2025 •

edited

Loading

Uh oh!

HandH1998 commented Mar 11, 2025

Uh oh!

Qubitium commented Mar 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add QQQ #1402

Add QQQ #1402

Uh oh!

Conversation

Qubitium commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Qubitium commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HandH1998 commented Mar 10, 2025

Uh oh!

Qubitium commented Mar 10, 2025

Uh oh!

Qubitium commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HandH1998 commented Mar 11, 2025

Uh oh!

Qubitium commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HandH1998 commented Mar 11, 2025

Uh oh!

Qubitium commented Mar 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Qubitium commented Mar 9, 2025 •

edited

Loading

Qubitium commented Mar 9, 2025 •

edited

Loading

Qubitium commented Mar 10, 2025 •

edited

Loading

Qubitium commented Mar 11, 2025 •

edited

Loading