[BACKEND] Add QBits support #137

CSY-ModelCloud · 2024-07-01T09:25:12Z

No description provided.

Signed-off-by: Cheng Penghui <[email protected]>

# Conflicts: # .github/workflows/run_tests.yml # examples/benchmark/generation_speed.py # examples/benchmark/perplexity.py # examples/evaluation/run_language_modeling_task.py # examples/evaluation/run_sequence_classification_task.py # examples/evaluation/run_text_summarization_task.py # examples/quantization/basic_usage_bitblas.py # gptqmodel/models/auto.py # gptqmodel/models/base.py # gptqmodel/nn_modules/qlinear/qlinear_exllama.py # gptqmodel/nn_modules/qlinear/qlinear_exllamav2.py # gptqmodel/utils/importer.py # gptqmodel/utils/model.py # tests/test_q4_bitblas.py # tests/test_q4_exallama_v2.py # tests/test_q4_marlin.py # tests/test_q4_triton.py # tests/test_quant_formats.py # tests/test_sharded.py # tests/test_triton.py

# Conflicts: # gptqmodel/utils/importer.py # tests/test_q4_exallama.py

# Conflicts: # .github/ISSUE_TEMPLATE/bug_report.md # gptqmodel/models/base.py # gptqmodel/utils/importer.py # gptqmodel/utils/model.py # requirements.txt

# Conflicts: # requirements.txt

# Conflicts: # gptqmodel/models/base.py

…ons/download-artifact#295

gptqmodel/nn_modules/qlinear/qlinear_qbits.py

Qubitium · 2024-07-05T13:51:01Z

@PenghuiCheng We have merged Qbits support with unit tests in this PR and a few follow-up PRs.

* Support QBits kernel for CPU device Signed-off-by: Cheng Penghui <[email protected]> * fix merge * format * fix merge * rename to meet with latest main style * rename to meet with latest main style * fix doc * revert commented codes * add warning for fallback to cpu * remove unneeded var * fix merge * get gpu from curl * update url & use matrix * revert to main * update codes with pr comments * no 2 bit * set min to 1.4.2 * fix name * add test * remove cpu check, model.device is CPU, so it cause wrong type check there * remove cpu check, model.device is CPU, so it cause wrong type check there * temp disable cuda check * add cpu check back * check module type like main * fix torch_dtype wrong which caused qbits not work * check bits support with BITS_DTYPE_MAPPING * add qbits test * add qbit test to ci * remove for now * delete test_qbits_kernel.py, it can't pass all 4 bit tests * remove cpu check again.. not sure what it is * add qbits in format tests * move test_qbits to test_cpu * no need container * setup python * update cuda check * set python to 3.10 * fix check * update runner * update runner * disable download other run's artifact * set --durations=0 * quant_type removed from main * quant_type removed * override device=cpu for qbits qbits must be explicit and we do not auto switch to qbits when device=cpu. we do the reverse, and force device=cpu and backend set to qbits * Update base.py * Update qlinear_qbits.py * qbits supports 2, 3, 4, 8 bits * Update qlinear_qbits.py * reverse/rename asym into sym * ruff * rename * rename * load qbits only as needed * cleanup * cleanup * fix device override for qbits * cleanup * cuda has been removed * format * fix check condition * fix qbits RuntimeError * fix qbits RuntimeError * remove todo * add protobuf in req & remove buggy download artifact with runid: actions/download-artifact#295 * ruff --------- Signed-off-by: Cheng Penghui <[email protected]> Co-authored-by: Cheng Penghui <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]>

PenghuiCheng and others added 17 commits June 27, 2024 14:00

Support QBits kernel for CPU device

bea6ecd

Signed-off-by: Cheng Penghui <[email protected]>

fix merge

3e69d44

format

3536c18

Merge branch 'refs/heads/main' into CSY/pick-qbits

ca8d57b

# Conflicts: # gptqmodel/utils/importer.py # tests/test_q4_exallama.py

fix merge

cdbbe7a

Merge branch 'refs/heads/main' into CSY/pick-qbits

764a272

Merge branch 'refs/heads/main' into CSY/pick-qbits

e0e94ea

# Conflicts: # .github/ISSUE_TEMPLATE/bug_report.md # gptqmodel/models/base.py # gptqmodel/utils/importer.py # gptqmodel/utils/model.py # requirements.txt

rename to meet with latest main style

5ee1428

rename to meet with latest main style

bc42759

Merge branch 'refs/heads/main' into CSY/pick-qbits

395ef8e

# Conflicts: # requirements.txt

fix doc

93db0ee

Merge branch 'main' into CSY/pick-qbits

6b1e77e

revert commented codes

d5fae59

Merge branch 'refs/heads/main' into CSY/pick-qbits

448bf38

add warning for fallback to cpu

c52f3b2

remove unneeded var

0d966c6

CSY-ModelCloud force-pushed the CSY/pick-qbits branch from 1e129c8 to 0d966c6 Compare July 2, 2024 02:22

CSY-ModelCloud added 12 commits July 2, 2024 10:46

Merge branch 'refs/heads/main' into CSY/pick-qbits

5147ff4

# Conflicts: # gptqmodel/models/base.py

fix merge

c28f4f8

Merge branch 'refs/heads/main' into CSY/pick-qbits

c58f584

get gpu from curl

aaa833a

update url & use matrix

b43f3c9

revert to main

f147ce6

update codes with pr comments

2a88c56

no 2 bit

f941eb7

set min to 1.4.2

3997ad5

Merge branch 'refs/heads/main' into CSY/pick-qbits

231d6a2

fix name

55826b3

add test

f9cd70b

Qubitium and others added 20 commits July 4, 2024 23:41

Update qlinear_qbits.py

51b9481

Merge branch 'main' into CSY/pick-qbits

0980e1e

reverse/rename asym into sym

e1b41c6

ruff

3bf1b03

rename

8fd2463

rename

3172013

load qbits only as needed

aac78c3

cleanup

39313f0

cleanup

9e683fc

fix device override for qbits

90e6122

cleanup

026d7e8

cuda has been removed

1f041b4

format

bf11c13

fix check condition

08eeb7c

fix qbits RuntimeError

d6b63d9

fix qbits RuntimeError

4fffa9f

remove todo

2ae26f3

Merge branch 'refs/heads/main' into CSY/pick-qbits

52013d2

add protobuf in req & remove buggy download artifact with runid: acti…

5574e2d

…ons/download-artifact#295

ruff

d90b13c

Qubitium approved these changes Jul 5, 2024

View reviewed changes

gptqmodel/nn_modules/qlinear/qlinear_qbits.py Show resolved Hide resolved

Qubitium approved these changes Jul 5, 2024

View reviewed changes

Qubitium merged commit b39fa13 into main Jul 5, 2024

Qubitium deleted the CSY/pick-qbits branch July 5, 2024 04:58

Qubitium mentioned this pull request Jul 7, 2024

[FEATURE] Intel/Qbits CPU inference #92

Closed

CSY-ModelCloud restored the CSY/pick-qbits branch July 8, 2024 09:22

CSY-ModelCloud deleted the CSY/pick-qbits branch July 9, 2024 02:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BACKEND] Add QBits support #137

[BACKEND] Add QBits support #137

Uh oh!

CSY-ModelCloud commented Jul 1, 2024

Uh oh!

Uh oh!

Qubitium commented Jul 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[BACKEND] Add QBits support #137

[BACKEND] Add QBits support #137

Uh oh!

Conversation

CSY-ModelCloud commented Jul 1, 2024

Uh oh!

Uh oh!

Qubitium commented Jul 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants