Skip to content

Conversation

@CSY-ModelCloud
Copy link
Collaborator

No description provided.

PenghuiCheng and others added 17 commits June 27, 2024 14:00
# Conflicts:
#	.github/workflows/run_tests.yml
#	examples/benchmark/generation_speed.py
#	examples/benchmark/perplexity.py
#	examples/evaluation/run_language_modeling_task.py
#	examples/evaluation/run_sequence_classification_task.py
#	examples/evaluation/run_text_summarization_task.py
#	examples/quantization/basic_usage_bitblas.py
#	gptqmodel/models/auto.py
#	gptqmodel/models/base.py
#	gptqmodel/nn_modules/qlinear/qlinear_exllama.py
#	gptqmodel/nn_modules/qlinear/qlinear_exllamav2.py
#	gptqmodel/utils/importer.py
#	gptqmodel/utils/model.py
#	tests/test_q4_bitblas.py
#	tests/test_q4_exallama_v2.py
#	tests/test_q4_marlin.py
#	tests/test_q4_triton.py
#	tests/test_quant_formats.py
#	tests/test_sharded.py
#	tests/test_triton.py
# Conflicts:
#	gptqmodel/utils/importer.py
#	tests/test_q4_exallama.py
# Conflicts:
#	.github/ISSUE_TEMPLATE/bug_report.md
#	gptqmodel/models/base.py
#	gptqmodel/utils/importer.py
#	gptqmodel/utils/model.py
#	requirements.txt
@Qubitium Qubitium merged commit b39fa13 into main Jul 5, 2024
@Qubitium Qubitium deleted the CSY/pick-qbits branch July 5, 2024 04:58
@Qubitium
Copy link
Collaborator

Qubitium commented Jul 5, 2024

@PenghuiCheng We have merged Qbits support with unit tests in this PR and a few follow-up PRs.

@CSY-ModelCloud CSY-ModelCloud restored the CSY/pick-qbits branch July 8, 2024 09:22
@CSY-ModelCloud CSY-ModelCloud deleted the CSY/pick-qbits branch July 9, 2024 02:49
DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024
* Support QBits kernel for CPU device

Signed-off-by: Cheng Penghui <[email protected]>

* fix merge

* format

* fix merge

* rename to meet with latest main style

* rename to meet with latest main style

* fix doc

* revert commented codes

* add warning for fallback to cpu

* remove unneeded var

* fix merge

* get gpu from curl

* update url & use matrix

* revert to main

* update codes with pr comments

* no 2 bit

* set min to 1.4.2

* fix name

* add test

* remove cpu check, model.device is CPU, so it cause wrong type check there

* remove cpu check, model.device is CPU, so it cause wrong type check there

* temp disable cuda check

* add cpu check back

* check module type like main

* fix torch_dtype wrong which caused qbits not work

* check bits support with BITS_DTYPE_MAPPING

* add qbits test

* add qbit test to ci

* remove for now

* delete test_qbits_kernel.py, it can't pass all 4 bit tests

* remove cpu check again.. not sure what it is

* add qbits in format tests

* move test_qbits to test_cpu

* no need container

* setup python

* update cuda check

* set python to 3.10

* fix check

* update runner

* update runner

* disable download other run's artifact

* set --durations=0

* quant_type removed from main

* quant_type removed

* override device=cpu for qbits 

qbits must be explicit and we do not auto switch to qbits when device=cpu. we do the reverse, and force device=cpu and backend  set to qbits

* Update base.py

* Update qlinear_qbits.py

* qbits supports 2, 3, 4, 8 bits

* Update qlinear_qbits.py

* reverse/rename asym into sym

* ruff

* rename

* rename

* load qbits only as needed

* cleanup

* cleanup

* fix device override for qbits

* cleanup

* cuda has been removed

* format

* fix check condition

* fix qbits RuntimeError

* fix qbits RuntimeError

* remove todo

* add protobuf in req & remove buggy download artifact with runid: actions/download-artifact#295

* ruff

---------

Signed-off-by: Cheng Penghui <[email protected]>
Co-authored-by: Cheng Penghui <[email protected]>
Co-authored-by: Qubitium-ModelCloud <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants