-
Notifications
You must be signed in to change notification settings - Fork 138
[BACKEND] Add QBits support #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Cheng Penghui <[email protected]>
# Conflicts: # .github/workflows/run_tests.yml # examples/benchmark/generation_speed.py # examples/benchmark/perplexity.py # examples/evaluation/run_language_modeling_task.py # examples/evaluation/run_sequence_classification_task.py # examples/evaluation/run_text_summarization_task.py # examples/quantization/basic_usage_bitblas.py # gptqmodel/models/auto.py # gptqmodel/models/base.py # gptqmodel/nn_modules/qlinear/qlinear_exllama.py # gptqmodel/nn_modules/qlinear/qlinear_exllamav2.py # gptqmodel/utils/importer.py # gptqmodel/utils/model.py # tests/test_q4_bitblas.py # tests/test_q4_exallama_v2.py # tests/test_q4_marlin.py # tests/test_q4_triton.py # tests/test_quant_formats.py # tests/test_sharded.py # tests/test_triton.py
# Conflicts: # gptqmodel/utils/importer.py # tests/test_q4_exallama.py
# Conflicts: # .github/ISSUE_TEMPLATE/bug_report.md # gptqmodel/models/base.py # gptqmodel/utils/importer.py # gptqmodel/utils/model.py # requirements.txt
# Conflicts: # requirements.txt
1e129c8 to
0d966c6
Compare
# Conflicts: # gptqmodel/models/base.py
Qubitium
approved these changes
Jul 5, 2024
Qubitium
approved these changes
Jul 5, 2024
Qubitium
approved these changes
Jul 5, 2024
Collaborator
|
@PenghuiCheng We have merged Qbits support with unit tests in this PR and a few follow-up PRs. |
DeJoker
pushed a commit
to DeJoker/GPTQModel
that referenced
this pull request
Jul 19, 2024
* Support QBits kernel for CPU device Signed-off-by: Cheng Penghui <[email protected]> * fix merge * format * fix merge * rename to meet with latest main style * rename to meet with latest main style * fix doc * revert commented codes * add warning for fallback to cpu * remove unneeded var * fix merge * get gpu from curl * update url & use matrix * revert to main * update codes with pr comments * no 2 bit * set min to 1.4.2 * fix name * add test * remove cpu check, model.device is CPU, so it cause wrong type check there * remove cpu check, model.device is CPU, so it cause wrong type check there * temp disable cuda check * add cpu check back * check module type like main * fix torch_dtype wrong which caused qbits not work * check bits support with BITS_DTYPE_MAPPING * add qbits test * add qbit test to ci * remove for now * delete test_qbits_kernel.py, it can't pass all 4 bit tests * remove cpu check again.. not sure what it is * add qbits in format tests * move test_qbits to test_cpu * no need container * setup python * update cuda check * set python to 3.10 * fix check * update runner * update runner * disable download other run's artifact * set --durations=0 * quant_type removed from main * quant_type removed * override device=cpu for qbits qbits must be explicit and we do not auto switch to qbits when device=cpu. we do the reverse, and force device=cpu and backend set to qbits * Update base.py * Update qlinear_qbits.py * qbits supports 2, 3, 4, 8 bits * Update qlinear_qbits.py * reverse/rename asym into sym * ruff * rename * rename * load qbits only as needed * cleanup * cleanup * fix device override for qbits * cleanup * cuda has been removed * format * fix check condition * fix qbits RuntimeError * fix qbits RuntimeError * remove todo * add protobuf in req & remove buggy download artifact with runid: actions/download-artifact#295 * ruff --------- Signed-off-by: Cheng Penghui <[email protected]> Co-authored-by: Cheng Penghui <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.