Skip to content

Conversation

@ZX-ModelCloud
Copy link
Collaborator

@ZX-ModelCloud ZX-ModelCloud commented Jun 26, 2024

Resolves #59

The following args will be merged into single backed: Backend = Backend.AUTO

use_triton: bool,
disable_exllama: bool = False,
disable_exllamav2: bool = False,
use_marlin: bool = False,
use_bitblas: bool = True,
Reason: It is not only super confusing for users to use correctly (matrix condition of passive binary toggles), even project developers ran into multiple bugs due to these passive switches. We can't keep adding more binary toggles every time we add a backend/kernel/runtime. Becoming unmaintainable and unusable by both end-users and project devs.

Prelim design:

class Backend(ENUM):
AUTO # choose the fastest one based on quant model compatibility
CUDA_OLD
CUDA
TRITON_V2
EXLLAMA
EXLLAMA_V2
MARLIN
BITBLAS

@ZX-ModelCloud ZX-ModelCloud marked this pull request as ready for review June 27, 2024 05:11
@Qubitium Qubitium merged commit 5b724ac into main Jun 27, 2024
@Qubitium Qubitium deleted the zx_consolidate_backend branch June 27, 2024 06:16
@Qubitium Qubitium changed the title Consolidate Backend [CORE] Consolidate 6+ kernel boolean toggels args to single Backend arg Jun 27, 2024
DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024
DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024
* Consolidate Backend

* change Backend.TRITON_V2 to Backend.TRITON

* According to quantize_config.format, determine when the Backend is packing the model.

* Auto choose the fastest one Backend based on quant model compatibility

* fix issue: Automatically select Backend, returns incorrect qlinear.

* cleanup

* cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Consolidate 6+ related use/disable: bool args in from_quantized into single backend: Backend

3 participants