Skip to content

Conversation

@ZX-ModelCloud
Copy link
Collaborator

No description provided.

@Qubitium Qubitium changed the title remove Backend.CUDA and Backend.CUDA_OLD [REFRACTOR] Remove Backend.CUDA and Backend.CUDA_OLD Jul 4, 2024
@Qubitium Qubitium marked this pull request as ready for review July 4, 2024 08:08
@Qubitium
Copy link
Collaborator

Qubitium commented Jul 4, 2024

Both cuda/cuda-old have no good working case in July 2024. They perform much worse than exllama and exllama v2 kernels. The only saving grace is more bits supported but 99.9% of the cases will be using 4bit. We will not spend the time to support compat for 4 kernels when we can just pick the fastest 2.

@Qubitium Qubitium merged commit 6f1eb58 into ModelCloud:main Jul 4, 2024
DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024
* remove Backend.CUDA and Backend.CUDA_OLD

* fix unit test

* remove cuda_64/ and cuda_256/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants