Skip to content

Conversation

@matthewdouglas
Copy link
Member

This PR fixes 8 failing tests for 4bit dequantization on CPUs with AVX512F support.

There is an accuracy issue for the AVX512 codepath with fp16/fp32 and blocksize 2048 or 4096. This is an unlikely usecase, and as such we can accept to fallback all the way to the slower PyTorch implementation instead of a more complex fallback to a scalar C++ kernel.

@matthewdouglas matthewdouglas added this to the v0.49.0 milestone Dec 10, 2025
@github-actions
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@matthewdouglas matthewdouglas merged commit 5ea4afe into main Dec 10, 2025
274 of 280 checks passed
@matthewdouglas matthewdouglas deleted the cpu-4bit-avx512-workaround branch December 10, 2025 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants