Deepseek-Lite #184

ikawrakow · 2025-01-30T16:03:05Z

I was playing with Deepseek-Lite and noticed that

Quantization mixes are inadequate, so added a few quick changes to that
As some of the tensors row sizes are not divisible by 256, we get quite a few tensors quantized with IQ4_NL, so I noticed that after repacking to IQ4_NL_R4 it does not work for row sizes that are not a multiple of 128 (4 blocks). So, I fixed that (AVX2 and Zen4)
Once at it, also fixed Q5_0_R4 and Q6_0_R4

Quantization error as measured by PPL is surprisingly low for the low-bit quants, even IQ1_S is kind of semi-usable. It is not a "true" IQ1_S quantization as quite a few tensors get quantized to IQ4_NL, and I changed the attention tensors, which represent a tiny fraction of the overall model sizes, to be quantized with much higher bpw. We end up using 2.525 bpw for the repeating layers, and PPL(IQ1_S)/PPL(fp16) - 1 = 49.4%. But I now understand the hype around the Internet when the other day somebody was pretending to have invented 1-bit quantization and quantization mixes by using IQ1_S in llama.cpp for Deepseek-R1.

... on Zen4

... on AVX2

... on Zen4

... on Zen4 and AVX2

…of 128 also on NEON.

Iwan Kawrakow added 8 commits January 30, 2025 11:49

Quantization mixes tweaks

cc39e3a

Make iq4_nl_r4 work with row size that are not a multiple of 128

4ecba36

... on Zen4

Make iq4_nl_r4 work with row size that are not a multiple of 128

5a279b3

... on AVX2

Make iq4_nl_r4 work with row size that are not a multiple of 128

42fcf45

... on AVX2

Make q6_0_w4 work with row size that are not a multiple of 128

5e38114

... on Zen4

Make q6_0_w4 work with row size that are not a multiple of 128

2c3ad4f

... on Zen4

Make q5_0_r4 work with row size that are not a multiple of 128

2ed550e

... on Zen4 and AVX2

Make q5,6_0_r4, iq4_nl_e4 work with row size that are not a multiple …

b896627

…of 128 also on NEON.

ikawrakow merged commit ecf111a into main Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deepseek-Lite #184

Deepseek-Lite #184

Uh oh!

ikawrakow commented Jan 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Deepseek-Lite #184

Deepseek-Lite #184

Uh oh!

Conversation

ikawrakow commented Jan 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants