Skip to content

Conversation

@ikawrakow
Copy link
Owner

Closes #167

I also saw requests for Falcon3-10B-1.58b being made in the mainline llama.cpp and llamafile repositories, so decided to add the ability to use this model with ik_llama.cpp.

  1. Get a ternary model in Microsoft's I2_S format. E.g., for Falcon3-10B-1.58b
huggingface-cli download tiiuae/Falcon3-10B-Instruct-1.58bit-GGUF
  1. Re-quantize to one of the ternary quantization types in this repository. E.g.,
./bin/llama-quantize --allow-requantize path_to_model/ggml-model-i2_s.gguf output.gguf iq2_bn

Works on the CPU and GPU (CUDA or Metal)

Enjoy!

I see perplexity is quite high (higher than the Falcon3 7B Instruct ternary model), so not sure how useful this model is in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Unable to quantize Falcon 10B 1.58 bitnet model

2 participants