Fleek
3,499 posts
- NVIDIA just dropped benchmarks showing 4-bit inference loses less than 1 point vs BF16 on most tasks. It's not accuracy per request that you should be measuring. It's tasks completed per dollar. And at that metric, 4-bit wins by a landslide. Read the full blog 👇
- 8/ Check out the code: github.com/weyl-ai/mdspan… Check out the Proofs: github.com/weyl-ai/mdspan… /end
- 8/ Check out more info on nix2gpu: Full blog: weyl.ai/plan/portable-… Repo: github.com/fleek-sh/nix2g… Quickstart in README - test and send feedback! /End


