Fleek (@fleek) / X

Fleek

3,499 posts

Fleek

@fleek

Something new coming soon

Joined October 2018

Fleek
@fleek
Apr 3
please excuse the silence. we've been cooking up something cool and are excited to share more details soon
4.5K
Fleek
@fleek
Jan 29
NVIDIA just dropped benchmarks showing 4-bit inference loses less than 1 point vs BF16 on most tasks. It's not accuracy per request that you should be measuring. It's tasks completed per dollar. And at that metric, 4-bit wins by a landslide. Read the full blog 👇
Fleek
@fleek
Jan 29
Article
NVIDIA Just Killed the "Quantization = Quality Loss" Myth
NVIDIA's new benchmarks show NVFP4 loses less than 1 point on most tasks while delivering 4x FLOPS. The quantization-kills-quality myth is officially dead. There's this take that floats around AI...
8.3K
Fleek
@fleek
Jan 29
Article
NVIDIA Just Killed the "Quantization = Quality Loss" Myth
NVIDIA's new benchmarks show NVFP4 loses less than 1 point on most tasks while delivering 4x FLOPS. The quantization-kills-quality myth is officially dead. There's this take that floats around AI...
13K
Fleek
@fleek
Jan 24
1/ Yesterday we announced mdspan-cute: C++23 std::mdspan syntax with CUTLASS cute layouts. One header. Zero overhead. Here's how it works 🧵
2.9K
Fleek
@fleek
Jan 24
Replying to @fleek
7/ Layout algebra is formalized in Lean 4. 26 theorems, 0 sorry. Properties extracted to RapidCheck tests. The art/ directory has 23 SVG visualizations - we drew pictures until we understood.
2K
Fleek
@fleek
Jan 24
8/ Check out the code: github.com/weyl-ai/mdspan… Check out the Proofs: github.com/weyl-ai/mdspan… /end
1.8K
Fleek
@fleek
Jan 23
💿 Open Source Release 💿 mdspan-cute: a zero-overhead bridge between C++23 std::mdspan and CUTLASS cute layouts. One header. Swizzled memory. No bank conflicts. Read the blog and check out the repo (links in reply)
2.1K
Fleek
@fleek
Jan 23
Read the blog: weyl.ai/plan/mdspan-cu… Check out the repo: github.com/weyl-ai/mdspan…
mdspan-cute: Zero-Overhead Bridge to CUTLASS | Weyl
From weyl.ai
1.4K
Fleek
@fleek
Jan 22
Replying to @fleek
5/ Quantized RoPE already runs in: → LLaMA → Mistral → Most open source inference stacks This isn't obscure. It's foundational.
668
Fleek
@fleek
Jan 22
6/ On "bit augmentation": Log/exp is a bijection. Information in = information out. You can't create precision from a reversible transformation. Thermodynamics doesn't allow it.
577
Fleek
@fleek
Jan 20
1/Yesterday we announced nix2gpu - a NixOS package for portable GPU containers. Portable containers prevent vendor inference lock-in. Here's why it's a big deal. #Nix #AIInfra
2K
Fleek
@fleek
Jan 20
Replying to @fleek
7/ Why it matters: Makes distributed GPU compute easy and deterministic. Philosophy: It's just Linux with libs - complexity is optional. Open-source, MIT-licensed; production-tested on Fleek machines.
828
Fleek
@fleek
Jan 20
8/ Check out more info on nix2gpu: Full blog: weyl.ai/plan/portable-… Repo: github.com/fleek-sh/nix2g… Quickstart in README - test and send feedback! /End
Ruining GPU Market Owners' Day with the Power of Nix | Weyl
From weyl.ai
772