Gemma4 E2B, compressed by @TheStageAI , from 9.3GB to 1.4GB, is running on iPhone 16e with tool calls!
The smallest and the best quality checkpoints open-sourced! @GoogleDeepMind
The smallest checkpoints for Gemma 4 E2B and E4B for local inference. Results for E2B:
size: 9.3 GB → 1.4 GB
speed: 113 tok/s on Apple M3
quality: -3% on ifEval
runs with: MLX, llama.cpp (coming)
Pareto-optimal, open source! Links to the blog post and GitHub repo ⬇️
Beyoncé heard cursing. TheWhisper heard Arsenal.
The fastest Whisper in the world.
Open-source real-time ASR.
Top 5 on OpenASR benchmarks.
1800 RTFx.
Built for live captions, transcription, and voice apps.
See the repo
For AI engineers, latency is product.
Wan 2.2 in Elastic Models now generates 5s of video in 34s on H100. Elastic Models is a library of accelerated open-source models.
Also new: TheWhisper at 1800 RTFx on a single H100 and instant FLUX LoRA switching.
Try it
How do you make text-to-music run in real time in production?
The model has to keep audio generation ahead of playback.
Our new case study with @MireloAI shows how inference optimization delivered up to 2.4х higher throughput.
See the full case study ↓
00:00
2.4x Faster Real-Time Text-to-Music Inference at Mirelo AI
Proud to team up with @brilliantlabsAR and @neuphonicspeech on Halo’s on-device privacy engine.
Coming to Brilliant Labs’ Halo smart glasses: real-time voice + vision, POV stays private.
ANNA + GPU/NPU SDK + memory manager for wake word, STT, TTS, diarization.
SDK demo 👇
Are you a big fan of jacket potato?
This is an open-source, real-time multilingual ASR for live speech.
It stays robust in heavy noise – even at SNR 0 dB.
That’s why it understands speech where people struggle to hear.
Use it for transcription, research, and multilingual apps
At TheStage AI, we shipped @nvidia cuDNN Paged Attention in our Elastic Models library.
We replaced paged FlashAttention for better integration. In our benchmarks, the cuDNN path shows nearly identical quality and latency vs the previous implementation.
Early results on B200: