TheStage AI (@TheStageAI) / X

TheStage AI

98 posts

TheStage AI

@TheStageAI

Automated Enterprise Inference Stack & Research Lab

Joined May 2023

Pinned
TheStage AI
@TheStageAI
May 12
TheStage AI Platform is now open to everyone. Automatically accelerate your models and download them to run in the cloud or on smartphones.
00:00
3.6M
TheStage AI reposted
Kirill Solodskikh
@GarchFather
Jun 2
Gemma4 E2B, compressed by @TheStageAI , from 9.3GB to 1.4GB, is running on iPhone 16e with tool calls! The smallest and the best quality checkpoints open-sourced! @GoogleDeepMind
00:00
234K
TheStage AI
@TheStageAI
Jun 2
Replying to @TheStageAI
Blog post:
TheStage AI – Faster, Cheaper AI Inference
From app.thestage.ai
91
TheStage AI
@TheStageAI
Jun 2
Replying to @TheStageAI
Github:
GitHub - TheStageAI/edge-lm: Tiny llms optimised for edge deployment
From github.com
125
TheStage AI
@TheStageAI
Jun 2
The smallest checkpoints for Gemma 4 E2B and E4B for local inference. Results for E2B: size: 9.3 GB → 1.4 GB speed: 113 tok/s on Apple M3 quality: -3% on ifEval runs with: MLX, llama.cpp (coming) Pareto-optimal, open source! Links to the blog post and GitHub repo ⬇️
275K
TheStage AI
@TheStageAI
May 17
Try it yourself,
TheStage AI
@TheStageAI
May 12
TheStage AI Platform is now open to everyone. Automatically accelerate your models and download them to run in the cloud or on smartphones.
00:00
TheStage AI – Faster, Cheaper AI Inference
From app.thestage.ai
312
TheStage AI
@TheStageAI
Apr 10
Beyoncé heard cursing. TheWhisper heard Arsenal. The fastest Whisper in the world. Open-source real-time ASR. Top 5 on OpenASR benchmarks. 1800 RTFx. Built for live captions, transcription, and voice apps. See the repo
00:00
Next-Gen Real-Time Whisper
From github.com
2.7M
TheStage AI
@TheStageAI
Apr 8
For AI engineers, latency is product. Wan 2.2 in Elastic Models now generates 5s of video in 34s on H100. Elastic Models is a library of accelerated open-source models. Also new: TheWhisper at 1800 RTFx on a single H100 and instant FLUX LoRA switching. Try it
00:00
Faster, Cheaper AI Inference
From thestage.ai
7.7M
TheStage AI
@TheStageAI
Mar 19
How do you make text-to-music run in real time in production? The model has to keep audio generation ahead of playback. Our new case study with @MireloAI shows how inference optimization delivered up to 2.4х higher throughput. See the full case study ↓
00:00
2.4x Faster Real-Time Text-to-Music Inference at Mirelo AI
From thestage.ai
383
TheStage AI
@TheStageAI
Mar 4
Proud to team up with @brilliantlabsAR and @neuphonicspeech on Halo’s on-device privacy engine. Coming to Brilliant Labs’ Halo smart glasses: real-time voice + vision, POV stays private. ANNA + GPU/NPU SDK + memory manager for wake word, STT, TTS, diarization. SDK demo 👇
00:00
Halo Smart Glasses Run AI Fully On-Device
From digitaltrends.com
2.3K
TheStage AI
@TheStageAI
Jan 22
Are you a big fan of jacket potato? This is an open-source, real-time multilingual ASR for live speech. It stays robust in heavy noise – even at SNR 0 dB. That’s why it understands speech where people struggle to hear. Use it for transcription, research, and multilingual apps
00:00
Code is open. Learn how it works →
From github.com
131K
TheStage AI
@TheStageAI
Jan 15
At TheStage AI, we shipped @nvidia cuDNN Paged Attention in our Elastic Models library. We replaced paged FlashAttention for better integration. In our benchmarks, the cuDNN path shows nearly identical quality and latency vs the previous implementation. Early results on B200:
427
TheStage AI
@TheStageAI
Jan 13
Replying to @TheStageAI
Multilingual, open-source STT built for real-time streaming ↓
GitHub - TheStageAI/TheWhisper: Optimized Whisper models for streaming and on-device use
From github.com
10K