AI builders: we're open-sourcing fine-tuned Whisper by @OpenAI with @TheStageAI optimized inference engine.
Runs on @nvidia GPUs. 2 W power usage on @Apple devices via CoreML+MLX.
Real-time streaming. Electron + ReactJS samples.
Open-source weights on @huggingface.
Kirill Solodskikh
303 posts
Joined October 2022
- If each person will train their #GPT4 based on their chats in messengers, then a result of specific dialogue can be predicted faster than it would occur in a real life. You can just approve the result of conversation. Future of conversation becomes more predictable.
- Can LLMs recognize ASCII art? Our tests show accelerated Elastic Models analyze line-by-line features and combine them using statistical patterns. Try it yourself with DeepSeek-Qwen-14B β 120 tok/s on H100, 40 tok/s on L40s, up to 3Γ faster. Free API token!
- Our research team took @AIatMeta LLaMA-8B, quantized it with QLIP using post-training int8, applied SmoothQuant, and used pre-defined compiler-compatible NVIDIA configs. Why do this? Up to 2Γ fewer weights and 3.6Γ faster on one GPU. Try it with our simple Jupyter Notebook.
- Meet Elastic MusicGen Large β our optimized fork of @metaai's MusicGen, powered by ANNA (@TheStageAIβs Automated Neural Network Accelerator): huggingface.co/TheStageAI/Elaβ¦ Ye @kanyewest used AI for vocals on "Bully," calling it the "next Auto-Tune." He switched up later, but tracks
- Current situation around top AI conferences like @NeurIPSConf, @CVPR, @iclr_conf, visas problems strongly motivates me to think wider and build proof-of-stake (PoS) conference based on AI + blockchain technology. What we need: π 1. Reviewers which got their stake based on their
- Been cooking up some audio tools. Made a quick playground on Hugging Face Spaces for easy testing. Itβs Elastic MusicGen, our fork of Metaβs MusicGen Large by @TheStageAI. huggingface.co/spaces/TheStagβ¦ Drop prompts, get tracks β in seconds, right in your browser. π 11Γ faster than
- Self-hosted text-to-image on H100 with @TheStageAI Elastic Models, accelerated from FLUX.1-schnell @bfl_ml. Our fastest model S generates a high-quality image in 0.5β―s. Precompiled and ready-to-deploy β minimal cold start. Tutorial + access token inside if you want to try.
- Replying to @GarchFatherOur work on #CVPR2023 "Integral Neural Networks" is a future of the knowledge extraction from large DNNs while reducing computational cost significantly! @TheStage_ai team will release python framework to build INNs in an efficient way. Project link: inn.thestage.ai
- Quantization delivers speedup but can reduce quality. Our researchers prepared a tutorial showing how ANNA automatically quantizes Flux and accelerates it 2Γ while keeping quality high. Orig. model latency: 6.4 s. Check the link. DM or comment for early access.
- How to measure the quality of text-to-image models? Our research team @TheStageAI put together a comprehensive guide to check perceptual quality, sharpness, color, prompt alignment, and more. All the tricky image quality questions researchers usually ask are covered hereβ
- MLPerf Inference v5.1 by @MLCommons is out β hereβs what our team can do. We ran @StabilityAI SDXL on 8ΓH100 with our stack, ANNA, accelerating inference with high quality. 18.1 img/s Submitted alongside @Google, @NVIDIA, @nebiusai and more. Proud @TheStageAI made this β
- Yo Yo! #CVPR2023 participants! We are preparing friendly meetup 23.06 with drinks, food, talk about our award candidate paper INNs and plans with @TheStage_ai team on DNNs acceleration. ABOUT PROMO CODES FOR FREE TICKETS - WRITE ME!
- We believe that everyone will become a model builder! That's why we are creating an automated acceleration and deployment stack which undestands ai engineers needsWeβre finally reaching the era of everyone training their own models based on open-source (versus relying on black box generalist APIs) and it is glorious!










