Andi Marafioti (@andimarafioti) / X

Andi Marafioti

2,737 posts

Andi Marafioti

@andimarafioti

leading multimodal research @huggingface (prev @unity)

Bern, Switzerland

Joined April 2022

Pinned
Andi Marafioti
@andimarafioti
Sep 4, 2025
Fuck it. Today, we open source FineVision: the finest curation of datasets for VLMs, over 200 sources! > 20% improvement across 10 benchmarks > 17M unique images > 10B answer tokens > New capabilities: GUI navigation, pointing, counting FineVision 10x’s open-source VLMs.
133K
Andi Marafioti
@andimarafioti
Mar 17, 2025
🚀We just dropped SmolDocling: a 256M open-source vision LM for complete document OCR!📄✨ It's lightning fast, process a page in 0.35 sec on consumer GPU using < 500MB VRAM⚡ SOTA in document conversion, beating every competing model we tested up to 27x larger🤯 But how? 🧶⬇️
253K
Andi Marafioti
@andimarafioti
Jun 17, 2025
📢 A new open-source OCR model is breaking the internet: Nanonets-OCR-s! Nanonets understands context and semantic structures, transforming documents into clean, structured markdown. It has an Apache 2.0 license, and the authors compare it to Mistral-OCR 🧵 Let's look closer:
172K
Andi Marafioti
@andimarafioti
Jan 31, 2025
Fuck it, today we're open-sourcing the codebase used to train SmolVLM from scratch on 256 H100s🔥 Inspired by our team's effort to open-source DeepSeek's R1 training, we are releasing the training and evaluation code on top of the weights 🫡 Now you can train any of our
99K
Andi Marafioti
@andimarafioti
Jan 23, 2025
Introducing the smollest VLMs yet! 🤏 SmolVLM (256M & 500M) runs on <1GB GPU memory. Fine-tune it on your laptop and run it on your toaster. 🚀 Even the 256M model outperforms our Idefics 80B (Aug '23). How small can we go? 👀
154K
Andi Marafioti
@andimarafioti
Nov 5, 2025
Mixing vision and robotics is incredibly hard, but when it finally works it feels like magic.
00:00
93K
Andi Marafioti
@andimarafioti
Oct 23, 2025
Everyone hypes new OCR models, but olmOCR quietly updates every few months, stays SOTA, and costs $178 per 1M pages. Don’t skip it—it even beats DeepSeek-OCR
Ai2
@allen_ai
Oct 22, 2025
Replying to @allen_ai
On olmOCR-Bench, olmOCR 2 scores 82.4 points, up from 78.5 in our previous release—increasing performance across every document category. 📈
76K
Andi Marafioti
@andimarafioti
Oct 23, 2025
Funny enough, a few years ago I was shopping for data labeling companies with a 100k/year budget and did a POC with Scale and a few others vendors. They were by far the worst. I thought our contract just wasn’t big enough for them to care, but maybe that was just their standard.
Ahmad
@TheAhmadOsman
Oct 22, 2025
> today this guy axes FAIR at Meta > so this is a quick recap of his origin story > and why he should not be the one > making that decision > Alexandr Wang, born January 1997 > age 19, drop out of MIT > co-found Scale AI > "what if we label data, but mid?" > convince every LLM
143K
Andi Marafioti
@andimarafioti
May 14, 2025
Real-time SmolVLM in a web-browser with transformers.js. All running locally with no installs. Just open the website.
00:00
87K
Andi Marafioti
@andimarafioti
Apr 7, 2025
Just read the Qwen2.5-Omni technical report from the Qwen team, it's super interesting. Here are my notes. Qwen2.5-Omni is a unified end-to-end model that can perceive text, images, audio, and video — and generate both text and natural speech responses in a streaming fashion.
37K
Andi Marafioti
@andimarafioti
Nov 26, 2024
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs. SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!
85K
Andi Marafioti
@andimarafioti
Jul 31, 2025
🚀 We're thrilled to launch four new OCR datasets with 20M images: DoclingMatix, SynthFormulaNet, SynthCodeNet, and SynthChartNet. We used them train SmolDocling, our ultra‑compact (256M) full-page document conversion VLM with performance rivaling models up to 27× larger.
30K
Andi Marafioti
@andimarafioti
Oct 21, 2025
🚨 New paper out! “FineVision: Open Data Is All You Need” 🥳 We unified 200+ data sources into 24M samples. That’s 17.3M images and 9.5B answer tokens, the largest open VLM dataset ever released. All fully documented, reproducible, and available for everyone. And there's more! 🎢
00:00
46K
Andi Marafioti
@andimarafioti
Oct 21, 2024
A warm welcome to Moonshine, a new family of speech-to-text models! Moonshine claims to be as fast and accurate as whisper-base, while being up to 5x faster! 🤯 They achieve this by removing whisper's constraint on 30-second length audios. Instead, Moonshine processes audios of
30K