Log inSign up
Andi Marafioti
2,737 posts
user avatar
Andi Marafioti
@andimarafioti
leading multimodal research @huggingface (prev @unity)
Bern, Switzerland
Joined April 2022
643
Following
7,401
Followers
  • Pinned
    user avatar
    Andi Marafioti
    @andimarafioti
    Sep 4, 2025
    Fuck it. Today, we open source FineVision: the finest curation of datasets for VLMs, over 200 sources! > 20% improvement across 10 benchmarks > 17M unique images > 10B answer tokens > New capabilities: GUI navigation, pointing, counting FineVision 10x’s open-source VLMs.
    133K
  • user avatar
    Andi Marafioti
    @andimarafioti
    Mar 17, 2025
    🚀We just dropped SmolDocling: a 256M open-source vision LM for complete document OCR!📄✨ It's lightning fast, process a page in 0.35 sec on consumer GPU using < 500MB VRAM⚡ SOTA in document conversion, beating every competing model we tested up to 27x larger🤯 But how? 🧶⬇️
    253K
  • user avatar
    Andi Marafioti
    @andimarafioti
    Jun 17, 2025
    📢 A new open-source OCR model is breaking the internet: Nanonets-OCR-s! Nanonets understands context and semantic structures, transforming documents into clean, structured markdown. It has an Apache 2.0 license, and the authors compare it to Mistral-OCR 🧵 Let's look closer:
    172K
  • user avatar
    Andi Marafioti
    @andimarafioti
    Jan 31, 2025
    Fuck it, today we're open-sourcing the codebase used to train SmolVLM from scratch on 256 H100s🔥 Inspired by our team's effort to open-source DeepSeek's R1 training, we are releasing the training and evaluation code on top of the weights 🫡 Now you can train any of our
    99K
  • user avatar
    Andi Marafioti
    @andimarafioti
    Jan 23, 2025
    Introducing the smollest VLMs yet! 🤏 SmolVLM (256M & 500M) runs on <1GB GPU memory. Fine-tune it on your laptop and run it on your toaster. 🚀 Even the 256M model outperforms our Idefics 80B (Aug '23). How small can we go? 👀
    154K
  • user avatar
    Andi Marafioti
    @andimarafioti
    Nov 5, 2025
    Mixing vision and robotics is incredibly hard, but when it finally works it feels like magic.
    00:00
    93K
  • user avatar
    Andi Marafioti
    @andimarafioti
    Oct 23, 2025
    Everyone hypes new OCR models, but olmOCR quietly updates every few months, stays SOTA, and costs $178 per 1M pages. Don’t skip it—it even beats DeepSeek-OCR
    user avatar
    Ai2
    @allen_ai
    Oct 22, 2025
    Replying to @allen_ai
    On olmOCR-Bench, olmOCR 2 scores 82.4 points, up from 78.5 in our previous release—increasing performance across every document category. 📈
    76K
  • user avatar
    Andi Marafioti
    @andimarafioti
    Oct 23, 2025
    Funny enough, a few years ago I was shopping for data labeling companies with a 100k/year budget and did a POC with Scale and a few others vendors. They were by far the worst. I thought our contract just wasn’t big enough for them to care, but maybe that was just their standard.
    user avatar
    Ahmad
    @TheAhmadOsman
    Oct 22, 2025
    > today this guy axes FAIR at Meta > so this is a quick recap of his origin story > and why he should not be the one > making that decision > Alexandr Wang, born January 1997 > age 19, drop out of MIT > co-found Scale AI > "what if we label data, but mid?" > convince every LLM
    143K
  • user avatar
    Andi Marafioti
    @andimarafioti
    May 14, 2025
    Real-time SmolVLM in a web-browser with transformers.js. All running locally with no installs. Just open the website.
    00:00
    87K
  • user avatar
    Andi Marafioti
    @andimarafioti
    Apr 7, 2025
    Just read the Qwen2.5-Omni technical report from the Qwen team, it's super interesting. Here are my notes. Qwen2.5-Omni is a unified end-to-end model that can perceive text, images, audio, and video — and generate both text and natural speech responses in a streaming fashion.
    37K
  • user avatar
    Andi Marafioti
    @andimarafioti
    Nov 26, 2024
    Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs. SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!
    85K
  • user avatar
    Andi Marafioti
    @andimarafioti
    Jul 31, 2025
    🚀 We're thrilled to launch four new OCR datasets with 20M images: DoclingMatix, SynthFormulaNet, SynthCodeNet, and SynthChartNet. We used them train SmolDocling, our ultra‑compact (256M) full-page document conversion VLM with performance rivaling models up to 27× larger.
    30K
  • user avatar
    Andi Marafioti
    @andimarafioti
    Oct 21, 2025
    🚨 New paper out! “FineVision: Open Data Is All You Need” 🥳 We unified 200+ data sources into 24M samples. That’s 17.3M images and 9.5B answer tokens, the largest open VLM dataset ever released. All fully documented, reproducible, and available for everyone. And there's more! 🎢
    00:00
    46K
  • user avatar
    Andi Marafioti
    @andimarafioti
    Oct 21, 2024
    A warm welcome to Moonshine, a new family of speech-to-text models! Moonshine claims to be as fast and accurate as whisper-base, while being up to 5x faster! 🤯 They achieve this by removing whisper's constraint on 30-second length audios. Instead, Moonshine processes audios of
    30K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up