Log inSign up
Axel Darmouni
H
3,182 posts
user avatar
Axel Darmouni
H
@ADarmouni
AI Engineer @CentraleSupelec P22 | Data Scientist. Full AI content, mostly LLM-related
Paris, France
axeld5.github.io
Joined November 2019
898
Following
1,251
Followers
  • user avatar
    Axel Darmouni
    H
    @ADarmouni
    Dec 23, 2024
    🧵 This weekend, I did a little fun side project, inspired by @GoogleDeepmind’s Gemini 2.0 Flash Thinking release Basically the idea was: what if we could distill its thinking capacities into a smaller model, enhancing their reasoning performances? More info below ⬇️
    141K
  • user avatar
    Axel Darmouni
    H
    @ADarmouni
    Mar 23, 2025
    A very small model for powerful document analysis 📖 Read of the day, season 3, day 26: « SmolDocling: An ultra-compact vision-language model for end-to-end multimodal document conversion » by @AsNassar, @andimarafioti et al from @IBMResearch and @huggingface The core idea of
    29K
  • user avatar
    Axel Darmouni
    H
    @ADarmouni
    Oct 26, 2025
    Very cool work by @a1zhang and @lateinteraction Instead of calling an LM to solve a problem, let it be able to agentically call an LM that works over an environment, storing prompt and context (that evolve over time) The root LM then answers using all info aggregated by the
    42K
  • user avatar
    Axel Darmouni
    H
    @ADarmouni
    Nov 16, 2025
    Impressive that @inria_paris managed to pre-train and SFT an LLM completely for 3 sizes (1.5, 8, 24B)! Named it Gaperon, and their report covers the whole pipeline from data gathering to the training itself Pre-trained base version matches leaderboard albeit with slight
    14K
  • user avatar
    Axel Darmouni
    H
    @ADarmouni
    Jan 2, 2025
    Creating an LLM as the backbone of a chess strategy 🧵 📖 Read of the day, season 3, day 1: « Mastering Board Games by External and Internal Planning with Large Language Models », by Schultz, Adamek et al from @GoogleDeepMind The authors of that paper make 3 contributions : 1-
    17K
  • user avatar
    Axel Darmouni
    H
    @ADarmouni
    Sep 6, 2025
    The UI Tars 2 report blew my expectations Massively thought out data collection and a pipeline of CT + SFT + RL, impressive display of setup, Multi Turn Online RL, concluded by model merging… To beat the SoTA of OpenAI and Anthropic of CUA Is this the ByteDance moment?
    19K
  • user avatar
    Axel Darmouni
    H
    @ADarmouni
    Jul 23, 2024
    I don’t know what’s most amazing about Llama release - 8B is one of the best models in category - 70B is gpt-4o-mini level - 405B is gpt-4o level - 405B is meant to be available on all cloud providers for use - The whole multimodality section Multimodality section is insane
    13K
  • user avatar
    Axel Darmouni
    H
    @ADarmouni
    Sep 29, 2025
    Was looking for something like this Super cool!
    user avatar
    Ivan Velichko
    @iximiuz
    Sep 22, 2025
    LeetCode, but for Linux, Docker, and Kubernetes? 🧐 Check out my collection of carefully crafted practical problems - with automated checks, helpful hints, and step-by-step solutions: labs.iximiuz.com/challenges A hands-on challenge a day keeps skill rot away.
    8K
  • user avatar
    Axel Darmouni
    H
    @ADarmouni
    Aug 5, 2025
    Aaand @huggingface x @OpenAI already has a finetuning guide on the ready to visualise how to finetune it Can’t love them enough @QGallouedec you guys are heroes
    user avatar
    Axel Darmouni
    H
    @ADarmouni
    Aug 5, 2025
    Those models are hilariously good Curious on how finetuning would go with the reasoning modes, suppose you’d always pick low?
    26K
  • user avatar
    Axel Darmouni
    H
    @ADarmouni
    Jul 13, 2024
    Repurposing PaliGemma as multimodal multi-vector encoder 🧵📖 Read of the day, day 104: ColPali: Efficient Document Retrieval with Vision Language Models, by @ManuelFaysse, @sibille_hugues, @tonywu_71 et al from Illuin Technology arxiv.org/pdf/2407.01449 The authors of this
    6.8K
  • user avatar
    Axel Darmouni
    H
    @ADarmouni
    Aug 7, 2025
    The hard/fun part of this competition is that you don’t just work on train/test of the ARC programs You work on ARC-GEN generated samples as well While a program may solve the train/test samples, it can fail in capturing the ARC task’s complete logic Which means here you need
    user avatar
    François Chollet
    @fchollet
    Aug 7, 2025
    Kaggle just launched the NeurIPS 2025 Code Golf competition -- the goal is for you to write Python solution programs to ARC-AGI-1 tasks, while keeping the programs as small as possible. Are you better at writing code than frontier models? kaggle.com/competitions/g…
    20K
  • user avatar
    Axel Darmouni
    H
    @ADarmouni
    Jul 13, 2024
    Google presents: PaliGemma, a SoTA 3B VLM 🧵📖 Read of the day, day 103: PaliGemma: A versatile 3B VLM for transfer, by @giffmana, @ASusanoPinto, @AndreasPSteiner, @__kolesnikov__, @brainshawn et al from @GoogleDeepmind Zurich arxiv.org/pdf/2407.07726 The authors of this paper
    6.2K
  • user avatar
    Axel Darmouni
    H
    @ADarmouni
    Mar 11, 2025
    Small weekend project I’ve made: turn any textual datasets into an OCR benchmark Sample images here taken from Gsm8k test, rest below ⬇️
    14K
  • user avatar
    Axel Darmouni
    H
    @ADarmouni
    Apr 21, 2024
    Fact-Checking Generated Outputs using a corpus of documents is possible at a GPT-4 level, for only 770M parameters. 🧵📖 Read of the day, day 31: MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents, by @LiyanTang4 et al from the University of Texas
    7.9K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up