Log inSign up
Alex Ratner
1,847 posts
user avatar
Alex Ratner
@ajratner
@SnorkelAI @uwcse / prev @StanfordAILab – Interested in data management systems for machine learning, weak supervision, and impactful applications.
Menlo Park, CA
ajratner.github.io
Joined November 2013
697
Following
6,738
Followers
  • Pinned
    user avatar
    Alex Ratner
    @ajratner
    Feb 15
    This week we launched the Open Benchmarks Grant with a $3M initial commitment from @SnorkelAI + partner support from @huggingface @togethercompute @PrimeIntellect @PyTorch @harborframework & others, in order to close the evaluation gap in AI. Our ability to measure AI has been
    Open Benchmarks Grant for Agentic AI | Snorkel AI
    From benchmarks.snorkel.ai
    7.7K
  • user avatar
    Alex Ratner
    @ajratner
    May 29, 2025
    Agentic AI will transform every enterprise–but only if agents are trusted experts. The key: Evaluation & tuning on specialized, expert data. I’m excited to announce two new products to support this–@SnorkelAI Evaluate & Expert Data-as-a-Service–along w/ our $100M Series D! ---
    00:00
    50K
  • user avatar
    Alex Ratner
    @ajratner
    Dec 18, 2022
    1/ 2023 AI prediction: the gap between generative and predictive AI will widen. Despite product & business model innovation in generative AI, real-world ROI will remain concentrated around predictive AI- leading to frustrated expectations. This gap will all come down to data...
    162K
  • user avatar
    Alex Ratner
    @ajratner
    Apr 2, 2023
    1/ Prediction: Everyone will soon be using foundation models (FMs) like GPT-4. However, they'll be using FMs trained on their own data & workloads: "GPT-You", not GPT-X Tl/dr: - Closed APIs aren't defensible - The durable moat is data - The last mile generates the real value
    87K
  • user avatar
    Alex Ratner
    @ajratner
    Feb 23, 2023
    arxiv.org/abs/2302.10724: ChatGPT is "jack of all trades, master of none"- on avg 25% worse than SOTA. Specialized ML models can be better, faster, and cheaper! But: foundation models like ChatGPT can actually be used to accelerate the development of these specialist models...
    arXiv logo
    arxiv.org
    ChatGPT: Jack of all trades, master of none
    OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. Several publications on ChatGPT...
    43K
  • user avatar
    Alex Ratner
    @ajratner
    Nov 18, 2022
    1/ Foundation models (FMs) like GPT-3 are amazing at generative, human-in-the-loop tasks. But *adapting* and *deploying* them for real enterprise use cases is still a major challenge. Today, we're excited to share @SnorkelAI's new data-centric approach to bridging this gap.
  • user avatar
    Alex Ratner
    @ajratner
    Dec 3, 2022
    1/ Labeling training data is one of the biggest pain points in ML today (whether training or fine-tuning). As a result- we get lots of claims of "automating it away" :) Ofc there's no such thing- but there are ways to make it 10-100x+ faster, which come down to three questions:
  • user avatar
    Alex Ratner
    @ajratner
    Sep 2, 2023
    1/ Lots of debate on "fine tuning vs. RAG" for LLMs. It's a completely false dichotomy! - FT: For adjusting a model's behavior - RAG: For providing external context Eg to make good diagnoses, a doctor needs specialty training (FT) *and* access to the patient's chart (RAG)
    69K
  • user avatar
    Alex Ratner
    @ajratner
    Aug 20, 2023
    Prediction: Enterprise AI will be 1000's of specialist models per org, not one big generalist. They'll be derived from OSS base models, tuned + distilled for specific use cases & settings. Why? - Specialists >> generalists on acc/perf - OSS FMs + fine-tuning = good enough now
    43K
  • user avatar
    Alex Ratner
    @ajratner
    May 6, 2023
    The future of AI is many smaller "specialist" models that are faster *and* cheaper on specific datasets & use cases- not one large "generalist" model. Very excited to share our work on doing this via more efficient distillation, with @GoogleAI & led by the amazing @cydhsieh !!
    user avatar
    Cheng-Yu Hsieh
    @cydhsieh
    May 5, 2023
    Excited to introduce Distilling Step-by-Step! ⚗️🪜 A simple mechanism to train small task-specific models to outperform LLMs, by leveraging less data needed by standard finetuning/distillation. Camera-ready paper and code release coming soon! 📜: arxiv.org/abs/2305.02301 🧵(1/n)
    35K
  • user avatar
    Alex Ratner
    @ajratner
    Jan 13, 2025
    arxiv.org/abs/2408.16737 - example of a noisier dataset with *greater diversity and coverage* outperforming an equivalent-cost dataset that is higher-quality but narrower. Key idea: AI datasets are about quality *and* coverage/diversity- a key tradeoff space to be explored
    arXiv logo
    arxiv.org
    Smaller, Weaker, Yet Better: Training LLM Reasoners via...
    Training on high-quality synthetic data from strong language models (LMs) is a common strategy to improve the reasoning performance of LMs. In this work, we revisit whether this strategy is...
    16K
  • user avatar
    Alex Ratner
    @ajratner
    Aug 15, 2023
    The ChatGPT hype cycle: - Stage 1: "GPT-X is out-of-the-box magic!" - Stage 2: "We need to use our data" (where we are now) - Stage 3: "We need to develop our data" From 2 -> 3, enterprises will realize not enough to just dump in a data lake... use case-specific dev is key!👇🧵
    326K
  • user avatar
    Alex Ratner
    @ajratner
    Jan 15, 2023
    1/ In classical ML, development was "model-centric" (tuning/editing your model). As deep learning models emerged, "data-centric" dev (labeling/curating your data) became increasingly important. In the foundation model era: data-centric is the *only* viable type of development.
    50K
  • user avatar
    Alex Ratner
    @ajratner
    Jun 11, 2025
    Scale alone is not enough for AI data. Quality and complexity are equally critical. Excited to support all of these for LLM developers with @SnorkelAI Data-as-a-Service, and to share our new leaderboard! — Our decade-plus of research and work in AI data has a simple point:
    00:00
    496K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms of Service|Privacy Policy|Cookie Policy|Accessibility|Ads info|© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up