Baseten (@baseten) / X

Baseten

2,407 posts

Baseten

@baseten

Inference is everything.

San Francisco and New York

Joined March 2021

Pinned
Baseten
@baseten
May 13
Intelligence should be defined by the people closest to the work. Intelligence should be owned by all of us. Let’s build a many model future!
Tuhin Srivastava
@tuhinone
May 13
Article
A many model future
Obsessives have always moved the world forward. They are responsible for our most beloved products, proudest scientific achievements, most moving art, the greatest leaps in what we're capable of....
12K
Baseten
@baseten
16h
The new AgentPerf benchmark by @ArtificialAnlys shows that @NVIDIAAI Blackwell delivers best performance for demanding agentic workloads. With NVIDIA, we're continuously investing in making your coding agents run fast, scale seamlessly, and cost less.
NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark
From blogs.nvidia.com
463
Baseten
@baseten
18h
We're thrilled to be working with the Harvey team to push open models to frontier-level performance for legal AI. Shout out to @gabepereyra for the great article. LAB was key to our joint work post-training open-weight models for legal agents.
Gabe Pereyra
@gabepereyra
22h
Article
Extending Harvey’s Legal Agent Bench to In-House Contracting
Creating a benchmark to evaluate agents’ ability to negotiate contracts end-to-end Authors: @nikogrupen @gabepereyra @ItsJulioPereyra Last month we released LAB, our benchmark for measuring...
1.5K
Baseten
@baseten
19h
Congrats to the MiniMax team on the open-source launch of M3! There are very few <500bn parameter models that can tackle coding, agentic workloads, and multimodal all with a 1M-token context window but M3 does it all. Dig in here: baseten.co/library/minima…
17K
Baseten
@baseten
21h
Join Baseten, Lovable, and ElevenLabs to hack on the future of healthcare.
Alex Ker 🔭
@thealexker
21h
Most AI demos built for healthcare don't survive in real clinical or operational environments. The data is messy, the workflows are fragmented, margin for error is near zero. That's why I'm stoked to host a 1.5-day Healthcare x AI Hackathon with @HealthcareAIGuy in NYC: a
583
Baseten
@baseten
23h
We've heard from customers that they ship model updates >50% more often with rolling deploys than their previous solutions. No downtime, parallel GPU fleet, or off-hours babysitting. Rolling deploys are autoscaling-aware, and you can pause, inspect, or roll back at any step.
Sid Shanker
@sidpshanker
23h
Article
Rolling deployments for zero-downtime model updates
We heard from customers coming from other inference platforms that while updating their models, they were stuck choosing between blue-green deployments and hard cutovers. Canary deployments require a...
3.1K
Baseten
@baseten
Jun 11
Great to see @baseten’s own @oneill_c and @part_harry_ sitting down with @cursor_ai’s @sjwhitmore to talk about the many things their 128(!) agents are doing (and occasionally arguing about), compaction, and the future.
Sam Whitmore
@sjwhitmore
Jun 11
We're trying a new experiment at @cursor_ai - interviewing devs we admire. I chatted with @oneill_c & @part_harry_ from @baseten about how they use coding agents. We discussed their current dev workflows & some predictions for the future. Check it out below!
00:00
2.4K
Baseten
@baseten
Jun 11
We are excited to announce that we have partnered with @_inception_ai to make Mercury 2 available on Baseten. This makes us the first inference platform to bring Inception’s diffusion LLM to production. Inception’s dLLM architecture fixes the bottlenecks of sequential token
Baseten
@baseten
Jun 11
Article
Mercury 2, the first reasoning diffusion LLM, is now on Baseten
Authors: Marylise Tauzia (Baseten), Lucas Bunzel (Inception), Kumar Chellapilla (Inception), Bola Malek (Baseten), Sid Sharma (Inception) TL;DR: Inception's Mercury 2 is now live on Baseten, making us...
22K
Baseten
@baseten
Jun 11
Article
Mercury 2, the first reasoning diffusion LLM, is now on Baseten
Authors: Marylise Tauzia (Baseten), Lucas Bunzel (Inception), Kumar Chellapilla (Inception), Bola Malek (Baseten), Sid Sharma (Inception) TL;DR: Inception's Mercury 2 is now live on Baseten, making us...
30K
Baseten
@baseten
Jun 10
The longer the context, the more memory your LLM needs. We introduce research techniques to compress that memory 200x on the fly without changing the base model.
Charlie O'Neill
@oneill_c
Jun 10
1/ You can shrink a language model's KV cache by 200×, in a single forward pass, and it still answers correctly. At 256k context that's 36 GiB of cache down to ~360 MiB, with no change to the base model. Here's how we did it 👇
3.5K
Baseten
@baseten
Jun 9
Replying to @baseten
respan.ai
AI Gateway for Production LLM Routing | Respan
OpenAI-compatible gateway with failover, response caching, per-key limits, and production tracing on one platform.
258
Baseten
@baseten
Jun 9
Baseten is live on the Respan Gateway. Congratulations to the @RespanAI team on their Gateway launch as they bring observability, evals, and routing to agents. Try Baseten Model APIs now on Respan.
1.1K
Baseten reposted
Sarah Sachs
@sarahmsachs
Jun 8
Model selection isn't just a fancy term for "looking at benchmarks". If you're just auto-updating and going off twitter vibes, you're not really adding any value to your business or your customers. To do this well, it means you need to deeply understand your use cases, how much
Charlie O'Neill
@oneill_c
Jun 8
Working in the Training team at Baseten, I often see companies agonize over which model to use. So many people worry about how to keep up with benchmarks and new releases But with post-training and specialization, and as we see a rising tide in the intelligence of many
How to choose an AI model with Gamma and Notion · Luma
From luma.com
2.4K