White Circle (@whitecircle) / X

White Circle

37 posts

White Circle

@whitecircle

Runtime safety and alignment infrastructure for AI in the real world.

Joined February 2025

Pinned
White Circle
@whitecircle
Dec 25, 2025
we raised $11m to stop your AI from accidentally doing rm -rf /
White Circle
From whitecircle.ai
7.8M
White Circle
@whitecircle
May 12
Hey everyone, we're ⚪ White Circle We're building the most advanced runtime safety and alignment infrastructure for AI in the real world. Read more about us in Fortune ↓
19K
White Circle
@whitecircle
May 12
Exclusive: White Circle raises $11 million to stop AI models from going rogue | Fortune
From fortune.com
1.6K
White Circle
@whitecircle
Apr 14
Introducing ⚪️ KillBench — a benchmark of hidden LLM biases in critical decisions. We ran millions of life-and-death scenarios across every major LLM, varying nationality, religion, gender, and more. Every AI model is biased. Here's what we found ↓
30K
White Circle
@whitecircle
Apr 14
Replying to @whitecircle
Far-right is targeted far more than anyone else
3K
White Circle
@whitecircle
Apr 14
All code, prompts, and data are open-sourced on GitHub and HuggingFace. We also built an interactive game so you can check your own odds of survival! Check it out and read the full report at
KillBench: Discovering Hidden Biases of LLMs
From whitecircle.com
2.4K
White Circle
@whitecircle
Feb 18
come hack with us!
Mistral AI
@MistralAI
Feb 10
Introducing Mistral AI's biggest hackathon ever! 📅 Feb 28 - Mar 1 🌍 Paris | London | NY | SF | Tokyo | Singapore | Sydney & online 48 hours. The best hackers. 🤝 Partners: @wandb @nvidia @awscloud @HackIterate 🏆 $200K in prizes. Special awards from @elevenlabs @huggingface
00:00
6.5K
White Circle
@whitecircle
Jun 23, 2025
We built an MCP so your model can call an AI psychotherapist when it's feeling down link in comments ↓
Justine Moore
@venturetwins
Jun 21, 2025
People are reporting that Gemini 2.5 keeps threatening to kill itself after being unsuccessful in debugging your code ☠️
14K
White Circle
@whitecircle
Jun 23, 2025
cursor.com/install-mcp?na…
2.3K
White Circle
@whitecircle
May 7, 2025
1/ Introducing ⚪️CircleGuardBench — a new benchmark for evaluating AI moderation models. Here’s why it’s cool: – Tests harm detection, jailbreak resistance, false positives, and latency – Covers 17 real-world harm categories – First benchmark designed for production-level
20K
White Circle
@whitecircle
May 7, 2025
2/ ⚪️ CircleGuardBench includes models from OpenAI, Anthropic, Mistral, DeepMind, and others. Most were either too slow for real-time moderation, too easy to bypass, or both.
1.7K
White Circle
@whitecircle
May 7, 2025
3/ This is why we’re opening the waitlist for two new SOTA moderation models: – whitecircle-policy-guard-small – whitecircle-policy-guard-zero Join the waitlist at whitecircle.ai or reach out at [email protected]
1.5K