Hey everyone, we're ⚪ White Circle
We're building the most advanced runtime safety and alignment infrastructure for AI in the real world.
Read more about us in Fortune ↓
Introducing ⚪️ KillBench — a benchmark of hidden LLM biases in critical decisions.
We ran millions of life-and-death scenarios across every major LLM, varying nationality, religion, gender, and more.
Every AI model is biased.
Here's what we found ↓
All code, prompts, and data are open-sourced on GitHub and HuggingFace.
We also built an interactive game so you can check your own odds of survival!
Check it out and read the full report at
Introducing Mistral AI's biggest hackathon ever!
📅 Feb 28 - Mar 1
🌍 Paris | London | NY | SF | Tokyo | Singapore | Sydney & online
48 hours. The best hackers.
🤝 Partners: @wandb@nvidia@awscloud@HackIterate
🏆 $200K in prizes. Special awards from @elevenlabs@huggingface
1/ Introducing ⚪️CircleGuardBench — a new benchmark for evaluating AI moderation models.
Here’s why it’s cool:
– Tests harm detection, jailbreak resistance, false positives, and latency
– Covers 17 real-world harm categories
– First benchmark designed for production-level
2/ ⚪️ CircleGuardBench includes models from OpenAI, Anthropic, Mistral, DeepMind, and others.
Most were either too slow for real-time moderation, too easy to bypass, or both.
3/ This is why we’re opening the waitlist for two new SOTA moderation models:
– whitecircle-policy-guard-small
– whitecircle-policy-guard-zero
Join the waitlist at whitecircle.ai or reach out at [email protected]