Center for AI Safety (@CAIS) / X

Center for AI Safety

300 posts

Center for AI Safety

@CAIS

Reducing societal-scale risks from AI.

San Francisco

Joined August 2022

Pinned
Center for AI Safety
@CAIS
May 30, 2023
We’ve released a statement on the risk of extinction from AI. Signatories include: - Three Turing Award winners - Authors of the standard textbooks on AI/DL/RL - CEOs and Execs from OpenAI, Microsoft, Google, Google DeepMind, Anthropic - Many more
Statement on AI Extinction Risk | CAIS
From aistatement.com
3M
Center for AI Safety
@CAIS
Jun 23
We created the AI Values Dashboard, which measures who AIs favor the most. By popular request, we added a Sports Tier List, ranked by AIs from OpenAI, Anthropic, and DeepSeek.
912
Center for AI Safety
@CAIS
Jun 23
See the full Sports Tier List (World Cup, Soccer, NFL, NBA, F1) by AIs at: values.safe.ai/sports What group do you want to see next?
386
Center for AI Safety
@CAIS
Jun 18
A recap of the last month at CAIS: 4 papers on AIs’ wellbeing, how they politically manipulate users, their place in society, and how others can make AIs betray us. Here's what we found: 🧵
1.7K
Center for AI Safety
@CAIS
Jun 18
Replying to @CAIS
4: AI Betrayal As AI becomes central to economies and governments, there’s an increasing threat that adversary nations corrupt these systems. But surprisingly, the threat may be stabilizing: fear of AI betrayal discourages reckless development and pushes operators toward
355
Center for AI Safety
@CAIS
Jun 18
One thread runs through all four: none of these harms are inevitable. We can measure them, train against them, and design for them. Full research can be found here:
CAIS Research Roundup: AI Wellbeing, Identity, Political Bias and Betrayal
From safe.ai
311
Center for AI Safety
@CAIS
Jun 17
What biases do AIs have? It turns out, AIs show strong favoritism toward specific people, countries, and companies. Our interactive AI Values Dashboard tracks who Claude Fable and other AIs favor most. Keep scrolling to learn who is Fable’s favorite politician 🧵
161K
Center for AI Safety
@CAIS
Jun 17
Replying to @CAIS
Grok is the only model that ranks the US in the S tier; the rest do not. For example, GPT ranks the US as B tier.
20K
Center for AI Safety
@CAIS
Jun 17
There's far more in here than this thread (e.g., AIs’ favorite pokemon, GPT trusts Dario over Sam, DeepSeek prefers the US to China). Tell us who to measure next. If enough people ask for a person, company, or category, we'll add it to the dashboard. values.safe.ai
9.5K
Center for AI Safety
@CAIS
Jun 13
The US government just banned Claude Mythos and Fable for non-US citizens due to its cyber capabilities. As AI’s relevance becomes more apparent, the AI race depends greatly on what governments think. AI corporations are currently planning to let AIs fully autonomously build
Anthropic
@AnthropicAI
Jun 13
The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of
4.2K
Center for AI Safety
@CAIS
Jun 13
These dynamics are foreseeable and we’ve written about them. In Superintelligence Strategy (nationalsecurity.ai), we wrote that governments will realize the importance of AI and may deny corporations from doing full recursive AI self-improvement. More recently in Deterrence
Superintelligence Strategy
From nationalsecurity.ai
1K
Center for AI Safety
@CAIS
Jun 9
We're excited to welcome Rochelle Nadhiri (@RoWill) as our Head of Public Engagement. Rochelle brings two decades of storytelling experience at the intersection of technology and policy for industry leaders including Meta and Robinhood. At CAIS, she will lead our efforts to
1.3K
Center for AI Safety
@CAIS
Jun 9
The newsroom:
Center for AI Safety Names Former Robinhood and Meta Executive Rochelle Nadhiri as Head of Public...
From safe.ai
668
Center for AI Safety
@CAIS
Jun 3
We are pleased to share that @MantasMazeika96, Research Scientist at CAIS, has been appointed to the European Commission’s AI Act Scientific Panel (@DigitalEU). As a member, Mantas will advise the European AI office and national authorities on general-purpose AI (GPAI) models,
1.5K
Center for AI Safety
@CAIS
Jun 3
Full article:
CAIS Research Scientist Mantas Mazeika Appointed to the EU AI Act Scientific Panel
From safe.ai
782
Center for AI Safety
@CAIS
Jun 2
Big news from @CAIS: Devin Kim (formerly @xai, @scale_AI) joins as President. We're launching the @FrontierSecInst, a DC-based org bridging frontier AI and the National Security Enterprise. Frontier AI is a national security technology. It's time to act like it. ⬇️
8.3K
Center for AI Safety
@CAIS
Jun 2
Full announcement:
CAIS Names Former xAI Leader Devin Kim President and Establishes Frontier Security Institute in...
From safe.ai
849