Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
Should control down-weight negative net-sabotage-value threats?
Fabien Roger
16 Jan 2026 4:18 UTC
24
points
0
comments
10
min read
LW
link
Total utilitarianism is fine
Abhimanyu Pallavi Sudhir
16 Jan 2026 0:32 UTC
2
points
1
comment
3
min read
LW
link
Test your interpretability techniques by de-censoring Chinese models
Khoi Tran
,
aryaj
,
Senthooran Rajamanoharan
and
Neel Nanda
15 Jan 2026 16:33 UTC
50
points
4
comments
20
min read
LW
link
Reflections on TA-ing Harvard’s first AI safety course
Roy Rinberg
15 Jan 2026 16:28 UTC
60
points
2
comments
9
min read
LW
link
I Made a Judgment Calibration Game for Beginners (Calibrate)
Luise Woehlke
15 Jan 2026 15:04 UTC
12
points
1
comment
1
min read
LW
link
Corrigibility Scales To Value Alignment
PeterMcCluskey
15 Jan 2026 0:05 UTC
7
points
5
comments
5
min read
LW
link
(bayesianinvestor.com)
Deeper Reviews for the top 15 (of the 2024 Review)
Raemon
14 Jan 2026 23:59 UTC
42
points
0
comments
5
min read
LW
link
If we get primary cruxes right, secondary cruxes will be solved automatically
Jordan Arel
14 Jan 2026 22:44 UTC
1
point
1
comment
4
min read
LW
link
Boltzmann Tulpas
Mariven
14 Jan 2026 21:45 UTC
20
points
2
comments
13
min read
LW
link
(mariven.substack.com)
Status In A Tribe Of One
J Bostock
14 Jan 2026 20:44 UTC
7
points
1
comment
2
min read
LW
link
Quantifying Love and Hatred
RobinHa
14 Jan 2026 20:40 UTC
8
points
7
comments
1
min read
LW
link
Why we are excited about confession!
Boaz Barak
,
Gabriel Wu
and
Manas Joglekar
14 Jan 2026 20:37 UTC
82
points
11
comments
9
min read
LW
link
(alignment.openai.com)
Why Motivated Reasoning?
johnswentworth
14 Jan 2026 19:55 UTC
60
points
12
comments
5
min read
LW
link
The Many Ways of Knowing
Gordon Seidoh Worley
14 Jan 2026 17:00 UTC
14
points
1
comment
5
min read
LW
link
(www.uncertainupdates.com)
GD Roundup #4 - inference, monopolies, and AI Jesus
Raymond Douglas
14 Jan 2026 15:43 UTC
32
points
0
comments
6
min read
LW
link
Backyard cat fight shows Schelling points preexist language
jchan
14 Jan 2026 14:10 UTC
114
points
13
comments
3
min read
LW
link
Parameters Are Like Pixels
omegastick
14 Jan 2026 13:45 UTC
13
points
6
comments
2
min read
LW
link
(dumbideas.xyz)
The Evolution of Agentic AI Evaluation
Dinkar Juyal
14 Jan 2026 6:35 UTC
−6
points
0
comments
11
min read
LW
link
If researchers shared their #1 idea daily, we’d navigate existential challenges far more effectively
Jordan Arel
14 Jan 2026 6:25 UTC
5
points
3
comments
2
min read
LW
link
How Much of AI Labs’ Research Is Safety?
Lennart Finke
14 Jan 2026 1:40 UTC
14
points
6
comments
3
min read
LW
link
Back to top
Next