RSS

Should con­trol down-weight nega­tive net-sab­o­tage-value threats?

Fabien Roger16 Jan 2026 4:18 UTC
24 points
0 comments10 min readLW link

To­tal util­i­tar­i­anism is fine

Abhimanyu Pallavi Sudhir16 Jan 2026 0:32 UTC
2 points
1 comment3 min readLW link

Test your in­ter­pretabil­ity tech­niques by de-cen­sor­ing Chi­nese models

15 Jan 2026 16:33 UTC
50 points
4 comments20 min readLW link

Reflec­tions on TA-ing Har­vard’s first AI safety course

Roy Rinberg15 Jan 2026 16:28 UTC
60 points
2 comments9 min readLW link

I Made a Judg­ment Cal­ibra­tion Game for Begin­ners (Cal­ibrate)

Luise Woehlke15 Jan 2026 15:04 UTC
12 points
1 comment1 min readLW link

Cor­rigi­bil­ity Scales To Value Alignment

PeterMcCluskey15 Jan 2026 0:05 UTC
7 points
5 comments5 min readLW link
(bayesianinvestor.com)

Deeper Re­views for the top 15 (of the 2024 Re­view)

Raemon14 Jan 2026 23:59 UTC
42 points
0 comments5 min readLW link

If we get pri­mary cruxes right, sec­ondary cruxes will be solved automatically

Jordan Arel14 Jan 2026 22:44 UTC
1 point
1 comment4 min readLW link

Boltz­mann Tulpas

Mariven14 Jan 2026 21:45 UTC
20 points
2 comments13 min readLW link
(mariven.substack.com)

Sta­tus In A Tribe Of One

J Bostock14 Jan 2026 20:44 UTC
7 points
1 comment2 min readLW link

Quan­tify­ing Love and Hatred

RobinHa14 Jan 2026 20:40 UTC
8 points
7 comments1 min readLW link

Why we are ex­cited about con­fes­sion!

14 Jan 2026 20:37 UTC
82 points
11 comments9 min readLW link
(alignment.openai.com)

Why Mo­ti­vated Rea­son­ing?

johnswentworth14 Jan 2026 19:55 UTC
60 points
12 comments5 min readLW link

The Many Ways of Knowing

Gordon Seidoh Worley14 Jan 2026 17:00 UTC
14 points
1 comment5 min readLW link
(www.uncertainupdates.com)

GD Roundup #4 - in­fer­ence, mo­nop­o­lies, and AI Jesus

Raymond Douglas14 Jan 2026 15:43 UTC
32 points
0 comments6 min readLW link

Back­yard cat fight shows Schel­ling points pre­ex­ist language

jchan14 Jan 2026 14:10 UTC
114 points
13 comments3 min readLW link

Pa­ram­e­ters Are Like Pixels

omegastick14 Jan 2026 13:45 UTC
13 points
6 comments2 min readLW link
(dumbideas.xyz)

The Evolu­tion of Agen­tic AI Evaluation

Dinkar Juyal14 Jan 2026 6:35 UTC
−6 points
0 comments11 min readLW link

If re­searchers shared their #1 idea daily, we’d nav­i­gate ex­is­ten­tial challenges far more effectively

Jordan Arel14 Jan 2026 6:25 UTC
5 points
3 comments2 min readLW link

How Much of AI Labs’ Re­search Is Safety?

Lennart Finke14 Jan 2026 1:40 UTC
14 points
6 comments3 min readLW link