Outcome Engineering

o16g

An ongoing exploration, discovery, and invention of what comes next for software engineering and product development in a world of agentic AI development

Read the manifesto →
Most recent How to self-host OpenClaw on a VPS server
All must reads →

Accountability hardens while the agent surface explodes

Regulators and enterprises stop accepting “the model did it” as an excuse—right as agent tooling becomes harder to contain. The UK’s Financial Reporting Council makes the line explicit: audit firms remain responsible for failures even when AI is involved, and “human oversight and accountability” is not optional governance garnish (FRC says auditors can’t blame AI for audit failures after publishing ‘world’s first’ auditor AI guidance). This is a preview of how outcome engineering will be judged: by who can prove control, not who can demo capability.

That posture is already spreading beyond audits. EU institutions ban fully AI-generated images and video in official communications to preserve trust and reduce deepfake risk (EU institutions ban fully AI-generated images and videos in official communications). And Microsoft’s Copilot terms lean hard into accuracy disclaimers—“for entertainment purposes only,” with explicit human verification expectations and tighter usage governance (Microsoft: Copilot is for entertainment purposes only). These aren’t abstract “AI ethics” signals; they’re product requirements. If you can’t show The Gate—permissioning, review, and traceability—institutions default to bans or liability shifts.

The problem is that the agent surface is widening faster than most teams’ immune systems. VentureBeat reports roughly 500,000 exposed OpenClaw instances running locally with no enterprise kill switch—an incident-response nightmare when an agent is both distributed and autonomous (OpenClaw has 500,000 instances and no enterprise kill switch). In parallel, OpenAI patches a ChatGPT flaw that could silently leak conversation data—another reminder that “secure by default” is a myth in consumer-grade AI tooling (A hard truth for the AI era: don’t assume AI tools are secure by default — OpenAI patches ChatGPT data-leak flaw). Then TechCrunch ties a Mercor breach to a LiteLLM supply-chain compromise, with Lapsus$ claiming data theft—showing how quickly open-source agent plumbing becomes a breach path (Mercor hit by supply-chain attack tied to LiteLLM; Lapsus$ claims data theft).

The response is starting to look like a control plane, not a policy doc. Portkey open-sources an AI gateway after processing two trillion tokens a day, explicitly positioning self-hosted governance, routing, and control for production AI (Portkey open-sources its AI gateway after processing 2 trillion tokens a day). Simon Willison’s Datasette ecosystem ships “small” features that are actually governance primitives: per-purpose API keys and internal prompt logging that make model usage attributable and reviewable (datasette-llm 0.1a4, datasette-llm-usage 0.2a0). This is The Documentation and Audit the Outcomes turning into runtime infrastructure.

Ground Truth keeps refusing to be centralized, too. Four major chatbots can’t agree when fact-checking political claims, underscoring why multi-model critique needs explicit evidence handling rather than vibes (4 AI chatbots tried to fact-check Rubio on Iran. They couldn’t agree). If you’re shipping agents into regulated or high-stakes domains, the “truth layer” is now your architecture.

Watch for the next competitive wedge: products that can prove accountability end-to-end—identity, logs, kill switches, and outcome audits—will out-ship products that only improve model quality.

All daily briefs →

Who's instigating and driving conversations

Reach

  1. 1 Simon Willison 4166
  2. 2 David Gewirtz 1884
  3. 3 Efosa Udinmwen 824
  4. 4 Craig Hale 756
  5. 5 Charles Lamanna 742
  6. 6 Dario Amodei 739
  7. 7 Mark Samuels 706
  8. 8 Colin Wood 684
  9. 9 Sam Altman 648
  10. 10 Lenny Rachitsky 610

How many later articles echo yours, weighted by day volume and article score.

First Mover

  1. 1 Sam Altman 65%
  2. 2 Chris Metinko 59%
  3. 3 Tomasz Tunguz 54%
  4. 4 Bruce Schneier 49%
  5. 5 Nathan Lambert 47%
  6. 6 Lance Whitney 46%
  7. 7 Dario Amodei 44%
  8. 8 Zac Bowden 44%
  9. 9 Steven Vaughan-Nichols 43%
  10. 10 Meghan Bobrowsky 42%

Fraction of similar articles published after yours — rewards being early.

Coverage

  1. 1 Mark Samuels 80
  2. 2 Lance Whitney 79
  3. 3 Craig Hale 75
  4. 4 Gavriel Cohen 75
  5. 5 Ritoban Mukherjee 74
  6. 6 Charles Lamanna 71
  7. 7 Chris Metinko 70
  8. 8 Todd Bishop 69
  9. 9 Jagmeet Singh 68
  10. 10 Lenny Rachitsky 67

Sum of daily percentile ranks across reach and first mover — higher means consistently top-ranked.

Reach

  1. 1 Anthropic 12233
  2. 2 NVIDIA 9345
  3. 3 OpenAI 6073
  4. 4 Microsoft 5257
  5. 5 Google 2150
  6. 6 Meta 1706
  7. 7 U.S. Department of Defense 1664
  8. 8 Amazon 1382
  9. 9 1Password 1138
  10. 10 Amazon Web Services 858

How many later articles echo yours, weighted by day volume and article score.

First Mover

  1. 1 Future of Life Institute 66%
  2. 2 Department of Veterans Affairs 63%
  3. 3 Amazon Web Services 52%
  4. 4 Alibaba 49%
  5. 5 Grammarly 49%
  6. 6 Perplexity 47%
  7. 7 UK government 46%
  8. 8 Snowflake 44%
  9. 9 YouTube 43%
  10. 10 Mistral AI 41%

Fraction of similar articles published after yours — rewards being early.

Coverage

  1. 1 Datasette 95
  2. 2 OpenClaw 81
  3. 3 Oracle 75
  4. 4 Stanford University 74
  5. 5 1Password 67
  6. 6 GitHub 64
  7. 7 Future of Life Institute 63
  8. 8 Amazon Web Services 63
  9. 9 White House 62
  10. 10 Mistral AI 61

Sum of daily percentile ranks across reach and first mover — higher means consistently top-ranked.

Reach

  1. 1 siliconangle.com 18564
  2. 2 venturebeat.com 10495
  3. 3 fortune.com 9355
  4. 4 zdnet.com 5329
  5. 5 futurism.com 5176
  6. 6 thenewstack.io 4598
  7. 7 infoworld.com 4305
  8. 8 simonwillison.net 4166
  9. 9 techradar.com 3900
  10. 10 bloomberg.com 3092

How many later articles echo yours, weighted by day volume and article score.

First Mover

  1. 1 theinformation.com 54%
  2. 2 tomtunguz.com 54%
  3. 3 reddit.com 51%
  4. 4 musicbusinessworldwide.com 49%
  5. 5 schneier.com 49%
  6. 6 interconnects.ai 47%
  7. 7 axios.com 46%
  8. 8 futurism.com 44%
  9. 9 devinterrupted.substack.com 41%
  10. 10 404media.co 40%

Fraction of similar articles published after yours — rewards being early.

Coverage

  1. 1 lennysnewsletter.com 67
  2. 2 zdnet.com 66
  3. 3 devinterrupted.substack.com 66
  4. 4 technologyreview.com 64
  5. 5 federalnewsnetwork.com 62
  6. 6 simonwillison.net 61
  7. 7 cnbc.com 61
  8. 8 venturebeat.com 60
  9. 9 fastcompany.com 59
  10. 10 schneier.com 59

Sum of daily percentile ranks across reach and first mover — higher means consistently top-ranked.

Share of trailing 7-day coverage per frontier lab

02-1102-1802-2503-0403-1103-1803-2504-01
Anthropic OpenAI Google Meta DeepSeek Mistral xAI

Per-article sentiment with 7-day net approval

+1 0 -1 02-1102-1802-2503-0403-1103-1803-2504-01
Building Governing Overall

Trailing 7-day balance of creation vs oversight principles

+50 0 -50 02-1102-1802-2503-0403-1103-1803-2504-01
Building Governing
All data →