An ongoing exploration, discovery, and invention of what comes next for software engineering and product development in a world of agentic AI development
Read the manifesto →Regulators and enterprises stop accepting “the model did it” as an excuse—right as agent tooling becomes harder to contain. The UK’s Financial Reporting Council makes the line explicit: audit firms remain responsible for failures even when AI is involved, and “human oversight and accountability” is not optional governance garnish (FRC says auditors can’t blame AI for audit failures after publishing ‘world’s first’ auditor AI guidance). This is a preview of how outcome engineering will be judged: by who can prove control, not who can demo capability.
That posture is already spreading beyond audits. EU institutions ban fully AI-generated images and video in official communications to preserve trust and reduce deepfake risk (EU institutions ban fully AI-generated images and videos in official communications). And Microsoft’s Copilot terms lean hard into accuracy disclaimers—“for entertainment purposes only,” with explicit human verification expectations and tighter usage governance (Microsoft: Copilot is for entertainment purposes only). These aren’t abstract “AI ethics” signals; they’re product requirements. If you can’t show The Gate—permissioning, review, and traceability—institutions default to bans or liability shifts.
The problem is that the agent surface is widening faster than most teams’ immune systems. VentureBeat reports roughly 500,000 exposed OpenClaw instances running locally with no enterprise kill switch—an incident-response nightmare when an agent is both distributed and autonomous (OpenClaw has 500,000 instances and no enterprise kill switch). In parallel, OpenAI patches a ChatGPT flaw that could silently leak conversation data—another reminder that “secure by default” is a myth in consumer-grade AI tooling (A hard truth for the AI era: don’t assume AI tools are secure by default — OpenAI patches ChatGPT data-leak flaw). Then TechCrunch ties a Mercor breach to a LiteLLM supply-chain compromise, with Lapsus$ claiming data theft—showing how quickly open-source agent plumbing becomes a breach path (Mercor hit by supply-chain attack tied to LiteLLM; Lapsus$ claims data theft).
The response is starting to look like a control plane, not a policy doc. Portkey open-sources an AI gateway after processing two trillion tokens a day, explicitly positioning self-hosted governance, routing, and control for production AI (Portkey open-sources its AI gateway after processing 2 trillion tokens a day). Simon Willison’s Datasette ecosystem ships “small” features that are actually governance primitives: per-purpose API keys and internal prompt logging that make model usage attributable and reviewable (datasette-llm 0.1a4, datasette-llm-usage 0.2a0). This is The Documentation and Audit the Outcomes turning into runtime infrastructure.
Ground Truth keeps refusing to be centralized, too. Four major chatbots can’t agree when fact-checking political claims, underscoring why multi-model critique needs explicit evidence handling rather than vibes (4 AI chatbots tried to fact-check Rubio on Iran. They couldn’t agree). If you’re shipping agents into regulated or high-stakes domains, the “truth layer” is now your architecture.
Watch for the next competitive wedge: products that can prove accountability end-to-end—identity, logs, kill switches, and outcome audits—will out-ship products that only improve model quality.
Who's instigating and driving conversations
How many later articles echo yours, weighted by day volume and article score.
Fraction of similar articles published after yours — rewards being early.
Sum of daily percentile ranks across reach and first mover — higher means consistently top-ranked.
How many later articles echo yours, weighted by day volume and article score.
Fraction of similar articles published after yours — rewards being early.
Sum of daily percentile ranks across reach and first mover — higher means consistently top-ranked.
How many later articles echo yours, weighted by day volume and article score.
Fraction of similar articles published after yours — rewards being early.
Sum of daily percentile ranks across reach and first mover — higher means consistently top-ranked.
Share of trailing 7-day coverage per frontier lab
Per-article sentiment with 7-day net approval
Trailing 7-day balance of creation vs oversight principles
Stories per principle, last 7 days