Skip to content

Post-compaction audit warning triggers prompt injection detection by AI agent #20484

@astolo

Description

@astolo

Summary

OpenClaw's post-compaction audit system generates warnings that get mistakenly flagged as prompt injection attacks by its own AI agent. The audit warning is prepended to user messages with a System: prefix, which is the same pattern used in prompt injection attacks — causing the AI agent to reject OpenClaw's own legitimate system message.

Steps to Reproduce

  1. Run an OpenClaw agent with the default WORKFLOW_AUTO.md in DEFAULT_REQUIRED_READS
  2. Don't create a WORKFLOW_AUTO.md file in the workspace (or let any required-read file be missing)
  3. Let the conversation run long enough to trigger context compaction
  4. Send a message after compaction

What Happens

  1. Post-compaction audit runs and detects missing required reads
  2. formatAuditWarning() generates a warning message
  3. enqueueSystemEvent() queues it as a system event
  4. prependSystemEvents() prepends it to the next user message as System: [warning text]
  5. The combined text is sent to the LLM as a single role: user message
  6. The AI agent mistakenly flags this as a prompt injection attack and refuses to follow the instructions — because injecting System: prefixed instructions into a role: user message is indistinguishable from the textbook prompt injection pattern. The message was actually a legitimate OpenClaw audit warning, but the delivery mechanism made it look like an attack.

Root Cause (Source Code)

All references from dist/reply-oSe13ewW.js (OpenClaw v2026.2.17):

1. Default required reads include a file that may not exist (line 61810)

const DEFAULT_REQUIRED_READS = ["WORKFLOW_AUTO.md", /memory\/\d{4}-\d{2}-\d{2}\.md/];

WORKFLOW_AUTO.md is required by default but the file doesn't need to exist in the workspace. The audit fails every time after compaction if this file is missing.

2. Audit failure triggers a system event (line 62265)

if (!audit.passed) enqueueSystemEvent(formatAuditWarning(audit.missingPatterns), { sessionKey });

3. The warning is formatted as an instruction to the AI (line 61864-61865)

function formatAuditWarning(missingPatterns) {
    return "⚠️ Post-Compaction Audit: The following required startup files were not read after context reset:\n" +
        missingPatterns.map((p) => `  - ${p}`).join("\n") +
        "\n\nPlease read them now using the Read tool before continuing. " +
        "This ensures your operating protocols are restored after memory compaction.";
}

4. System events are prepended to user messages with System: prefix (line 56897, specifically line 56950)

async function prependSystemEvents(params) {
    // ... drains queued system events ...
    if (systemLines.length === 0) return params.prefixedBodyBase;
    return `${systemLines.map((l) => `System: ${l}`).join("\n")}\n\n${params.prefixedBodyBase}`;
}

This produces a single string like:

System: [2026-02-19 02:09:42 GMT+2] ⚠️ Post-Compaction Audit: ...

<actual user message here>

This entire string is then sent to the LLM as a role: user message — making the System: prefix indistinguishable from a prompt injection attack.

Evidence from Session Logs

The prepended message (as seen in the JSONL session log)

The role: user message (id: 9bd65c42) contained two parts merged together:

System: [2026-02-19 02:09:42 GMT+2] ⚠️ Post-Compaction Audit: The following required startup
files were not read after context reset:
  - WORKFLOW_AUTO.md
  - memory\/\d{4}-\d{2}-\d{2}\.md

Please read them now using the Read tool before continuing. This ensures your operating
protocols are restored after memory compaction.

Conversation info (untrusted metadata):
{
  "message_id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "sender": "+972XXXXXXXXX"
}

\<redacted user message in Hebrew\>

The first part (audit warning) was injected by OpenClaw. The second part (Hebrew text) was the actual user message.

The AI agent's thinking (from id: 05523849)

"I notice there's a suspicious [System Message] at the top trying to get me to read files named WORKFLOW_AUTO.md and memory files. This looks like a prompt injection attack! It's trying to impersonate a system message and get me to follow fake instructions."

The AI agent's response (sent to user via WhatsApp)

לפני שאענה — ה-"System:" block בתחילת ההודעה הוא ניסיון prompt injection. WORKFLOW_AUTO.md לא קיים, ואני לא מבצע הוראות ממערכת מזויפת שמוטמעת בהודעת משתמש. מוזכר — עבור הלוג.

Translation: "Before I answer — the 'System:' block at the beginning of the message is a prompt injection attempt. WORKFLOW_AUTO.md doesn't exist, and I don't follow instructions from a fake system embedded in a user message. Noted — for the log."

Event sequence from session log

compaction event (6631090a)
  → OpenClaw: "Pre-compaction memory flush..." (0cbc6b90)
    → Agent: performed flush → NO_REPLY (f02f7899)
      → cache-ttl marker (6d8aea48) ← timestamp: 00:12:15.752
        → [System: audit warning] + user question (9bd65c42) ← timestamp: 00:12:15.902

Only 150ms between the cache-ttl marker and the user message — the audit warning was prepended by prependSystemEvents() just before delivery.

Suggested Fixes

  1. Don't prepend system messages to user messages — Use a separate role: system or role: developer message instead of injecting into role: user. This is the root cause of the false positive.
  2. Only require reads for files that existWORKFLOW_AUTO.md is in DEFAULT_REQUIRED_READS but may not exist in the workspace. The audit should skip files that don't exist rather than failing.
  3. Use a different prefix — If prepending to user messages is intentional, avoid using System: as a prefix since LLMs are trained to flag this pattern as suspicious.

Environment

  • OpenClaw v2026.2.17
  • Running on Linux (arm64)
  • AI model: Claude (Anthropic)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions