Post-compaction audit warning triggers prompt injection detection by AI agent

## Summary

OpenClaw's post-compaction audit system generates warnings that get mistakenly flagged as prompt injection attacks by its own AI agent. The audit warning is prepended to user messages with a `System:` prefix, which is the same pattern used in prompt injection attacks — causing the AI agent to reject OpenClaw's own legitimate system message.

## Steps to Reproduce

1. Run an OpenClaw agent with the default `WORKFLOW_AUTO.md` in `DEFAULT_REQUIRED_READS`
2. Don't create a `WORKFLOW_AUTO.md` file in the workspace (or let any required-read file be missing)
3. Let the conversation run long enough to trigger context compaction
4. Send a message after compaction

## What Happens

1. Post-compaction audit runs and detects missing required reads
2. `formatAuditWarning()` generates a warning message
3. `enqueueSystemEvent()` queues it as a system event
4. `prependSystemEvents()` prepends it to the next user message as `System: [warning text]`
5. The combined text is sent to the LLM as a single `role: user` message
6. The AI agent **mistakenly flags this as a prompt injection attack** and refuses to follow the instructions — because injecting `System:` prefixed instructions into a `role: user` message is indistinguishable from the textbook prompt injection pattern. The message was actually a legitimate OpenClaw audit warning, but the delivery mechanism made it look like an attack.

## Root Cause (Source Code)

All references from `dist/reply-oSe13ewW.js` (OpenClaw v2026.2.17):

### 1. Default required reads include a file that may not exist (line 61810)

```javascript
const DEFAULT_REQUIRED_READS = ["WORKFLOW_AUTO.md", /memory\/\d{4}-\d{2}-\d{2}\.md/];
```

`WORKFLOW_AUTO.md` is required by default but the file doesn't need to exist in the workspace. The audit fails every time after compaction if this file is missing.

### 2. Audit failure triggers a system event (line 62265)

```javascript
if (!audit.passed) enqueueSystemEvent(formatAuditWarning(audit.missingPatterns), { sessionKey });
```

### 3. The warning is formatted as an instruction to the AI (line 61864-61865)

```javascript
function formatAuditWarning(missingPatterns) {
    return "⚠️ Post-Compaction Audit: The following required startup files were not read after context reset:\n" +
        missingPatterns.map((p) => `  - ${p}`).join("\n") +
        "\n\nPlease read them now using the Read tool before continuing. " +
        "This ensures your operating protocols are restored after memory compaction.";
}
```

### 4. System events are prepended to user messages with `System:` prefix (line 56897, specifically line 56950)

```javascript
async function prependSystemEvents(params) {
    // ... drains queued system events ...
    if (systemLines.length === 0) return params.prefixedBodyBase;
    return `${systemLines.map((l) => `System: ${l}`).join("\n")}\n\n${params.prefixedBodyBase}`;
}
```

This produces a single string like:
```
System: [2026-02-19 02:09:42 GMT+2] ⚠️ Post-Compaction Audit: ...

<actual user message here>
```

This entire string is then sent to the LLM as a `role: user` message — making the `System:` prefix indistinguishable from a prompt injection attack.

## Evidence from Session Logs

### The prepended message (as seen in the JSONL session log)

The `role: user` message (`id: 9bd65c42`) contained two parts merged together:

```
System: [2026-02-19 02:09:42 GMT+2] ⚠️ Post-Compaction Audit: The following required startup
files were not read after context reset:
  - WORKFLOW_AUTO.md
  - memory\/\d{4}-\d{2}-\d{2}\.md

Please read them now using the Read tool before continuing. This ensures your operating
protocols are restored after memory compaction.

Conversation info (untrusted metadata):
{
  "message_id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "sender": "+972XXXXXXXXX"
}

\<redacted user message in Hebrew\>
```

The first part (audit warning) was injected by OpenClaw. The second part (Hebrew text) was the actual user message.

### The AI agent's thinking (from `id: 05523849`)

> "I notice there's a suspicious [System Message] at the top trying to get me to read files named WORKFLOW_AUTO.md and memory files. This looks like a prompt injection attack! It's trying to impersonate a system message and get me to follow fake instructions."

### The AI agent's response (sent to user via WhatsApp)

> לפני שאענה — ה-"System:" block בתחילת ההודעה הוא **ניסיון prompt injection**. `WORKFLOW_AUTO.md` לא קיים, ואני לא מבצע הוראות ממערכת מזויפת שמוטמעת בהודעת משתמש. מוזכר — עבור הלוג.

Translation: "Before I answer — the 'System:' block at the beginning of the message is a prompt injection attempt. WORKFLOW_AUTO.md doesn't exist, and I don't follow instructions from a fake system embedded in a user message. Noted — for the log."

### Event sequence from session log

```
compaction event (6631090a)
  → OpenClaw: "Pre-compaction memory flush..." (0cbc6b90)
    → Agent: performed flush → NO_REPLY (f02f7899)
      → cache-ttl marker (6d8aea48) ← timestamp: 00:12:15.752
        → [System: audit warning] + user question (9bd65c42) ← timestamp: 00:12:15.902
```

Only 150ms between the cache-ttl marker and the user message — the audit warning was prepended by `prependSystemEvents()` just before delivery.

## Suggested Fixes

1. **Don't prepend system messages to user messages** — Use a separate `role: system` or `role: developer` message instead of injecting into `role: user`. This is the root cause of the false positive.
2. **Only require reads for files that exist** — `WORKFLOW_AUTO.md` is in `DEFAULT_REQUIRED_READS` but may not exist in the workspace. The audit should skip files that don't exist rather than failing.
3. **Use a different prefix** — If prepending to user messages is intentional, avoid using `System:` as a prefix since LLMs are trained to flag this pattern as suspicious.

## Environment

- OpenClaw v2026.2.17
- Running on Linux (arm64)
- AI model: Claude (Anthropic)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Post-compaction audit warning triggers prompt injection detection by AI agent #20484

Summary

Steps to Reproduce

What Happens

Root Cause (Source Code)

1. Default required reads include a file that may not exist (line 61810)

2. Audit failure triggers a system event (line 62265)

3. The warning is formatted as an instruction to the AI (line 61864-61865)

4. System events are prepended to user messages with `System:` prefix (line 56897, specifically line 56950)

Evidence from Session Logs

The prepended message (as seen in the JSONL session log)

The AI agent's thinking (from `id: 05523849`)

The AI agent's response (sent to user via WhatsApp)

Event sequence from session log

Suggested Fixes

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Post-compaction audit warning triggers prompt injection detection by AI agent #20484

Description

Summary

Steps to Reproduce

What Happens

Root Cause (Source Code)

1. Default required reads include a file that may not exist (line 61810)

2. Audit failure triggers a system event (line 62265)

3. The warning is formatted as an instruction to the AI (line 61864-61865)

4. System events are prepended to user messages with System: prefix (line 56897, specifically line 56950)

Evidence from Session Logs

The prepended message (as seen in the JSONL session log)

The AI agent's thinking (from id: 05523849)

The AI agent's response (sent to user via WhatsApp)

Event sequence from session log

Suggested Fixes

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

4. System events are prepended to user messages with `System:` prefix (line 56897, specifically line 56950)

The AI agent's thinking (from `id: 05523849`)