Skip to content

[Regression] v2026.2.14: MEDIA: parser in pi-embedded causes ENOENT spam on tool results #17141

@kbecking

Description

@kbecking

Summary

Version 2026.2.14 introduced a regression that causes constant "Media failed: ENOENT" errors when tool results contain the text "MEDIA:" (e.g., in session transcripts, documentation, or code examples).

Environment

  • OpenClaw version: 2026.2.14
  • Previous working version: 2026.2.12
  • OS: Linux (Ubuntu 24.04)
  • Channel: WhatsApp

Root Cause

New code in dist/pi-embedded-8DITBEle.js scans tool result content for MEDIA: tokens without a startsWith("MEDIA:") guard.

Affected Code (v2026.2.14)

File: dist/pi-embedded-8DITBEle.js (lines ~411-420)

for (const item of content) {
    if (entry.type === "text" && typeof entry.text === "string") {
        MEDIA_TOKEN_RE.lastIndex = 0;
        let match;
        while ((match = MEDIA_TOKEN_RE.exec(entry.text)) !== null) {
            const p = match[1]?.replace(/^[`"'[{(]+/, "").replace(/[`"'\]})\\,]+$/, "").trim();
            if (p && p.length <= 4096) paths.push(p);
        }
    }
}

The Problem

  • Uses regex /\bMEDIA:\s*?([^\n]+)?/gi which matches MEDIA: anywhere in text (not just line-start)
  • Processes tool results (like memory_search returning session transcripts)
  • Session transcripts contain instructional text with examples: "MEDIA:https://example.com/image.jpg"
  • Regex matches those examples → system attempts to fetch them → ENOENT errors

Why v2026.2.12 Doesn't Have This

This code doesn't exist in v2026.2.12:

$ grep -r "MEDIA_TOKEN_RE.lastIndex" v2026.2.12/dist/*.js
(no output)

$ grep -r "MEDIA_TOKEN_RE.lastIndex" v2026.2.14/dist/*.js
v2026.2.14/dist/pi-embedded-8DITBEle.js:            MEDIA_TOKEN_RE.lastIndex = 0;

Comparison

v2026.2.12 (working):

  • Only splitMediaFromOutput parses MEDIA: directives (with startsWith guard) ✅
  • Tool results are not scanned for media tokens
  • No false positives

v2026.2.14 (broken):

  • splitMediaFromOutput still has guard for assistant output ✅
  • NEW: pi-embedded scans tool results WITHOUT guard ❌
  • Result: example URLs in tool output trigger fetch attempts

Reproduction Steps

  1. Use memory_search tool configured with sources: ["memory", "sessions"]
  2. Have session transcripts containing the text MEDIA: (e.g., from previous agent instructions or documentation)
  3. Run any query that returns those transcripts
  4. Observer ENOENT errors for random text fragments

Symptoms

Constant error messages like:

⚠️ Media failed: ENOENT: no such file or directory, open ')) return nextPayload;'
⚠️ Media failed: ENOENT: no such file or directory, open '^>]+>(\s*\([^)]*\))?$/i;'
⚠️ Media failed: ENOENT: no such file or directory, open 'tokens from text content blocks (all OpenClaw tools).'
⚠️ Media failed: Failed to fetch media from https://example.com/image.jpg

Examples of text fragments being treated as file paths:

  • Code snippets: ')) return nextPayload;'
  • Regex patterns: ^>]+>(\s*\([^)]*\))?$/i;'
  • Documentation text: tokens from text content blocks
  • Example URLs from instructional text

Impact

  • High: Error spam disrupts normal operations
  • Tool results containing documentation/examples become unusable
  • Session transcripts trigger the bug every time memory_search is used
  • Users must rollback to v2026.2.12 to avoid the issue

Fix Suggestion

Add a startsWith("MEDIA:") guard to the new code in pi-embedded, similar to how splitMediaFromOutput handles it:

for (const item of content) {
    if (entry.type === "text" && typeof entry.text === "string") {
        // Split into lines and check each line
        const lines = entry.text.split('\n');
        for (const line of lines) {
            // Guard: only process lines that start with MEDIA:
            if (!line.trimStart().startsWith("MEDIA:")) {
                continue;
            }
            
            MEDIA_TOKEN_RE.lastIndex = 0;
            let match;
            while ((match = MEDIA_TOKEN_RE.exec(line)) !== null) {
                const p = match[1]?.replace(/^[`"'[{(]+/, "").replace(/[`"'\]})\\,]+$/, "").trim();
                if (p && p.length <= 4096) paths.push(p);
            }
        }
    }
}

Alternatively, filter tool results before scanning them, or add metadata to distinguish between agent-generated directives and tool result content.

Workaround

Rollback to v2026.2.12:

npm install -g [email protected] --force
systemctl --user restart openclaw-gateway

Related Code

For reference, the working splitMediaFromOutput function has this guard (line ~1606 in deliver-Dlw-4HTg.js):

if (!line.trimStart().startsWith("MEDIA:")) {
    keptLines.push(line);
    lineOffset += line.length + 1;
    continue;
}

This same pattern should be applied to the new tool result scanning code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions