Skip to content

[Bug]: Voice message binary leaks into context after transcription #7333

@Diaspar4u

Description

@Diaspar4u

Summary

Voice messages (OGG audio) leak 300KB+ of raw binary into context as <file mime="text/plain"> blocks after successful transcription.

Root Cause

In src/media-understanding/apply.ts:367, audio files were only skipped from file extraction if they failed the text heuristic:

if (!forcedTextMimeResolved && kind === "audio" && !textLike) { continue; }

Why this fails for OGG:

  • looksLikeUtf8Text() samples first 4KB, returns true if >85% printable chars
  • OGG files start with OggS magic bytes (valid ASCII: 0x4F 0x67 0x67 0x53)
  • Compressed audio data often has >85% printable bytes in the sample
  • When textLike is true, the skip is bypassed → binary becomes a file block

Fix

Remove && !textLike — audio files should always be skipped from file extraction:

if (!forcedTextMimeResolved && kind === "audio") { continue; }

Audio should be transcribed or skipped entirely, never included as raw binary.

Reproduction

  1. Send voice message via Telegram/WhatsApp
  2. Transcription succeeds (transcript appears)
  3. Raw OGG binary also appears as <file mime="text/plain"> block

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions