Skip to content

Add Unicode variation selector detection to content scanner (Glassworm attack vector) #320

@danielmeppiel

Description

@danielmeppiel

Context

Follow-up from the excellent community feedback in #312 (comment by @raye-deng): our content scanner currently detects tag characters and bidi overrides, but is missing Unicode variation selectors — the specific mechanism used in the Glassworm supply-chain attacks (March 2026) that compromised Wasmer, multiple npm packages, and 72 VS Code extensions.

Variation selectors are particularly insidious because they attach to visible characters, making the byte stream contain invisible payload bytes that humans and most diff viewers ignore. AST-based tools (ESLint, SonarQube, Semgrep) completely skip them because parsers tokenize based on visible character boundaries.

Changes

Add the following ranges to the content scanner's _SUSPICIOUS_RANGES detection table:

Range Name Severity Rationale
U+E0100–E01EF VS17-256 (SMP) critical No legitimate use in prompt files. 240 invisible chars that can encode arbitrary data.
U+FE00–FE0D VS1-14 (BMP) warning Rare CJK typography variants. Unusual in prompt files.
U+FE0E VS15 (text presentation) warning Forces text rendering. Uncommon in prompts.
U+FE0F VS16 (emoji presentation) info Extremely common with emoji — only shown with --verbose to avoid noise.

Key design decisions

  • VS16 (U+FE0F) is info-level: Every emoji uses this character (❤️ = ❤ + U+FE0F). Flagging it as warning/critical would generate noise on virtually every prompt file with emoji. Info level means it only appears with --verbose.
  • No architecture changes: Extends the existing _SUSPICIOUS_RANGES table and _CHAR_LOOKUP O(1) dict. No changes needed to the audit command, install security gate, or compile/pack scanning — they all use ContentScanner generically.
  • Strip behavior: apm audit --strip will remove warning/info-level variation selectors (VS1-16) but preserve critical ones (VS17-256) for manual review — consistent with existing strip behavior.

Scope

  • Scanner ranges in content_scanner.py
  • Unit tests for detection and strip behavior
  • End-to-end audit command tests with injected fixtures
  • Security documentation update with Glassworm reference
  • CHANGELOG entry

Closes via PR.

cc @raye-deng — thank you for the detailed analysis!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementDeprecated: use type/feature. Kept for issue history; will be removed in milestone 0.10.0.securityDeprecated: use theme/security. Kept for issue history; will be removed in milestone 0.10.0.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions