Context
Follow-up from the excellent community feedback in #312 (comment by @raye-deng): our content scanner currently detects tag characters and bidi overrides, but is missing Unicode variation selectors — the specific mechanism used in the Glassworm supply-chain attacks (March 2026) that compromised Wasmer, multiple npm packages, and 72 VS Code extensions.
Variation selectors are particularly insidious because they attach to visible characters, making the byte stream contain invisible payload bytes that humans and most diff viewers ignore. AST-based tools (ESLint, SonarQube, Semgrep) completely skip them because parsers tokenize based on visible character boundaries.
Changes
Add the following ranges to the content scanner's _SUSPICIOUS_RANGES detection table:
| Range |
Name |
Severity |
Rationale |
| U+E0100–E01EF |
VS17-256 (SMP) |
critical |
No legitimate use in prompt files. 240 invisible chars that can encode arbitrary data. |
| U+FE00–FE0D |
VS1-14 (BMP) |
warning |
Rare CJK typography variants. Unusual in prompt files. |
| U+FE0E |
VS15 (text presentation) |
warning |
Forces text rendering. Uncommon in prompts. |
| U+FE0F |
VS16 (emoji presentation) |
info |
Extremely common with emoji — only shown with --verbose to avoid noise. |
Key design decisions
- VS16 (U+FE0F) is info-level: Every emoji uses this character (❤️ = ❤ + U+FE0F). Flagging it as warning/critical would generate noise on virtually every prompt file with emoji. Info level means it only appears with
--verbose.
- No architecture changes: Extends the existing
_SUSPICIOUS_RANGES table and _CHAR_LOOKUP O(1) dict. No changes needed to the audit command, install security gate, or compile/pack scanning — they all use ContentScanner generically.
- Strip behavior:
apm audit --strip will remove warning/info-level variation selectors (VS1-16) but preserve critical ones (VS17-256) for manual review — consistent with existing strip behavior.
Scope
Closes via PR.
cc @raye-deng — thank you for the detailed analysis!
Context
Follow-up from the excellent community feedback in #312 (comment by @raye-deng): our content scanner currently detects tag characters and bidi overrides, but is missing Unicode variation selectors — the specific mechanism used in the Glassworm supply-chain attacks (March 2026) that compromised Wasmer, multiple npm packages, and 72 VS Code extensions.
Variation selectors are particularly insidious because they attach to visible characters, making the byte stream contain invisible payload bytes that humans and most diff viewers ignore. AST-based tools (ESLint, SonarQube, Semgrep) completely skip them because parsers tokenize based on visible character boundaries.
Changes
Add the following ranges to the content scanner's
_SUSPICIOUS_RANGESdetection table:--verboseto avoid noise.Key design decisions
--verbose._SUSPICIOUS_RANGEStable and_CHAR_LOOKUPO(1) dict. No changes needed to the audit command, install security gate, or compile/pack scanning — they all useContentScannergenerically.apm audit --stripwill remove warning/info-level variation selectors (VS1-16) but preserve critical ones (VS17-256) for manual review — consistent with existing strip behavior.Scope
content_scanner.pyCloses via PR.
cc @raye-deng — thank you for the detailed analysis!