Skip to content

Bug: MaskingContext.unmask() can corrupt output when placeholder names share a prefix (e.g. <NS_1> and <NS_10>) #639

@kuishou68

Description

@kuishou68

Description

MaskingContext.unmask() in app/masking/context.py iterates self._placeholder_map in arbitrary dict insertion order and calls str.replace on each placeholder.

When two placeholders share a name prefix — e.g. <NS_1> and <NS_10> — the shorter one (<NS_1>) may be replaced first, matching the prefix of the longer one (<NS_10>) and producing corrupted output like originalValue0> instead of the original value for <NS_10>.

Reproduction

from app.masking.context import MaskingContext
from app.masking.policy import MaskingPolicy

policy = MaskingPolicy(enabled=True)
ctx = MaskingContext(policy=policy, placeholder_map={
    "<NS_1>": "host-a",
    "<NS_10>": "host-b",
})

# Simulate unmasking a string that contains <NS_10>
text = "Alert on <NS_10> and <NS_1>"
print(ctx.unmask(text))
# If <NS_1> is processed first: 'Alert on host-a0> and host-a'  ← WRONG
# Expected:                      'Alert on host-b and host-a'

Root Cause

# app/masking/context.py  ~line 95
def unmask(self, text: str) -> str:
    result = text
    for placeholder, original in self._placeholder_map.items():   # ← unordered
        if placeholder in result:
            result = result.replace(placeholder, original)
    return result

str.replace treats placeholders as plain substrings. Because <NS_1> is a substring of <NS_10>, iterating in arbitrary order causes the shorter key to match inside the longer one.

Fix

Sort placeholders by descending length before iterating so longer (more-specific) placeholders are always substituted first:

def unmask(self, text: str) -> str:
    if not text or not self._placeholder_map:
        return text
    result = text
    for placeholder, original in sorted(
        self._placeholder_map.items(), key=lambda kv: len(kv[0]), reverse=True
    ):
        if placeholder in result:
            result = result.replace(placeholder, original)
    return result

This is an O(n log n) sort on the number of placeholders (typically small) and has no measurable runtime cost.

Severity

P0 — directly related to #479. Any investigation with ≥10 masked identifiers of the same kind will silently produce corrupted de-anonymised output that reaches the user or downstream systems.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions