Description
MaskingContext.unmask() in app/masking/context.py iterates self._placeholder_map in arbitrary dict insertion order and calls str.replace on each placeholder.
When two placeholders share a name prefix — e.g. <NS_1> and <NS_10> — the shorter one (<NS_1>) may be replaced first, matching the prefix of the longer one (<NS_10>) and producing corrupted output like originalValue0> instead of the original value for <NS_10>.
Reproduction
from app.masking.context import MaskingContext
from app.masking.policy import MaskingPolicy
policy = MaskingPolicy(enabled=True)
ctx = MaskingContext(policy=policy, placeholder_map={
"<NS_1>": "host-a",
"<NS_10>": "host-b",
})
# Simulate unmasking a string that contains <NS_10>
text = "Alert on <NS_10> and <NS_1>"
print(ctx.unmask(text))
# If <NS_1> is processed first: 'Alert on host-a0> and host-a' ← WRONG
# Expected: 'Alert on host-b and host-a'
Root Cause
# app/masking/context.py ~line 95
def unmask(self, text: str) -> str:
result = text
for placeholder, original in self._placeholder_map.items(): # ← unordered
if placeholder in result:
result = result.replace(placeholder, original)
return result
str.replace treats placeholders as plain substrings. Because <NS_1> is a substring of <NS_10>, iterating in arbitrary order causes the shorter key to match inside the longer one.
Fix
Sort placeholders by descending length before iterating so longer (more-specific) placeholders are always substituted first:
def unmask(self, text: str) -> str:
if not text or not self._placeholder_map:
return text
result = text
for placeholder, original in sorted(
self._placeholder_map.items(), key=lambda kv: len(kv[0]), reverse=True
):
if placeholder in result:
result = result.replace(placeholder, original)
return result
This is an O(n log n) sort on the number of placeholders (typically small) and has no measurable runtime cost.
Severity
P0 — directly related to #479. Any investigation with ≥10 masked identifiers of the same kind will silently produce corrupted de-anonymised output that reaches the user or downstream systems.
Description
MaskingContext.unmask()inapp/masking/context.pyiteratesself._placeholder_mapin arbitrary dict insertion order and callsstr.replaceon each placeholder.When two placeholders share a name prefix — e.g.
<NS_1>and<NS_10>— the shorter one (<NS_1>) may be replaced first, matching the prefix of the longer one (<NS_10>) and producing corrupted output likeoriginalValue0>instead of the original value for<NS_10>.Reproduction
Root Cause
str.replacetreats placeholders as plain substrings. Because<NS_1>is a substring of<NS_10>, iterating in arbitrary order causes the shorter key to match inside the longer one.Fix
Sort placeholders by descending length before iterating so longer (more-specific) placeholders are always substituted first:
This is an O(n log n) sort on the number of placeholders (typically small) and has no measurable runtime cost.
Severity
P0 — directly related to #479. Any investigation with ≥10 masked identifiers of the same kind will silently produce corrupted de-anonymised output that reaches the user or downstream systems.