Fixed #463 PII can be leaked if nested query string keys are out of order#663
Conversation
brianr
left a comment
There was a problem hiding this comment.
@danielmorell This part I'm not understanding:
As part of this change we also no longer encode all query strings that we send in the payload in the RFC3986 percent encoding sytanx. Instead we send the query string with the more human readable non-encoded characters.
Doesn't that mean that other characters that previously would have been encoded (and need to be encoded), will no longer be encoded?
Codex's review noted something similar:
- [P0] Keep percent-encoding when rebuilding query — src/Scrubber.php:233-236
Calling urldecode() here means any percent-encoded characters (for example %26 representing &, %2B
representing +, etc.) are decoded before we hand the string back. Because we re-insert the string
without re-encoding, a query like token=abc%26def now becomes token=abc&def, which splits the value
into a brand-new parameter and permanently corrupts the payload we send. This regresses behaviour
for every query string that contains encoded separators, so we need to drop the urldecode() (and
accompanying str_replace) and keep the percent-encoding intact.
|
This one is a tough call. The reason is We could add a check to see if the original string has any percent encoding. If it does we would then return the percent encoded version. Otherwise we would return the decoded version. Some examples of what each of these would produce... Thoughts? |
|
We now check the original string to see if it contains any valid percent encoded hexidcimal characters. If, it does we will output a string with all characters that should be percent encoded in a URL percent encoded. However, if the string does not contain any percent encoded characters. Even if we scrub the string as query string we will ensure we don't percent encode any characters that could be percent encoded. This helps keep scrubbed content as close as possible to the same format we recieved it in. |
Description of the change
This PR resolves an issue where out of order nested query string values could fail to be properly scrubbed.
The reason was we were parsing then rebuilding the query string to see if they matched. If they did we would decode and scrub the string as if it was a query string.
However, this caused issues because the nested keys would be order together, when the encoded string did not strictly require that. This solves that problem by partially decoding the string and comparing that with the decoded and rebuilt query string with a sorted key order.
As part of this change we also no longer encode all query strings that we send in the payload in the RFC3986 percent encoding sytanx. Instead we send the query string with the more human readable non-encoded characters.
Type of change
Related issues
Checklists
Development
Code review