Fix parsing the Forwarded header#2173
Fix parsing the Forwarded header#2173asvetlov merged 1 commit intoaio-libs:masterfrom vfaronov:master
Conversation
This fixes #2170 by parsing Forwarded more carefully, while staying about as fast, simple, and robust in the face of potential injection attacks. (Speed was measured with IPython's %timeit on my laptop on a few typical and pathological header values.) In particular: - commas and semicolons are allowed inside quoted-strings; - empty forwarded-pairs (as in "for=_1;;by=_2") are allowed; - non-standard parameters are allowed (although this alone could be easily done in the previous parser). This still doesn't parse valid headers containing obs-text, which was an intentional decision in the previous parser (see comments) that I did not change. Also, the previous parser used to bail out of forwarded-elements containing duplicate parameter names. No rationale was given in the code, and I don't think this is important, so the new parser doesn't enforce this.
| r'\1', value[1:-1]) | ||
| fvparams[param.lower()] = value | ||
| elems = [] | ||
| for field_value in self._message.headers.getall(hdrs.FORWARDED, ()): |
There was a problem hiding this comment.
@fafhrd91 @asvetlov
It is easily possible to build FSM which will parse this structure using regex module. Unfortunatelly built-in regexps are not able to capture all repeating matches.
Pros:
- Less code
- Less error-prone
- Faster
- Ability to parse other structures using higly-efficient FSMs
Cons:
- Dependency on regex module
- More difficult to understand (unsure, maybe more simplier)
- Non-pythonic way.
(Anyway, maybe merge as shown here, but re-write later, I can do that)
There was a problem hiding this comment.
@socketpair please merge the PR first than feel free to create a new one for regexps based approach.
There was a problem hiding this comment.
@asvetlov how can I sure that other members agree with my decision to merge ? (sorry, I'm asking BEFORE making something awful)
There was a problem hiding this comment.
@socketpair I’m not sure an FSM can handle my test_single_forwarded_header_injection1 case, which necessarily involves a form of backtracking. And I think this case is important to handle, at least until deployment experience shows that all reverse proxies take precautions against that.
There was a problem hiding this comment.
@socketpair the trick borrowed from CPython works: declare intention to merge (LGTM or github pull request approval), wait a day and do merge tomorrow if nobody objects.
| r'\1', value[1:-1]) | ||
| fvparams[param.lower()] = value | ||
| elems = [] | ||
| for field_value in self._message.headers.getall(hdrs.FORWARDED, ()): |
There was a problem hiding this comment.
@socketpair please merge the PR first than feel free to create a new one for regexps based approach.
Codecov Report
@@ Coverage Diff @@
## master #2173 +/- ##
==========================================
+ Coverage 97.11% 97.14% +0.02%
==========================================
Files 39 39
Lines 7878 7889 +11
Branches 1366 1367 +1
==========================================
+ Hits 7651 7664 +13
+ Misses 101 100 -1
+ Partials 126 125 -1
Continue to review full report at Codecov.
|
|
Thanks |
What do these changes do?
Parse the
Forwardedheader more carefully, while staying about as fast, simple, and robust in the face of potential injection attacks. (Speed was measured with IPython’s%timeiton my laptop on a few typical and pathological header values.)Are there changes in behavior for the user?
Most valid
Forwardedheaders that used to be parsed incorrectly by aiohttp are now parsed correctly. In particular:for=_1;;by=_2) are allowed;Valid headers containing obs-text are still not parsed correctly, as this was an intentional decision in the previous parser (see comments) that I did not change.
Also, the previous parser used to bail out of (invalid) forwarded-elements containing duplicate parameter names. No rationale was given in the code, and I don’t think this is important, so the new parser doesn't enforce this.
Related issue number
#2170
Checklist
CONTRIBUTORS.txtchangesfolder<issue_id>.<type>for example (588.bug)issue_idchange it to the pr id after creating the pr.feature: Signifying a new feature..bugfix: Signifying a bug fix..doc: Signifying a documentation improvement..removal: Signifying a deprecation or removal of public API..misc: A ticket has been closed, but it is not of interest to users.