Make METHRE RFC-7230 compliant#3235
Make METHRE RFC-7230 compliant#3235asvetlov merged 1 commit intoaio-libs:masterfrom kxepal:fix-methodre
Conversation
|
After review of #3233 I found this inconsistentcy between our regex and the spec. If fix is ok, will finish the rest bits. |
|
@kxepal could you look on failed tests? |
|
Oh, I didn't run full test suite, just test_http_parser. Now should be fixed. |
Codecov Report
@@ Coverage Diff @@
## master #3235 +/- ##
=======================================
Coverage 98.07% 98.07%
=======================================
Files 43 43
Lines 7856 7856
Branches 1354 1354
=======================================
Hits 7705 7705
Misses 59 59
Partials 92 92
Continue to review full report at Codecov.
|
aiohttp/http_parser.py
Outdated
|
|
||
| ASCIISET = set(string.printable) | ||
| METHRE = re.compile('[A-Z0-9$-_.]+') | ||
| METHRE = re.compile("[!#$%&'*+\-.^_`|~0-9A-Za-z]+") |
There was a problem hiding this comment.
I think It might make sense to put a comment with that rule somewhere nearby:
# method = token
# tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
# "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
# token = 1*tchar
There was a problem hiding this comment.
There is git blame with that information, but having a copy in source code wouldn't hurt anyone. Added that one.
There was a problem hiding this comment.
Yeah, I know about blame. But it's visually more convenient to see it next to the code. Like it's done in yarl, for example.
aiohttp/http_parser.py
Outdated
| # tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / | ||
| # "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA | ||
| # token = 1*tchar | ||
| METHRE = re.compile("[!#$%&'*+\-.^_`|~0-9A-Za-z]+") |
There was a problem hiding this comment.
I'd use a raw-string literal r''
There was a problem hiding this comment.
Also, + and * might also need escaping and | needs escaping (otherwise it divides groups in [])
There was a problem hiding this comment.
Oh, and . might stand for any single character.
There was a problem hiding this comment.
Probably they have restricted meanings within [], but IMHO better safe than sorry.
There was a problem hiding this comment.
All true, but not for [] sets.
>>> import re
>>> re.match('[.]', 't')
>>> re.match('[.]', 't') is None
True
There was a problem hiding this comment.
digits can be replaced with \d (to match style below)
There was a problem hiding this comment.
No, they cannot. \d means all the unicode numbers, not just ascii ones:
>>> from hypothesis.strategies import from_regex
>>> thing = from_regex('^\d$').example()
>>> re.match('\d', thing)
<_sre.SRE_Match object; span=(0, 1), match='𝟙'>
>>> thing
'𝟙'
>>> unicodedata.category(thing)
'Nd'
>>> unicodedata.name(thing)
'MATHEMATICAL DOUBLE-STRUCK DIGIT ONE'
| def test_http_request_parser_bad_method(parser): | ||
| with pytest.raises(http_exceptions.BadStatusLine): | ||
| parser.feed_data(b'!12%()+=~$ /get HTTP/1.1\r\n\r\n') | ||
| parser.feed_data(b'=":<G>(e),[T];?" /get HTTP/1.1\r\n\r\n') |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Why? These characters are valid.
There was a problem hiding this comment.
Oh, it's a negative test. This reads confusing. Let's change it to parametrized test, smth like:
@pytest.mark.parametrize(
'invalid_byte',
list(b'=":<>(),[];?"'),
)
@pytest.raises(http_exceptions.BadStatusLine)
def test_http_request_parser_bad_method(parser, invalid_byte):
valid_method = b'Get'
invalid_method = valid_method + invalid_byte
invalid_request_line = invalid_method + b' /get HTTP/1.1\r\n\r\n'
parser.feed_data(invalid_request_line)There was a problem hiding this comment.
with pytest.raises at least.
Personally I dont insist on pytest parametrization
Definition of method is:
```
method = token
tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
"^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
token = 1*tchar
```
So we had two issues:
1. Not all the characters were allowed.
2. Actually, we did allowed too much characters since `$-_` parsed as:
"all characters in range between code 36 ($) and code 95 (_)" instead of
"characters with codes 36, 45 and 95". So we did match methods like
`[GET]` which are malformed according the spec.
|
@kxepal please merge when ready |
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs. |
Definition of method is:
So he had two issues:
$-_parsed as:"all characters in range between code 36 ($) and code 95 (_)" instead of
"characters with codes 36, 45 and 95". So we did match methods like
[GET]which are malformed according the spec.