Skip to content

Make METHRE RFC-7230 compliant#3235

Merged
asvetlov merged 1 commit intoaio-libs:masterfrom
kxepal:fix-methodre
Sep 8, 2018
Merged

Make METHRE RFC-7230 compliant#3235
asvetlov merged 1 commit intoaio-libs:masterfrom
kxepal:fix-methodre

Conversation

@kxepal
Copy link
Copy Markdown
Member

@kxepal kxepal commented Sep 2, 2018

Definition of method is:

    method = token
    tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
            "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
    token = 1*tchar

So he had two issues:

  1. Not all the characters were allowed.
  2. Actually, we did allowed too much characters since $-_ parsed as:
    "all characters in range between code 36 ($) and code 95 (_)" instead of
    "characters with codes 36, 45 and 95". So we did match methods like
    [GET] which are malformed according the spec.

@kxepal kxepal requested a review from asvetlov September 2, 2018 04:54
@kxepal
Copy link
Copy Markdown
Member Author

kxepal commented Sep 2, 2018

After review of #3233 I found this inconsistentcy between our regex and the spec. If fix is ok, will finish the rest bits.

@asvetlov
Copy link
Copy Markdown
Member

asvetlov commented Sep 2, 2018

@kxepal could you look on failed tests?

@kxepal
Copy link
Copy Markdown
Member Author

kxepal commented Sep 2, 2018

Oh, I didn't run full test suite, just test_http_parser. Now should be fixed.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Sep 2, 2018

Codecov Report

Merging #3235 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3235   +/-   ##
=======================================
  Coverage   98.07%   98.07%           
=======================================
  Files          43       43           
  Lines        7856     7856           
  Branches     1354     1354           
=======================================
  Hits         7705     7705           
  Misses         59       59           
  Partials       92       92
Impacted Files Coverage Δ
aiohttp/http_parser.py 98.06% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b71302...7351a1f. Read the comment docs.


ASCIISET = set(string.printable)
METHRE = re.compile('[A-Z0-9$-_.]+')
METHRE = re.compile("[!#$%&'*+\-.^_`|~0-9A-Za-z]+")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think It might make sense to put a comment with that rule somewhere nearby:

#     method = token
#     tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
#             "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
#     token = 1*tchar

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is git blame with that information, but having a copy in source code wouldn't hurt anyone. Added that one.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know about blame. But it's visually more convenient to see it next to the code. Like it's done in yarl, for example.

# tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
# "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
# token = 1*tchar
METHRE = re.compile("[!#$%&'*+\-.^_`|~0-9A-Za-z]+")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use a raw-string literal r''

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, + and * might also need escaping and | needs escaping (otherwise it divides groups in [])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and . might stand for any single character.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably they have restricted meanings within [], but IMHO better safe than sorry.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All true, but not for [] sets.

>>> import re
>>> re.match('[.]', 't')
>>> re.match('[.]', 't') is None
True

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

digits can be replaced with \d (to match style below)

Copy link
Copy Markdown
Member Author

@kxepal kxepal Sep 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they cannot. \d means all the unicode numbers, not just ascii ones:

>>> from hypothesis.strategies import from_regex
>>> thing = from_regex('^\d$').example()
>>> re.match('\d', thing)
<_sre.SRE_Match object; span=(0, 1), match='𝟙'>
>>> thing
'𝟙'
>>> unicodedata.category(thing)
'Nd'
>>> unicodedata.name(thing)
'MATHEMATICAL DOUBLE-STRUCK DIGIT ONE'

def test_http_request_parser_bad_method(parser):
with pytest.raises(http_exceptions.BadStatusLine):
parser.feed_data(b'!12%()+=~$ /get HTTP/1.1\r\n\r\n')
parser.feed_data(b'=":<G>(e),[T];?" /get HTTP/1.1\r\n\r\n')

This comment was marked as outdated.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? These characters are valid.

Copy link
Copy Markdown
Member

@webknjaz webknjaz Sep 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's a negative test. This reads confusing. Let's change it to parametrized test, smth like:

@pytest.mark.parametrize(
    'invalid_byte',
    list(b'=":<>(),[];?"'),
)
@pytest.raises(http_exceptions.BadStatusLine)
def test_http_request_parser_bad_method(parser, invalid_byte):
    valid_method = b'Get'
    invalid_method = valid_method + invalid_byte
    invalid_request_line = invalid_method + b' /get HTTP/1.1\r\n\r\n'
    parser.feed_data(invalid_request_line)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with pytest.raises at least.
Personally I dont insist on pytest parametrization

Definition of method is:
```
    method = token
    tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
            "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
    token = 1*tchar
```

So we had two issues:
1. Not all the characters were allowed.
2. Actually, we did allowed too much characters since `$-_` parsed as:
"all characters in range between code 36 ($) and code 95 (_)" instead of
"characters with codes 36, 45 and 95". So we did match methods like
`[GET]` which are malformed according the spec.
@asvetlov
Copy link
Copy Markdown
Member

asvetlov commented Sep 2, 2018

@kxepal please merge when ready

@asvetlov asvetlov merged commit de4daf3 into aio-libs:master Sep 8, 2018
@lock
Copy link
Copy Markdown

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

bot:chronographer:provided There is a change note present in this PR outdated

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants