-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Manually parse cookies #4780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manually parse cookies #4780
Conversation
Iterate over split cookie chunks to skip invalid values instead of failing fast like `SimpleCookie.load()` does
|
Relevant CPython issues: python/cpython#71861, python/cpython#92936, python/cpython#86111 |
Follow the CPython http.cookies implementation, adding the `bad` group to catch the remaining invalid values. Skips invalid values instead of returning immediately.
dirkf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks like what I had in mind, thanks. But the proof of the coding is in the testing!
Regarding RFC6265, it's possible that, as with the previous cookie RFCs, the real world doesn't match. I haven't studied the changes in the spec.
|
All right, I am composing a testing suite for the cookie parsing, derived from the CPython tests. |
- Use re.search instead of `re.match` to skip invalid parts - Continue when encountering bad attributes - Decode attribute value - Reset morsel after invalid cookie
dirkf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supposing that the tests all pass this is nice.
Perhaps consider casting it as a separate class, which might eventually get moved out of extractor/common.py? According to the docs and CPython 3.10 source, s/t like this ought to work:
class ImprovedSimpleCookie(http.cookies.SimpleCookie):
# your code from l.3639-3678, re-indented as necessary
def load(self, data):
if not isinstance(data, str):
super().load(data)
return
# your code from l.3681-3724
...
def _get_cookies(self, url):
""" Return a SimpleCookie with the cookies for the url """
return ImprovedSimpleCookie(self._downloader._calc_cookies(url))
Although it isn't used in yt-dl/dlp (I think), the not-str case is a dict of (cookie_name, value) which the original Cookie/http.cookies code (2.6+) should handle.
|
Extracting the class immediately seems to make more sense, potentially placing it into |
|
Yes, that file isn't in yt-dl but looks plausible: take advice from @pukkandan. |
That makes sense |
Extract the `InfoExtractor._make_simple_cookie()` function into the `cookies.LenientSimpleCookie` subclass and move tests accordingly
|
Late now, but |
|
It is not a defect but a design choice as the comment in the code suggests, and lenient describes the behavior of the cookie accurately. |
IMPORTANT: PRs without the template will be CLOSED
Description of your pull request and other information
Iterate over split cookie chunks to skip invalid values instead of failing fast like
http.cookies.SimpleCookiedoes. This allows to preserve all cookie values, even invalid ones.Fixes #4776
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?