Skip to content

Conversation

@Grub4K
Copy link
Member

@Grub4K Grub4K commented Aug 27, 2022

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

Iterate over split cookie chunks to skip invalid values instead of failing fast like http.cookies.SimpleCookie does. This allows to preserve all cookie values, even invalid ones.

Fixes #4776

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Iterate over split cookie chunks to skip invalid values instead of failing fast like `SimpleCookie.load()` does
@coletdjnz coletdjnz self-requested a review August 29, 2022 21:36
@coletdjnz
Copy link
Member

coletdjnz commented Aug 30, 2022

Relevant CPython issues: python/cpython#71861, python/cpython#92936, python/cpython#86111

Grub4K added 2 commits August 30, 2022 12:41
Follow the CPython http.cookies implementation, adding the `bad` group to catch the remaining invalid values. Skips invalid values instead of returning immediately.
@Grub4K Grub4K requested review from dirkf and removed request for coletdjnz August 30, 2022 14:56
@pukkandan pukkandan linked an issue Aug 30, 2022 that may be closed by this pull request
7 tasks
@pukkandan pukkandan added enhancement New feature or request bug Bug that is not site-specific and removed enhancement New feature or request labels Aug 30, 2022
Copy link
Contributor

@dirkf dirkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like what I had in mind, thanks. But the proof of the coding is in the testing!

Regarding RFC6265, it's possible that, as with the previous cookie RFCs, the real world doesn't match. I haven't studied the changes in the spec.

@Grub4K
Copy link
Member Author

Grub4K commented Sep 2, 2022

All right, I am composing a testing suite for the cookie parsing, derived from the CPython tests.
Will push the tests to test/test_InfoExtractor.py when they are done.

- Use re.search instead of `re.match` to skip invalid parts
- Continue when encountering bad attributes
- Decode attribute value
- Reset morsel after invalid cookie
@Grub4K Grub4K requested a review from dirkf September 9, 2022 16:11
Copy link
Contributor

@dirkf dirkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supposing that the tests all pass this is nice.

Perhaps consider casting it as a separate class, which might eventually get moved out of extractor/common.py? According to the docs and CPython 3.10 source, s/t like this ought to work:

class ImprovedSimpleCookie(http.cookies.SimpleCookie):
    # your code from l.3639-3678, re-indented as necessary
    def load(self, data):
        if not isinstance(data, str):
            super().load(data)
            return
        # your code from l.3681-3724
...
    def _get_cookies(self, url):
        """ Return a SimpleCookie with the cookies for the url """
        return ImprovedSimpleCookie(self._downloader._calc_cookies(url))

Although it isn't used in yt-dl/dlp (I think), the not-str case is a dict of (cookie_name, value) which the original Cookie/http.cookies code (2.6+) should handle.

@Grub4K
Copy link
Member Author

Grub4K commented Sep 12, 2022

Extracting the class immediately seems to make more sense, potentially placing it into cookies.py (with the tests moved to test_cookies.py)? Or should the class reside within common.py for now?

@dirkf
Copy link
Contributor

dirkf commented Sep 12, 2022

Yes, that file isn't in yt-dl but looks plausible: take advice from @pukkandan.

@pukkandan
Copy link
Member

Extracting the class immediately seems to make more sense, potentially placing it into cookies.py (with the tests moved to test_cookies.py)?

That makes sense

Extract the `InfoExtractor._make_simple_cookie()` function into the `cookies.LenientSimpleCookie` subclass and move tests accordingly
@Grub4K Grub4K requested a review from pukkandan September 12, 2022 23:51
@pukkandan pukkandan merged commit 8817a80 into yt-dlp:master Sep 16, 2022
@Grub4K Grub4K deleted the fix-get-cookies branch September 16, 2022 17:05
@dirkf
Copy link
Contributor

dirkf commented Sep 16, 2022

Late now, but Lenient doesn't properly describe what is being done, ie fixing a defect in the parsing implemented in SimpleCookie.load(). Any of Fixed, Better, Improved, Correct would be more accurate ...

@Grub4K
Copy link
Member Author

Grub4K commented Sep 16, 2022

It is not a defect but a design choice as the comment in the code suggests, and lenient describes the behavior of the cookie accurately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Bug that is not site-specific

Projects

None yet

Development

Successfully merging this pull request may close these issues.

InfoExtractor._get_cookies fails if values contain quotes [crunchyroll] Endless loop when trying to download an episode

4 participants