Manually parse cookies #4780

Grub4K · 2022-08-27T16:53:12Z

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

Iterate over split cookie chunks to skip invalid values instead of failing fast like http.cookies.SimpleCookie does. This allows to preserve all cookie values, even invalid ones.

Fixes #4776

Template

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8 and ran relevant tests

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Fix or improvement to an extractor (Make sure to add/update tests)
New extractor (Piracy websites will not be accepted)
Core bug fix/improvement
New feature (It is strongly recommended to open an issue first)

Iterate over split cookie chunks to skip invalid values instead of failing fast like `SimpleCookie.load()` does

yt_dlp/extractor/common.py

coletdjnz · 2022-08-30T06:42:48Z

Relevant CPython issues: python/cpython#71861, python/cpython#92936, python/cpython#86111

Follow the CPython http.cookies implementation, adding the `bad` group to catch the remaining invalid values. Skips invalid values instead of returning immediately.

dirkf

That looks like what I had in mind, thanks. But the proof of the coding is in the testing!

Regarding RFC6265, it's possible that, as with the previous cookie RFCs, the real world doesn't match. I haven't studied the changes in the spec.

Grub4K · 2022-09-02T16:31:12Z

All right, I am composing a testing suite for the cookie parsing, derived from the CPython tests.
Will push the tests to test/test_InfoExtractor.py when they are done.

- Use re.search instead of `re.match` to skip invalid parts - Continue when encountering bad attributes - Decode attribute value - Reset morsel after invalid cookie

dirkf

Supposing that the tests all pass this is nice.

Perhaps consider casting it as a separate class, which might eventually get moved out of extractor/common.py? According to the docs and CPython 3.10 source, s/t like this ought to work:

class ImprovedSimpleCookie(http.cookies.SimpleCookie):
    # your code from l.3639-3678, re-indented as necessary
    def load(self, data):
        if not isinstance(data, str):
            super().load(data)
            return
        # your code from l.3681-3724
...
    def _get_cookies(self, url):
        """ Return a SimpleCookie with the cookies for the url """
        return ImprovedSimpleCookie(self._downloader._calc_cookies(url))

Although it isn't used in yt-dl/dlp (I think), the not-str case is a dict of (cookie_name, value) which the original Cookie/http.cookies code (2.6+) should handle.

Grub4K · 2022-09-12T19:05:21Z

Extracting the class immediately seems to make more sense, potentially placing it into cookies.py (with the tests moved to test_cookies.py)? Or should the class reside within common.py for now?

dirkf · 2022-09-12T20:05:49Z

Yes, that file isn't in yt-dl but looks plausible: take advice from @pukkandan.

pukkandan · 2022-09-12T22:33:22Z

Extracting the class immediately seems to make more sense, potentially placing it into cookies.py (with the tests moved to test_cookies.py)?

That makes sense

Extract the `InfoExtractor._make_simple_cookie()` function into the `cookies.LenientSimpleCookie` subclass and move tests accordingly

dirkf · 2022-09-16T20:04:49Z

Late now, but Lenient doesn't properly describe what is being done, ie fixing a defect in the parsing implemented in SimpleCookie.load(). Any of Fixed, Better, Improved, Correct would be more accurate ...

Grub4K · 2022-09-16T20:49:18Z

It is not a defect but a design choice as the comment in the code suggests, and lenient describes the behavior of the cookie accurately.

Manually parse cookies

e17b01a

Iterate over split cookie chunks to skip invalid values instead of failing fast like `SimpleCookie.load()` does

coletdjnz self-requested a review August 29, 2022 21:36

dirkf reviewed Aug 30, 2022

View reviewed changes

yt_dlp/extractor/common.py Outdated Show resolved Hide resolved

Grub4K added 2 commits August 30, 2022 12:41

Use internal http.cookies._unquote for unquoting

7c5953a

Comply with CPython http.cookies parsing

4c0868e

Follow the CPython http.cookies implementation, adding the `bad` group to catch the remaining invalid values. Skips invalid values instead of returning immediately.

Grub4K requested review from dirkf and removed request for coletdjnz August 30, 2022 14:56

pukkandan linked an issue Aug 30, 2022 that may be closed by this pull request

[crunchyroll] Endless loop when trying to download an episode #3778

Closed

7 tasks

pukkandan added enhancement New feature or request bug Bug that is not site-specific and removed enhancement New feature or request labels Aug 30, 2022

dirkf reviewed Sep 1, 2022

View reviewed changes

Grub4K added 3 commits September 3, 2022 16:03

Merge branch 'yt-dlp:master' into fix-get-cookies

f36a674

Fix cookie parsing edge cases

0586f8c

- Use re.search instead of `re.match` to skip invalid parts - Continue when encountering bad attributes - Decode attribute value - Reset morsel after invalid cookie

Implement manual cookie parsing tests

9503c47

Grub4K requested a review from dirkf September 9, 2022 16:11

dirkf reviewed Sep 12, 2022

View reviewed changes

Extract cookie parsing into subclass

df2a873

Extract the `InfoExtractor._make_simple_cookie()` function into the `cookies.LenientSimpleCookie` subclass and move tests accordingly

Grub4K requested a review from pukkandan September 12, 2022 23:51

Grub4K and others added 2 commits September 16, 2022 18:30

Split CPython and extended tests

befd24b

cleanup

a4e1964

pukkandan merged commit 8817a80 into yt-dlp:master Sep 16, 2022

Grub4K deleted the fix-get-cookies branch September 16, 2022 17:05

Grub4K mentioned this pull request Sep 25, 2022

Can't download from Crunchyroll #5023

Closed

10 tasks

dirkf mentioned this pull request Oct 5, 2022

dplay.py several bugs latin1' codec can't encode and other bug errors about requests.py, client.py, common.py ytdl-org/youtube-dl#31279

Closed

6 tasks

nburns mentioned this pull request Jan 3, 2024

gh-92936: allow double quote in cookie values python/cpython#113663

Merged

This was referenced Aug 8, 2025

gh-92936: update http.cookies docs post GH-113663 python/cpython#137566

Merged

SimpleCookie() fails for json-like values with embedded double-quotes python/cpython#92936

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Manually parse cookies #4780

Manually parse cookies #4780

Uh oh!

Grub4K commented Aug 27, 2022 •

edited

Loading

Uh oh!

Uh oh!

coletdjnz commented Aug 30, 2022 •

edited

Loading

Uh oh!

dirkf left a comment

Uh oh!

Grub4K commented Sep 2, 2022 •

edited

Loading

Uh oh!

dirkf left a comment

Uh oh!

Grub4K commented Sep 12, 2022

Uh oh!

dirkf commented Sep 12, 2022

Uh oh!

pukkandan commented Sep 12, 2022

Uh oh!

dirkf commented Sep 16, 2022

Uh oh!

Grub4K commented Sep 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Manually parse cookies #4780

Manually parse cookies #4780

Uh oh!

Conversation

Grub4K commented Aug 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of your pull request and other information

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Uh oh!

Uh oh!

coletdjnz commented Aug 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dirkf left a comment

Choose a reason for hiding this comment

Uh oh!

Grub4K commented Sep 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dirkf left a comment

Choose a reason for hiding this comment

Uh oh!

Grub4K commented Sep 12, 2022

Uh oh!

dirkf commented Sep 12, 2022

Uh oh!

pukkandan commented Sep 12, 2022

Uh oh!

dirkf commented Sep 16, 2022

Uh oh!

Grub4K commented Sep 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Grub4K commented Aug 27, 2022 •

edited

Loading

coletdjnz commented Aug 30, 2022 •

edited

Loading

Grub4K commented Sep 2, 2022 •

edited

Loading