bpo-32861: urllib.robotparser fix incomplete __str__ methods.#5711
bpo-32861: urllib.robotparser fix incomplete __str__ methods.#5711serhiy-storchaka merged 3 commits intopython:masterfrom michael-lazar:fix-issue-32861
Conversation
The RobotFileParser's string representation was incomplete and missing some valid rule lines.
serhiy-storchaka
left a comment
There was a problem hiding this comment.
LGTM. I have added just few suggestions. And the two unnecessary trailing newlines should be kept in maintained releases.
Please add a news entry.
|
|
||
| def __str__(self): | ||
| return ''.join([str(entry) + "\n" for entry in self.entries]) | ||
| ret = [str(entry) for entry in self.entries] |
There was a problem hiding this comment.
This may be faster:
entries = self.entries
if self.default_entry is not None:
entries = entries + [self.default_entry]
return '\n\n'.join(map(str, entries))| ret = [] | ||
| for agent in self.useragents: | ||
| ret.extend(["User-agent: ", agent, "\n"]) | ||
| ret.append("User-agent: {0}".format(agent)) |
There was a problem hiding this comment.
f-strings can be used in 3.6+.
| for line in self.rulelines: | ||
| ret.extend([str(line), "\n"]) | ||
| return ''.join(ret) | ||
| ret.append(str(line)) |
There was a problem hiding this comment.
Or just
ret.extend(map(str, self.rulelines))|
All suggestions have been implemented, I have no strong opinions on any of them. I also added a news entry. Is backporting part of this PR, or do you merge this into 3.8 and then create separate issues for the other python branches? Is that something that I can help with? |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
LGTM.
Just add your credits.
| @@ -0,0 +1,3 @@ | |||
| The urllib.robotparser's ``__str__`` representation now includes wildcard | |||
| entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra | |||
| newlines that were being appended to the end of the string. | |||
There was a problem hiding this comment.
Please add "Patch by yourname." and add your name into Misc/ACKS.
|
Cool, I added my name to the news entry. I'm already in the ACKS file so all good there. |
|
Thanks @michael-lazar for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6, 3.7. |
…GH-5711) The urllib.robotparser's __str__ representation now includes wildcard entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra newlines that were being appended to the end of the string. (cherry picked from commit bd08a0a) Co-authored-by: Michael Lazar <[email protected]>
|
GH-6795 is a backport of this pull request to the 3.7 branch. |
|
Sorry, @michael-lazar and @serhiy-storchaka, I could not cleanly backport this to |
|
GH-6796 is a backport of this pull request to the 3.6 branch. |
…GH-5711) The urllib.robotparser's __str__ representation now includes wildcard entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra newlines that were being appended to the end of the string. (cherry picked from commit bd08a0a) Co-authored-by: Michael Lazar <[email protected]>
…H-5711) (GH-6795) The urllib.robotparser's __str__ representation now includes wildcard entries and the "Crawl-delay" and "Request-rate" fields. (cherry picked from commit bd08a0a) Co-authored-by: Michael Lazar <[email protected]>
…ythonGH-5711) (pythonGH-6795) The urllib.robotparser's __str__ representation now includes wildcard entries and the "Crawl-delay" and "Request-rate" fields. (cherry picked from commit bd08a0a) Co-authored-by: Michael Lazar <[email protected]> (cherry picked from commit c3fa1f2) Co-authored-by: Miss Islington (bot) <[email protected]>
…ythonGH-5711) (pythonGH-6795) The robotparser's __str__ representation now includes wildcard entries. (cherry picked from commit c3fa1f2) Co-authored-by: Michael Lazar <[email protected]>.
…ythonGH-5711) (pythonGH-6795) The urllib.robotparser's __str__ representation now includes wildcard entries and the "Crawl-delay" and "Request-rate" fields. (cherry picked from commit bd08a0a) Co-authored-by: Michael Lazar <[email protected]> (cherry picked from commit c3fa1f2) Co-authored-by: Miss Islington (bot) <[email protected]>
…H-5711) (GH-6795) (GH-6818) The urllib.robotparser's __str__ representation now includes wildcard entries and the "Crawl-delay" and "Request-rate" fields. (cherry picked from commit c3fa1f2) Co-authored-by: Michael Lazar <[email protected]>
GH-6795) (GH-6817) The robotparser's __str__ representation now includes wildcard entries. (cherry picked from commit c3fa1f2) Co-authored-by: Michael Lazar <[email protected]>.
The RobotFileParser's string representation was incomplete and missing some valid rule lines.
https://bugs.python.org/issue32861
https://bugs.python.org/issue32861