Always use encoding="utf-8-sig" when reading text files#387
Merged
pawamoy merged 3 commits intomkdocstrings:mainfrom Jul 21, 2025
Merged
Always use encoding="utf-8-sig" when reading text files#387pawamoy merged 3 commits intomkdocstrings:mainfrom
encoding="utf-8-sig" when reading text files#387pawamoy merged 3 commits intomkdocstrings:mainfrom
Conversation
Contributor
Author
|
I don't think this PR has anything to do with the reported Mypy errors, unless I'm missing something. (I only ran |
Member
|
Thanks! You can rebase on main to get rid of the mypy warnings 👍 |
Changed the encoding from `utf8` to `utf-8-sig` throughout the code base when reading files, in order to ignore a possible byte-order mark (a.k.a. BOM, code point U+FEFF) at the start of the file. As per the Python documentation: > In some areas, it is also convention to use a “BOM” at the start of > UTF-8 encoded files; the name is misleading since UTF-8 is not > byte-order dependent. The mark simply announces that the file is > encoded in UTF-8. For reading such files, use the ‘utf-8-sig’ codec > to automatically skip the mark if present. https://docs.python.org/3/howto/unicode.html#reading-and-writing-unicode-data So this change won't affect reading UTF8-encoded files without a BOM.
5b3816d to
603088f
Compare
Member
|
Oh, can you please add a test that runs on Windows only, asserting the fix works? It should check that trying to load a BOM'd module with UTF8 raises a |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changed the encoding from
utf8toutf-8-sigwhen reading files, in order to ignore a possible byte-order mark (a.k.a. BOM, code point U+FEFF) at the start of the file.As per the Python documentation:
https://docs.python.org/3/howto/unicode.html#reading-and-writing-unicode-data
So this change won't affect reading UTF8-encoded files without a BOM.
Fixes #386.