-
-
Notifications
You must be signed in to change notification settings - Fork 63
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Griffe fails to parse Python files that begin with a UTF8 byte-order mark (a.k.a. BOM, code point U+FEFF).
Minimal reproducer with an otherwise empty Python module:
from griffe import GriffeLoader
from pathlib import Path
loader = GriffeLoader(search_paths=[Path('.')])
file = Path('empty_except_bom.py')
file.write_text('', encoding='utf-8-sig')
module = loader.load(file.stem)Raises:
SyntaxError: invalid non-printable character U+FEFFFull traceback
Could not load package Package(name='empty_except_bom', path=WindowsPath('C:/home/projects/MPh/docs/Griffe_bug_UTF8_BOM/empty_except_bom.py'), stubs=None)
Traceback (most recent call last):
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 531, in _load_module
return self._load_module_path(module_name, module_path, submodules=submodules, parent=parent)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 555, in _load_module_path
module = self._visit_module(module_name, module_path, parent)
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 634, in _visit_module
module = visit(
module_name,
...<7 lines>...
modules_collection=self.modules_collection,
)
File "C:\scratch\repos\other\Griffe\src\_griffe\agents\visitor.py", line 113, in visit
).get_module()
~~~~~~~~~~^^
File "C:\scratch\repos\other\Griffe\src\_griffe\agents\visitor.py", line 204, in get_module
top_node = compile(self.code, mode="exec", filename=str(self.filepath), flags=ast.PyCF_ONLY_AST, optimize=1)
File "C:\home\projects\MPh\docs\Griffe_bug_UTF8_BOM\empty_except_bom.py", line 1
^
SyntaxError: invalid non-printable character U+FEFF
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 179, in load
top_module = self._load_package(package, submodules=submodules)
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 508, in _load_package
top_module = self._load_module(package.name, package.path, submodules=submodules)
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 533, in _load_module
raise LoadingError(f"Syntax error: {error}") from error
_griffe.exceptions.LoadingError: Syntax error: invalid non-printable character U+FEFF (empty_except_bom.py, line 1)
Traceback (most recent call last):
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 531, in _load_module
return self._load_module_path(module_name, module_path, submodules=submodules, parent=parent)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 555, in _load_module_path
module = self._visit_module(module_name, module_path, parent)
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 634, in _visit_module
module = visit(
module_name,
...<7 lines>...
modules_collection=self.modules_collection,
)
File "C:\scratch\repos\other\Griffe\src\_griffe\agents\visitor.py", line 113, in visit
).get_module()
~~~~~~~~~~^^
File "C:\scratch\repos\other\Griffe\src\_griffe\agents\visitor.py", line 204, in get_module
top_node = compile(self.code, mode="exec", filename=str(self.filepath), flags=ast.PyCF_ONLY_AST, optimize=1)
File "C:\home\projects\MPh\docs\Griffe_bug_UTF8_BOM\empty_except_bom.py", line 1
^
SyntaxError: invalid non-printable character U+FEFF
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\home\projects\MPh\docs\Griffe_bug_UTF8_BOM\demo_bug.py", line 20, in <module>
module = loader.load(file.stem)
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 179, in load
top_module = self._load_package(package, submodules=submodules)
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 508, in _load_package
top_module = self._load_module(package.name, package.path, submodules=submodules)
File "C:\scratch\repos\other\Griffe\src\_griffe\loader.py", line 533, in _load_module
raise LoadingError(f"Syntax error: {error}") from error
_griffe.exceptions.LoadingError: Syntax error: invalid non-printable character U+FEFF (empty_except_bom.py, line 1)Environment information
❯ griffe --debug-info
- __System__: Windows-11-10.0.22621-SP0
- __Python__: cpython 3.13.4 (C:\scratch\venvs\Griffe\Scripts\python.exe)
- __Environment variables__:
- `PYTHONPATH`: `C:\home\tools;C:\polybox\work\tools`
- __Installed packages__:
- `griffe` v1.7.4.dev1172+g441b3b7UTF-8 BOMs aren't used a lot, but are supported by the Python interpreter. I, for one, use them routinely, as they prevent editing mishaps on Windows, where some editors default to ANSI encoding if there are no non-ASCII characters in the file yet, and a Unicode character is then added. (Though the situation has improved over the last years, and most Windows editors/IDEs now default to UTF-8 too, just like on other platforms.)
PR to follow. Related issue: #99.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working