fix(storage): consolidated parse handling#6260
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4a02591e3e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR consolidates storage parsing input-handling to avoid reopening files based on filename-like content or stream metadata (per #3911), improving consistency across storage formats.
Changes:
- Added shared helpers in
translate.storage.baseto normalize inputs (prepare_input,get_input_name) and distinguish path arguments vs stream-provided content. - Updated several storage parsers (TS, subtitles, INI, FPO/CPO) to use the shared helpers and avoid path-based reopening when input originates from a handle.
- Added/expanded tests to cover stream-vs-path behavior and charset handling for PO text streams.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| translate/storage/base.py | Introduces shared input-normalization helpers used by storage parsers. |
| translate/storage/ts.py | Uses new base helpers to avoid interpreting stream content as a path. |
| translate/storage/subtitles.py | Reworks _parsefile to use prepared input + preserves filename metadata without reopening. |
| translate/storage/ini.py | Uses prepared input; supports text streams as content without treating them as paths. |
| translate/storage/fpo.py | Aligns filename inference via get_input_name before delegating to cpo. |
| translate/storage/cpo.py | Adds text-stream encoding support based on PO header charset; uses prepared input. |
| tests/translate/storage/test_subtitles.py | Adds regression test ensuring stream content is used even when .name points to an existing file. |
| tests/translate/storage/test_ini.py | Adds test ensuring StringIO input is parsed as content. |
| tests/translate/storage/test_fpo.py | Adds tests for stream content handling and declared-header charset encoding. |
| tests/translate/storage/test_cpo.py | Adds tests for stream content handling and declared-header charset encoding. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c2dca180e6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR standardizes storage parsing input handling to avoid reopening files based on filename-like strings embedded in already-provided content (per #3911), by introducing shared helpers in translate.storage.base and updating several storage implementations + tests accordingly.
Changes:
- Add
base.prepare_input(),base.get_input_name(), and path helpers to centralize “path vs content vs handle” decisions. - Update TS, subtitles, INI, CPO, and FPO parsing paths to use the consolidated input-handling helpers and preserve
.filenamemetadata safely. - Add regression tests ensuring stream content is parsed as content (not reopened by
.name), and thatTranslationStore.parsefile()ownership semantics remain consistent.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| translate/storage/base.py | Introduces shared helpers for materializing inputs and extracting filename metadata without misclassifying stream content as paths. |
| translate/storage/ts.py | Uses the shared helpers to prevent reopening a path when content came from a stream (even if .name is set). |
| translate/storage/subtitles.py | Consolidates _parsefile input handling and preserves caller-provided filename metadata while parsing stream content. |
| translate/storage/ini.py | Switches to shared input preparation; adds support for parsing text-stream content as content rather than a path. |
| translate/storage/cpo.py | Uses shared input preparation and tightens type rules for handle vs path inputs before libgettextpo parsing. |
| translate/storage/fpo.py | Aligns filename metadata extraction with the new helper behavior when delegating to CPO parsing. |
| tests/translate/storage/test_base.py | Adds coverage for TranslationStore.parsefile() handle ownership/closure semantics. |
| tests/translate/storage/test_ts.py | Adds regression test ensuring named streams don’t trigger path reopening. |
| tests/translate/storage/test_subtitles.py | Adds regression test ensuring named streams use stream content and still save back to the named path. |
| tests/translate/storage/test_ini.py | Adds coverage for parsing INI content from a text stream. |
| tests/translate/storage/test_fpo.py | Adds coverage around stream inputs being treated as content (and failing predictably) for FPO. |
| tests/translate/storage/test_cpo.py | Adds coverage for stream inputs and ensures bytes that “look like paths” aren’t reopened. |
| pyproject.toml | Broadens typos-tool exclude pattern for .xlf test fixtures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Avoid parsing files based on name in the passed content. See translate#3911
There was a problem hiding this comment.
Pull request overview
This PR consolidates storage parsing behavior to avoid reopening filesystem paths based solely on filename metadata present on in-memory/stream inputs, aligning behavior across multiple storage backends and adding regression tests for the clarified semantics.
Changes:
- Introduces
base.prepare_input,base.get_input_name, and path helpers to distinguish “path arguments” vs “content from handles” and preserve filename metadata safely. - Updates TS, subtitles, INI, FPO/CPO parsing to use the new input-preparation flow (avoiding unintended reopen-by-name behavior).
- Adds targeted tests covering named streams, ownership/closing semantics, and path-vs-content edge cases.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| translate/storage/base.py | Adds shared input-prep and filename-extraction helpers; updates parsefile docs and is used by updated parsers. |
| translate/storage/ts.py | Uses base input helpers to avoid reopening named streams while preserving filename metadata. |
| translate/storage/subtitles.py | Updates _parsefile and parse to use base helpers and avoid reopening based on stream .name. |
| translate/storage/ini.py | Makes parsing behavior explicit: str/PathLike treated as paths; streams treated as content via prepare_input. |
| translate/storage/fpo.py | Preserves filename metadata via get_input_name; clarifies path-vs-bytes expectations in docstring. |
| translate/storage/cpo.py | Uses prepare_input and makes path-vs-content requirements explicit; safer temp-file materialization. |
| tests/translate/storage/test_ts.py | Adds regression test ensuring named byte streams aren’t reopened as paths. |
| tests/translate/storage/test_subtitles.py | Adds regression test ensuring named streams use stream content and still save to the named path. |
| tests/translate/storage/test_ini.py | Adds test ensuring text streams are parsed as content. |
| tests/translate/storage/test_fpo.py | Adds tests asserting bytes/text-stream inputs are not misinterpreted as filesystem paths. |
| tests/translate/storage/test_cpo.py | Adds tests for bytes/text-stream behavior and path-vs-content handling. |
| tests/translate/storage/test_base.py | Adds tests for parsefile handle-closing semantics and prepare_input close-on-error behavior. |
| pyproject.toml | Broadens typos-tool exclude pattern for .xlf test fixtures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def is_path_input(value: object) -> TypeGuard[str | os.PathLike[str]]: | ||
| """Return whether the value should be treated as a filesystem path.""" | ||
| return isinstance(value, (str, os.PathLike)) | ||
|
|
||
|
|
||
| def path_input_str(value: str | os.PathLike[str]) -> str: | ||
| """Convert a path-like value to a normalized string path.""" | ||
| return os.fspath(value) | ||
|
|
| prepared = base.prepare_input(inputfile) | ||
| content = prepared.data | ||
| if not prepared.from_handle and base.is_path_input(content): | ||
| content = base.path_input_str(content) | ||
| if not self._looks_like_xml(content): |
Avoid parsing files based on name in the passed content.
See #3911