Skip to content

fix(storage): consolidated parse handling#6260

Merged
nijel merged 1 commit intotranslate:masterfrom
nijel:parsing
Apr 23, 2026
Merged

fix(storage): consolidated parse handling#6260
nijel merged 1 commit intotranslate:masterfrom
nijel:parsing

Conversation

@nijel
Copy link
Copy Markdown
Member

@nijel nijel commented Apr 22, 2026

Avoid parsing files based on name in the passed content.

See #3911

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a02591e3e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread translate/storage/cpo.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates storage parsing input-handling to avoid reopening files based on filename-like content or stream metadata (per #3911), improving consistency across storage formats.

Changes:

  • Added shared helpers in translate.storage.base to normalize inputs (prepare_input, get_input_name) and distinguish path arguments vs stream-provided content.
  • Updated several storage parsers (TS, subtitles, INI, FPO/CPO) to use the shared helpers and avoid path-based reopening when input originates from a handle.
  • Added/expanded tests to cover stream-vs-path behavior and charset handling for PO text streams.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
translate/storage/base.py Introduces shared input-normalization helpers used by storage parsers.
translate/storage/ts.py Uses new base helpers to avoid interpreting stream content as a path.
translate/storage/subtitles.py Reworks _parsefile to use prepared input + preserves filename metadata without reopening.
translate/storage/ini.py Uses prepared input; supports text streams as content without treating them as paths.
translate/storage/fpo.py Aligns filename inference via get_input_name before delegating to cpo.
translate/storage/cpo.py Adds text-stream encoding support based on PO header charset; uses prepared input.
tests/translate/storage/test_subtitles.py Adds regression test ensuring stream content is used even when .name points to an existing file.
tests/translate/storage/test_ini.py Adds test ensuring StringIO input is parsed as content.
tests/translate/storage/test_fpo.py Adds tests for stream content handling and declared-header charset encoding.
tests/translate/storage/test_cpo.py Adds tests for stream content handling and declared-header charset encoding.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread translate/storage/base.py Outdated
Comment thread translate/storage/cpo.py Outdated
Comment thread translate/storage/cpo.py Outdated
Comment thread translate/storage/ts.py
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2dca180e6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread translate/storage/subtitles.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR standardizes storage parsing input handling to avoid reopening files based on filename-like strings embedded in already-provided content (per #3911), by introducing shared helpers in translate.storage.base and updating several storage implementations + tests accordingly.

Changes:

  • Add base.prepare_input(), base.get_input_name(), and path helpers to centralize “path vs content vs handle” decisions.
  • Update TS, subtitles, INI, CPO, and FPO parsing paths to use the consolidated input-handling helpers and preserve .filename metadata safely.
  • Add regression tests ensuring stream content is parsed as content (not reopened by .name), and that TranslationStore.parsefile() ownership semantics remain consistent.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
translate/storage/base.py Introduces shared helpers for materializing inputs and extracting filename metadata without misclassifying stream content as paths.
translate/storage/ts.py Uses the shared helpers to prevent reopening a path when content came from a stream (even if .name is set).
translate/storage/subtitles.py Consolidates _parsefile input handling and preserves caller-provided filename metadata while parsing stream content.
translate/storage/ini.py Switches to shared input preparation; adds support for parsing text-stream content as content rather than a path.
translate/storage/cpo.py Uses shared input preparation and tightens type rules for handle vs path inputs before libgettextpo parsing.
translate/storage/fpo.py Aligns filename metadata extraction with the new helper behavior when delegating to CPO parsing.
tests/translate/storage/test_base.py Adds coverage for TranslationStore.parsefile() handle ownership/closure semantics.
tests/translate/storage/test_ts.py Adds regression test ensuring named streams don’t trigger path reopening.
tests/translate/storage/test_subtitles.py Adds regression test ensuring named streams use stream content and still save back to the named path.
tests/translate/storage/test_ini.py Adds coverage for parsing INI content from a text stream.
tests/translate/storage/test_fpo.py Adds coverage around stream inputs being treated as content (and failing predictably) for FPO.
tests/translate/storage/test_cpo.py Adds coverage for stream inputs and ensures bytes that “look like paths” aren’t reopened.
pyproject.toml Broadens typos-tool exclude pattern for .xlf test fixtures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread translate/storage/base.py Outdated
Comment thread translate/storage/ini.py Outdated
Comment thread translate/storage/fpo.py
Avoid parsing files based on name in the passed content.

See translate#3911
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates storage parsing behavior to avoid reopening filesystem paths based solely on filename metadata present on in-memory/stream inputs, aligning behavior across multiple storage backends and adding regression tests for the clarified semantics.

Changes:

  • Introduces base.prepare_input, base.get_input_name, and path helpers to distinguish “path arguments” vs “content from handles” and preserve filename metadata safely.
  • Updates TS, subtitles, INI, FPO/CPO parsing to use the new input-preparation flow (avoiding unintended reopen-by-name behavior).
  • Adds targeted tests covering named streams, ownership/closing semantics, and path-vs-content edge cases.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
translate/storage/base.py Adds shared input-prep and filename-extraction helpers; updates parsefile docs and is used by updated parsers.
translate/storage/ts.py Uses base input helpers to avoid reopening named streams while preserving filename metadata.
translate/storage/subtitles.py Updates _parsefile and parse to use base helpers and avoid reopening based on stream .name.
translate/storage/ini.py Makes parsing behavior explicit: str/PathLike treated as paths; streams treated as content via prepare_input.
translate/storage/fpo.py Preserves filename metadata via get_input_name; clarifies path-vs-bytes expectations in docstring.
translate/storage/cpo.py Uses prepare_input and makes path-vs-content requirements explicit; safer temp-file materialization.
tests/translate/storage/test_ts.py Adds regression test ensuring named byte streams aren’t reopened as paths.
tests/translate/storage/test_subtitles.py Adds regression test ensuring named streams use stream content and still save to the named path.
tests/translate/storage/test_ini.py Adds test ensuring text streams are parsed as content.
tests/translate/storage/test_fpo.py Adds tests asserting bytes/text-stream inputs are not misinterpreted as filesystem paths.
tests/translate/storage/test_cpo.py Adds tests for bytes/text-stream behavior and path-vs-content handling.
tests/translate/storage/test_base.py Adds tests for parsefile handle-closing semantics and prepare_input close-on-error behavior.
pyproject.toml Broadens typos-tool exclude pattern for .xlf test fixtures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread translate/storage/base.py
Comment on lines +93 to +101
def is_path_input(value: object) -> TypeGuard[str | os.PathLike[str]]:
"""Return whether the value should be treated as a filesystem path."""
return isinstance(value, (str, os.PathLike))


def path_input_str(value: str | os.PathLike[str]) -> str:
"""Convert a path-like value to a normalized string path."""
return os.fspath(value)

Comment thread translate/storage/ts.py
Comment on lines +68 to +72
prepared = base.prepare_input(inputfile)
content = prepared.data
if not prepared.from_handle and base.is_path_input(content):
content = base.path_input_str(content)
if not self._looks_like_xml(content):
@nijel nijel merged commit 47b437b into translate:master Apr 23, 2026
30 checks passed
@nijel nijel deleted the parsing branch April 23, 2026 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants