Skip to content

fix(cli): opensre tests crashes with FileNotFoundError in bundled binary (#1078)#1090

Merged
muddlebee merged 4 commits intoTracer-Cloud:mainfrom
mayankbharati-ops:fix/1078-discover-graceful-missing-sources
Apr 30, 2026
Merged

fix(cli): opensre tests crashes with FileNotFoundError in bundled binary (#1078)#1090
muddlebee merged 4 commits intoTracer-Cloud:mainfrom
mayankbharati-ops:fix/1078-discover-graceful-missing-sources

Conversation

@mayankbharati-ops
Copy link
Copy Markdown
Contributor

Closes #1078.

Summary

opensre tests and opensre tests list crash with FileNotFoundError when run from the PyInstaller-bundled binary (the released binary, not source-checkout runs). The reported crash is one of three of the same class — fixed all three.

Root cause

packaging/opensre.spec only collects app/ data files (collect_data_files("app")) and explicitly filters tests out of the submodule list (_is_runtime_submodule). At runtime the bundled temp extraction path therefore has no tests/synthetic/rds_postgres/, no tests/e2e/rca/, and no Makefile. Three call sites in the test-discovery code path read directly from those paths without an existence guard:

Location Operation that crashes Crash in bundled binary?
_discover_rds_synthetic_scenarios scenarios_dir.iterdir() the reported #1078 crash
discover_make_targets MAKEFILE_PATH.read_text() ✅ identical class of bug
run_synthetic_suite from tests.synthetic.rds_postgres.run_suite import … ModuleNotFoundError traceback
discover_rca_files RCA_DIR.glob("*.md") ❌ safe today (Path.glob returns empty), guard added for documentation + forward compatibility

The reported crash was introduced by 9a1b5c3d "add synthetic RDS postgres RCA suite (#194)" on 2026-04-01 (answers @muddlebee's regression-commit question).

Fix

# app/cli/tests/discover.py
def _discover_rds_synthetic_scenarios() -> list[TestCatalogItem]:
    scenarios_dir = REPO_ROOT / "tests" / "synthetic" / "rds_postgres"
    items: list[TestCatalogItem] = []
    if not scenarios_dir.is_dir():
        return items   # PyInstaller bundle has no tests/ tree
    ...

def discover_make_targets() -> list[TestCatalogItem]:
    if not MAKEFILE_PATH.is_file():
        return []   # PyInstaller bundle has no Makefile
    ...

def discover_rca_files() -> list[TestCatalogItem]:
    if not RCA_DIR.is_dir():
        return items   # explicit guard, currently a no-op on CPython
    ...
# app/cli/commands/tests.py
def run_synthetic_suite(...):
    try:
        from tests.synthetic.rds_postgres.run_suite import main as run_suite_main
    except ModuleNotFoundError as exc:
        raise OpenSREError(
            "The synthetic RDS PostgreSQL suite is not available in this build.",
            suggestion="Run from a source checkout (clone the repo) — synthetic scenarios are not bundled.",
        ) from exc
    ...

is_dir() / is_file() return False on a missing path without raising, so they're cleaner than exists() followed by a separate type check.

Test coverage

tests/cli/test_discover.py (+5 tests, all in a new TestDiscoverGracefulOnMissingSource class):

  • test_rds_synthetic_returns_empty_when_dir_missing — the literal reported crash
  • test_make_targets_returns_empty_when_makefile_missing — same-class regression
  • test_rca_files_returns_empty_when_dir_missing — covers the explicit guard
  • test_load_test_catalog_does_not_crash_with_no_sources — full degradation contract
  • test_rds_synthetic_still_discovers_when_dir_present — sanity that the guard doesn't break the source-checkout path

tests/cli/test_cli_inventory.py (+1 test):

  • test_tests_synthetic_surfaces_clean_error_when_suite_not_bundled — patches builtins.__import__ to deny tests.synthetic.*, asserts the CLI exits non-zero with a clean message and no Traceback / ModuleNotFoundError reaches the user

Live e2e verification

Wrote a probe that monkeypatches REPO_ROOT, MAKEFILE_PATH, and RCA_DIR to an empty tmp dir (the exact same file layout PyInstaller produces — no tests/, no Makefile, no tests/e2e/rca) and invokes the real Click CLI in-process. 17 / 17 live checks pass:

Baseline (source tree) Bundled-binary sim (empty REPO_ROOT)
opensre tests list exit 0, full catalog exit 0, empty output
opensre --json tests list valid JSON, full catalog valid JSON, empty []
opensre tests list --category synthetic finds synthetic:001-replication-lag exit 0, empty (degraded)
opensre tests synthetic --scenario X runs the suite exits non-zero with OpenSREError and actionable suggestion — no ModuleNotFoundError / Traceback leaks
load_test_catalog() direct call populated catalog empty catalog, no exception

User-facing degraded message:

Error: The synthetic RDS PostgreSQL suite is not available in this build.

Suggestion: Run 'opensre tests synthetic' from a source checkout
(clone https://github.com/Tracer-Cloud/opensre) — the synthetic
scenarios under 'tests/synthetic/rds_postgres/' are not bundled
into the released binary.

Verification — final state

  • pytest tests/cli/test_discover.py tests/cli/test_cli_inventory.py: 25 / 25 pass (3 pre-existing + 22 including new)
  • Full unit suite: 4151 pass / 2 skipped / 1 xfailed (no regressions)
  • ruff check app/ tests/: clean
  • ruff format --check app/ tests/: clean
  • mypy app/cli/tests/ app/cli/commands/: clean on the touched modules; the 3 errors mypy reports are pre-existing missing-stub warnings for pymysql in app/integrations/ — unrelated to this PR

Scope

File Change
app/cli/tests/discover.py +14 / -1 (3 existence guards + docstring update)
app/cli/commands/tests.py +14 / -1 (try/except around the synthetic-suite import)
tests/cli/test_discover.py +85 / -1 (5 new regression tests)
tests/cli/test_cli_inventory.py +39 / -0 (1 new bundled-binary CLI test)

No public API changes. No state-shape changes. No behaviour change for source-checkout users — the existing source-tree happy path is verified by both the existing test test_load_test_catalog_includes_make_targets_and_rca_fixtures and the new sanity test test_rds_synthetic_still_discovers_when_dir_present.

…les (Tracer-Cloud#1078)

`opensre tests` and `opensre tests list` crashed with `FileNotFoundError`
when run from the PyInstaller-bundled binary, because
`packaging/opensre.spec` only collects `app/` data files and excludes
the `tests/` tree, the Makefile, and `tests/e2e/rca/`. The discover
helpers in `app/cli/tests/discover.py` then ran `iterdir()` /
`read_text()` on paths that don't exist at runtime.

The reported crash was in `_discover_rds_synthetic_scenarios`
(introduced by 9a1b5c3 "add synthetic RDS postgres RCA suite (Tracer-Cloud#194)"),
but the same class of bug is present in `discover_make_targets` —
`MAKEFILE_PATH.read_text()` raises identically in the bundled binary.
A third spot, `run_synthetic_suite` in `app/cli/commands/tests.py`,
imports `from tests.synthetic.rds_postgres.run_suite` and would surface
a raw `ModuleNotFoundError` traceback instead of an actionable message.

Changes:
- `_discover_rds_synthetic_scenarios`: early `is_dir()` guard, returns []
- `discover_make_targets`: early `is_file()` guard on the Makefile, returns []
- `discover_rca_files`: explicit `is_dir()` guard added for documentation
  + future-proofing (current `Path.glob` already returns empty on a
  missing parent on CPython, but the contract was implicit)
- `run_synthetic_suite`: catches `ModuleNotFoundError` from the
  `tests.synthetic` import and raises `OpenSREError` with a "run from a
  source checkout" suggestion instead of a raw traceback

Tests:
- `tests/cli/test_discover.py`: 5 new tests covering each missing-source
  case + a fully-degraded `load_test_catalog` regression + a
  source-checkout sanity test that proves the existence guards don't
  break the happy path
- `tests/cli/test_cli_inventory.py`: 1 new test simulating a bundled
  binary by patching `__import__` to deny `tests.synthetic.*` — confirms
  `opensre tests synthetic` exits non-zero with a clean message and no
  `Traceback` / `ModuleNotFoundError` reaches the user

Live verification: in-process Click invocation with REPO_ROOT,
MAKEFILE_PATH, and RCA_DIR all redirected to an empty tmp dir
(matches the PyInstaller bundle layout). 17/17 e2e checks pass:
`opensre tests list` exits 0 with empty output, `--json tests list`
produces a valid empty array, `tests synthetic` surfaces the structured
error. Full unit suite: 4151 pass / 2 skipped / 1 xfailed.

Closes Tracer-Cloud#1078
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 29, 2026

Greptile Summary

This PR fixes FileNotFoundError / ModuleNotFoundError crashes in the PyInstaller-bundled opensre binary by adding three is_dir()/is_file() existence guards in the test-discovery path and a try/except ModuleNotFoundError around the lazy tests.synthetic.rds_postgres.run_suite import. Source-checkout behaviour is unchanged. Regression coverage is thorough: 5 new unit tests and 1 CLI-level test covering all fixed paths and the source-checkout happy path.

Confidence Score: 5/5

Safe to merge — all fixes are well-targeted guards with no behaviour change on source-checkout runs, and regression coverage is complete.

All findings are P2 or lower. The core logic is correct, the exception scope (ModuleNotFoundError vs broad ImportError) is intentionally narrow and the right call, tests cover both the empty-return and the happy path, and pre-existing tests still pass with the new guards because MagicMock.is_file() is truthy by default.

No files require special attention.

Important Files Changed

Filename Overview
app/cli/tests/discover.py Adds three existence guards (is_dir/is_file) before iterdir(), read_text(), and glob() calls to prevent FileNotFoundError crashes in PyInstaller bundles; well-scoped and consistent.
app/cli/commands/tests.py Wraps the lazy import of tests.synthetic.rds_postgres.run_suite in a try/except ModuleNotFoundError to surface a clean OpenSREError instead of a raw traceback when the tests tree is absent in a bundled binary.
tests/cli/test_discover.py Adds 5 regression tests in TestDiscoverGracefulOnMissingSource using monkeypatch to simulate a missing tests/ tree; covers both the empty-return and the source-checkout happy path.
tests/cli/test_cli_inventory.py Adds 1 CLI-level regression test that patches builtins.import to block tests.synthetic.* and asserts no raw ModuleNotFoundError/Traceback reaches the user.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["opensre tests synthetic --scenario X"] --> B["run_synthetic_suite()"]
    B --> C{try import\ntests.synthetic.rds_postgres.run_suite}
    C -- "ModuleNotFoundError\n(bundled binary)" --> D["raise OpenSREError\n'suite not available in this build'\n+ actionable suggestion"]
    C -- "import OK\n(source checkout)" --> E["capture_test_synthetic_started()"]
    E --> F["run_suite_main(argv)"]

    G["opensre tests list"] --> H["load_test_catalog()"]
    H --> I["_discover_rds_synthetic_scenarios()"]
    I --> J{scenarios_dir.is_dir?}
    J -- "False (bundled)" --> K["return []"]
    J -- "True (source)" --> L["iterdir() → catalog items"]

    H --> M["discover_make_targets()"]
    M --> N{MAKEFILE_PATH.is_file?}
    N -- "False (bundled)" --> O["return []"]
    N -- "True (source)" --> P["read_text() → catalog items"]

    H --> Q["discover_rca_files()"]
    Q --> R{RCA_DIR.is_dir?}
    R -- "False (bundled)" --> S["return items (empty)"]
    R -- "True (source)" --> T["glob('*.md') → catalog items"]
Loading

Reviews (1): Last reviewed commit: "fix(cli): graceful degradation when bund..." | Re-trigger Greptile

… EC2 guards (Tracer-Cloud#1078)

Live PyInstaller bundle verification revealed the previous fix was
incomplete: the ``tests.synthetic.rds_postgres`` Python package is
bundled transitively (the static analyzer follows the literal
``from tests.synthetic… import …`` statement in
``app/cli/commands/tests.py`` regardless of the spec's
``collect_submodules`` filter), but the per-scenario data files under
``tests/synthetic/rds_postgres/<scenario>/`` are NOT bundled. So
``opensre tests synthetic`` was still crashing in the actual binary —
now with a ``FileNotFoundError`` from inside the scenario loader,
which my ``except ModuleNotFoundError`` did not catch.

Plus an audit of the rest of ``app/`` turned up the same class of bug
in ``app/cli/commands/deploy.py:323,329`` — ``opensre deploy ec2``
and ``opensre deploy ec2 --down`` directly import from
``tests.deployment.ec2.infrastructure_sdk`` without a guard.

Changes:

- ``run_synthetic_suite`` now pre-checks that
  ``REPO_ROOT / tests / synthetic / rds_postgres`` exists *before*
  the import, so the structured error fires regardless of whether the
  Python package or the data dir is the missing piece.
- Both ``ModuleNotFoundError`` catches narrow on ``exc.name`` so an
  unrelated transitive missing dep (e.g. ``psycopg``, ``boto3``) bubbles
  up as the real cause instead of being mis-reported as "not bundled".
- ``deploy_ec2`` wraps both the destroy and deploy imports in the same
  ``try/except`` pattern, raising a structured ``OpenSREError`` with a
  ``pip install -e .`` suggestion. New helper
  ``_ec2_deploy_not_bundled_error`` keeps the message in one place.
- Suggestion text on both errors updated from "clone the repo" to
  ``pip install -e .`` so pipx/Homebrew users get an actionable path.
- Test helper ``_patch_discover_paths`` extracted to dedupe
  ``monkeypatch.setattr("app.cli.tests.discover.X", ...)`` repetition.

Test additions:

- ``test_rds_synthetic_returns_empty_when_dir_present_but_empty``
- ``test_rds_synthetic_skips_underscore_and_pycache_entries``
- ``test_rds_synthetic_enriches_display_name_from_scenario_yml``
- ``test_rds_synthetic_tolerates_malformed_scenario_yml``
- ``test_tests_synthetic_clean_error_when_data_dir_missing`` — covers
  the actual bundled-binary failure mode that live verification found
- ``test_tests_synthetic_unrelated_module_not_found_propagates`` —
  pins the narrow-catch contract: ``psycopg`` missing must not show
  the "not bundled" message
- ``test_deploy_ec2_clean_error_when_not_bundled`` (parametrized for
  both ``deploy ec2`` and ``deploy ec2 --down``)
- ``test_deploy_ec2_unrelated_module_not_found_propagates``
- Existing ``test_tests_synthetic_clean_error_when_module_not_bundled``
  tightened: ``exit_code == 1``, asserts contractual error strings.

Live PyInstaller bundle verification (rebuild + run from ``dist/opensre``):

- ``./dist/opensre tests list`` → exit 0, empty stdout
- ``./dist/opensre --json tests list`` → exit 0, ``[]``
- ``./dist/opensre tests synthetic --scenario 001-replication-lag``
  → exit 1, structured error: "The synthetic RDS PostgreSQL suite is
  not available in this build" + ``pip install -e .`` suggestion. No
  ``Traceback`` / ``FileNotFoundError`` / ``ModuleNotFoundError``
  reaches the user. ← this is the failure mode the previous fix missed.

Full unit suite: 4160 pass / 2 skipped / 1 xfailed (no regressions).
``ruff check``, ``ruff format --check`` clean. ``mypy`` clean on touched
modules (the 3 errors mypy reports are pre-existing missing-stub
warnings for ``pymysql`` in ``app/integrations/`` — unrelated).
@mayankbharati-ops
Copy link
Copy Markdown
Contributor Author

Update — pushed 3136509f after a much deeper round of investigation.

I ran 4 parallel review/audit passes (codebase audit, real PyInstaller build verification, test-coverage gap analysis, independent code review) and the build verifier caught a P0 my first commit missed: opensre tests synthetic was still crashing in the actual bundled binary.

What deeper testing revealed

  1. The previous fix was incomplete. PyInstaller's static analyzer follows the literal from tests.synthetic.rds_postgres.run_suite import main statement in app/cli/commands/tests.py and bundles the Python package transitively, regardless of the spec's collect_submodules(filter=…) exclusion. So in the real bundled binary:

    • The Python module IS bundled
    • The per-scenario data files (tests/synthetic/rds_postgres/<scenario>/) are NOT bundled
    • run_suite.main crashed inside the scenario loader with FileNotFoundError from iterdir()
    • My try/except ModuleNotFoundError did not catch it
  2. An audit found the same class of bug elsewhere. app/cli/commands/deploy.py:323,329opensre deploy ec2 and opensre deploy ec2 --down directly import from tests.deployment.ec2.infrastructure_sdk without a guard, same pattern.

  3. The ModuleNotFoundError catch was too broad. A missing transitive dep (e.g. psycopg if a user has the synthetic package but not the database driver) would have produced a misleading "not bundled" error.

What changed in 3136509f

  • run_synthetic_suite now pre-checks the data dir before the import — catches the actual bundled-binary failure mode the build verifier found.
  • Both ModuleNotFoundError catches narrow on exc.name so unrelated missing deps bubble up as the real cause instead of being mis-reported.
  • deploy_ec2 (both branches) wrapped with the same structured OpenSREError pattern.
  • Suggestion text changed from "clone the repo" → pip install -e . for pipx/Homebrew users.
  • Test helper _patch_discover_paths to dedupe monkeypatch.setattr("app.cli.tests.discover.X", ...) repetition (per @muddlebee's prior fixture-dedup nit on PR refactor(utils): extract shared HTTP-post helper for delivery modules #952).

Live PyInstaller bundle verification (rebuilt dist/opensre)

Command Result
./dist/opensre tests list exit 0, empty stdout ✅
./dist/opensre --json tests list exit 0, []
./dist/opensre tests synthetic --scenario 001-replication-lag exit 1, structured error "The synthetic RDS PostgreSQL suite is not available in this build" + pip install -e . suggestion. No Traceback / FileNotFoundError / ModuleNotFoundError reaches the user. ✅

This is the failure mode the previous commit missed — confirmed fixed in the actual binary.

Test additions (+9 new tests, all passing)

  • Empty rds_postgres/ dir → returns []
  • __pycache__ / _template / stray top-level files → skipped correctly
  • Happy scenario.yml → enriches display name with [failure_mode]
  • Malformed scenario.yml → falls back to bare ID (no crash)
  • Missing data dir → structured error with pip install -e . suggestion (the bundled-binary case)
  • psycopg transitive missing → real cause shown, not "not bundled"
  • deploy ec2 + deploy ec2 --down (parametrized) → structured error
  • boto3 transitive missing on deploy ec2 → real cause shown

Final state

  • pytest tests/cli/test_discover.py tests/cli/test_cli_inventory.py34 / 34 pass (was 25)
  • Full unit suite — 4160 pass / 2 skipped / 1 xfailed (was 4151, +9 new)
  • ruff check, ruff format --check clean
  • mypy app/cli/... clean on touched modules

Follow-up worth tracking

The audit also surfaced that the EC2 deploy guard is defense-in-depth in the current packaging/opensre.spec configuration — PyInstaller bundles tests.deployment.ec2.* because of the literal from tests… import, so the import succeeds and the guard's except branch is unreachable from a fresh binary. The guard is still correct (and unit-tested for the case it does catch — a non-standard install missing those modules), but the "right" long-term fix is to also bundle the data files for the synthetic suite via the spec, so opensre tests synthetic can actually run from a binary instead of always degrading. Happy to file that as a follow-up issue if interest.

@Ghraven
Copy link
Copy Markdown

Ghraven commented Apr 29, 2026

Hey @mayankbharati-ops, this is a really thorough fix! I was looking at the same issue (#1078) and you've gone much deeper than just the rds_postgres guard — catching the ModuleNotFoundError at the run_synthetic_suite level and guarding discover_make_targets and discover_rca_files too is the right approach for a packaged binary.

One small thing I noticed in tests/cli/test_discover.py — the new tests use tmp_path: object as the type annotation in a couple of places (e.g. test_discover_rds_scenarios_missing_dir). Since tmp_path is a pytest.fixture it should be pathlib.Path, not object, both for accuracy and so type checkers can infer its methods. Might be worth a quick update there.

Also just curious — the _ec2_deploy_not_bundled_error helper in deploy.py is a nice touch. Is the plan to use it as the error surfaced in both ec2 and ec2 --down paths? Looks like both call sites would benefit from it.

Really solid work on this one — the test coverage for the degradation paths is especially useful. 🙏

…#1078 review nit)

Per @Ghraven's PR Tracer-Cloud#1090 review — three of the new bundled-binary tests
had ``tmp_path: object`` (one) or no annotation (two), which loses
the actual ``pathlib.Path`` type and stops type-checkers from inferring
``.mkdir()`` / ``/`` operator on the parameter. Added the
``from pathlib import Path`` import and tightened all three
signatures to ``tmp_path: Path``.
@mayankbharati-ops
Copy link
Copy Markdown
Contributor Author

@Ghraven thanks for the close read! Both points addressed:

1. tmp_path typing — fixed in 440111cf. Three tests in tests/cli/test_cli_inventory.py (test_tests_synthetic_clean_error_when_data_dir_missing, test_tests_synthetic_clean_error_when_module_not_bundled, test_tests_synthetic_unrelated_module_not_found_propagates) had tmp_path: object or no annotation. Tightened all three to tmp_path: Path with the from pathlib import Path import. You're right that object was wrong — type-checkers can't infer .mkdir() / / on tmp_path without the proper annotation, and the consistency with tests/cli/test_discover.py matters.

2. _ec2_deploy_not_bundled_error reuse — both call sites in app/cli/commands/deploy.py already route through it:

# line 351 — destroy path
if exc.name is None or not exc.name.startswith("tests.deployment.ec2"):
    raise
raise _ec2_deploy_not_bundled_error() from exc

# line 363 — deploy path  
if exc.name is None or not exc.name.startswith("tests.deployment.ec2"):
    raise
raise _ec2_deploy_not_bundled_error() from exc

Same OpenSREError instance gets surfaced from both opensre deploy ec2 and opensre deploy ec2 --down, with the message in one place. The narrow exc.name.startswith("tests.deployment.ec2") check on each side keeps unrelated transitive missing deps (e.g. boto3) from getting mis-reported as "EC2 not bundled" — pinned by test_deploy_ec2_unrelated_module_not_found_propagates.

Tests still 34/34 green, lint/format clean, full suite 4160 pass. Appreciate the eyeballs! 🙏

Comment thread app/cli/commands/tests.py Outdated
# of which failure mode their bundle produces.
from app.cli.tests.discover import REPO_ROOT

scenarios_dir = REPO_ROOT / "tests" / "synthetic" / "rds_postgres"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path duplicated from discover.py:320 — both construct REPO_ROOT / "tests" / "synthetic" / "rds_postgres" independently. Extract a SYNTHETIC_SCENARIOS_DIR constant in discover.py alongside MAKEFILE_PATH and RCA_DIR, then import it here. One place to update if the layout ever changes.

Comment thread app/cli/commands/tests.py Outdated
)


def _synthetic_suite_not_bundled_error() -> OpenSREError:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper is defined after its two call sites (lines 127 and 137). Move _synthetic_suite_not_bundled_error above run_synthetic_suite so readers see the helper before the code that uses it — consistent with how _ec2_deploy_not_bundled_error is ordered in deploy.py.

Comment thread tests/cli/test_cli_inventory.py Outdated
assert "Traceback" not in output


def test_deploy_ec2_unrelated_module_not_found_propagates() -> None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Narrow-catch contract is only verified for --down (destroy). The non---down deploy path has its own try/except block in deploy_ec2 but no parallel test. Add a variant that patches tests.deployment.ec2.infrastructure_sdk.deploy_remote with an unrelated ModuleNotFoundError(name='boto3') and asserts the 'EC2 deployment is not available' message is absent.

Three nits from @muddlebee's 2026-04-30 review:

1. Path duplication on ``app/cli/commands/tests.py:125`` — the
   ``REPO_ROOT / "tests" / "synthetic" / "rds_postgres"`` literal was
   computed in both ``run_synthetic_suite`` and
   ``_discover_rds_synthetic_scenarios``. Extracted a
   ``SYNTHETIC_SCENARIOS_DIR`` constant in ``app/cli/tests/discover.py``
   alongside ``MAKEFILE_PATH`` and ``RCA_DIR``; both call sites now
   import it. Single source of truth if the layout ever changes.

2. ``_synthetic_suite_not_bundled_error`` was defined *after* its two
   call sites in ``run_synthetic_suite``. Moved above for forward-
   declaration consistency with how ``_ec2_deploy_not_bundled_error``
   sits above ``deploy_ec2`` in ``deploy.py``.

3. ``test_deploy_ec2_unrelated_module_not_found_propagates`` only
   exercised the ``--down`` (destroy_remote) path. The non-``--down``
   (deploy_remote) path has its own try/except in ``deploy_ec2`` but
   no parallel test. Parametrized the test over both ``argv`` and the
   patched module name so a regression in either branch surfaces
   independently.

Test plumbing: ``_patch_discover_paths`` helper extended with a
``synthetic_dir`` kwarg so tests can patch ``SYNTHETIC_SCENARIOS_DIR``
directly (the new constant is bound at module load, so patching
``REPO_ROOT`` after import doesn't propagate). Existing tests updated
to use the new kwarg; semantics unchanged.

Verification:

- ``pytest tests/cli/test_discover.py tests/cli/test_cli_inventory.py``:
  35 / 35 pass (was 34, +1 from the new ``deploy_remote`` parametrize).
- Full unit suite: 4161 pass / 2 skipped / 1 xfailed (was 4160).
- ``ruff check`` and ``ruff format --check`` clean.
@mayankbharati-ops
Copy link
Copy Markdown
Contributor Author

@muddlebee thanks for the close review — pushed 1e4dcaef addressing all three nits:

1. Path duplication (app/cli/commands/tests.py:125) — Extracted SYNTHETIC_SCENARIOS_DIR constant in app/cli/tests/discover.py alongside MAKEFILE_PATH and RCA_DIR. Both call sites (_discover_rds_synthetic_scenarios and run_synthetic_suite) now import the same constant. Single source of truth if the layout ever moves.

2. Helper-after-call-site (app/cli/commands/tests.py:151) — Moved _synthetic_suite_not_bundled_error above run_synthetic_suite. Now matches the order of _ec2_deploy_not_bundled_errordeploy_ec2 in deploy.py.

3. deploy_remote test gap (tests/cli/test_cli_inventory.py:364) — Parametrized test_deploy_ec2_unrelated_module_not_found_propagates over both (argv, patched_module) so the narrow-catch contract is independently exercised on both branches:

@pytest.mark.parametrize(
    ("argv", "patched_module"),
    [
        (["deploy", "ec2", "--down"], "tests.deployment.ec2.infrastructure_sdk.destroy_remote"),
        (["deploy", "ec2"],            "tests.deployment.ec2.infrastructure_sdk.deploy_remote"),
    ],
)

A regression in either deploy_ec2's try/except is now caught independently.

Test plumbing detail: the new SYNTHETIC_SCENARIOS_DIR constant is bound at module load, so monkeypatching REPO_ROOT after import no longer affects it (correctly). I extended the _patch_discover_paths helper with a synthetic_dir kwarg so tests patch the new constant directly, matching how they already patch MAKEFILE_PATH / RCA_DIR.

Verification:

  • pytest tests/cli/test_discover.py tests/cli/test_cli_inventory.py35/35 pass (was 34, +1 from the new deploy_remote parametrize).
  • Full unit suite — 4161 / 2 skipped / 1 xfailed (was 4160).
  • ruff check, ruff format --check clean.

@muddlebee muddlebee merged commit 67dc64d into Tracer-Cloud:main Apr 30, 2026
7 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

🧑‍💻 @mayankbharati-ops has entered the contributor hall of fame. Merged. Done. Shipped. Go touch grass (then come back with another PR). 🌱


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

@muddlebee
Copy link
Copy Markdown
Collaborator

@mayankbharati-ops this is really nice, extensive error handling and path checks. 👍

muddlebee pushed a commit that referenced this pull request Apr 30, 2026
…1099) (#1101)

The post-merge ``test (windows-latest)`` job on commit 67dc64d (PR #1090
landing) is failing with two distinct symptoms in the install.sh test
suites added by PR #1064:

1. ``tests/cli/test_install_sh_path.py`` — 12 tests fail because the
   subprocess call ``subprocess.run(["bash", "-c", script])`` resolves
   ``bash`` to ``wsl.exe`` on the GitHub Actions ``windows-latest``
   runner, and that runner has no installed WSL distribution. Every
   ``_run`` call exits 1 with a literal "Windows Subsystem for Linux
   has no installed distributions" message, so none of the asserted
   rc files (``.zshrc``/``.bashrc``/``config.fish``) ever get written.

2. ``tests/cli/test_install_sh_resolution.py`` — module fails to import
   on Windows: the existing
   ``@pytest.mark.skipif(os.geteuid() == 0, reason=...)`` decorator
   evaluates ``os.geteuid()`` at decorator-application time, but
   ``os.geteuid`` does not exist on Windows. ``AttributeError`` fires
   *before* pytest can read any markers, blocking collection of every
   test in the module.

Both files exclusively test ``install.sh``, which is a POSIX shell
script (zsh/bash/fish RC files) with no Windows analogue, so the
correct fix for the Windows runner is to skip them.

Changes:

- ``test_install_sh_path.py``: module-level
  ``pytestmark = pytest.mark.skipif(sys.platform == "win32", ...)``
  with a comment that names the failure mode.

- ``test_install_sh_resolution.py``: same module-level skipif, *and*
  defensively replace ``os.geteuid() == 0`` with
  ``_RUNNING_AS_ROOT = hasattr(os, "geteuid") and os.geteuid() == 0``
  evaluated once at module load. ``hasattr`` short-circuits the ``and``
  on Windows so ``os.geteuid()`` is never called there — the module
  imports cleanly even before pytest reads ``pytestmark``.

No source-checkout behaviour change: skipif is a no-op on macOS/Linux,
and the existing root-only test still skips correctly on those
platforms (verified by running ``pytest tests/cli/test_install_sh_path.py
tests/cli/test_install_sh_resolution.py`` locally — 15 / 15 pass).
Full unit suite: 4164 pass / 2 skipped / 1 xfailed.

Closes #1099.
gitsofaryan pushed a commit to gitsofaryan/opensre that referenced this pull request May 3, 2026
…inary (Tracer-Cloud#1078) (Tracer-Cloud#1090)

* fix(cli): graceful degradation when bundled binary lacks repo data files (Tracer-Cloud#1078)

`opensre tests` and `opensre tests list` crashed with `FileNotFoundError`
when run from the PyInstaller-bundled binary, because
`packaging/opensre.spec` only collects `app/` data files and excludes
the `tests/` tree, the Makefile, and `tests/e2e/rca/`. The discover
helpers in `app/cli/tests/discover.py` then ran `iterdir()` /
`read_text()` on paths that don't exist at runtime.

The reported crash was in `_discover_rds_synthetic_scenarios`
(introduced by 9a1b5c3 "add synthetic RDS postgres RCA suite (Tracer-Cloud#194)"),
but the same class of bug is present in `discover_make_targets` —
`MAKEFILE_PATH.read_text()` raises identically in the bundled binary.
A third spot, `run_synthetic_suite` in `app/cli/commands/tests.py`,
imports `from tests.synthetic.rds_postgres.run_suite` and would surface
a raw `ModuleNotFoundError` traceback instead of an actionable message.

Changes:
- `_discover_rds_synthetic_scenarios`: early `is_dir()` guard, returns []
- `discover_make_targets`: early `is_file()` guard on the Makefile, returns []
- `discover_rca_files`: explicit `is_dir()` guard added for documentation
  + future-proofing (current `Path.glob` already returns empty on a
  missing parent on CPython, but the contract was implicit)
- `run_synthetic_suite`: catches `ModuleNotFoundError` from the
  `tests.synthetic` import and raises `OpenSREError` with a "run from a
  source checkout" suggestion instead of a raw traceback

Tests:
- `tests/cli/test_discover.py`: 5 new tests covering each missing-source
  case + a fully-degraded `load_test_catalog` regression + a
  source-checkout sanity test that proves the existence guards don't
  break the happy path
- `tests/cli/test_cli_inventory.py`: 1 new test simulating a bundled
  binary by patching `__import__` to deny `tests.synthetic.*` — confirms
  `opensre tests synthetic` exits non-zero with a clean message and no
  `Traceback` / `ModuleNotFoundError` reaches the user

Live verification: in-process Click invocation with REPO_ROOT,
MAKEFILE_PATH, and RCA_DIR all redirected to an empty tmp dir
(matches the PyInstaller bundle layout). 17/17 e2e checks pass:
`opensre tests list` exits 0 with empty output, `--json tests list`
produces a valid empty array, `tests synthetic` surfaces the structured
error. Full unit suite: 4151 pass / 2 skipped / 1 xfailed.

Closes Tracer-Cloud#1078

* fix(cli): pre-check synthetic data dir + narrow ModuleNotFoundError + EC2 guards (Tracer-Cloud#1078)

Live PyInstaller bundle verification revealed the previous fix was
incomplete: the ``tests.synthetic.rds_postgres`` Python package is
bundled transitively (the static analyzer follows the literal
``from tests.synthetic… import …`` statement in
``app/cli/commands/tests.py`` regardless of the spec's
``collect_submodules`` filter), but the per-scenario data files under
``tests/synthetic/rds_postgres/<scenario>/`` are NOT bundled. So
``opensre tests synthetic`` was still crashing in the actual binary —
now with a ``FileNotFoundError`` from inside the scenario loader,
which my ``except ModuleNotFoundError`` did not catch.

Plus an audit of the rest of ``app/`` turned up the same class of bug
in ``app/cli/commands/deploy.py:323,329`` — ``opensre deploy ec2``
and ``opensre deploy ec2 --down`` directly import from
``tests.deployment.ec2.infrastructure_sdk`` without a guard.

Changes:

- ``run_synthetic_suite`` now pre-checks that
  ``REPO_ROOT / tests / synthetic / rds_postgres`` exists *before*
  the import, so the structured error fires regardless of whether the
  Python package or the data dir is the missing piece.
- Both ``ModuleNotFoundError`` catches narrow on ``exc.name`` so an
  unrelated transitive missing dep (e.g. ``psycopg``, ``boto3``) bubbles
  up as the real cause instead of being mis-reported as "not bundled".
- ``deploy_ec2`` wraps both the destroy and deploy imports in the same
  ``try/except`` pattern, raising a structured ``OpenSREError`` with a
  ``pip install -e .`` suggestion. New helper
  ``_ec2_deploy_not_bundled_error`` keeps the message in one place.
- Suggestion text on both errors updated from "clone the repo" to
  ``pip install -e .`` so pipx/Homebrew users get an actionable path.
- Test helper ``_patch_discover_paths`` extracted to dedupe
  ``monkeypatch.setattr("app.cli.tests.discover.X", ...)`` repetition.

Test additions:

- ``test_rds_synthetic_returns_empty_when_dir_present_but_empty``
- ``test_rds_synthetic_skips_underscore_and_pycache_entries``
- ``test_rds_synthetic_enriches_display_name_from_scenario_yml``
- ``test_rds_synthetic_tolerates_malformed_scenario_yml``
- ``test_tests_synthetic_clean_error_when_data_dir_missing`` — covers
  the actual bundled-binary failure mode that live verification found
- ``test_tests_synthetic_unrelated_module_not_found_propagates`` —
  pins the narrow-catch contract: ``psycopg`` missing must not show
  the "not bundled" message
- ``test_deploy_ec2_clean_error_when_not_bundled`` (parametrized for
  both ``deploy ec2`` and ``deploy ec2 --down``)
- ``test_deploy_ec2_unrelated_module_not_found_propagates``
- Existing ``test_tests_synthetic_clean_error_when_module_not_bundled``
  tightened: ``exit_code == 1``, asserts contractual error strings.

Live PyInstaller bundle verification (rebuild + run from ``dist/opensre``):

- ``./dist/opensre tests list`` → exit 0, empty stdout
- ``./dist/opensre --json tests list`` → exit 0, ``[]``
- ``./dist/opensre tests synthetic --scenario 001-replication-lag``
  → exit 1, structured error: "The synthetic RDS PostgreSQL suite is
  not available in this build" + ``pip install -e .`` suggestion. No
  ``Traceback`` / ``FileNotFoundError`` / ``ModuleNotFoundError``
  reaches the user. ← this is the failure mode the previous fix missed.

Full unit suite: 4160 pass / 2 skipped / 1 xfailed (no regressions).
``ruff check``, ``ruff format --check`` clean. ``mypy`` clean on touched
modules (the 3 errors mypy reports are pre-existing missing-stub
warnings for ``pymysql`` in ``app/integrations/`` — unrelated).

* test(cli): tighten ``tmp_path`` annotations to ``Path`` (Tracer-Cloud#1078 review nit)

Per @Ghraven's PR Tracer-Cloud#1090 review — three of the new bundled-binary tests
had ``tmp_path: object`` (one) or no annotation (two), which loses
the actual ``pathlib.Path`` type and stops type-checkers from inferring
``.mkdir()`` / ``/`` operator on the parameter. Added the
``from pathlib import Path`` import and tightened all three
signatures to ``tmp_path: Path``.

* refactor(cli): address muddlebee's PR Tracer-Cloud#1090 review nits

Three nits from @muddlebee's 2026-04-30 review:

1. Path duplication on ``app/cli/commands/tests.py:125`` — the
   ``REPO_ROOT / "tests" / "synthetic" / "rds_postgres"`` literal was
   computed in both ``run_synthetic_suite`` and
   ``_discover_rds_synthetic_scenarios``. Extracted a
   ``SYNTHETIC_SCENARIOS_DIR`` constant in ``app/cli/tests/discover.py``
   alongside ``MAKEFILE_PATH`` and ``RCA_DIR``; both call sites now
   import it. Single source of truth if the layout ever changes.

2. ``_synthetic_suite_not_bundled_error`` was defined *after* its two
   call sites in ``run_synthetic_suite``. Moved above for forward-
   declaration consistency with how ``_ec2_deploy_not_bundled_error``
   sits above ``deploy_ec2`` in ``deploy.py``.

3. ``test_deploy_ec2_unrelated_module_not_found_propagates`` only
   exercised the ``--down`` (destroy_remote) path. The non-``--down``
   (deploy_remote) path has its own try/except in ``deploy_ec2`` but
   no parallel test. Parametrized the test over both ``argv`` and the
   patched module name so a regression in either branch surfaces
   independently.

Test plumbing: ``_patch_discover_paths`` helper extended with a
``synthetic_dir`` kwarg so tests can patch ``SYNTHETIC_SCENARIOS_DIR``
directly (the new constant is bound at module load, so patching
``REPO_ROOT`` after import doesn't propagate). Existing tests updated
to use the new kwarg; semantics unchanged.

Verification:

- ``pytest tests/cli/test_discover.py tests/cli/test_cli_inventory.py``:
  35 / 35 pass (was 34, +1 from the new ``deploy_remote`` parametrize).
- Full unit suite: 4161 pass / 2 skipped / 1 xfailed (was 4160).
- ``ruff check`` and ``ruff format --check`` clean.
gitsofaryan pushed a commit to gitsofaryan/opensre that referenced this pull request May 3, 2026
…racer-Cloud#1099) (Tracer-Cloud#1101)

The post-merge ``test (windows-latest)`` job on commit 67dc64d (PR Tracer-Cloud#1090
landing) is failing with two distinct symptoms in the install.sh test
suites added by PR Tracer-Cloud#1064:

1. ``tests/cli/test_install_sh_path.py`` — 12 tests fail because the
   subprocess call ``subprocess.run(["bash", "-c", script])`` resolves
   ``bash`` to ``wsl.exe`` on the GitHub Actions ``windows-latest``
   runner, and that runner has no installed WSL distribution. Every
   ``_run`` call exits 1 with a literal "Windows Subsystem for Linux
   has no installed distributions" message, so none of the asserted
   rc files (``.zshrc``/``.bashrc``/``config.fish``) ever get written.

2. ``tests/cli/test_install_sh_resolution.py`` — module fails to import
   on Windows: the existing
   ``@pytest.mark.skipif(os.geteuid() == 0, reason=...)`` decorator
   evaluates ``os.geteuid()`` at decorator-application time, but
   ``os.geteuid`` does not exist on Windows. ``AttributeError`` fires
   *before* pytest can read any markers, blocking collection of every
   test in the module.

Both files exclusively test ``install.sh``, which is a POSIX shell
script (zsh/bash/fish RC files) with no Windows analogue, so the
correct fix for the Windows runner is to skip them.

Changes:

- ``test_install_sh_path.py``: module-level
  ``pytestmark = pytest.mark.skipif(sys.platform == "win32", ...)``
  with a comment that names the failure mode.

- ``test_install_sh_resolution.py``: same module-level skipif, *and*
  defensively replace ``os.geteuid() == 0`` with
  ``_RUNNING_AS_ROOT = hasattr(os, "geteuid") and os.geteuid() == 0``
  evaluated once at module load. ``hasattr`` short-circuits the ``and``
  on Windows so ``os.geteuid()`` is never called there — the module
  imports cleanly even before pytest reads ``pytestmark``.

No source-checkout behaviour change: skipif is a no-op on macOS/Linux,
and the existing root-only test still skips correctly on those
platforms (verified by running ``pytest tests/cli/test_install_sh_path.py
tests/cli/test_install_sh_resolution.py`` locally — 15 / 15 pass).
Full unit suite: 4164 pass / 2 skipped / 1 xfailed.

Closes Tracer-Cloud#1099.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FileNotFoundError: missing 'tests/synthetic/rds_postgres' directory when running 'opensre tests'

3 participants