Skip to content

Conversation

@d-chambers
Copy link
Contributor

@d-chambers d-chambers commented Aug 25, 2025

Description

In one of our Sintela DAS datasets there are a small number of files which are not an integer size of packets. These files would cause the indexing of the entire directory to crash. This PR fixes that issues by checking and raising the correct error when such a situation is found, then expanding the types of exceptions caught when scanning.

It also fixes a minor annoying "feature" that ctl + c didn't work correctly to kill the progress bar while indexing.

Checklist

I have (if applicable):

  • referenced the GitHub issue this PR closes.
  • documented the new feature with docstrings or appropriate doc page.
  • included a test. See testing guidelines.
  • your name has been added to the contributors page (docs/contributors.md).
  • added the "ready_for_review" tag once the PR is ready to be reviewed.

Summary by CodeRabbit

  • New Features

    • Accept a custom rich Progress instance/class for progress utilities and scanning.
  • Bug Fixes

    • Scanning is more robust: corrupt files and missing optional dependencies are skipped with clearer errors.
    • Sintela binary files with extra/misaligned bytes now raise a specific error.
    • KeyboardInterrupt during scanning stops progress cleanly.
    • Reduced repeated-work via internal caching for faster repeated scans.
  • Documentation

    • Updated progress-related docstrings.
  • Tests

    • Added tests for keyboard interrupt handling and Sintela binary validation.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 25, 2025

Walkthrough

Adds optional rich Progress support to progress utilities and scan; scan now handles KeyboardInterrupt, unknown formats, corrupt files, and missing optional dependencies with fiber_io_hint caching; Sintela binary utils now raise InvalidFiberFileError for mis-sized files; tests added for interrupt and corrupted Sintela files.

Changes

Cohort / File(s) Summary
Compat import
dascore/compat.py
Added from rich.progress import Progress to exports.
Scan core changes
dascore/io/core.py
scan now accepts PROGRESS_LEVELS | Progress; imports Progress and InvalidFiberFileError. Wrapped main loop in KeyboardInterrupt handling (stops progress if supported), uses IOResourceManager per source, skips UnknownFiberFormatError, caches fiber_io by input_type, treats directory vs file inputs separately, and catches OSError/InvalidFiberFileError/ValueError/TypeError to warn-and-skip and MissingOptionalDependencyError to accumulate missing deps. Docstring updated.
Sintela binary robustness
dascore/io/sintela_binary/utils.py
Imported InvalidFiberFileError. _get_number_of_packets no longer takes version, computes packet count from header and size, raises InvalidFiberFileError if remainder != 0. _read_remaining_header asserts version == "3" and calls updated packet counter.
Progress utilities
dascore/utils/progress.py
Type hints widened to PROGRESS_LEVELS | Progress. get_progress_instance returns passed Progress instance unchanged; otherwise constructs a Progress. track accepts Progress or level string and guards isinstance(progress, str) for mode checks. Docstrings updated.
Scan interrupt test
tests/test_io/test_io_core.py
Added imports (rich.progress as prog, dascore) and TestScan.test_keyboard_interrupt which defines a Progress subclass that raises KeyboardInterrupt from track and asserts dc.scan(..., progress=Progress()) propagates the interrupt.
Sintela corruption test
tests/test_io/test_sintela_binary/test_sintela_binary.py
New test class TestScanSintelaBinary with a class-scoped fixture that appends extra bytes to a Sintela V3 asset and test_extra_bytes_raises asserting InvalidFiberFileError when scanning the corrupted file.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User
  participant SC as dc.scan
  participant PR as Progress (get_progress_instance / track)
  participant RM as IOResourceManager
  participant FI as fiber_io
  participant FS as Filesystem

  U->>SC: scan(patch_sources, progress)
  SC->>PR: get_progress_instance(progress)
  PR-->>SC: Progress instance

  loop each patch_source
    SC->>RM: enter IOResourceManager(patch_source)
    RM-->>SC: (fiber_io, resource) or UnknownFiberFormatError
    alt Unknown format
      SC-->>SC: log/skip source
    else Got fiber_io
      SC->>SC: cache fiber_io by input_type
      alt resource is directory
        SC->>FI: fiber_io.scan(directory)
        FI-->>SC: attrs or MissingOptionalDependencyError
      else resource is file
        SC->>FI: fiber_io.scan(file)
        alt Error (OSError / InvalidFiberFileError / ValueError / TypeError)
          FI-->>SC: error -> warn and continue
        else MissingOptionalDependencyError
          FI-->>SC: record missing dep and continue
        else Success
          FI-->>SC: attrs
        end
      end
      SC-->>SC: append attrs to output
    end
  end

  opt KeyboardInterrupt from Progress
    PR-->>SC: raises KeyboardInterrupt
    SC->>PR: stop() if supported
    SC-->>U: propagate KeyboardInterrupt
  end

  SC-->>U: return list[PatchAttrs]
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I twitch my whiskers at progress that flows,
Halt on a hiccup where the interrupt goes.
If Sintela spills an extra byte, I cry,
“Invalid!” — then I cache and hop on by.
Small paws, steady scans, we ship with grace. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7957c22 and ed3de10.

📒 Files selected for processing (1)
  • dascore/io/core.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • dascore/io/core.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sintela_binary_bad_file

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link

codecov bot commented Aug 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.87%. Comparing base (384a547) to head (ed3de10).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #520   +/-   ##
=======================================
  Coverage   99.87%   99.87%           
=======================================
  Files         121      121           
  Lines        9874     9887   +13     
=======================================
+ Hits         9862     9875   +13     
  Misses         12       12           
Flag Coverage Δ
unittests 99.87% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (12)
tests/test_io/test_io_core.py (1)

14-17: Minor: Redundant imports (dascore and dc)

Both dascore and dascore as dc are imported; only one is necessary. You can use dc for both attribute patching and functional calls.

Apply this diff for brevity:

-import rich.progress as prog
-
-import dascore
-import dascore as dc
+import rich.progress as prog
+import dascore as dc

And adjust the monkeypatch line below:

-        monkeypatch.setattr(dascore, "_debug", False)
+        monkeypatch.setattr(dc, "_debug", False)
tests/test_io/test_sintela_binary/test_sintela_binary.py (1)

18-35: Solid negative test; consider asserting on message and adding a positive control

This proves mis-sized files raise InvalidFiberFileError. Two optional enhancements:

  • Assert the error message contains a hint (e.g., “not divisible by packet size”) to lock in UX.
  • Add a positive control test using the unmodified .raw file to assert scan succeeds, guarding against false positives in fixture setup.

Example tweak inside the test:

-        with pytest.raises(InvalidFiberFileError):
+        with pytest.raises(InvalidFiberFileError, match="divisible|packet size"):
             fiber_io.scan(extra_bytes_file)
dascore/utils/progress.py (2)

57-61: Docstring nit: clarify that a Progress instance or subclass is accepted

Minor clarity: explicitly say “instance or subclass” to reflect the behavior after the above change.

Suggested wording:

  • “…can also accept a rich.progress.Progress instance or subclass.”

Also, “emmit” -> “emit” below (typo).

To update the typo outside the changed hunk, adjust this line:

# The minimum length to emit a progress bar.

36-37: Optional: consider constructing columns via compat to keep all rich touchpoints centralized

You import Progress from compat but still construct columns via rich.progress as prog. This is fine, but if you’re aiming to isolate ‘rich’ behind compat, consider mirroring columns or providing a small factory there. Not required for this PR.

dascore/io/core.py (8)

23-23: Decouple runtime from rich by avoiding a hard Progress import

With from future import annotations, you can reference Progress as a string literal and avoid importing it at runtime. This keeps rich fully optional and prevents accidental ImportError during import if rich isn’t installed. Also aligns with type-only usage here.

Apply:

-from dascore.compat import Progress
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from dascore.compat import Progress

And update the annotation to a forward ref:

-def scan(..., progress: PROGRESS_LEVELS | Progress = "standard", ...) -> list[dc.PatchAttrs]:
+def scan(..., progress: PROGRESS_LEVELS | "Progress" = "standard", ...) -> list[dc.PatchAttrs]:

Also applies to: 790-790


363-365: Catching Exception in _get_format risks masking bugs; add debug logging

Swallowing every exception improves resilience to third-party FiberIOs, but completely silences programmer errors. Suggest logging at debug level to aid diagnosis without spamming users.

Minimal change:

-                except Exception:
-                    continue
+                except Exception as ex:
+                    # Keep robust, but surface details in debug logs.
+                    import logging as _logging
+                    _logging.getLogger(__name__).debug(
+                        "get_format suppressed for %s: %r",
+                        getattr(fiber_io, "name", "?"),
+                        ex,
+                    )
+                    continue

811-813: Docstring typo: “acceted” → “accepts”; clarify phrasing

Small doc polish for correctness.

-        "basic" is best for low latency scenarios. Can also acceted a subclass
-        of rich.progress.Progress.
+        "basic" is best for low-latency scenarios. Can also accept a subclass
+        of rich.progress.Progress.

882-887: Include exception detail in warning for failed scans

Currently the warning hides the cause, making triage harder. Appending the exception message keeps logs actionable without being noisy.

-                    except (OSError, InvalidFiberFileError, ValueError, TypeError):
-                        warnings.warn(f"Failed to scan {resource}", UserWarning)
+                    except (OSError, InvalidFiberFileError, ValueError, TypeError) as ex:
+                        warnings.warn(f"Failed to scan {resource}: {ex}", UserWarning)
                         continue

888-890: Deriving package name from ex.msg is brittle

ImportError/MissingOptionalDependencyError usually carries .name; .msg parsing can misidentify the package. Prefer name, then fall back to message parsing.

-                    except MissingOptionalDependencyError as ex:
-                        missing_optional_deps[ex.msg.split(" ")[0]] += 1
+                    except MissingOptionalDependencyError as ex:
+                        pkg = getattr(ex, "name", None)
+                        if not pkg:
+                            # Fallback: first token from message or str(ex)
+                            txt = getattr(ex, "msg", "") or str(ex)
+                            pkg = (txt.split()[0] if txt else "unknown")
+                        missing_optional_deps[pkg] += 1
                         continue

891-892: Avoid re-parsing PatchAttrs objects

If source already yields PatchAttrs, re-creating via from_dict is unnecessary.

-                for attr in source:
-                    out.append(dc.PatchAttrs.from_dict(attr))
+                for attr in source:
+                    out.append(attr if isinstance(attr, dc.PatchAttrs)
+                               else dc.PatchAttrs.from_dict(attr))

893-895: Comment typo

Minor typo and casing.

-    # Ensure ctl + c exists scan.
+    # Ensure Ctrl+C exits scan.

654-662: Align scan_to_df progress type with scan’s new annotation

scan_to_df passes progress through to scan but its type hint still excludes a Progress instance. Align for consistency.

-def scan_to_df(..., progress: PROGRESS_LEVELS = "standard", ...) -> pd.DataFrame:
+def scan_to_df(..., progress: PROGRESS_LEVELS | "Progress" = "standard", ...) -> pd.DataFrame:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 384a547 and 7957c22.

📒 Files selected for processing (6)
  • dascore/compat.py (1 hunks)
  • dascore/io/core.py (6 hunks)
  • dascore/io/sintela_binary/utils.py (3 hunks)
  • dascore/utils/progress.py (4 hunks)
  • tests/test_io/test_io_core.py (2 hunks)
  • tests/test_io/test_sintela_binary/test_sintela_binary.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
tests/test_io/test_sintela_binary/test_sintela_binary.py (3)
dascore/exceptions.py (1)
  • InvalidFiberFileError (14-15)
dascore/io/sintela_binary/core.py (1)
  • SintelaBinaryV3 (30-80)
dascore/utils/downloader.py (1)
  • fetch (40-58)
dascore/compat.py (1)
tests/test_io/test_io_core.py (1)
  • Progress (382-387)
tests/test_io/test_io_core.py (2)
dascore/utils/progress.py (1)
  • track (39-84)
dascore/io/core.py (2)
  • scan (483-500)
  • scan (784-898)
dascore/io/sintela_binary/utils.py (2)
dascore/exceptions.py (1)
  • InvalidFiberFileError (14-15)
dascore/utils/misc.py (1)
  • get_buffer_size (751-768)
dascore/utils/progress.py (2)
tests/test_io/test_io_core.py (2)
  • Progress (382-387)
  • track (385-387)
dascore/io/indexer.py (2)
  • update (90-95)
  • update (309-346)
dascore/io/core.py (3)
tests/test_io/test_io_core.py (1)
  • Progress (382-387)
dascore/exceptions.py (3)
  • InvalidFiberFileError (14-15)
  • UnknownFiberFormatError (18-19)
  • MissingOptionalDependencyError (98-99)
dascore/utils/io.py (2)
  • IOResourceManager (89-137)
  • source (98-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (macos-latest, 3.10)
🔇 Additional comments (5)
dascore/compat.py (1)

15-15: rich is already a declared dependency—no fallback needed
The search shows "rich", at line 53 in your project’s dependency file, confirming that rich is an explicit runtime requirement. The guarded import shim is therefore unnecessary.

tests/test_io/test_io_core.py (1)

379-395: Test coverage looks good; it exercises the exact regression around Ctrl+C and Progress integration

The test cleanly validates propagation of KeyboardInterrupt through the progress path and ensures the bar is active by flipping _debug. Nicely scoped.

dascore/io/core.py (3)

36-36: Good: add InvalidFiberFileError import

Pulling in InvalidFiberFileError enables robust handling of mis-sized/corrupt files during scan. Matches the PR intent.


848-896: Graceful Ctrl+C handling looks good

Wrapping the scan loop and stopping progress ensures a clean exit and matches the new test behavior. Re-raising preserves expected KeyboardInterrupt semantics.


871-873: Skip Protocol Verified – No Changes Needed

I reviewed the _iter_filesystem and _iterate_scan_inputs implementations:

  • _iter_filesystem explicitly handles a "skip" signal right after yielding a directory, yielding None then returning to stop recursion.
  • _iterate_scan_inputs uses yield from _iter_filesystem(...), which per PEP 380 correctly forwards .send("skip") calls into the subgenerator.
  • The code in dascore/io/core.py sends "skip" on the outer generator (not the track wrapper), ensuring the signal reaches _iter_filesystem.
  • There’s a dedicated test (test_skip_signal_directory) confirming that anything after a "skip"-sent directory is indeed skipped.

Since the end-to-end protocol is already implemented and covered by tests, no action is required here.

Comment on lines +109 to 121
def _get_number_of_packets(fid, header, size):
"""Get the number of packets in the file."""
# get filesize without depending on input being a file.
file_size = get_buffer_size(fid)
samples = header["num_samples"]
channels = header["num_channels"]
assert version == "3", "Only version 3 of Sintela binary currently supported."
packet_size = size + (channels * samples * 4)
num_packets, remainder = divmod(file_size, packet_size)
assert not remainder, "Sintela binary file size not divisible by packet size."
if remainder:
msg = "Sintela binary file size not divisible by packet size."
raise InvalidFiberFileError(msg)
return num_packets

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid magic constant for sample size; derive from dtype and rename param for clarity

packet_size = size + (channels * samples * 4) bakes in 4 bytes/sample. This couples the packet math to V3’s float32 and will silently break if dtype ever changes. Also, the size parameter is actually the header size — name it accordingly.

Apply this diff to use the header’s dtype and clarify naming:

-def _get_number_of_packets(fid, header, size):
+def _get_number_of_packets(fid, header, header_size):
     """Get the number of packets in the file."""
     # get filesize without depending on input being a file.
     file_size = get_buffer_size(fid)
     samples = header["num_samples"]
     channels = header["num_channels"]
-    packet_size = size + (channels * samples * 4)
+    # Derive bytes/sample from header["dtype"] to avoid magic constants.
+    dtype_size = np.dtype(header["dtype"]).itemsize
+    packet_size = header_size + (channels * samples * dtype_size)
     num_packets, remainder = divmod(file_size, packet_size)
     if remainder:
         msg = "Sintela binary file size not divisible by packet size."
         raise InvalidFiberFileError(msg)
     return num_packets

Note: this assumes header["dtype"] is populated before calling this function; see next comment for ordering fix.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _get_number_of_packets(fid, header, size):
"""Get the number of packets in the file."""
# get filesize without depending on input being a file.
file_size = get_buffer_size(fid)
samples = header["num_samples"]
channels = header["num_channels"]
assert version == "3", "Only version 3 of Sintela binary currently supported."
packet_size = size + (channels * samples * 4)
num_packets, remainder = divmod(file_size, packet_size)
assert not remainder, "Sintela binary file size not divisible by packet size."
if remainder:
msg = "Sintela binary file size not divisible by packet size."
raise InvalidFiberFileError(msg)
return num_packets
def _get_number_of_packets(fid, header, header_size):
"""Get the number of packets in the file."""
# get filesize without depending on input being a file.
file_size = get_buffer_size(fid)
samples = header["num_samples"]
channels = header["num_channels"]
# Derive bytes/sample from header["dtype"] to avoid magic constants.
dtype_size = np.dtype(header["dtype"]).itemsize
packet_size = header_size + (channels * samples * dtype_size)
num_packets, remainder = divmod(file_size, packet_size)
if remainder:
msg = "Sintela binary file size not divisible by packet size."
raise InvalidFiberFileError(msg)
return num_packets
🤖 Prompt for AI Agents
In dascore/io/sintela_binary/utils.py around lines 109 to 121, the function uses
a hardcoded 4 bytes/sample and a misleading parameter name `size`; replace the
magic constant by deriving the per-sample byte size from header["dtype"] (e.g.
numpy.dtype(header["dtype"]).itemsize) and rename the `size` parameter to
`header_size` (and update call sites) so packet_size is computed as header_size
+ (channels * samples * itemsize); also ensure callers populate header["dtype"]
before this function is invoked.

Comment on lines +132 to 134
assert version == "3", "only 3 support for now,"
header["num_packets"] = _get_number_of_packets(fid, header, header_size)
header["dtype"] = "<f4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Replace asserts with explicit exceptions and set dtype before computing packet count

  • assert version == "3" is a runtime assertion that can be stripped with -O and yields a generic AssertionError. Prefer explicit, stable exceptions with actionable messages.
  • Call _get_number_of_packets after setting header["dtype"] so packet sizing uses the derived itemsize.

Apply this diff:

-    assert version == "3", "only 3 support for now,"
-    header["num_packets"] = _get_number_of_packets(fid, header, header_size)
-    header["dtype"] = "<f4"
+    if version != "3":
+        raise InvalidFiberFileError(
+            f"Unsupported Sintela binary version: {version}. Only '3' is supported."
+        )
+    header["dtype"] = "<f4"
+    header["num_packets"] = _get_number_of_packets(fid, header, header_size)

Additionally, consider replacing the earlier size assertion with a proper exception for consistency and better user feedback:

# Replace:
assert header_size == expected_size
# With:
if header_size != expected_size:
    raise InvalidFiberFileError(
        f"Header size {header_size} does not match expected {expected_size} for version {version}."
    )

Please run a quick search to confirm _get_number_of_packets isn’t called elsewhere with the old signature:


🏁 Script executed:

#!/bin/bash
rg -nP --glob '!**/site-packages/**' '\b_get_number_of_packets\s*\(' -C2

Length of output: 862


🏁 Script executed:

#!/bin/bash
# Locate and display the start of the Sintela binary utils file for inspection
file="dascore/io/sintela_binary/utils.py"
if [ -f "$file" ]; then
  echo "Found $file; printing lines 1–200:"
  sed -n '1,200p' "$file"
else
  echo "File not found at expected path. Searching for any utils.py in sintela_binary:"
  find . -type f | grep 'sintela_binary/utils.py'
fi

Length of output: 6706


Enforce explicit version checking and reorder dtype assignment

The only call to _get_number_of_packets is in _read_remaining_header, so updating this block won’t break elsewhere. Replace both assert statements with InvalidFiberFileError and set header["dtype"] before computing packet count to ensure correct itemsize usage.

• File: dascore/io/sintela_binary/utils.py
• Function: _read_remaining_header
• Lines: ~125–135

-    assert header_size == expected_size
+    if header_size != expected_size:
+        raise InvalidFiberFileError(
+            f"Header size {header_size} does not match expected {expected_size} for version {version}."
+        )
 
-    assert version == "3", "only 3 support for now,"
-    header["num_packets"] = _get_number_of_packets(fid, header, header_size)
-    header["dtype"] = "<f4"
+    if version != "3":
+        raise InvalidFiberFileError(
+            f"Unsupported Sintela binary version: {version}. Only '3' is supported."
+        )
+    header["dtype"] = "<f4"
+    header["num_packets"] = _get_number_of_packets(fid, header, header_size)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert version == "3", "only 3 support for now,"
header["num_packets"] = _get_number_of_packets(fid, header, header_size)
header["dtype"] = "<f4"
# Previously:
# expected_size = ...
# assert header_size == expected_size
# assert version == "3", "only 3 support for now,"
# header["num_packets"] = _get_number_of_packets(fid, header, header_size)
# header["dtype"] = "<f4"
if header_size != expected_size:
raise InvalidFiberFileError(
f"Header size {header_size} does not match expected {expected_size} for version {version}."
)
if version != "3":
raise InvalidFiberFileError(
f"Unsupported Sintela binary version: {version}. Only '3' is supported."
)
header["dtype"] = "<f4"
header["num_packets"] = _get_number_of_packets(fid, header, header_size)
🤖 Prompt for AI Agents
In dascore/io/sintela_binary/utils.py around lines 132 to 134, replace the two
assert statements with explicit InvalidFiberFileError raises (check version
explicitly and raise InvalidFiberFileError with a clear message if unsupported)
and assign header["dtype"] = "<f4" before calling _get_number_of_packets so
packet-size calculations use the correct itemsize; ensure the error is raised
(not asserted) and the dtype is set prior to computing header["num_packets"].

Comment on lines +15 to +21
def get_progress_instance(progress: PROGRESS_LEVELS | Progress = "standard"):
"""
Get the Rich progress bar instance based on complexity level.
"""
# If a progress class is passed in, just use it.
if isinstance(progress, Progress):
return progress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Support both Progress instances and subclasses (classes) as input

Current logic only detects an instance of Progress. If a user passes a subclass (class object) intending you to construct it, we’ll ignore it and build the base Progress instead.

Apply this diff to handle both:

 def get_progress_instance(progress: PROGRESS_LEVELS | Progress = "standard"):
@@
-    # If a progress class is passed in, just use it.
-    if isinstance(progress, Progress):
-        return progress
+    # Accept either an instance or a subclass (class) of Progress.
+    if isinstance(progress, Progress):
+        return progress
+    if isinstance(progress, type) and issubclass(progress, Progress):
+        ProgressClass = progress
+    else:
+        ProgressClass = Progress
@@
-    if progress == "basic":
+    if progress == "basic":
         # set the refresh rate very low and eliminate the spinner
         kwargs["refresh_per_second"] = 0.25
         progress_list = progress_list[1:]
-    return Progress(*progress_list, **kwargs)
+    return ProgressClass(*progress_list, **kwargs)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_progress_instance(progress: PROGRESS_LEVELS | Progress = "standard"):
"""
Get the Rich progress bar instance based on complexity level.
"""
# If a progress class is passed in, just use it.
if isinstance(progress, Progress):
return progress
def get_progress_instance(progress: PROGRESS_LEVELS | Progress = "standard"):
"""
Get the Rich progress bar instance based on complexity level.
"""
# Accept either an instance or a subclass (class) of Progress.
if isinstance(progress, Progress):
return progress
if isinstance(progress, type) and issubclass(progress, Progress):
ProgressClass = progress
else:
ProgressClass = Progress
if progress == "basic":
# set the refresh rate very low and eliminate the spinner
kwargs["refresh_per_second"] = 0.25
progress_list = progress_list[1:]
return ProgressClass(*progress_list, **kwargs)
🤖 Prompt for AI Agents
In dascore/utils/progress.py around lines 15 to 21, the current check only
handles Progress instances and ignores when a Progress subclass (class object)
is passed; update the logic to also detect when progress is a class (use
inspect.isclass or isinstance(progress, type)) and issubclass(progress,
Progress), and in that case instantiate and return progress() (or
progress(*default_args) if needed); keep the existing instance check and string
handling as-is.

yield from sequence
return
update = 1.0 if progress == "standard" else 5.0
update = 1.0 if isinstance(progress, str) and progress == "standard" else 5.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Align update rate semantics: only “basic” should slow refresh; instances/classes should get standard rate

Currently, any non-string progress (including instances) use the slower 5s update period. That’s surprising. Make “basic” the only slow case; everything else gets the standard 1s.

Apply this diff:

-    update = 1.0 if isinstance(progress, str) and progress == "standard" else 5.0
+    update = 5.0 if (isinstance(progress, str) and progress == "basic") else 1.0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
update = 1.0 if isinstance(progress, str) and progress == "standard" else 5.0
update = 5.0 if (isinstance(progress, str) and progress == "basic") else 1.0
🤖 Prompt for AI Agents
In dascore/utils/progress.py around line 76, the update rate logic currently
treats any non-string progress (including instances and classes) as the slow
5.0s case; change it so only the string "basic" uses 5.0s and everything else
(including progress instances or classes and other strings) uses the standard
1.0s. Replace the conditional to check specifically for isinstance(progress,
str) and progress == "basic" to set update = 5.0, otherwise set update = 1.0.

@d-chambers d-chambers merged commit d6e1c58 into master Aug 25, 2025
20 checks passed
@d-chambers d-chambers deleted the sintela_binary_bad_file branch August 25, 2025 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants