Skip to content

Conversation

@d-chambers
Copy link
Contributor

@d-chambers d-chambers commented Aug 23, 2025

Description

This PR add support for Sintela's binary format, version 3.

Checklist

I have (if applicable):

  • referenced the GitHub issue this PR closes.
  • documented the new feature with docstrings or appropriate doc page.
  • included a test. See testing guidelines.
  • your name has been added to the contributors page (docs/contributors.md).
  • added the "ready_for_review" tag once the PR is ready to be reviewed.

Summary by CodeRabbit

  • New Features

    • Support for reading Sintela binary v3 with coordinates, metadata (including gauge length), and correct data typing.
    • Expanded valid data types to include "phase_difference" and "phase_rate".
    • Utilities to determine buffer size and to attempt memory-mapping file-like objects.
  • Chores

    • Registered the Sintela v3 reader in the plugin registry and added a sample Sintela v3 dataset to the data registry.
  • Tests

    • Added common IO tests for Sintela v3 and tests for the new buffer/memmap utilities.

@d-chambers d-chambers added the IO Work for reading/writing different formats label Aug 23, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 23, 2025

Walkthrough

Adds a Sintela binary v3 reader (package export, core reader, and utils), registers it as a fiber_io entry point, updates VALID_DATA_TYPES with "phase_difference" and "phase_rate", extends the data registry with a test file, and adds file-buffer/memmap utilities plus tests.

Changes

Cohort / File(s) Summary
Data type constants
dascore/constants.py
Adds "phase_difference" and "phase_rate" to VALID_DATA_TYPES immediately after "phase".
Sintela binary reader package & export
dascore/io/sintela_binary/__init__.py
Adds package initializer and re-exports SintelaBinaryV3.
Sintela binary reader implementation
dascore/io/sintela_binary/core.py
Adds SintelaPatchAttrs and SintelaBinaryV3 (format detection, scan, read) leveraging reader utilities.
Sintela binary utilities
dascore/io/sintela_binary/utils.py
New utilities for parsing v3 headers, validating, computing coords, loading/memory-mapping data, and constructing Patch objects.
Entry point registration
pyproject.toml
Registers SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3" and adds GDR_DAS__V1; re-adds SILIXA_H5__V1.
Data registry
dascore/data_registry.txt
Adds dataset entry sintela_binary_v3_test_1.raw with SHA256 hash and remote URL.
Misc utilities
dascore/utils/misc.py
Adds get_buffer_size(fid: IOBase) and maybe_mem_map(fid: IOBase, dtype="<u1") for file-like buffer sizing and memmap fallback.
Tests — misc
tests/test_utils/test_misc.py
Adds tests for get_buffer_size and maybe_mem_map covering real files and BytesIO scenarios.
Tests — IO
tests/test_io/test_common_io.py
Imports SintelaBinaryV3 and adds SintelaBinaryV3(): ("sintela_binary_v3_test_1.raw",) to COMMON_IO_READ_TESTS.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant IO as SintelaBinaryV3
  participant U as sintela_binary.utils
  Note over User,IO: Format detection (v3)
  User->>IO: get_format(resource)
  IO->>U: _read_base_header(resource)
  U-->>IO: base_header
  IO->>U: _get_complete_header(resource, base_header)
  U-->>IO: complete_header (validated v3)
  IO-->>User: ("Sintela_Binary","3") or False
Loading
sequenceDiagram
  autonumber
  actor User
  participant IO as SintelaBinaryV3
  participant U as sintela_binary.utils
  participant DC as dascore
  Note over User,IO: Read flow (v3)
  User->>IO: read(resource, time?, distance?)
  IO->>U: _get_attrs_coords_header(resource, attr_cls)
  U-->>IO: (attrs, coord_mgr, header)
  IO->>U: _load_data(resource, header)
  U-->>IO: raw_data (memmap/ndarray)
  IO->>U: _get_patch(data, coords, attrs, slices?)
  U-->>IO: Patch
  IO->>DC: dc.spool(patch)
  DC-->>User: BaseSpool
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I twitch my ears at header bytes,
Hop through packets, parse their lights.
Gauge and phase in tidy rows,
I memmap, slice, and forward those.
New v3 trails — thump-thump, code goes! 🐇📦

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5c442db and ed5c3b4.

📒 Files selected for processing (1)
  • pyproject.toml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sintela_binary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link

codecov bot commented Aug 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.87%. Comparing base (271bcc8) to head (ed5c3b4).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master     #516    +/-   ##
========================================
  Coverage   99.87%   99.87%            
========================================
  Files         118      121     +3     
  Lines        9735     9874   +139     
========================================
+ Hits         9723     9862   +139     
  Misses         12       12            
Flag Coverage Δ
unittests 99.87% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dascore/io/sintela_binary/core.py (1)

69-81: read(): wrong keyword name prevents using SintelaPatchAttrs

Passing attr_cls= won’t bind to _get_patch’s attr_class parameter, so reader will produce generic PatchAttrs.

-        patch = _get_patch(
-            resource, time=time, distance=distance, attr_cls=SintelaPatchAttrs
-        )
+        patch = _get_patch(
+            resource, time=time, distance=distance, attr_class=SintelaPatchAttrs
+        )

Also update the docstring:

-        """Read a single file with Silixa H5 data inside."""
+        """Read a single Sintela v3 binary file and return a one-patch spool."""
🧹 Nitpick comments (12)
dascore/data_registry.txt (1)

33-33: Checksum Verified – Asset Integrity Confirmed

  • The SHA256 of sintela_binary_v3_test_1.raw matches the declared 4c0afae1ab60b73ddcb3c4f7ac55b976d4d18d1ffc9ea6cd2dfa6a881b697101.
  • Consider adding a mirror URL alongside the primary GitHub link for improved availability.
  • It’s still recommended (as a CI enhancement) to verify the SHA256 locally on first download to guard against silent upstream changes.

File: dascore/data_registry.txt (line 33)

dascore/constants.py (1)

66-68: New data types align with Sintela v3 – ensure consistent casing across readers

Adding "phase_difference" and "phase_rate" is great. Please confirm all producers/consumers use lowercase values; in the new Sintela utils, "Strain" and "Strain Rate" appear with capitalization which will not match these constants.

Would you like me to generate a quick grep to find inconsistent data_type strings across the repo?

dascore/io/sintela_binary/utils.py (5)

102-106: Avoid shadowing imported name ‘array’

Local variable array shadows the imported function array from dascore.compat. Rename for clarity.

 def _read_base_header(fid):
     """Return the first 3 elements of the sintela header."""
-    array = np.fromfile(fid, dtype=base_header_dtypes, count=1)
-    out = {x: y for x, y in zip(array.dtype.names, array[0])}
+    buf = np.fromfile(fid, dtype=base_header_dtypes, count=1)
+    out = {x: y for x, y in zip(buf.dtype.names, buf[0])}
     return out

120-131: Prefer explicit exceptions over asserts for header validation

Asserts may be stripped under optimization and are not user-friendly. Raise a clear exception if header sizes don’t match.

-    expected_size = _HEADER_SIZES.get(version, -1)
-    assert header_size == expected_size
+    expected_size = _HEADER_SIZES.get(version, -1)
+    if header_size != expected_size:
+        raise ValueError(
+            f"Header size {header_size} does not match expected {expected_size} "
+            f"for version {version}."
+        )

141-149: Guard against invalid sample_rate and ensure timedelta conversion is supported

If sample_rate is 0/NaN or extremely small, this will raise/overflow. Also ensure dc.to_timedelta64(float) is supported.

-    timestep = dc.to_timedelta64(1 / header["sample_rate"])
+    sr = float(header["sample_rate"])
+    if not np.isfinite(sr) or sr <= 0:
+        raise ValueError(f"Invalid sample_rate: {sr}")
+    timestep = dc.to_timedelta64(1 / sr)

If to_timedelta64 does not support floats in your environment, I can switch to np.timedelta64(int(round(1e9/sr)), "ns") to avoid dependency on that helper.


152-166: Confirm channel indexing convention (off-by-one risk)

Distance start is start_channel * dx. If start_channel is 1-based in Sintela v3, this will shift distance by one channel (should be (start_channel - 1) * dx). Please confirm against the v3 spec or sample header.

I can adapt this function once we verify the convention.


203-211: Attr construction: let PatchAttrs derive dims from CoordManager

Minor: since you pass cm into attrs, you can drop explicit dims injection earlier; PatchAttrs can derive dims from coords. This reduces duplication and mismatch risk.

No code change required here if you apply the earlier change in _get_attr_dict to omit dims.

dascore/io/sintela_binary/core.py (3)

23-28: Correct class and module docstrings; fix copy/paste artifacts

  • "Patch Attributes for Silixa hdf5 format." and module references to Silixa/Terra15 are misleading here.
 class SintelaPatchAttrs(dc.PatchAttrs):
-    """Patch Attributes for Silixa hdf5 format."""
+    """Patch Attributes for Sintela binary format."""

I’ve included broader docstring fixes below for get_format/scan/read.


37-55: Update get_format docstring and parameter description

Docstrings still mention Silixa/Terra15.

     def get_format(self, resource: BinaryReader, **kwargs) -> tuple[str, str] | bool:
         """
-        Return name and version string if Silixa hdf5 else False.
+        Return (name, version) if this is a Sintela v3 binary file, else False.
 
         Parameters
         ----------
         resource
-            A path to the file which may contain terra15 data.
+            A path or binary stream to the candidate Sintela file.
         """

56-66: scan docstring mentions Silixa; also ensure resource offset reset

Minor docstring fix. The BinaryReader type-caster resets the offset, so this is fine functionally.

-    def scan(self, resource: BinaryReader, **kwargs) -> list[dc.PatchAttrs]:
-        """Scan a Silixa HDF5 file, return summary information on the contents."""
+    def scan(self, resource: BinaryReader, **kwargs) -> list[dc.PatchAttrs]:
+        """Scan a Sintela v3 binary file and return summary attributes."""
dascore/io/sintela_binary/__init__.py (2)

4-4: Explicitly export re-exported symbol to satisfy Ruff F401 and make API intent clear

SintelaBinaryV3 is re-exported for public use, but Ruff flags it as "imported but unused". Declare __all__ so the intent is explicit and the lint warning is silenced.

Apply this diff:

 """
 Sintela binary reader.
 """
 from .core import SintelaBinaryV3
+
+__all__ = ["SintelaBinaryV3"]

1-3: Optional: clarify the docstring to mention the supported version

Minor clarity win since this package may eventually host multiple versions.

-"""
-Sintela binary reader.
-"""
+"""
+Sintela binary reader (version 3).
+"""
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 72c7878 and 04d9bbf.

📒 Files selected for processing (7)
  • dascore/constants.py (1 hunks)
  • dascore/data_registry.txt (1 hunks)
  • dascore/io/sintela_binary/__init__.py (1 hunks)
  • dascore/io/sintela_binary/core.py (1 hunks)
  • dascore/io/sintela_binary/utils.py (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_io/test_common_io.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
dascore/io/sintela_binary/__init__.py (1)
dascore/io/sintela_binary/core.py (1)
  • SintelaBinaryV3 (30-80)
tests/test_io/test_common_io.py (1)
dascore/io/sintela_binary/core.py (1)
  • SintelaBinaryV3 (30-80)
dascore/io/sintela_binary/core.py (4)
dascore/io/core.py (1)
  • FiberIO (443-581)
dascore/utils/io.py (1)
  • BinaryReader (33-51)
dascore/io/sintela_binary/utils.py (3)
  • _get_attrs_coords_header (203-210)
  • _get_patch (213-224)
  • _read_base_header (102-106)
dascore/core/attrs.py (2)
  • PatchAttrs (65-287)
  • get (174-179)
dascore/io/sintela_binary/utils.py (4)
dascore/compat.py (1)
  • array (30-36)
dascore/core/attrs.py (2)
  • PatchAttrs (65-287)
  • get (174-179)
dascore/core/coordmanager.py (2)
  • get_coord_manager (1036-1124)
  • step (1019-1021)
dascore/utils/time.py (1)
  • to_timedelta64 (141-173)
🪛 Ruff (0.12.2)
dascore/io/sintela_binary/__init__.py

4-4: .core.SintelaBinaryV3 imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
🔇 Additional comments (2)
pyproject.toml (1)

123-123: Entry point looks correct; ensure public re-export path exists

SINTELA_BINARY__V3 mapping is good. Please confirm dascore/io/sintela_binary/__init__.py re-exports SintelaBinaryV3 so from dascore.io.sintela_binary import SintelaBinaryV3 works (tests rely on this pattern).

tests/test_io/test_common_io.py (1)

39-39: LGTM: importing from the package namespace keeps tests aligned with the public API

This import ensures tests use the stable, documented path rather than reaching into module internals. No changes needed.

Comment on lines 185 to 201
def _load_data(fid, header):
"""Use numpy's memmap to get array information."""
header_size = header["header_size"]
num_channels = header["num_channels"]
num_samples = header["num_samples"]
dtype = np.dtype(header["dtype"])
packet_size = header_size + num_channels * num_samples * dtype.itemsize
# Memory map entire file as raw bytes. We do this to later skip the headers
fid.seek(0)
raw = np.memmap(fid, dtype=np.uint8, mode="r")
# Compute how many blocks
block_count = raw.size // packet_size
# Create a view that skips headers
data = raw.reshape(block_count, packet_size)[:, header_size:]
data = data.view(dtype).reshape(-1, num_channels)
return data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Memmap with file handles and block count consistency

  • np.memmap expects a filename or file descriptor; passing a buffered handle can be platform-dependent. Using fid.name is safer; fall back to reading into memory for streams.
  • Validate that block_count matches header["num_packets"] and raise if not.
-def _load_data(fid, header):
+def _load_data(fid, header):
     """Use numpy's memmap to get array information."""
     header_size = header["header_size"]
     num_channels = header["num_channels"]
     num_samples = header["num_samples"]
     dtype = np.dtype(header["dtype"])
     packet_size = header_size + num_channels * num_samples * dtype.itemsize
-    # Memory map entire file as raw bytes. We do this to later skip the headers
-    fid.seek(0)
-    raw = np.memmap(fid, dtype=np.uint8, mode="r")
+    # Memory map entire file as raw bytes when possible (file on disk).
+    fid.seek(0)
+    try:
+        filename = fid.name  # may not exist for streams
+        raw = np.memmap(filename, dtype=np.uint8, mode="r")
+    except Exception:
+        # Fallback: read into memory
+        raw = np.frombuffer(fid.read(), dtype=np.uint8)
+        fid.seek(0)
     # Compute how many blocks
-    block_count = raw.size // packet_size
+    block_count, remainder = divmod(raw.size, packet_size)
+    if remainder:
+        raise ValueError(
+            f"Raw size {raw.size} not divisible by packet size {packet_size}; "
+            f"remainder {remainder}."
+        )
+    if "num_packets" in header and block_count != header["num_packets"]:
+        raise ValueError(
+            f"Header num_packets={header['num_packets']} does not match "
+            f"computed {block_count}."
+        )
     # Create a view that skips headers
     data = raw.reshape(block_count, packet_size)[:, header_size:]
     data = data.view(dtype).reshape(-1, num_channels)
     return data
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _load_data(fid, header):
"""Use numpy's memmap to get array information."""
header_size = header["header_size"]
num_channels = header["num_channels"]
num_samples = header["num_samples"]
dtype = np.dtype(header["dtype"])
packet_size = header_size + num_channels * num_samples * dtype.itemsize
# Memory map entire file as raw bytes. We do this to later skip the headers
fid.seek(0)
raw = np.memmap(fid, dtype=np.uint8, mode="r")
# Compute how many blocks
block_count = raw.size // packet_size
# Create a view that skips headers
data = raw.reshape(block_count, packet_size)[:, header_size:]
data = data.view(dtype).reshape(-1, num_channels)
return data
def _load_data(fid, header):
"""Use numpy's memmap to get array information."""
header_size = header["header_size"]
num_channels = header["num_channels"]
num_samples = header["num_samples"]
dtype = np.dtype(header["dtype"])
packet_size = header_size + num_channels * num_samples * dtype.itemsize
# Memory map entire file as raw bytes when possible (file on disk).
fid.seek(0)
try:
filename = fid.name # may not exist for streams
raw = np.memmap(filename, dtype=np.uint8, mode="r")
except Exception:
# Fallback: read into memory
raw = np.frombuffer(fid.read(), dtype=np.uint8)
fid.seek(0)
# Compute how many blocks
block_count, remainder = divmod(raw.size, packet_size)
if remainder:
raise ValueError(
f"Raw size {raw.size} not divisible by packet size {packet_size}; "
f"remainder {remainder}."
)
if "num_packets" in header and block_count != header["num_packets"]:
raise ValueError(
f"Header num_packets={header['num_packets']} does not match "
f"computed {block_count}."
)
# Create a view that skips headers
data = raw.reshape(block_count, packet_size)[:, header_size:]
data = data.view(dtype).reshape(-1, num_channels)
return data

Comment on lines +59 to 60
SintelaBinaryV3(): ("sintela_binary_v3_test_1.raw",),
GDR_V1(): ("gdr_1.h5",),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify plumbing: registry entry and entry point exist for the new reader

The new test case depends on:

  • data_registry entry for “sintela_binary_v3_test_1.raw”
  • the SINTELA_BINARY__V3 entry point wiring the reader
  • expected class attributes in core.py (name/version/extensions)

Run the script below to confirm all integration points are present.


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Check public export in __init__.py"
rg -nC1 -F '__all__ = ["SintelaBinaryV3"]' dascore/io/sintela_binary/__init__.py || {
  echo "Missing __all__ export for SintelaBinaryV3"; exit 1; }

echo "Check data registry contains the test asset"
rg -n '^sintela_binary_v3_test_1\.raw\b' dascore/data_registry.txt

echo "Check pyproject entry point wiring"
rg -nP 'SINTELA_BINARY__V3.*dascore\.io\.sintela_binary\.core:SintelaBinaryV3' pyproject.toml

echo "Check core class attributes"
rg -nC2 -P 'class\s+SintelaBinaryV3' dascore/io/sintela_binary/core.py
rg -nP 'name\s*=\s*"Sintela_Binary"' dascore/io/sintela_binary/core.py
rg -nP 'version\s*=\s*"3"' dascore/io/sintela_binary/core.py
rg -nP 'preferred_extensions\s*=\s*\("raw",\)' dascore/io/sintela_binary/core.py

echo "All good."

Length of output: 285


Add missing __all__ export for SintelaBinaryV3

The integration check failed immediately because SintelaBinaryV3 isn’t publicly exported. Without this, the reader won’t be discoverable, causing import errors in downstream code.

• File: dascore/io/sintela_binary/__init__.py
– Add or extend the module’s __all__ to include "SintelaBinaryV3"

 __all__ = [
-    # …other exports…
+    # …other exports…
  "SintelaBinaryV3",
 ]

Once that’s in place, please rerun the integration script to confirm the remaining points:

  • dascore/data_registry.txt contains sintela_binary_v3_test_1.raw
  • pyproject.toml has the SINTELA_BINARY__V3 = dascore.io.sintela_binary.core:SintelaBinaryV3 entry-point
  • core.py defines class SintelaBinaryV3 with the correct name, version, and preferred_extensions attributes
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SintelaBinaryV3(): ("sintela_binary_v3_test_1.raw",),
GDR_V1(): ("gdr_1.h5",),
# File: dascore/io/sintela_binary/__init__.py
__all__ = [
# …other exports…
"SintelaBinaryV3",
]
🤖 Prompt for AI Agents
In tests/test_io/test_common_io.py around lines 59-60, the test fails because
the SintelaBinaryV3 reader class isn't exported from
dascore/io/sintela_binary/__init__.py; open that __init__.py and add or extend
the module-level __all__ to include "SintelaBinaryV3" (or append it if __all__
already exists), save and rerun the integration check to confirm the class is
discoverable and the downstream checks (data_registry, pyproject entry point,
and core.py class attributes) pass.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dascore/utils/misc.py (1)

771-788: Fallback reads from current position, not the whole file — inconsistent with memmap semantics.

np.memmap(path) maps the entire file starting at offset 0. The fallback path currently reads from the current file position, which can silently yield truncated arrays if fid.tell() != 0. Reset to start, read all, then restore the position. Also consider catching OSError for mapping failures (eg, permissions).

Apply this diff:

 def maybe_mem_map(fid: IOBase, dtype="<u1") -> np.ndarray | np.memmap:
@@
-    try:
-        raw = np.memmap(fid.name, dtype=dtype, mode="r")
-    except (AttributeError, TypeError, ValueError):
-        # Fallback: read into memory
-        current = fid.tell()
-        raw = np.frombuffer(fid.read(), dtype=dtype)
-        fid.seek(current)
-    return raw
+    try:
+        return np.memmap(fid.name, dtype=dtype, mode="r")
+    except (AttributeError, TypeError, ValueError, OSError):
+        # Fallback: read the entire buffer and restore the original position.
+        cur = None
+        try:
+            cur = fid.tell()
+            fid.seek(0, os.SEEK_SET)
+        except Exception:
+            # Non-seekable streams: read whatever is available.
+            pass
+        out = np.frombuffer(fid.read(), dtype=dtype)
+        if cur is not None:
+            with contextlib.suppress(Exception):
+                fid.seek(cur, os.SEEK_SET)
+        return out
🧹 Nitpick comments (3)
dascore/utils/misc.py (1)

751-769: Clarify return type and harden size detection (fd-backed streams, SEEK constants).

  • Add a return annotation (int).
  • Use os.SEEK_* for readability/portability.
  • If fid.name is not a path-like (eg, some temp files expose an int fd), Path(path) will raise TypeError. Fall back to os.fstat(fid.fileno()).st_size.

Apply this diff:

-def get_buffer_size(fid: IOBase):
+def get_buffer_size(fid: IOBase) -> int:
@@
-    path = getattr(fid, "name", None)
-    if path is None:
-        cur = fid.tell()
-        fid.seek(0, 2)  # end
-        file_size = fid.tell()
-        fid.seek(cur, 0)
-    else:
-        file_size = Path(path).stat().st_size
-    return file_size
+    path = getattr(fid, "name", None)
+    if path is None:
+        cur = fid.tell()
+        fid.seek(0, os.SEEK_END)
+        file_size = fid.tell()
+        fid.seek(cur, os.SEEK_SET)
+        return file_size
+    # If `name` isn't a filesystem path (eg, int fd), fall back to fstat.
+    try:
+        return Path(path).stat().st_size
+    except TypeError:
+        return os.fstat(fid.fileno()).st_size
tests/test_utils/test_misc.py (2)

357-362: Good type assertion for file-backed memmap.

Asserts np.memmap is returned for real files. Consider also asserting array.size > 0 to guard against empty mappings.


363-370: Add a test for non-zero file pointer to catch partial reads in fallback.

Currently, if fid.tell() != 0, the fallback path can return a truncated array. Add a test that writes without seeking back to 0 before calling maybe_mem_map. This will safeguard the intended semantics once the fix lands.

Proposed test addition:

 class TestMaybeMemMap:
@@
     def test_bytes_io(self):
         """Ensure non-files return arrays."""
         bio = BytesIO()
         bio.write(b"1234")
         bio.seek(0)
         array = maybe_mem_map(bio)
         assert isinstance(array, np.ndarray)
+    
+    def test_bytes_io_nonzero_position(self):
+        """Fallback should read entire buffer even if pointer is not at 0."""
+        bio = BytesIO()
+        bio.write(b"abcdef")
+        # Intentionally do not seek back to 0; pointer is at end.
+        arr_end_pos = maybe_mem_map(bio)
+        # Now reset to start and compare lengths; both should see full content.
+        bio.seek(0)
+        arr_full = maybe_mem_map(bio)
+        assert arr_end_pos.size == arr_full.size == 6
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 04d9bbf and 00e8833.

📒 Files selected for processing (4)
  • dascore/io/sintela_binary/core.py (1 hunks)
  • dascore/io/sintela_binary/utils.py (1 hunks)
  • dascore/utils/misc.py (2 hunks)
  • tests/test_utils/test_misc.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • dascore/io/sintela_binary/utils.py
  • dascore/io/sintela_binary/core.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_utils/test_misc.py (2)
dascore/utils/misc.py (2)
  • get_buffer_size (751-768)
  • maybe_mem_map (771-787)
tests/conftest.py (1)
  • terra15_v5_path (207-209)
dascore/utils/misc.py (3)
tests/test_io/test_io_core.py (3)
  • read (46-47)
  • read (65-67)
  • read (89-92)
dascore/io/core.py (2)
  • read (469-478)
  • read (584-648)
dascore/io/pickle/core.py (1)
  • read (50-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
🔇 Additional comments (3)
dascore/utils/misc.py (1)

14-14: Import of IOBase is appropriate for public APIs working with file-like objects.

Good call surfacing IOBase for type annotations on the new helpers.

tests/test_utils/test_misc.py (2)

334-343: Solid coverage for file-backed size detection.

Validates parity with filesystem stat; this anchors behavior for regular files.


344-352: Nice BytesIO coverage and pointer preservation check.

Confirms size is stable regardless of position; aligns with the helper’s contract.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2515ea5 and 5c442db.

📒 Files selected for processing (1)
  • pyproject.toml (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: LintCode
pyproject.toml

[error] 127-127: TOML parse error in pyproject.toml: invalid basic string at line 127, column 28.


[error] 127-127: ruff: TOML parse error in pyproject.toml: invalid basic string at line 127, column 28.


[error] 127-127: TOML decode error in pyproject.toml (fix-future-annotations): Illegal character newline at line 127, column 28.

pyproject.toml Outdated
Comment on lines 127 to 128
SILIXA_H5__V1 = "dascore.io
SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Fix broken TOML string for SILIXA entry point (parsing failure).

The value for SILIXA_H5__V1 is split across lines, creating an unterminated string and causing the TOML parse errors reported by CI. Keep the entry-point target on a single line and point it to the correct module:object.

Apply this minimal fix (adjust the right-hand side if your actual module/object differ):

-SILIXA_H5__V1 = "dascore.io
-SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3"
+SILIXA_H5__V1 = "dascore.io.silixa_h5.core:SilixaH5V1"
+SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3"

Run the following to (a) confirm the correct Silixa target, (b) ensure the Sintela class exists, and (c) validate the TOML parses:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Searching for Silixa IO class candidates..."
rg -nP -C2 '(class\s+SilixaH5V1\b|class\s+SilixaH5\b|class\s+Silixa.*V1\b)' -- dascore

echo "Searching for SintelaBinaryV3 class..."
rg -nP -C2 'class\s+SintelaBinaryV3\b' -- dascore

echo "Checking for matching module paths..."
fd -a -t f 'silixa*' dascore/io || true
fd -a -t f 'sintela*' dascore/io || true

echo "Validating pyproject.toml parses..."
python - <<'PY'
import tomllib, sys
with open("pyproject.toml","rb") as f:
    tomllib.load(f)
print("pyproject.toml: TOML OK")
PY

Length of output: 2476


Fix broken TOML string and correct SILIXA entry-point path

The TOML parser is erroring on an unterminated string at line 127 because the SILIXA_H5__V1 value is split across lines. The entry-point target must be on a single line and match the actual module path (dascore.io.silixah5.core) and class name (SilixaH5V1) as defined in dascore/io/silixah5/core.py (line 26). The SINTELA_BINARY__V3 entry is already correct.

– File: pyproject.toml
– Lines: 127–128

-SILIXA_H5__V1 = "dascore.io
-SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3"
+SILIXA_H5__V1       = "dascore.io.silixah5.core:SilixaH5V1"
+SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3"

After applying, re-run the TOML validation to ensure the file parses cleanly:

python - <<'PY'
import tomllib
with open("pyproject.toml", "rb") as f:
    tomllib.load(f)
print("pyproject.toml: OK")
PY
🧰 Tools
🪛 GitHub Actions: LintCode

[error] 127-127: TOML parse error in pyproject.toml: invalid basic string at line 127, column 28.


[error] 127-127: ruff: TOML parse error in pyproject.toml: invalid basic string at line 127, column 28.


[error] 127-127: TOML decode error in pyproject.toml (fix-future-annotations): Illegal character newline at line 127, column 28.

🤖 Prompt for AI Agents
In pyproject.toml around lines 127 to 128, the TOML string for SILIXA_H5__V1 is
broken across lines and the entry-point path/class are incorrect; replace the
two-line value with a single-line correct entry pointing to
dascore.io.silixah5.core:SilixaH5V1 (e.g. SILIXA_H5__V1 =
"dascore.io.silixah5.core:SilixaH5V1"), leaving SINTELA_BINARY__V3 as-is, then
re-run the provided tomllib validation command to confirm the file parses
cleanly.

@d-chambers d-chambers merged commit 384a547 into master Aug 25, 2025
20 checks passed
@d-chambers d-chambers deleted the sintela_binary branch August 25, 2025 13:10
This was referenced Sep 26, 2025
@coderabbitai coderabbitai bot mentioned this pull request Oct 11, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IO Work for reading/writing different formats

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants