add Sintela binary v3 support #516

d-chambers · 2025-08-23T09:37:44Z

Description

This PR add support for Sintela's binary format, version 3.

Checklist

I have (if applicable):

referenced the GitHub issue this PR closes.
documented the new feature with docstrings or appropriate doc page.
included a test. See testing guidelines.
your name has been added to the contributors page (docs/contributors.md).
added the "ready_for_review" tag once the PR is ready to be reviewed.

Summary by CodeRabbit

New Features
- Support for reading Sintela binary v3 with coordinates, metadata (including gauge length), and correct data typing.
- Expanded valid data types to include "phase_difference" and "phase_rate".
- Utilities to determine buffer size and to attempt memory-mapping file-like objects.
Chores
- Registered the Sintela v3 reader in the plugin registry and added a sample Sintela v3 dataset to the data registry.
Tests
- Added common IO tests for Sintela v3 and tests for the new buffer/memmap utilities.

coderabbitai · 2025-08-23T09:37:53Z

Walkthrough

Adds a Sintela binary v3 reader (package export, core reader, and utils), registers it as a fiber_io entry point, updates VALID_DATA_TYPES with "phase_difference" and "phase_rate", extends the data registry with a test file, and adds file-buffer/memmap utilities plus tests.

Changes

Cohort / File(s)	Summary
Data type constants `dascore/constants.py`	Adds `"phase_difference"` and `"phase_rate"` to `VALID_DATA_TYPES` immediately after `"phase"`.
Sintela binary reader package & export `dascore/io/sintela_binary/__init__.py`	Adds package initializer and re-exports `SintelaBinaryV3`.
Sintela binary reader implementation `dascore/io/sintela_binary/core.py`	Adds `SintelaPatchAttrs` and `SintelaBinaryV3` (format detection, scan, read) leveraging reader utilities.
Sintela binary utilities `dascore/io/sintela_binary/utils.py`	New utilities for parsing v3 headers, validating, computing coords, loading/memory-mapping data, and constructing Patch objects.
Entry point registration `pyproject.toml`	Registers `SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3"` and adds `GDR_DAS__V1`; re-adds `SILIXA_H5__V1`.
Data registry `dascore/data_registry.txt`	Adds dataset entry `sintela_binary_v3_test_1.raw` with SHA256 hash and remote URL.
Misc utilities `dascore/utils/misc.py`	Adds `get_buffer_size(fid: IOBase)` and `maybe_mem_map(fid: IOBase, dtype="<u1")` for file-like buffer sizing and memmap fallback.
Tests — misc `tests/test_utils/test_misc.py`	Adds tests for `get_buffer_size` and `maybe_mem_map` covering real files and `BytesIO` scenarios.
Tests — IO `tests/test_io/test_common_io.py`	Imports `SintelaBinaryV3` and adds `SintelaBinaryV3(): ("sintela_binary_v3_test_1.raw",)` to `COMMON_IO_READ_TESTS`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant IO as SintelaBinaryV3
  participant U as sintela_binary.utils
  Note over User,IO: Format detection (v3)
  User->>IO: get_format(resource)
  IO->>U: _read_base_header(resource)
  U-->>IO: base_header
  IO->>U: _get_complete_header(resource, base_header)
  U-->>IO: complete_header (validated v3)
  IO-->>User: ("Sintela_Binary","3") or False

sequenceDiagram
  autonumber
  actor User
  participant IO as SintelaBinaryV3
  participant U as sintela_binary.utils
  participant DC as dascore
  Note over User,IO: Read flow (v3)
  User->>IO: read(resource, time?, distance?)
  IO->>U: _get_attrs_coords_header(resource, attr_cls)
  U-->>IO: (attrs, coord_mgr, header)
  IO->>U: _load_data(resource, header)
  U-->>IO: raw_data (memmap/ndarray)
  IO->>U: _get_patch(data, coords, attrs, slices?)
  U-->>IO: Patch
  IO->>DC: dc.spool(patch)
  DC-->>User: BaseSpool

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I twitch my ears at header bytes,
Hop through packets, parse their lights.
Gauge and phase in tidy rows,
I memmap, slice, and forward those.
New v3 trails — thump-thump, code goes! 🐇📦

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5c442db and ed5c3b4.

📒 Files selected for processing (1)

pyproject.toml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

pyproject.toml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: test_code (ubuntu-latest, 3.10)
GitHub Check: test_code (windows-latest, 3.12)
GitHub Check: test_code (windows-latest, 3.11)
GitHub Check: test_code (ubuntu-latest, 3.12)
GitHub Check: test_code (windows-latest, 3.10)
GitHub Check: test_code (macos-latest, 3.12)
GitHub Check: test_code (ubuntu-latest, 3.11)
GitHub Check: test_code (macos-latest, 3.11)
GitHub Check: test_code (macos-latest, 3.10)
GitHub Check: test_code_min_deps (macos-latest, 3.12)
GitHub Check: test_code_min_deps (windows-latest, 3.13)
GitHub Check: test_code_min_deps (windows-latest, 3.12)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
GitHub Check: test_code_min_deps (macos-latest, 3.13)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch sintela_binary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

codecov · 2025-08-23T09:40:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.87%. Comparing base (271bcc8) to head (ed5c3b4).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master     #516    +/-   ##
========================================
  Coverage   99.87%   99.87%            
========================================
  Files         118      121     +3     
  Lines        9735     9874   +139     
========================================
+ Hits         9723     9862   +139     
  Misses         12       12

Flag	Coverage Δ
unittests	`99.87% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

dascore/io/sintela_binary/core.py (1)

69-81: read(): wrong keyword name prevents using SintelaPatchAttrs

Passing attr_cls= won’t bind to _get_patch’s attr_class parameter, so reader will produce generic PatchAttrs.

-        patch = _get_patch(
-            resource, time=time, distance=distance, attr_cls=SintelaPatchAttrs
-        )
+        patch = _get_patch(
+            resource, time=time, distance=distance, attr_class=SintelaPatchAttrs
+        )

Also update the docstring:

-        """Read a single file with Silixa H5 data inside."""
+        """Read a single Sintela v3 binary file and return a one-patch spool."""

🧹 Nitpick comments (12)

dascore/data_registry.txt (1)

33-33: Checksum Verified – Asset Integrity Confirmed

The SHA256 of sintela_binary_v3_test_1.raw matches the declared 4c0afae1ab60b73ddcb3c4f7ac55b976d4d18d1ffc9ea6cd2dfa6a881b697101.

Consider adding a mirror URL alongside the primary GitHub link for improved availability.

It’s still recommended (as a CI enhancement) to verify the SHA256 locally on first download to guard against silent upstream changes.

File: dascore/data_registry.txt (line 33)

dascore/constants.py (1)

66-68: New data types align with Sintela v3 – ensure consistent casing across readers

Adding "phase_difference" and "phase_rate" is great. Please confirm all producers/consumers use lowercase values; in the new Sintela utils, "Strain" and "Strain Rate" appear with capitalization which will not match these constants.

Would you like me to generate a quick grep to find inconsistent data_type strings across the repo?
dascore/io/sintela_binary/utils.py (5)
102-106: Avoid shadowing imported name ‘array’

Local variable array shadows the imported function array from dascore.compat. Rename for clarity.
 def _read_base_header(fid):
     """Return the first 3 elements of the sintela header."""
-    array = np.fromfile(fid, dtype=base_header_dtypes, count=1)
-    out = {x: y for x, y in zip(array.dtype.names, array[0])}
+    buf = np.fromfile(fid, dtype=base_header_dtypes, count=1)
+    out = {x: y for x, y in zip(buf.dtype.names, buf[0])}
     return out
120-131: Prefer explicit exceptions over asserts for header validation

Asserts may be stripped under optimization and are not user-friendly. Raise a clear exception if header sizes don’t match.
-    expected_size = _HEADER_SIZES.get(version, -1)
-    assert header_size == expected_size
+    expected_size = _HEADER_SIZES.get(version, -1)
+    if header_size != expected_size:
+        raise ValueError(
+            f"Header size {header_size} does not match expected {expected_size} "
+            f"for version {version}."
+        )
141-149: Guard against invalid sample_rate and ensure timedelta conversion is supported

If sample_rate is 0/NaN or extremely small, this will raise/overflow. Also ensure dc.to_timedelta64(float) is supported.
-    timestep = dc.to_timedelta64(1 / header["sample_rate"])
+    sr = float(header["sample_rate"])
+    if not np.isfinite(sr) or sr <= 0:
+        raise ValueError(f"Invalid sample_rate: {sr}")
+    timestep = dc.to_timedelta64(1 / sr)
If to_timedelta64 does not support floats in your environment, I can switch to np.timedelta64(int(round(1e9/sr)), "ns") to avoid dependency on that helper.

152-166: Confirm channel indexing convention (off-by-one risk)

Distance start is start_channel * dx. If start_channel is 1-based in Sintela v3, this will shift distance by one channel (should be (start_channel - 1) * dx). Please confirm against the v3 spec or sample header.

I can adapt this function once we verify the convention.

203-211: Attr construction: let PatchAttrs derive dims from CoordManager

Minor: since you pass cm into attrs, you can drop explicit dims injection earlier; PatchAttrs can derive dims from coords. This reduces duplication and mismatch risk.

No code change required here if you apply the earlier change in _get_attr_dict to omit dims.
dascore/io/sintela_binary/core.py (3)
23-28: Correct class and module docstrings; fix copy/paste artifacts

"Patch Attributes for Silixa hdf5 format." and module references to Silixa/Terra15 are misleading here.
 class SintelaPatchAttrs(dc.PatchAttrs):
-    """Patch Attributes for Silixa hdf5 format."""
+    """Patch Attributes for Sintela binary format."""
I’ve included broader docstring fixes below for get_format/scan/read.

37-55: Update get_format docstring and parameter description

Docstrings still mention Silixa/Terra15.
     def get_format(self, resource: BinaryReader, **kwargs) -> tuple[str, str] | bool:
         """
-        Return name and version string if Silixa hdf5 else False.
+        Return (name, version) if this is a Sintela v3 binary file, else False.
 
         Parameters
         ----------
         resource
-            A path to the file which may contain terra15 data.
+            A path or binary stream to the candidate Sintela file.
         """
56-66: scan docstring mentions Silixa; also ensure resource offset reset

Minor docstring fix. The BinaryReader type-caster resets the offset, so this is fine functionally.
-    def scan(self, resource: BinaryReader, **kwargs) -> list[dc.PatchAttrs]:
-        """Scan a Silixa HDF5 file, return summary information on the contents."""
+    def scan(self, resource: BinaryReader, **kwargs) -> list[dc.PatchAttrs]:
+        """Scan a Sintela v3 binary file and return summary attributes."""
dascore/io/sintela_binary/__init__.py (2)
4-4: Explicitly export re-exported symbol to satisfy Ruff F401 and make API intent clear

SintelaBinaryV3 is re-exported for public use, but Ruff flags it as "imported but unused". Declare __all__ so the intent is explicit and the lint warning is silenced.

Apply this diff:
 """
 Sintela binary reader.
 """
 from .core import SintelaBinaryV3
+
+__all__ = ["SintelaBinaryV3"]
1-3: Optional: clarify the docstring to mention the supported version

Minor clarity win since this package may eventually host multiple versions.
-"""
-Sintela binary reader.
-"""
+"""
+Sintela binary reader (version 3).
+"""

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 72c7878 and 04d9bbf.

📒 Files selected for processing (7)

dascore/constants.py (1 hunks)
dascore/data_registry.txt (1 hunks)
dascore/io/sintela_binary/__init__.py (1 hunks)
dascore/io/sintela_binary/core.py (1 hunks)
dascore/io/sintela_binary/utils.py (1 hunks)
pyproject.toml (1 hunks)
tests/test_io/test_common_io.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

dascore/io/sintela_binary/__init__.py (1)

dascore/io/sintela_binary/core.py (1)

SintelaBinaryV3 (30-80)

tests/test_io/test_common_io.py (1)

dascore/io/sintela_binary/core.py (1)

SintelaBinaryV3 (30-80)

dascore/io/sintela_binary/core.py (4)

dascore/io/core.py (1)

FiberIO (443-581)

dascore/utils/io.py (1)

BinaryReader (33-51)

dascore/io/sintela_binary/utils.py (3)

_get_attrs_coords_header (203-210)

_get_patch (213-224)

_read_base_header (102-106)

dascore/core/attrs.py (2)

PatchAttrs (65-287)

get (174-179)

dascore/io/sintela_binary/utils.py (4)

dascore/compat.py (1)

array (30-36)

dascore/core/attrs.py (2)

PatchAttrs (65-287)

get (174-179)

dascore/core/coordmanager.py (2)

get_coord_manager (1036-1124)

step (1019-1021)

dascore/utils/time.py (1)

to_timedelta64 (141-173)

🪛 Ruff (0.12.2)

dascore/io/sintela_binary/__init__.py

4-4: .core.SintelaBinaryV3 imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: test_code (windows-latest, 3.12)
GitHub Check: test_code (windows-latest, 3.10)
GitHub Check: test_code (macos-latest, 3.12)
GitHub Check: test_code (macos-latest, 3.11)
GitHub Check: test_code (windows-latest, 3.11)
GitHub Check: test_code (ubuntu-latest, 3.10)
GitHub Check: test_code (ubuntu-latest, 3.12)
GitHub Check: test_code (macos-latest, 3.10)
GitHub Check: test_code (ubuntu-latest, 3.11)
GitHub Check: test_code_min_deps (windows-latest, 3.12)
GitHub Check: test_code_min_deps (macos-latest, 3.13)
GitHub Check: test_code_min_deps (windows-latest, 3.13)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
GitHub Check: test_code_min_deps (macos-latest, 3.12)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)

🔇 Additional comments (2)

pyproject.toml (1)

123-123: Entry point looks correct; ensure public re-export path exists

SINTELA_BINARY__V3 mapping is good. Please confirm dascore/io/sintela_binary/__init__.py re-exports SintelaBinaryV3 so from dascore.io.sintela_binary import SintelaBinaryV3 works (tests rely on this pattern).

tests/test_io/test_common_io.py (1)

39-39: LGTM: importing from the package namespace keeps tests aligned with the public API

This import ensures tests use the stable, documented path rather than reaching into module internals. No changes needed.

dascore/io/sintela_binary/utils.py

coderabbitai · 2025-08-23T09:49:50Z

dascore/io/sintela_binary/utils.py

+def _load_data(fid, header):
+    """Use numpy's memmap to get array information."""
+    header_size = header["header_size"]
+    num_channels = header["num_channels"]
+    num_samples = header["num_samples"]
+    dtype = np.dtype(header["dtype"])
+    packet_size = header_size + num_channels * num_samples * dtype.itemsize
+    # Memory map entire file as raw bytes. We do this to later skip the headers
+    fid.seek(0)
+    raw = np.memmap(fid, dtype=np.uint8, mode="r")
+    # Compute how many blocks
+    block_count = raw.size // packet_size
+    # Create a view that skips headers
+    data = raw.reshape(block_count, packet_size)[:, header_size:]
+    data = data.view(dtype).reshape(-1, num_channels)
+    return data
+


🛠️ Refactor suggestion

Memmap with file handles and block count consistency

np.memmap expects a filename or file descriptor; passing a buffered handle can be platform-dependent. Using fid.name is safer; fall back to reading into memory for streams.

Validate that block_count matches header["num_packets"] and raise if not.

-def _load_data(fid, header): +def _load_data(fid, header): """Use numpy's memmap to get array information.""" header_size = header["header_size"] num_channels = header["num_channels"] num_samples = header["num_samples"] dtype = np.dtype(header["dtype"]) packet_size = header_size + num_channels * num_samples * dtype.itemsize - # Memory map entire file as raw bytes. We do this to later skip the headers - fid.seek(0) - raw = np.memmap(fid, dtype=np.uint8, mode="r") + # Memory map entire file as raw bytes when possible (file on disk). + fid.seek(0) + try: + filename = fid.name # may not exist for streams + raw = np.memmap(filename, dtype=np.uint8, mode="r") + except Exception: + # Fallback: read into memory + raw = np.frombuffer(fid.read(), dtype=np.uint8) + fid.seek(0) # Compute how many blocks - block_count = raw.size // packet_size + block_count, remainder = divmod(raw.size, packet_size) + if remainder: + raise ValueError( + f"Raw size {raw.size} not divisible by packet size {packet_size}; " + f"remainder {remainder}." + ) + if "num_packets" in header and block_count != header["num_packets"]: + raise ValueError( + f"Header num_packets={header['num_packets']} does not match " + f"computed {block_count}." + ) # Create a view that skips headers data = raw.reshape(block_count, packet_size)[:, header_size:] data = data.view(dtype).reshape(-1, num_channels) return data

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _load_data(fid, header):

"""Use numpy's memmap to get array information."""

header_size = header["header_size"]

num_channels = header["num_channels"]

num_samples = header["num_samples"]

dtype = np.dtype(header["dtype"])

packet_size = header_size + num_channels * num_samples * dtype.itemsize

# Memory map entire file as raw bytes. We do this to later skip the headers

fid.seek(0)

raw = np.memmap(fid, dtype=np.uint8, mode="r")

# Compute how many blocks

block_count = raw.size // packet_size

# Create a view that skips headers

data = raw.reshape(block_count, packet_size)[:, header_size:]

data = data.view(dtype).reshape(-1, num_channels)

return data

def _load_data(fid, header):

"""Use numpy's memmap to get array information."""

header_size = header["header_size"]

num_channels = header["num_channels"]

num_samples = header["num_samples"]

dtype = np.dtype(header["dtype"])

packet_size = header_size + num_channels * num_samples * dtype.itemsize

# Memory map entire file as raw bytes when possible (file on disk).

fid.seek(0)

try:

filename = fid.name # may not exist for streams

raw = np.memmap(filename, dtype=np.uint8, mode="r")

except Exception:

# Fallback: read into memory

raw = np.frombuffer(fid.read(), dtype=np.uint8)

fid.seek(0)

# Compute how many blocks

block_count, remainder = divmod(raw.size, packet_size)

if remainder:

raise ValueError(

f"Raw size {raw.size} not divisible by packet size {packet_size}; "

f"remainder {remainder}."

)

if "num_packets" in header and block_count != header["num_packets"]:

raise ValueError(

f"Header num_packets={header['num_packets']} does not match "

f"computed {block_count}."

)

# Create a view that skips headers

data = raw.reshape(block_count, packet_size)[:, header_size:]

data = data.view(dtype).reshape(-1, num_channels)

return data

dascore/io/sintela_binary/utils.py

coderabbitai · 2025-08-23T09:49:51Z

tests/test_io/test_common_io.py

+    SintelaBinaryV3(): ("sintela_binary_v3_test_1.raw",),
    GDR_V1(): ("gdr_1.h5",),


💡 Verification agent

🧩 Analysis chain

Verify plumbing: registry entry and entry point exist for the new reader

The new test case depends on:

data_registry entry for “sintela_binary_v3_test_1.raw”

the SINTELA_BINARY__V3 entry point wiring the reader

expected class attributes in core.py (name/version/extensions)

Run the script below to confirm all integration points are present.

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Check public export in __init__.py" rg -nC1 -F '__all__ = ["SintelaBinaryV3"]' dascore/io/sintela_binary/__init__.py || { echo "Missing __all__ export for SintelaBinaryV3"; exit 1; } echo "Check data registry contains the test asset" rg -n '^sintela_binary_v3_test_1\.raw\b' dascore/data_registry.txt echo "Check pyproject entry point wiring" rg -nP 'SINTELA_BINARY__V3.*dascore\.io\.sintela_binary\.core:SintelaBinaryV3' pyproject.toml echo "Check core class attributes" rg -nC2 -P 'class\s+SintelaBinaryV3' dascore/io/sintela_binary/core.py rg -nP 'name\s*=\s*"Sintela_Binary"' dascore/io/sintela_binary/core.py rg -nP 'version\s*=\s*"3"' dascore/io/sintela_binary/core.py rg -nP 'preferred_extensions\s*=\s*$"raw",$' dascore/io/sintela_binary/core.py echo "All good."

Length of output: 285

Add missing __all__ export for SintelaBinaryV3

The integration check failed immediately because SintelaBinaryV3 isn’t publicly exported. Without this, the reader won’t be discoverable, causing import errors in downstream code.

• File: dascore/io/sintela_binary/__init__.py
– Add or extend the module’s __all__ to include "SintelaBinaryV3"

__all__ = [ - # …other exports… + # …other exports… "SintelaBinaryV3", ]

Once that’s in place, please rerun the integration script to confirm the remaining points:

dascore/data_registry.txt contains sintela_binary_v3_test_1.raw

pyproject.toml has the SINTELA_BINARY__V3 = dascore.io.sintela_binary.core:SintelaBinaryV3 entry-point

core.py defines class SintelaBinaryV3 with the correct name, version, and preferred_extensions attributes

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

SintelaBinaryV3(): ("sintela_binary_v3_test_1.raw",),

GDR_V1(): ("gdr_1.h5",),

# File: dascore/io/sintela_binary/__init__.py

__all__ = [

# …other exports…

"SintelaBinaryV3",

]

🤖 Prompt for AI Agents

In tests/test_io/test_common_io.py around lines 59-60, the test fails because the SintelaBinaryV3 reader class isn't exported from dascore/io/sintela_binary/__init__.py; open that __init__.py and add or extend the module-level __all__ to include "SintelaBinaryV3" (or append it if __all__ already exists), save and rerun the integration check to confirm the class is discoverable and the downstream checks (data_registry, pyproject entry point, and core.py class attributes) pass.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

dascore/utils/misc.py (1)

771-788: Fallback reads from current position, not the whole file — inconsistent with memmap semantics.

np.memmap(path) maps the entire file starting at offset 0. The fallback path currently reads from the current file position, which can silently yield truncated arrays if fid.tell() != 0. Reset to start, read all, then restore the position. Also consider catching OSError for mapping failures (eg, permissions).

Apply this diff:

 def maybe_mem_map(fid: IOBase, dtype="<u1") -> np.ndarray | np.memmap:
@@
-    try:
-        raw = np.memmap(fid.name, dtype=dtype, mode="r")
-    except (AttributeError, TypeError, ValueError):
-        # Fallback: read into memory
-        current = fid.tell()
-        raw = np.frombuffer(fid.read(), dtype=dtype)
-        fid.seek(current)
-    return raw
+    try:
+        return np.memmap(fid.name, dtype=dtype, mode="r")
+    except (AttributeError, TypeError, ValueError, OSError):
+        # Fallback: read the entire buffer and restore the original position.
+        cur = None
+        try:
+            cur = fid.tell()
+            fid.seek(0, os.SEEK_SET)
+        except Exception:
+            # Non-seekable streams: read whatever is available.
+            pass
+        out = np.frombuffer(fid.read(), dtype=dtype)
+        if cur is not None:
+            with contextlib.suppress(Exception):
+                fid.seek(cur, os.SEEK_SET)
+        return out

🧹 Nitpick comments (3)

dascore/utils/misc.py (1)

751-769: Clarify return type and harden size detection (fd-backed streams, SEEK constants).

Add a return annotation (int).
Use os.SEEK_* for readability/portability.
If fid.name is not a path-like (eg, some temp files expose an int fd), Path(path) will raise TypeError. Fall back to os.fstat(fid.fileno()).st_size.

Apply this diff:

-def get_buffer_size(fid: IOBase):
+def get_buffer_size(fid: IOBase) -> int:
@@
-    path = getattr(fid, "name", None)
-    if path is None:
-        cur = fid.tell()
-        fid.seek(0, 2)  # end
-        file_size = fid.tell()
-        fid.seek(cur, 0)
-    else:
-        file_size = Path(path).stat().st_size
-    return file_size
+    path = getattr(fid, "name", None)
+    if path is None:
+        cur = fid.tell()
+        fid.seek(0, os.SEEK_END)
+        file_size = fid.tell()
+        fid.seek(cur, os.SEEK_SET)
+        return file_size
+    # If `name` isn't a filesystem path (eg, int fd), fall back to fstat.
+    try:
+        return Path(path).stat().st_size
+    except TypeError:
+        return os.fstat(fid.fileno()).st_size

tests/test_utils/test_misc.py (2)

357-362: Good type assertion for file-backed memmap.

Asserts np.memmap is returned for real files. Consider also asserting array.size > 0 to guard against empty mappings.

363-370: Add a test for non-zero file pointer to catch partial reads in fallback.

Currently, if fid.tell() != 0, the fallback path can return a truncated array. Add a test that writes without seeking back to 0 before calling maybe_mem_map. This will safeguard the intended semantics once the fix lands.

Proposed test addition:
 class TestMaybeMemMap:
@@
     def test_bytes_io(self):
         """Ensure non-files return arrays."""
         bio = BytesIO()
         bio.write(b"1234")
         bio.seek(0)
         array = maybe_mem_map(bio)
         assert isinstance(array, np.ndarray)
+    
+    def test_bytes_io_nonzero_position(self):
+        """Fallback should read entire buffer even if pointer is not at 0."""
+        bio = BytesIO()
+        bio.write(b"abcdef")
+        # Intentionally do not seek back to 0; pointer is at end.
+        arr_end_pos = maybe_mem_map(bio)
+        # Now reset to start and compare lengths; both should see full content.
+        bio.seek(0)
+        arr_full = maybe_mem_map(bio)
+        assert arr_end_pos.size == arr_full.size == 6

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 04d9bbf and 00e8833.

📒 Files selected for processing (4)

dascore/io/sintela_binary/core.py (1 hunks)
dascore/io/sintela_binary/utils.py (1 hunks)
dascore/utils/misc.py (2 hunks)
tests/test_utils/test_misc.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

dascore/io/sintela_binary/utils.py
dascore/io/sintela_binary/core.py

🧰 Additional context used

🧬 Code graph analysis (2)

tests/test_utils/test_misc.py (2)

dascore/utils/misc.py (2)

get_buffer_size (751-768)

maybe_mem_map (771-787)

tests/conftest.py (1)

terra15_v5_path (207-209)

dascore/utils/misc.py (3)

tests/test_io/test_io_core.py (3)

read (46-47)

read (65-67)

read (89-92)

dascore/io/core.py (2)

read (469-478)

read (584-648)

dascore/io/pickle/core.py (1)

read (50-53)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: test_code_min_deps (macos-latest, 3.12)
GitHub Check: test_code_min_deps (macos-latest, 3.13)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
GitHub Check: test_code_min_deps (windows-latest, 3.13)
GitHub Check: test_code_min_deps (windows-latest, 3.12)
GitHub Check: test_code (windows-latest, 3.11)
GitHub Check: test_code (windows-latest, 3.10)
GitHub Check: test_code (windows-latest, 3.12)
GitHub Check: test_code (macos-latest, 3.10)
GitHub Check: test_code (macos-latest, 3.11)
GitHub Check: test_code (macos-latest, 3.12)
GitHub Check: test_code (ubuntu-latest, 3.12)
GitHub Check: test_code (ubuntu-latest, 3.10)
GitHub Check: test_code (ubuntu-latest, 3.11)

🔇 Additional comments (3)

dascore/utils/misc.py (1)

14-14: Import of IOBase is appropriate for public APIs working with file-like objects.

Good call surfacing IOBase for type annotations on the new helpers.

tests/test_utils/test_misc.py (2)

334-343: Solid coverage for file-backed size detection.

Validates parity with filesystem stat; this anchors behavior for regular files.

344-352: Nice BytesIO coverage and pointer preservation check.

Confirms size is stable regardless of position; aligns with the helper’s contract.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2515ea5 and 5c442db.

📒 Files selected for processing (1)

pyproject.toml (1 hunks)

🧰 Additional context used

🪛 GitHub Actions: LintCode

pyproject.toml

[error] 127-127: TOML parse error in pyproject.toml: invalid basic string at line 127, column 28.

[error] 127-127: ruff: TOML parse error in pyproject.toml: invalid basic string at line 127, column 28.

[error] 127-127: TOML decode error in pyproject.toml (fix-future-annotations): Illegal character newline at line 127, column 28.

coderabbitai · 2025-08-25T11:45:17Z

pyproject.toml

+SILIXA_H5__V1 = "dascore.io
+SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3"


💡 Verification agent

🧩 Analysis chain

Fix broken TOML string for SILIXA entry point (parsing failure).

The value for SILIXA_H5__V1 is split across lines, creating an unterminated string and causing the TOML parse errors reported by CI. Keep the entry-point target on a single line and point it to the correct module:object.

Apply this minimal fix (adjust the right-hand side if your actual module/object differ):

-SILIXA_H5__V1 = "dascore.io -SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3" +SILIXA_H5__V1 = "dascore.io.silixa_h5.core:SilixaH5V1" +SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3"

Run the following to (a) confirm the correct Silixa target, (b) ensure the Sintela class exists, and (c) validate the TOML parses:

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Searching for Silixa IO class candidates..." rg -nP -C2 '(class\s+SilixaH5V1\b|class\s+SilixaH5\b|class\s+Silixa.*V1\b)' -- dascore echo "Searching for SintelaBinaryV3 class..." rg -nP -C2 'class\s+SintelaBinaryV3\b' -- dascore echo "Checking for matching module paths..." fd -a -t f 'silixa*' dascore/io || true fd -a -t f 'sintela*' dascore/io || true echo "Validating pyproject.toml parses..." python - <<'PY' import tomllib, sys with open("pyproject.toml","rb") as f: tomllib.load(f) print("pyproject.toml: TOML OK") PY

Length of output: 2476

Fix broken TOML string and correct SILIXA entry-point path

The TOML parser is erroring on an unterminated string at line 127 because the SILIXA_H5__V1 value is split across lines. The entry-point target must be on a single line and match the actual module path (dascore.io.silixah5.core) and class name (SilixaH5V1) as defined in dascore/io/silixah5/core.py (line 26). The SINTELA_BINARY__V3 entry is already correct.

– File: pyproject.toml
– Lines: 127–128

-SILIXA_H5__V1 = "dascore.io -SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3" +SILIXA_H5__V1 = "dascore.io.silixah5.core:SilixaH5V1" +SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3"

After applying, re-run the TOML validation to ensure the file parses cleanly:

python - <<'PY' import tomllib with open("pyproject.toml", "rb") as f: tomllib.load(f) print("pyproject.toml: OK") PY

🧰 Tools

🪛 GitHub Actions: LintCode

[error] 127-127: TOML parse error in pyproject.toml: invalid basic string at line 127, column 28.

[error] 127-127: ruff: TOML parse error in pyproject.toml: invalid basic string at line 127, column 28.

[error] 127-127: TOML decode error in pyproject.toml (fix-future-annotations): Illegal character newline at line 127, column 28.

🤖 Prompt for AI Agents

In pyproject.toml around lines 127 to 128, the TOML string for SILIXA_H5__V1 is broken across lines and the entry-point path/class are incorrect; replace the two-line value with a single-line correct entry pointing to dascore.io.silixah5.core:SilixaH5V1 (e.g. SILIXA_H5__V1 = "dascore.io.silixah5.core:SilixaH5V1"), leaving SINTELA_BINARY__V3 as-is, then re-run the provided tomllib validation command to confirm the file parses cleanly.

add sintela binary v3 support

04d9bbf

d-chambers added the IO Work for reading/writing different formats label Aug 23, 2025

coderabbitai bot reviewed Aug 23, 2025

View reviewed changes

address review

00e8833

coderabbitai bot reviewed Aug 23, 2025

View reviewed changes

d-chambers and others added 2 commits August 23, 2025 13:17

fix util functions

2515ea5

Merge branch 'master' into sintela_binary

5c442db

coderabbitai bot reviewed Aug 25, 2025

View reviewed changes

fix toml

ed5c3b4

d-chambers merged commit 384a547 into master Aug 25, 2025
20 checks passed

d-chambers deleted the sintela_binary branch August 25, 2025 13:10

This was referenced Sep 26, 2025

codspeed for performance benchmarks #552

Merged

Hampel speed #557

Merged

coderabbitai bot mentioned this pull request Oct 11, 2025

Prodml fbe #571

Merged

4 tasks

		SintelaBinaryV3(): ("sintela_binary_v3_test_1.raw",),
		GDR_V1(): ("gdr_1.h5",),

-    SintelaBinaryV3(): ("sintela_binary_v3_test_1.raw",),
-    GDR_V1(): ("gdr_1.h5",),
+# File: dascore/io/sintela_binary/__init__.py
+__all__ = [
+    # …other exports…
+    "SintelaBinaryV3",
+]

		SILIXA_H5__V1 = "dascore.io
		SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3"

add Sintela binary v3 support #516

add Sintela binary v3 support #516

Uh oh!

Conversation

d-chambers commented Aug 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

codecov bot commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

d-chambers commented Aug 23, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 23, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Aug 23, 2025 •

edited

Loading