-
Notifications
You must be signed in to change notification settings - Fork 28
add Sintela binary v3 support #516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a Sintela binary v3 reader (package export, core reader, and utils), registers it as a fiber_io entry point, updates VALID_DATA_TYPES with "phase_difference" and "phase_rate", extends the data registry with a test file, and adds file-buffer/memmap utilities plus tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant IO as SintelaBinaryV3
participant U as sintela_binary.utils
Note over User,IO: Format detection (v3)
User->>IO: get_format(resource)
IO->>U: _read_base_header(resource)
U-->>IO: base_header
IO->>U: _get_complete_header(resource, base_header)
U-->>IO: complete_header (validated v3)
IO-->>User: ("Sintela_Binary","3") or False
sequenceDiagram
autonumber
actor User
participant IO as SintelaBinaryV3
participant U as sintela_binary.utils
participant DC as dascore
Note over User,IO: Read flow (v3)
User->>IO: read(resource, time?, distance?)
IO->>U: _get_attrs_coords_header(resource, attr_cls)
U-->>IO: (attrs, coord_mgr, header)
IO->>U: _load_data(resource, header)
U-->>IO: raw_data (memmap/ndarray)
IO->>U: _get_patch(data, coords, attrs, slices?)
U-->>IO: Patch
IO->>DC: dc.spool(patch)
DC-->>User: BaseSpool
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #516 +/- ##
========================================
Coverage 99.87% 99.87%
========================================
Files 118 121 +3
Lines 9735 9874 +139
========================================
+ Hits 9723 9862 +139
Misses 12 12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dascore/io/sintela_binary/core.py (1)
69-81: read(): wrong keyword name prevents using SintelaPatchAttrsPassing
attr_cls=won’t bind to_get_patch’sattr_classparameter, so reader will produce generic PatchAttrs.- patch = _get_patch( - resource, time=time, distance=distance, attr_cls=SintelaPatchAttrs - ) + patch = _get_patch( + resource, time=time, distance=distance, attr_class=SintelaPatchAttrs + )Also update the docstring:
- """Read a single file with Silixa H5 data inside.""" + """Read a single Sintela v3 binary file and return a one-patch spool."""
🧹 Nitpick comments (12)
dascore/data_registry.txt (1)
33-33: Checksum Verified – Asset Integrity Confirmed
- The SHA256 of
sintela_binary_v3_test_1.rawmatches the declared4c0afae1ab60b73ddcb3c4f7ac55b976d4d18d1ffc9ea6cd2dfa6a881b697101.- Consider adding a mirror URL alongside the primary GitHub link for improved availability.
- It’s still recommended (as a CI enhancement) to verify the SHA256 locally on first download to guard against silent upstream changes.
File: dascore/data_registry.txt (line 33)
dascore/constants.py (1)
66-68: New data types align with Sintela v3 – ensure consistent casing across readersAdding "phase_difference" and "phase_rate" is great. Please confirm all producers/consumers use lowercase values; in the new Sintela utils, "Strain" and "Strain Rate" appear with capitalization which will not match these constants.
Would you like me to generate a quick grep to find inconsistent data_type strings across the repo?
dascore/io/sintela_binary/utils.py (5)
102-106: Avoid shadowing imported name ‘array’Local variable
arrayshadows the imported functionarrayfrom dascore.compat. Rename for clarity.def _read_base_header(fid): """Return the first 3 elements of the sintela header.""" - array = np.fromfile(fid, dtype=base_header_dtypes, count=1) - out = {x: y for x, y in zip(array.dtype.names, array[0])} + buf = np.fromfile(fid, dtype=base_header_dtypes, count=1) + out = {x: y for x, y in zip(buf.dtype.names, buf[0])} return out
120-131: Prefer explicit exceptions over asserts for header validationAsserts may be stripped under optimization and are not user-friendly. Raise a clear exception if header sizes don’t match.
- expected_size = _HEADER_SIZES.get(version, -1) - assert header_size == expected_size + expected_size = _HEADER_SIZES.get(version, -1) + if header_size != expected_size: + raise ValueError( + f"Header size {header_size} does not match expected {expected_size} " + f"for version {version}." + )
141-149: Guard against invalid sample_rate and ensure timedelta conversion is supportedIf
sample_rateis 0/NaN or extremely small, this will raise/overflow. Also ensuredc.to_timedelta64(float)is supported.- timestep = dc.to_timedelta64(1 / header["sample_rate"]) + sr = float(header["sample_rate"]) + if not np.isfinite(sr) or sr <= 0: + raise ValueError(f"Invalid sample_rate: {sr}") + timestep = dc.to_timedelta64(1 / sr)If
to_timedelta64does not support floats in your environment, I can switch tonp.timedelta64(int(round(1e9/sr)), "ns")to avoid dependency on that helper.
152-166: Confirm channel indexing convention (off-by-one risk)Distance start is
start_channel * dx. Ifstart_channelis 1-based in Sintela v3, this will shift distance by one channel (should be(start_channel - 1) * dx). Please confirm against the v3 spec or sample header.I can adapt this function once we verify the convention.
203-211: Attr construction: let PatchAttrs derive dims from CoordManagerMinor: since you pass
cminto attrs, you can drop explicit dims injection earlier; PatchAttrs can derive dims from coords. This reduces duplication and mismatch risk.No code change required here if you apply the earlier change in _get_attr_dict to omit dims.
dascore/io/sintela_binary/core.py (3)
23-28: Correct class and module docstrings; fix copy/paste artifacts
- "Patch Attributes for Silixa hdf5 format." and module references to Silixa/Terra15 are misleading here.
class SintelaPatchAttrs(dc.PatchAttrs): - """Patch Attributes for Silixa hdf5 format.""" + """Patch Attributes for Sintela binary format."""I’ve included broader docstring fixes below for get_format/scan/read.
37-55: Update get_format docstring and parameter descriptionDocstrings still mention Silixa/Terra15.
def get_format(self, resource: BinaryReader, **kwargs) -> tuple[str, str] | bool: """ - Return name and version string if Silixa hdf5 else False. + Return (name, version) if this is a Sintela v3 binary file, else False. Parameters ---------- resource - A path to the file which may contain terra15 data. + A path or binary stream to the candidate Sintela file. """
56-66: scan docstring mentions Silixa; also ensure resource offset resetMinor docstring fix. The BinaryReader type-caster resets the offset, so this is fine functionally.
- def scan(self, resource: BinaryReader, **kwargs) -> list[dc.PatchAttrs]: - """Scan a Silixa HDF5 file, return summary information on the contents.""" + def scan(self, resource: BinaryReader, **kwargs) -> list[dc.PatchAttrs]: + """Scan a Sintela v3 binary file and return summary attributes."""dascore/io/sintela_binary/__init__.py (2)
4-4: Explicitly export re-exported symbol to satisfy Ruff F401 and make API intent clear
SintelaBinaryV3is re-exported for public use, but Ruff flags it as "imported but unused". Declare__all__so the intent is explicit and the lint warning is silenced.Apply this diff:
""" Sintela binary reader. """ from .core import SintelaBinaryV3 + +__all__ = ["SintelaBinaryV3"]
1-3: Optional: clarify the docstring to mention the supported versionMinor clarity win since this package may eventually host multiple versions.
-""" -Sintela binary reader. -""" +""" +Sintela binary reader (version 3). +"""
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
dascore/constants.py(1 hunks)dascore/data_registry.txt(1 hunks)dascore/io/sintela_binary/__init__.py(1 hunks)dascore/io/sintela_binary/core.py(1 hunks)dascore/io/sintela_binary/utils.py(1 hunks)pyproject.toml(1 hunks)tests/test_io/test_common_io.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
dascore/io/sintela_binary/__init__.py (1)
dascore/io/sintela_binary/core.py (1)
SintelaBinaryV3(30-80)
tests/test_io/test_common_io.py (1)
dascore/io/sintela_binary/core.py (1)
SintelaBinaryV3(30-80)
dascore/io/sintela_binary/core.py (4)
dascore/io/core.py (1)
FiberIO(443-581)dascore/utils/io.py (1)
BinaryReader(33-51)dascore/io/sintela_binary/utils.py (3)
_get_attrs_coords_header(203-210)_get_patch(213-224)_read_base_header(102-106)dascore/core/attrs.py (2)
PatchAttrs(65-287)get(174-179)
dascore/io/sintela_binary/utils.py (4)
dascore/compat.py (1)
array(30-36)dascore/core/attrs.py (2)
PatchAttrs(65-287)get(174-179)dascore/core/coordmanager.py (2)
get_coord_manager(1036-1124)step(1019-1021)dascore/utils/time.py (1)
to_timedelta64(141-173)
🪛 Ruff (0.12.2)
dascore/io/sintela_binary/__init__.py
4-4: .core.SintelaBinaryV3 imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: test_code (windows-latest, 3.12)
- GitHub Check: test_code (windows-latest, 3.10)
- GitHub Check: test_code (macos-latest, 3.12)
- GitHub Check: test_code (macos-latest, 3.11)
- GitHub Check: test_code (windows-latest, 3.11)
- GitHub Check: test_code (ubuntu-latest, 3.10)
- GitHub Check: test_code (ubuntu-latest, 3.12)
- GitHub Check: test_code (macos-latest, 3.10)
- GitHub Check: test_code (ubuntu-latest, 3.11)
- GitHub Check: test_code_min_deps (windows-latest, 3.12)
- GitHub Check: test_code_min_deps (macos-latest, 3.13)
- GitHub Check: test_code_min_deps (windows-latest, 3.13)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
- GitHub Check: test_code_min_deps (macos-latest, 3.12)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
🔇 Additional comments (2)
pyproject.toml (1)
123-123: Entry point looks correct; ensure public re-export path existsSINTELA_BINARY__V3 mapping is good. Please confirm
dascore/io/sintela_binary/__init__.pyre-exportsSintelaBinaryV3sofrom dascore.io.sintela_binary import SintelaBinaryV3works (tests rely on this pattern).tests/test_io/test_common_io.py (1)
39-39: LGTM: importing from the package namespace keeps tests aligned with the public APIThis import ensures tests use the stable, documented path rather than reaching into module internals. No changes needed.
| def _load_data(fid, header): | ||
| """Use numpy's memmap to get array information.""" | ||
| header_size = header["header_size"] | ||
| num_channels = header["num_channels"] | ||
| num_samples = header["num_samples"] | ||
| dtype = np.dtype(header["dtype"]) | ||
| packet_size = header_size + num_channels * num_samples * dtype.itemsize | ||
| # Memory map entire file as raw bytes. We do this to later skip the headers | ||
| fid.seek(0) | ||
| raw = np.memmap(fid, dtype=np.uint8, mode="r") | ||
| # Compute how many blocks | ||
| block_count = raw.size // packet_size | ||
| # Create a view that skips headers | ||
| data = raw.reshape(block_count, packet_size)[:, header_size:] | ||
| data = data.view(dtype).reshape(-1, num_channels) | ||
| return data | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Memmap with file handles and block count consistency
np.memmapexpects a filename or file descriptor; passing a buffered handle can be platform-dependent. Usingfid.nameis safer; fall back to reading into memory for streams.- Validate that
block_countmatchesheader["num_packets"]and raise if not.
-def _load_data(fid, header):
+def _load_data(fid, header):
"""Use numpy's memmap to get array information."""
header_size = header["header_size"]
num_channels = header["num_channels"]
num_samples = header["num_samples"]
dtype = np.dtype(header["dtype"])
packet_size = header_size + num_channels * num_samples * dtype.itemsize
- # Memory map entire file as raw bytes. We do this to later skip the headers
- fid.seek(0)
- raw = np.memmap(fid, dtype=np.uint8, mode="r")
+ # Memory map entire file as raw bytes when possible (file on disk).
+ fid.seek(0)
+ try:
+ filename = fid.name # may not exist for streams
+ raw = np.memmap(filename, dtype=np.uint8, mode="r")
+ except Exception:
+ # Fallback: read into memory
+ raw = np.frombuffer(fid.read(), dtype=np.uint8)
+ fid.seek(0)
# Compute how many blocks
- block_count = raw.size // packet_size
+ block_count, remainder = divmod(raw.size, packet_size)
+ if remainder:
+ raise ValueError(
+ f"Raw size {raw.size} not divisible by packet size {packet_size}; "
+ f"remainder {remainder}."
+ )
+ if "num_packets" in header and block_count != header["num_packets"]:
+ raise ValueError(
+ f"Header num_packets={header['num_packets']} does not match "
+ f"computed {block_count}."
+ )
# Create a view that skips headers
data = raw.reshape(block_count, packet_size)[:, header_size:]
data = data.view(dtype).reshape(-1, num_channels)
return data📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _load_data(fid, header): | |
| """Use numpy's memmap to get array information.""" | |
| header_size = header["header_size"] | |
| num_channels = header["num_channels"] | |
| num_samples = header["num_samples"] | |
| dtype = np.dtype(header["dtype"]) | |
| packet_size = header_size + num_channels * num_samples * dtype.itemsize | |
| # Memory map entire file as raw bytes. We do this to later skip the headers | |
| fid.seek(0) | |
| raw = np.memmap(fid, dtype=np.uint8, mode="r") | |
| # Compute how many blocks | |
| block_count = raw.size // packet_size | |
| # Create a view that skips headers | |
| data = raw.reshape(block_count, packet_size)[:, header_size:] | |
| data = data.view(dtype).reshape(-1, num_channels) | |
| return data | |
| def _load_data(fid, header): | |
| """Use numpy's memmap to get array information.""" | |
| header_size = header["header_size"] | |
| num_channels = header["num_channels"] | |
| num_samples = header["num_samples"] | |
| dtype = np.dtype(header["dtype"]) | |
| packet_size = header_size + num_channels * num_samples * dtype.itemsize | |
| # Memory map entire file as raw bytes when possible (file on disk). | |
| fid.seek(0) | |
| try: | |
| filename = fid.name # may not exist for streams | |
| raw = np.memmap(filename, dtype=np.uint8, mode="r") | |
| except Exception: | |
| # Fallback: read into memory | |
| raw = np.frombuffer(fid.read(), dtype=np.uint8) | |
| fid.seek(0) | |
| # Compute how many blocks | |
| block_count, remainder = divmod(raw.size, packet_size) | |
| if remainder: | |
| raise ValueError( | |
| f"Raw size {raw.size} not divisible by packet size {packet_size}; " | |
| f"remainder {remainder}." | |
| ) | |
| if "num_packets" in header and block_count != header["num_packets"]: | |
| raise ValueError( | |
| f"Header num_packets={header['num_packets']} does not match " | |
| f"computed {block_count}." | |
| ) | |
| # Create a view that skips headers | |
| data = raw.reshape(block_count, packet_size)[:, header_size:] | |
| data = data.view(dtype).reshape(-1, num_channels) | |
| return data |
| SintelaBinaryV3(): ("sintela_binary_v3_test_1.raw",), | ||
| GDR_V1(): ("gdr_1.h5",), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify plumbing: registry entry and entry point exist for the new reader
The new test case depends on:
- data_registry entry for “sintela_binary_v3_test_1.raw”
- the
SINTELA_BINARY__V3entry point wiring the reader - expected class attributes in
core.py(name/version/extensions)
Run the script below to confirm all integration points are present.
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Check public export in __init__.py"
rg -nC1 -F '__all__ = ["SintelaBinaryV3"]' dascore/io/sintela_binary/__init__.py || {
echo "Missing __all__ export for SintelaBinaryV3"; exit 1; }
echo "Check data registry contains the test asset"
rg -n '^sintela_binary_v3_test_1\.raw\b' dascore/data_registry.txt
echo "Check pyproject entry point wiring"
rg -nP 'SINTELA_BINARY__V3.*dascore\.io\.sintela_binary\.core:SintelaBinaryV3' pyproject.toml
echo "Check core class attributes"
rg -nC2 -P 'class\s+SintelaBinaryV3' dascore/io/sintela_binary/core.py
rg -nP 'name\s*=\s*"Sintela_Binary"' dascore/io/sintela_binary/core.py
rg -nP 'version\s*=\s*"3"' dascore/io/sintela_binary/core.py
rg -nP 'preferred_extensions\s*=\s*\("raw",\)' dascore/io/sintela_binary/core.py
echo "All good."Length of output: 285
Add missing __all__ export for SintelaBinaryV3
The integration check failed immediately because SintelaBinaryV3 isn’t publicly exported. Without this, the reader won’t be discoverable, causing import errors in downstream code.
• File: dascore/io/sintela_binary/__init__.py
– Add or extend the module’s __all__ to include "SintelaBinaryV3"
__all__ = [
- # …other exports…
+ # …other exports…
"SintelaBinaryV3",
]Once that’s in place, please rerun the integration script to confirm the remaining points:
dascore/data_registry.txtcontainssintela_binary_v3_test_1.rawpyproject.tomlhas theSINTELA_BINARY__V3 = dascore.io.sintela_binary.core:SintelaBinaryV3entry-pointcore.pydefinesclass SintelaBinaryV3with the correctname,version, andpreferred_extensionsattributes
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| SintelaBinaryV3(): ("sintela_binary_v3_test_1.raw",), | |
| GDR_V1(): ("gdr_1.h5",), | |
| # File: dascore/io/sintela_binary/__init__.py | |
| __all__ = [ | |
| # …other exports… | |
| "SintelaBinaryV3", | |
| ] |
🤖 Prompt for AI Agents
In tests/test_io/test_common_io.py around lines 59-60, the test fails because
the SintelaBinaryV3 reader class isn't exported from
dascore/io/sintela_binary/__init__.py; open that __init__.py and add or extend
the module-level __all__ to include "SintelaBinaryV3" (or append it if __all__
already exists), save and rerun the integration check to confirm the class is
discoverable and the downstream checks (data_registry, pyproject entry point,
and core.py class attributes) pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dascore/utils/misc.py (1)
771-788: Fallback reads from current position, not the whole file — inconsistent with memmap semantics.np.memmap(path) maps the entire file starting at offset 0. The fallback path currently reads from the current file position, which can silently yield truncated arrays if fid.tell() != 0. Reset to start, read all, then restore the position. Also consider catching OSError for mapping failures (eg, permissions).
Apply this diff:
def maybe_mem_map(fid: IOBase, dtype="<u1") -> np.ndarray | np.memmap: @@ - try: - raw = np.memmap(fid.name, dtype=dtype, mode="r") - except (AttributeError, TypeError, ValueError): - # Fallback: read into memory - current = fid.tell() - raw = np.frombuffer(fid.read(), dtype=dtype) - fid.seek(current) - return raw + try: + return np.memmap(fid.name, dtype=dtype, mode="r") + except (AttributeError, TypeError, ValueError, OSError): + # Fallback: read the entire buffer and restore the original position. + cur = None + try: + cur = fid.tell() + fid.seek(0, os.SEEK_SET) + except Exception: + # Non-seekable streams: read whatever is available. + pass + out = np.frombuffer(fid.read(), dtype=dtype) + if cur is not None: + with contextlib.suppress(Exception): + fid.seek(cur, os.SEEK_SET) + return out
🧹 Nitpick comments (3)
dascore/utils/misc.py (1)
751-769: Clarify return type and harden size detection (fd-backed streams, SEEK constants).
- Add a return annotation (int).
- Use os.SEEK_* for readability/portability.
- If fid.name is not a path-like (eg, some temp files expose an int fd), Path(path) will raise TypeError. Fall back to os.fstat(fid.fileno()).st_size.
Apply this diff:
-def get_buffer_size(fid: IOBase): +def get_buffer_size(fid: IOBase) -> int: @@ - path = getattr(fid, "name", None) - if path is None: - cur = fid.tell() - fid.seek(0, 2) # end - file_size = fid.tell() - fid.seek(cur, 0) - else: - file_size = Path(path).stat().st_size - return file_size + path = getattr(fid, "name", None) + if path is None: + cur = fid.tell() + fid.seek(0, os.SEEK_END) + file_size = fid.tell() + fid.seek(cur, os.SEEK_SET) + return file_size + # If `name` isn't a filesystem path (eg, int fd), fall back to fstat. + try: + return Path(path).stat().st_size + except TypeError: + return os.fstat(fid.fileno()).st_sizetests/test_utils/test_misc.py (2)
357-362: Good type assertion for file-backed memmap.Asserts np.memmap is returned for real files. Consider also asserting array.size > 0 to guard against empty mappings.
363-370: Add a test for non-zero file pointer to catch partial reads in fallback.Currently, if fid.tell() != 0, the fallback path can return a truncated array. Add a test that writes without seeking back to 0 before calling maybe_mem_map. This will safeguard the intended semantics once the fix lands.
Proposed test addition:
class TestMaybeMemMap: @@ def test_bytes_io(self): """Ensure non-files return arrays.""" bio = BytesIO() bio.write(b"1234") bio.seek(0) array = maybe_mem_map(bio) assert isinstance(array, np.ndarray) + + def test_bytes_io_nonzero_position(self): + """Fallback should read entire buffer even if pointer is not at 0.""" + bio = BytesIO() + bio.write(b"abcdef") + # Intentionally do not seek back to 0; pointer is at end. + arr_end_pos = maybe_mem_map(bio) + # Now reset to start and compare lengths; both should see full content. + bio.seek(0) + arr_full = maybe_mem_map(bio) + assert arr_end_pos.size == arr_full.size == 6
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
dascore/io/sintela_binary/core.py(1 hunks)dascore/io/sintela_binary/utils.py(1 hunks)dascore/utils/misc.py(2 hunks)tests/test_utils/test_misc.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- dascore/io/sintela_binary/utils.py
- dascore/io/sintela_binary/core.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_utils/test_misc.py (2)
dascore/utils/misc.py (2)
get_buffer_size(751-768)maybe_mem_map(771-787)tests/conftest.py (1)
terra15_v5_path(207-209)
dascore/utils/misc.py (3)
tests/test_io/test_io_core.py (3)
read(46-47)read(65-67)read(89-92)dascore/io/core.py (2)
read(469-478)read(584-648)dascore/io/pickle/core.py (1)
read(50-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: test_code_min_deps (macos-latest, 3.12)
- GitHub Check: test_code_min_deps (macos-latest, 3.13)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
- GitHub Check: test_code_min_deps (windows-latest, 3.13)
- GitHub Check: test_code_min_deps (windows-latest, 3.12)
- GitHub Check: test_code (windows-latest, 3.11)
- GitHub Check: test_code (windows-latest, 3.10)
- GitHub Check: test_code (windows-latest, 3.12)
- GitHub Check: test_code (macos-latest, 3.10)
- GitHub Check: test_code (macos-latest, 3.11)
- GitHub Check: test_code (macos-latest, 3.12)
- GitHub Check: test_code (ubuntu-latest, 3.12)
- GitHub Check: test_code (ubuntu-latest, 3.10)
- GitHub Check: test_code (ubuntu-latest, 3.11)
🔇 Additional comments (3)
dascore/utils/misc.py (1)
14-14: Import of IOBase is appropriate for public APIs working with file-like objects.Good call surfacing IOBase for type annotations on the new helpers.
tests/test_utils/test_misc.py (2)
334-343: Solid coverage for file-backed size detection.Validates parity with filesystem stat; this anchors behavior for regular files.
344-352: Nice BytesIO coverage and pointer preservation check.Confirms size is stable regardless of position; aligns with the helper’s contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
pyproject.toml(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: LintCode
pyproject.toml
[error] 127-127: TOML parse error in pyproject.toml: invalid basic string at line 127, column 28.
[error] 127-127: ruff: TOML parse error in pyproject.toml: invalid basic string at line 127, column 28.
[error] 127-127: TOML decode error in pyproject.toml (fix-future-annotations): Illegal character newline at line 127, column 28.
pyproject.toml
Outdated
| SILIXA_H5__V1 = "dascore.io | ||
| SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Fix broken TOML string for SILIXA entry point (parsing failure).
The value for SILIXA_H5__V1 is split across lines, creating an unterminated string and causing the TOML parse errors reported by CI. Keep the entry-point target on a single line and point it to the correct module:object.
Apply this minimal fix (adjust the right-hand side if your actual module/object differ):
-SILIXA_H5__V1 = "dascore.io
-SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3"
+SILIXA_H5__V1 = "dascore.io.silixa_h5.core:SilixaH5V1"
+SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3"Run the following to (a) confirm the correct Silixa target, (b) ensure the Sintela class exists, and (c) validate the TOML parses:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Searching for Silixa IO class candidates..."
rg -nP -C2 '(class\s+SilixaH5V1\b|class\s+SilixaH5\b|class\s+Silixa.*V1\b)' -- dascore
echo "Searching for SintelaBinaryV3 class..."
rg -nP -C2 'class\s+SintelaBinaryV3\b' -- dascore
echo "Checking for matching module paths..."
fd -a -t f 'silixa*' dascore/io || true
fd -a -t f 'sintela*' dascore/io || true
echo "Validating pyproject.toml parses..."
python - <<'PY'
import tomllib, sys
with open("pyproject.toml","rb") as f:
tomllib.load(f)
print("pyproject.toml: TOML OK")
PYLength of output: 2476
Fix broken TOML string and correct SILIXA entry-point path
The TOML parser is erroring on an unterminated string at line 127 because the SILIXA_H5__V1 value is split across lines. The entry-point target must be on a single line and match the actual module path (dascore.io.silixah5.core) and class name (SilixaH5V1) as defined in dascore/io/silixah5/core.py (line 26). The SINTELA_BINARY__V3 entry is already correct.
– File: pyproject.toml
– Lines: 127–128
-SILIXA_H5__V1 = "dascore.io
-SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3"
+SILIXA_H5__V1 = "dascore.io.silixah5.core:SilixaH5V1"
+SINTELA_BINARY__V3 = "dascore.io.sintela_binary.core:SintelaBinaryV3"After applying, re-run the TOML validation to ensure the file parses cleanly:
python - <<'PY'
import tomllib
with open("pyproject.toml", "rb") as f:
tomllib.load(f)
print("pyproject.toml: OK")
PY🧰 Tools
🪛 GitHub Actions: LintCode
[error] 127-127: TOML parse error in pyproject.toml: invalid basic string at line 127, column 28.
[error] 127-127: ruff: TOML parse error in pyproject.toml: invalid basic string at line 127, column 28.
[error] 127-127: TOML decode error in pyproject.toml (fix-future-annotations): Illegal character newline at line 127, column 28.
🤖 Prompt for AI Agents
In pyproject.toml around lines 127 to 128, the TOML string for SILIXA_H5__V1 is
broken across lines and the entry-point path/class are incorrect; replace the
two-line value with a single-line correct entry pointing to
dascore.io.silixah5.core:SilixaH5V1 (e.g. SILIXA_H5__V1 =
"dascore.io.silixah5.core:SilixaH5V1"), leaving SINTELA_BINARY__V3 as-is, then
re-run the provided tomllib validation command to confirm the file parses
cleanly.
Description
This PR add support for Sintela's binary format, version 3.
Checklist
I have (if applicable):
Summary by CodeRabbit
New Features
Chores
Tests