Prodml fbe #571

d-chambers · 2025-10-11T20:53:16Z

Description

This PR adds support for FBE (frequency band extracted) patches in prodml files. It also tests that slicing out of bands returns empty spools for all commonly tested fiberIO instances.

Checklist

I have (if applicable):

referenced the GitHub issue this PR closes.
documented the new feature with docstrings and/or appropriate doc page.
included tests. See testing guidelines.
added the "ready_for_review" tag once the PR is ready to be reviewed.

Summary by CodeRabbit

New Features
- Expanded ProdML support and added ProdML patch attribute types.
Bug Fixes
- Readers across formats now return empty results when time/distance selections yield no data (prevent empty patches).
Behavior Changes
- SEGY reads no longer perform channel/time trimming; full patch returned unless selection excludes all samples.
Tests
- Added tests verifying empty-read behavior when slices exclude all patches; minor docstring fix.
Chores
- Added a new sample dataset entry to the data registry.

coderabbitai · 2025-10-11T20:53:25Z

Walkthrough

Adds empty-data guards across many IO readers to short-circuit when selections yield no data, renames several internal helpers from _get_patch to _get_patches (now returning lists), refactors ProdML parsing to a processor/iterator-driven flow, adds ProdML attr classes and a data-registry entry, and updates tests for empty-selection reads.

Changes

Cohort / File(s)	Summary
ProdML overhaul `dascore/io/prodml/core.py`, `dascore/io/prodml/utils.py`, `dascore/data_registry.txt`	Replaces dict-based attr construction with a processor/iterator-driven flow: adds `ProdMLRawPatchAttrs`/`ProdMLFbePatchAttrs`, processor registries, `_yield_prodml_attrs_coords`, and updates `_read_prodml` usage. Also adds a new data registry entry.
_get_patch → _get_patches conversions `dascore/io/ap_sensing/core.py`, `dascore/io/ap_sensing/utils.py`, `dascore/io/silixah5/core.py`, `dascore/io/silixah5/utils.py`, `dascore/io/sintela_binary/core.py`, `dascore/io/sintela_binary/utils.py`	Renames helper to `_get_patches`, updates imports and call sites, changes return shape to lists of `Patch`(es), and adds early empty-data short-circuits in utils.
Readers: empty-data guards `dascore/io/dashdf5/core.py`, `dascore/io/gdr/core.py`, `dascore/io/h5simple/core.py`, `dascore/io/neubrex/core.py`, `dascore/io/segy/core.py`, `dascore/io/tdms/core.py`, `dascore/io/terra15/core.py`, `dascore/io/dasdae/core.py`, `dascore/io/optodas/utils.py`, `dascore/io/febus/utils.py`	Adds early-return or skip behavior when selected coords/data are empty to avoid creating empty `Patch` objects and to prevent invalid reshapes; adjusts control flow accordingly.
dasdae utilities `dascore/io/dasdae/utils.py`	Adds `_KWARG_NON_KEYS` and `_kwargs_empty(kwargs)` to determine when kwargs are effectively empty and to control filtering of empty patches.
Tests & minor fixes `tests/test_io/test_common_io.py`, `tests/test_io/test_dasdae/test_dasdae.py`	Adds tests asserting reads return empty spools when selections exclude all data, removes a runtime skip, and fixes a docstring typo.

Possibly related PRs

add Sintela binary v3 support #516 — Renames and call-site updates for sintela_binary _get_patch → _get_patches, which directly touches the same helper and affected modules.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title Check	❓ Inconclusive	The title “Prodml fbe” is too terse and does not clearly communicate the nature of the change or the action being taken, as it omits that support for frequency‐band‐extracted patches is being added and reads more like a label than a descriptive summary.	Please expand the title to a clear, imperative statement such as “Add support for FBE patches in ProdML files” so that teammates can immediately understand the primary change.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	The pull request description follows the repository template by including a clear “## Description” section explaining the feature and a “## Checklist” with the required items, satisfying the structural requirements for a complete description.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch prodml_fbe

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0aa431 and f66a73f.

📒 Files selected for processing (1)

dascore/io/prodml/utils.py (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

dascore/io/prodml/utils.py (4)

dascore/core/coordmanager.py (5)

get_coord (1039-1058)

step (1068-1070)

keys (1027-1029)

update (240-308)

get_coord_manager (1127-1215)

dascore/core/coords.py (4)

get_coord (1515-1703)

update (738-751)

update (965-967)

data (606-608)

dascore/utils/misc.py (4)

iterate (257-269)

maybe_get_items (438-460)

register_func (35-58)

unbyte (370-374)

dascore/core/attrs.py (4)

PatchAttrs (65-301)

items (181-183)

get (174-179)

update (234-255)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)

GitHub Check: Run benchmarks
GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
GitHub Check: test_code (windows-latest, 3.12)
GitHub Check: test_code (windows-latest, 3.10)
GitHub Check: test_code (windows-latest, 3.13)
GitHub Check: test_code (windows-latest, 3.11)
GitHub Check: test_code (ubuntu-latest, 3.11)
GitHub Check: test_code (macos-latest, 3.10)
GitHub Check: test_code_min_deps (windows-latest, 3.12)
GitHub Check: test_code_min_deps (macos-latest, 3.13)
GitHub Check: test_code_min_deps (macos-latest, 3.12)
GitHub Check: test_code (macos-latest, 3.12)
GitHub Check: test_code_min_deps (windows-latest, 3.13)
GitHub Check: test_code (ubuntu-latest, 3.10)
GitHub Check: test_code (ubuntu-latest, 3.12)
GitHub Check: test_code (macos-latest, 3.13)
GitHub Check: test_code (ubuntu-latest, 3.13)

🔇 Additional comments (5)

dascore/io/prodml/utils.py (5)
176-176: Dict union operator requires Python 3.9+.

The dict union operator (fbe | raws) on line 176 requires Python 3.9 or later. Verify that the project's minimum Python version requirement is at least 3.9.

If Python 3.8 support is still needed, use {**fbe, **raws} instead:
-    for name, node in (fbe | raws).items():
+    for name, node in {**fbe, **raws}.items():
220-231: Verify data units are read from the correct location.

For many ProdML files, RawDataUnit may be attached to the RawData dataset rather than the parent group. The current code on line 225 only reads from node_info.node (the group). Consider reading from both the group and the dataset to ensure units are found:
 @register_func(_NODE_ATTRS_PROCESSORS, key="raw")
 def _get_raw_node_attr_coords(node_info, d_coord, base_info):
     """Get the raw data information."""
     info = dict(base_info)
     t_coord = _get_time_coord(node_info.node)
     info.update(_get_data_unit_and_type(node_info.node))
+    # Override with dataset-level units if present (some files store units on dataset).
+    info.update(_get_data_unit_and_type(node_info.node["RawData"]))
     coords = dc.get_coord_manager(
         coords={"time": t_coord, "distance": d_coord},
         dims=_get_dims_from_attrs(node_info.node["RawData"].attrs),
     )
     return ProdMLRawPatchAttrs(**info), coords
Based on learnings

233-256: LGTM! FBE distance slicing correctly implemented.

The distance slicing logic now correctly:

Uses StartLocusIndex for the starting position (line 245)

Uses NumberOfLoci for the count (line 246)

Handles both bytes and regular types with proper type conversion (lines 247-248)

This addresses the previous concern about using the wrong attribute for start_ind.

307-321: LGTM! Clean iterator-based implementation.

The refactored implementation using a single iteration over _yield_data_nodes with registry-based processor dispatch is clear and maintainable. The empty-data check on line 319 correctly prevents creating patches with no data.

48-55: Mapping is now consistent with ProdMLFbePatchAttrs.

The mapping from "WindowFunction" to "window_function" on line 52 correctly matches the field definition on line 81. Previous review comments flagged a mismatch with window_type, but the code is now internally consistent.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-10-11T20:56:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.92%. Comparing base (f8d7949) to head (f66a73f).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master     #571    +/-   ##
========================================
  Coverage   99.92%   99.92%            
========================================
  Files         127      128     +1     
  Lines       10622    10878   +256     
========================================
+ Hits        10614    10870   +256     
  Misses          8        8

Flag	Coverage Δ
unittests	`99.92% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

dascore/io/prodml/utils.py (1)
141-156: Guard against single-element time arrays (division by zero risk)

If array_len == 1, (end - start) / (array_len - 1) divides by zero. Fallback to using the time array directly.
-    step = (end - start) / (array_len - 1)
-    time_coord = get_coord(start=start, stop=end + step, step=step, units="s")
+    if array_len == 1:
+        time_coord = get_coord(data=np.asarray(time_array[:]), units="s")
+    else:
+        step = (end - start) / (array_len - 1)
+        time_coord = get_coord(start=start, stop=end + step, step=step, units="s")

🧹 Nitpick comments (4)

dascore/io/terra15/core.py (1)

73-74: Consider checking before patch creation for efficiency.

The empty-data guard works correctly but checks patch.data.size after the patch has been created by _read_terra15. Other IO modules in this PR check before creating the patch, which is more efficient. If feasible, consider adding the empty-data check within _read_terra15 or checking the data before patch construction.
dascore/io/sintela_binary/utils.py (1)
224-229: Move the empty-data check before Patch creation.

The current implementation creates a Patch object before checking if the data is empty. This is inefficient because Patch creation can be expensive for memory-mapped data. Move the empty-data guard before the Patch instantiation to avoid unnecessary allocation.

Apply this diff:
     if time is not None or distance is not None:
         cm, data = cm.select(data, time=time, distance=distance)
-    patch = dc.Patch(data=array(data), coords=cm, attrs=patch_attrs)
     if not data.size:  # no data in this slice.
         return []
+    patch = dc.Patch(data=array(data), coords=cm, attrs=patch_attrs)
     return [patch]
dascore/io/dasdae/core.py (1)

120-123: Document empty-patch guard rationale

Add an inline comment above the if not patch.data.size and kwargs guard explaining that empty patches are preserved on full reads (for round-trip support) but skipped only when filters (kwargs) are provided.

Add a test for dc.read(path, time=…) (or other selection kwargs) to assert that empty patches are correctly filtered out under selection.
dascore/io/prodml/utils.py (1)
264-269: Avoid eager loading FBE data

Returning node[:] loads full arrays into memory. Return the dataset and let downstream selection handle slicing/laziness, matching raw behavior.
-    node = node_info.node
-    return node[:]
+    return node_info.node

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d6b3ae and 71c2dba.

📒 Files selected for processing (21)

dascore/data_registry.txt (1 hunks)
dascore/io/ap_sensing/core.py (2 hunks)
dascore/io/ap_sensing/utils.py (1 hunks)
dascore/io/dasdae/core.py (1 hunks)
dascore/io/dashdf5/core.py (1 hunks)
dascore/io/febus/utils.py (3 hunks)
dascore/io/gdr/core.py (1 hunks)
dascore/io/h5simple/core.py (1 hunks)
dascore/io/neubrex/core.py (2 hunks)
dascore/io/optodas/utils.py (1 hunks)
dascore/io/prodml/core.py (3 hunks)
dascore/io/prodml/utils.py (6 hunks)
dascore/io/segy/core.py (1 hunks)
dascore/io/silixah5/core.py (2 hunks)
dascore/io/silixah5/utils.py (1 hunks)
dascore/io/sintela_binary/core.py (2 hunks)
dascore/io/sintela_binary/utils.py (2 hunks)
dascore/io/tdms/core.py (1 hunks)
dascore/io/terra15/core.py (1 hunks)
tests/test_io/test_common_io.py (1 hunks)
tests/test_io/test_dasdae/test_dasdae.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (16)

dascore/io/tdms/core.py (4)

dascore/core/patch.py (2)

data (272-283)

size (302-315)

dascore/core/coords.py (2)

data (606-608)

size (473-475)

dascore/core/coordmanager.py (1)

size (802-804)

dascore/core/spool.py (1)

spool (663-692)

dascore/io/silixah5/core.py (1)

dascore/io/silixah5/utils.py (3)

_get_attr (94-101)

_get_patches (104-117)

_get_version_string (32-39)

dascore/io/sintela_binary/utils.py (2)

dascore/io/ap_sensing/utils.py (1)

_get_patches (90-101)

dascore/io/silixah5/utils.py (1)

_get_patches (104-117)

dascore/io/terra15/core.py (2)

dascore/core/patch.py (2)

data (272-283)

size (302-315)

dascore/core/spool.py (1)

spool (663-692)

dascore/io/h5simple/core.py (2)

dascore/core/patch.py (1)

size (302-315)

dascore/core/spool.py (1)

spool (663-692)

dascore/io/dasdae/core.py (4)

dascore/io/dasdae/utils.py (1)

_read_patch (155-167)

dascore/core/patch.py (2)

data (272-283)

size (302-315)

dascore/core/coords.py (2)

data (606-608)

size (473-475)

dascore/core/coordmanager.py (1)

size (802-804)

dascore/io/silixah5/utils.py (3)

dascore/io/ap_sensing/utils.py (1)

_get_patches (90-101)

dascore/io/sintela_binary/utils.py (1)

_get_patches (216-229)

dascore/compat.py (1)

array (31-37)

dascore/io/dashdf5/core.py (1)

dascore/core/spool.py (1)

spool (663-692)

dascore/io/prodml/core.py (2)

dascore/io/prodml/utils.py (3)

_get_prodml_version_str (96-106)

_read_prodml (299-310)

_yield_prodml_attrs_coords (271-282)

dascore/core/attrs.py (1)

update (234-255)

dascore/io/segy/core.py (2)

dascore/core/spool.py (1)

spool (663-692)

dascore/core/patch.py (1)

Patch (28-510)

dascore/io/ap_sensing/utils.py (2)

dascore/io/silixah5/utils.py (1)

_get_patches (104-117)

dascore/io/sintela_binary/utils.py (1)

_get_patches (216-229)

dascore/io/febus/utils.py (2)

dascore/core/patch.py (5)

size (302-315)

data (272-283)

dtype (318-320)

shape (286-299)

coords (255-269)

dascore/core/coords.py (4)

size (473-475)

empty (679-696)

max (443-445)

data (606-608)

tests/test_io/test_common_io.py (3)

dascore/io/core.py (4)

scan (483-500)

scan (784-900)

read (472-481)

read (587-651)

dascore/core/coordmanager.py (1)

max (1064-1066)

dascore/core/coords.py (1)

max (443-445)

dascore/io/prodml/utils.py (3)

dascore/core/coordmanager.py (3)

get_coord (1039-1058)

update (240-308)

get_coord_manager (1127-1215)

dascore/utils/misc.py (3)

maybe_get_items (438-460)

register_func (35-58)

unbyte (370-374)

dascore/core/attrs.py (4)

PatchAttrs (65-301)

items (181-183)

get (174-179)

update (234-255)

dascore/io/sintela_binary/core.py (3)

dascore/io/ap_sensing/utils.py (1)

_get_patches (90-101)

dascore/io/silixah5/utils.py (1)

_get_patches (104-117)

dascore/io/sintela_binary/utils.py (1)

_get_patches (216-229)

dascore/io/ap_sensing/core.py (1)

dascore/io/ap_sensing/utils.py (3)

_get_attrs_dict (76-87)

_get_patches (90-101)

_get_version_string (10-20)

🪛 Ruff (0.13.3)

dascore/io/prodml/utils.py

302-302: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)

GitHub Check: test_code_min_deps (macos-latest, 3.12)
GitHub Check: test_code (windows-latest, 3.10)
GitHub Check: test_code (windows-latest, 3.11)
GitHub Check: test_code (macos-latest, 3.13)
GitHub Check: test_code (windows-latest, 3.13)
GitHub Check: test_code (windows-latest, 3.12)
GitHub Check: test_code (macos-latest, 3.12)
GitHub Check: test_code (ubuntu-latest, 3.11)
GitHub Check: test_code (ubuntu-latest, 3.13)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
GitHub Check: test_code (macos-latest, 3.10)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
GitHub Check: test_code (ubuntu-latest, 3.12)
GitHub Check: test_code_min_deps (windows-latest, 3.13)
GitHub Check: test_code_min_deps (windows-latest, 3.12)
GitHub Check: test_code (ubuntu-latest, 3.10)
GitHub Check: test_code_min_deps (macos-latest, 3.13)
GitHub Check: Run benchmarks

🔇 Additional comments (21)

dascore/data_registry.txt (1)

34-34: LGTM!

The new data registry entry for prodml_fbe_1.h5 follows the established format and aligns with the PR's objective to add FBE support.

dascore/io/optodas/utils.py (1)

87-88: LGTM!

The empty-data guard correctly prevents patch creation when selection yields no data, returning an empty list consistent with the function's return type.

dascore/io/gdr/core.py (1)

59-60: LGTM!

The empty-data guard appropriately short-circuits processing when trimming results in no data, avoiding unnecessary patch creation.

dascore/io/h5simple/core.py (1)

41-42: LGTM with observation.

The empty-data guard correctly prevents patch creation when trimming yields no data. Note that this implementation checks new_cm.size (coordinate manager size) rather than new_data.size (data array size) as seen in other readers. Both approaches should be equivalent when coordinates and data are properly aligned.

dascore/io/tdms/core.py (1)

62-63: LGTM!

The empty-data guard appropriately short-circuits when coordinate selection yields no data, consistent with the pattern applied across other IO modules.

dascore/io/segy/core.py (2)

50-51: LGTM!

The empty-data guard correctly prevents patch creation when filtering yields no data.

54-54: LGTM!

Explicitly wrapping the patch in a list ([patch]) improves clarity and aligns with the pattern used in other IO modules.

dascore/io/dashdf5/core.py (1)

71-72: LGTM!

The empty-data guard appropriately returns an empty spool when coordinate selection yields no data, consistent with the unified pattern applied across IO modules in this PR.

dascore/io/neubrex/core.py (1)

74-75: LGTM!

The empty-data guards are properly placed after data trimming and consistently return empty spools when no data remains. This aligns with the PR's objective to handle out-of-range selections gracefully across all IO readers.

Also applies to: 121-122

tests/test_io/test_dasdae/test_dasdae.py (1)

133-142: LGTM! Clear documentation of the breaking change.

The xfail marker and explanatory comment properly document why empty patches can no longer round-trip. The note about this being an obscure edge case that won't affect real-world usage is helpful context.

dascore/io/sintela_binary/core.py (1)

18-18: LGTM!

The import and call site updates correctly adapt to the renamed _get_patches function, which now returns a list instead of a single Patch.

Also applies to: 77-79

dascore/io/silixah5/utils.py (1)

104-117: LGTM!

The refactoring correctly:

Renames the function to _get_patches reflecting the list return type

Checks for empty data before creating the Patch object (efficient)

Wraps the result in a list for consistent return type

Follows the same pattern as other readers in this PR

dascore/io/silixah5/core.py (1)

14-14: LGTM!

The changes correctly update the import and call site to use the refactored _get_patches function. The variable naming (patches) now accurately reflects that it's a list.

Also applies to: 65-68

tests/test_io/test_common_io.py (1)

353-365: LGTM!

The test correctly verifies that reading with a time window entirely after the file's time range returns an empty spool. The logic is sound:

Scans file to determine max time

Adds a one-second offset to ensure the query is beyond available data

Asserts empty spool is returned

This aligns with the PR's objective to handle out-of-range selections gracefully.

dascore/io/febus/utils.py (3)

231-237: Good early bail-out on empty time selection

Avoids min/max on empty arrays and unnecessary work. LGTM.

263-268: Graceful handling when distance is fully sliced out

Returning a 2D zero-length-distance array is consistent with empty-spool behavior. LGTM.

277-279: Skip creating patches for empty data

Correctly avoids empty patches. LGTM.

dascore/io/prodml/core.py (2)

54-57: Scan integration looks good

Iterating generator and updating coords on attrs is correct. LGTM.

67-68: Read path aligned with new utils

Delegates to _read_prodml and spools. LGTM.

dascore/io/ap_sensing/utils.py (1)

98-99: Empty selection guard is correct

Returns [] when slicing removes all data. Matches other IO behaviors. LGTM.

dascore/io/ap_sensing/core.py (1)

64-66: Updated to _get_patches

Callsite matches new list-returning helper. LGTM.

dascore/io/febus/utils.py

dascore/io/prodml/utils.py

tests/test_io/test_common_io.py

tests/test_io/test_dasdae/test_dasdae.py

codspeed-hq · 2025-10-11T21:23:19Z

CodSpeed Performance Report

Merging #571 will degrade performances by 16.27%

_{Comparing prodml_fbe (f66a73f) with master (f2da091)}

Summary

❌ 2 regressions
✅ 52 untouched

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`test_scan`	254.4 ms	303.8 ms	-16.27%
❌	`test_scan_df`	349.3 ms	406.5 ms	-14.05%

d-chambers · 2025-10-13T09:16:44Z

Merging #571 will degrade performances by 16.15%

This looks bad, but the real issue isn't that the code is 16% slower, but rather I added another test file. Predictably, the benchmarks that scan all the test data files will take longer when it has to scan more files.

Running the benchmark again after removing the new file on this branch I get:

128 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

and on master I get:
127 ms ± 782 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

So the performance change really isn't significant.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

dascore/io/prodml/utils.py (1)

130-155: Single-sample time array: guard divide-by-zero

If time array length is 1, computing step with (array_len - 1) divides by zero. Handle the singleton case explicitly.

-    step = (end - start) / (array_len - 1)
-    time_coord = get_coord(start=start, stop=end + step, step=step, units="s")
+    if array_len == 1:
+        time_coord = get_coord(data=np.asarray(time_array[:]).astype("datetime64[us]"))
+        return time_coord
+    step = (end - start) / (array_len - 1)
+    time_coord = get_coord(start=start, stop=end + step, step=step, units="s")

♻️ Duplicate comments (2)

dascore/io/prodml/utils.py (2)

218-229: Raw units source: read from RawData dataset, not group

RawDataUnit typically lives on the RawData dataset. Use dataset attrs (with group as fallback) to populate units.

 def _get_raw_node_attr_coords(node_info, d_coord, base_info):
     """Get the raw data information."""
     info = dict(base_info)
     t_coord = _get_time_coord(node_info.node)
-    info.update(_get_data_unit_and_type(node_info.node))
+    # Get type from group; units from dataset (fallback to group).
+    info.update(_get_data_unit_and_type(node_info.node))
+    info.update(_get_data_unit_and_type(node_info.node["RawData"]))
     coords = dc.get_coord_manager(
         coords={"time": t_coord, "distance": d_coord},
         dims=_get_dims_from_attrs(node_info.node["RawData"].attrs),
     )
     return ProdMLRawPatchAttrs(**info), coords

Based on learnings

231-252: FBE distance slice: ensure integer indices

Cast StartLocusIndex and NumberOfLoci to int before slicing. Prevents type issues when attrs are bytes/strings or numpy scalars.

 def _get_processed_node_attr_coords(node_info, d_coord, base_info):
@@
-    start_ind = node_info.parent_node.attrs.get("StartLocusIndex", 0)
-    total_inds = node_info.parent_node.attrs["NumberOfLoci"]
+    start_ind = int(unbyte(node_info.parent_node.attrs.get("StartLocusIndex", 0)))
+    total_inds = int(unbyte(node_info.parent_node.attrs["NumberOfLoci"]))
@@
-    distance = d_coord[start_ind : start_ind + total_inds]
+    distance = d_coord[start_ind : start_ind + total_inds]

🧹 Nitpick comments (3)

dascore/io/prodml/utils.py (3)

254-269: Avoid eager read of FBE data; keep it lazy and consistent with raw

Return the dataset instead of materializing it; selection will slice as needed.

 @register_func(_NODE_DATA_PROCESSORS, key="fbe")
 def _get_fbe_data(node_info):
     """Get the data from a fbe node."""
     node = node_info.node
-    return node[:]
+    return node

285-296: Normalize dim names: include 'Distance' mapping

Some files use 'Distance' in Dimensions; map to 'distance' for consistency.

 def _get_dims_from_attrs(attrs):
@@
-    map_ = {"locus": "distance", "Locus": "distance", "Time": "time"}
+    map_ = {"locus": "distance", "Locus": "distance", "Distance": "distance", "Time": "time"}

299-306: Good: strict zip added; consider single-pass to avoid double traversal

zip(strict=True) is correct. To avoid scanning nodes twice, iterate once and compute attrs/data per node.

Example refactor:

def _read_prodml(fi, distance=None, time=None):
    out = []
    acq = fi["Acquisition"]
    base_info = maybe_get_items(acq.attrs, _ROOT_ATTRS)
    d_coord = _get_distance_coord(acq)
    for info in _yield_data_nodes(fi):
        attr_func = _NODE_ATTRS_PROCESSORS[info.patch_type]
        data_func = _NODE_DATA_PROCESSORS[info.patch_type]
        attrs, cm = attr_func(info, d_coord, base_info)
        data = data_func(info)
        if time is not None or distance is not None:
            cm, data = cm.select(array=data, time=time, distance=distance)
        if data.size:
            out.append(dc.Patch(data=data, attrs=attrs, coords=cm))
    return out

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71c2dba and 7d69ec5.

📒 Files selected for processing (5)

dascore/io/dasdae/core.py (2 hunks)
dascore/io/dasdae/utils.py (1 hunks)
dascore/io/prodml/utils.py (6 hunks)
dascore/io/sintela_binary/utils.py (2 hunks)
tests/test_io/test_dasdae/test_dasdae.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

tests/test_io/test_dasdae/test_dasdae.py

🧰 Additional context used

🧬 Code graph analysis (3)

dascore/io/prodml/utils.py (3)

dascore/core/coordmanager.py (4)

get_coord (1039-1058)

keys (1027-1029)

update (240-308)

get_coord_manager (1127-1215)

dascore/utils/misc.py (4)

iterate (257-269)

maybe_get_items (438-460)

register_func (35-58)

unbyte (370-374)

dascore/core/attrs.py (4)

PatchAttrs (65-301)

items (181-183)

get (174-179)

update (234-255)

dascore/io/sintela_binary/utils.py (3)

dascore/io/ap_sensing/utils.py (1)

_get_patches (90-101)

dascore/io/silixah5/utils.py (1)

_get_patches (104-117)

dascore/compat.py (1)

array (31-37)

dascore/io/dasdae/core.py (3)

dascore/io/dasdae/utils.py (2)

_kwargs_empty (186-192)

_read_patch (155-167)

dascore/core/patch.py (2)

data (272-283)

size (302-315)

dascore/core/coords.py (2)

data (606-608)

size (473-475)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)

GitHub Check: test_code (windows-latest, 3.10)
GitHub Check: test_code (macos-latest, 3.10)
GitHub Check: test_code (ubuntu-latest, 3.12)
GitHub Check: test_code (windows-latest, 3.13)
GitHub Check: test_code (macos-latest, 3.11)
GitHub Check: test_code (windows-latest, 3.12)
GitHub Check: test_code (macos-latest, 3.13)
GitHub Check: test_code (ubuntu-latest, 3.11)
GitHub Check: test_code (ubuntu-latest, 3.10)
GitHub Check: test_code (ubuntu-latest, 3.13)
GitHub Check: test_code (windows-latest, 3.11)
GitHub Check: Run benchmarks
GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
GitHub Check: test_code_min_deps (macos-latest, 3.13)
GitHub Check: test_code_min_deps (windows-latest, 3.13)
GitHub Check: test_code_min_deps (macos-latest, 3.12)
GitHub Check: test_code_min_deps (windows-latest, 3.12)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)

🔇 Additional comments (6)

dascore/io/sintela_binary/utils.py (3)

216-218: LGTM! Signature aligns with broader refactoring.

The function rename to _get_patches (plural) and the addition of time and distance parameters are consistent with similar changes across other IO modules (e.g., ap_sensing/utils.py, silixah5/utils.py). This standardizes the API for selective data loading.

224-227: LGTM! Early return prevents empty patches efficiently.

The empty-data guard correctly returns an empty list when slicing yields no data, avoiding unnecessary Patch creation and memory allocation. This placement (after slicing the memmap, before loading into memory) maintains optimal memory efficiency.

229-229: All references to _get_patch have been removed; call sites now use _get_patches returning lists.

dascore/io/dasdae/utils.py (1)

186-192: LGTM! Clean helper for filtering selection kwargs.

The implementation correctly identifies effectively empty kwargs by filtering out file metadata keys and None values. The explicit bool() conversion makes the emptiness check clear and readable.

dascore/io/dasdae/core.py (2)

24-24: LGTM! Import of the helper function.

The import is correctly placed with other utility imports.

121-124: LGTM! Correctly filters empty patches from selection results.

The logic properly skips patches with empty data when selection criteria were applied (non-empty kwargs), while preserving empty patches when no selection was performed. This prevents returning empty patches from unsuccessful band selections while maintaining backward compatibility for direct reads.

The condition not patch.data.size and not _kwargs_empty(kwargs) correctly captures the intended behavior: skip only when both the data is empty AND meaningful selection criteria were provided.

coderabbitai · 2025-10-13T10:02:59Z

dascore/io/prodml/utils.py

+_FBE_PARENT_ATTRS = {
+    "RawReference": "raw_reference",
+    "TransformSize": "transform_size",
+    "TransformType": "transform_type",
+    "WindowFunction": "window_function",
+    "WindowOverlap": "window_overlap",
+    "WindowSize": "window_size",
+}


⚠️ Potential issue | 🟠 Major

Field mismatch: map WindowFunction to window_type

ProdMLFbePatchAttrs defines window_type, but mapping uses window_function. This leaves window_type unset.

_FBE_PARENT_ATTRS = { "RawReference": "raw_reference", "TransformSize": "transform_size", "TransformType": "transform_type", - "WindowFunction": "window_function", + "WindowFunction": "window_type", "WindowOverlap": "window_overlap", "WindowSize": "window_size", }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_FBE_PARENT_ATTRS = {

"RawReference": "raw_reference",

"TransformSize": "transform_size",

"TransformType": "transform_type",

"WindowFunction": "window_function",

"WindowOverlap": "window_overlap",

"WindowSize": "window_size",

}

_FBE_PARENT_ATTRS = {

"RawReference": "raw_reference",

"TransformSize": "transform_size",

"TransformType": "transform_type",

"WindowFunction": "window_type",

"WindowOverlap": "window_overlap",

"WindowSize": "window_size",

}

🤖 Prompt for AI Agents

In dascore/io/prodml/utils.py around lines 48 to 55, the _FBE_PARENT_ATTRS mapping incorrectly maps "WindowFunction" to "window_function" but ProdMLFbePatchAttrs expects "window_type"; update the mapping key-value pair to map "WindowFunction" to "window_type" so the attribute is set correctly, and run tests or linters that validate ProdMLFbePatchAttrs to ensure no other mismatches remain.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

dascore/io/prodml/utils.py (1)

134-143: Fix divide-by-zero when only one time sample.

array_len == 1 makes (array_len - 1) zero; guard and return direct coord.

Apply this diff:

   array_len = len(time_array)
   assert array_len > 0, "Missing time array in ProdML file."
-  time_attrs = time_array.attrs
+  # Special case: a single timestamp -> return array-based coord.
+  if array_len == 1:
+      time_arr = time_array[:].astype("datetime64[us]")
+      return get_coord(data=time_arr)
+  time_attrs = time_array.attrs
   start_str = unbyte(time_attrs["PartStartTime"]).split("+")[0]
   start = dc.to_datetime64(start_str.rstrip("Z"))
   end_str = unbyte(time_attrs["PartEndTime"]).split("+")[0]
   end = dc.to_datetime64(end_str.rstrip("Z"))
   step = (end - start) / (array_len - 1)

♻️ Duplicate comments (2)

dascore/io/prodml/utils.py (2)

218-229: Read RawDataUnit from the RawData dataset, not the group.

Many ProdMLs store units on the RawData dataset. Keep data_type from group; fetch units from dataset.

Apply this diff:

 def _get_raw_node_attr_coords(node_info, d_coord, base_info):
     """Get the raw data information."""
     info = dict(base_info)
     t_coord = _get_time_coord(node_info.node)
-    info.update(_get_data_unit_and_type(node_info.node))
+    # data_type from group (eg RawDescription), units from dataset (RawDataUnit)
+    info.update(_get_data_unit_and_type(node_info.node))
+    info.update(_get_data_unit_and_type(node_info.node["RawData"]))
     coords = dc.get_coord_manager(
         coords={"time": t_coord, "distance": d_coord},
         dims=_get_dims_from_attrs(node_info.node["RawData"].attrs),
     )
     return ProdMLRawPatchAttrs(**info), coords

231-252: Cast StartLocusIndex/NumberOfLoci to int for FBE distance slice.

Avoid byte/str/NumPy-scalar issues in slicing; previous review noted this.

Apply this diff:

-    start_ind = node_info.parent_node.attrs.get("StartLocusIndex", 0)
-    total_inds = node_info.parent_node.attrs["NumberOfLoci"]
+    start_ind = int(node_info.parent_node.attrs.get("StartLocusIndex", 0))
+    total_inds = int(node_info.parent_node.attrs["NumberOfLoci"])
     # Put the coords back together.
     distance = d_coord[start_ind : start_ind + total_inds]

Also verify WindowOverlap=0 is preserved: maybe_get_items skips falsy values. If needed, set explicitly:

-    out.update(maybe_get_items(node_info.parent_node.attrs, _FBE_PARENT_ATTRS))
+    out.update(maybe_get_items(node_info.parent_node.attrs, _FBE_PARENT_ATTRS))
+    if "WindowOverlap" in node_info.parent_node.attrs:
+        out["window_overlap"] = int(node_info.parent_node.attrs["WindowOverlap"])

🧹 Nitpick comments (4)

dascore/io/dasdae/utils.py (1)
186-193: LGTM! Clean implementation of empty kwargs detection.

The logic correctly distinguishes between metadata keys and actual selection criteria, which aligns well with the empty-data guard pattern described in the PR summary.

Consider these optional enhancements:

Add type hints for clarity:
-def _kwargs_empty(kwargs):
+def _kwargs_empty(kwargs: dict) -> bool:
     """Determine if the keyword arguments are effectively empty."""
Extract the metadata keys as a module-level constant for reusability:
# Near the top of the file
_METADATA_KEYS = {"file_version", "file_format", "path"}

def _kwargs_empty(kwargs: dict) -> bool:
    """Determine if the keyword arguments are effectively empty."""
    out = {i: v for i, v in kwargs.items() if v is not None and i not in _METADATA_KEYS}
    return not bool(out)
dascore/io/sintela_binary/utils.py (1)

216-218: Unused **kwargs parameter.

The **kwargs parameter is declared but never used in the function body. If it's intended for future extensibility or API consistency with similar functions in the codebase, consider adding a comment to document this intention.
dascore/io/prodml/utils.py (2)
119-125: HDF5 attr types: cast to native ints/floats to avoid surprises.

Attrs can be NumPy scalars/bytes; explicit casts make math/slicing robust.

Apply this diff:
-    step = attrs["SpatialSamplingInterval"]
-    num_dist_channels = attrs["NumberOfLoci"]
-    start = attrs["StartLocusIndex"] * step
+    step = float(attrs["SpatialSamplingInterval"])
+    num_dist_channels = int(attrs["NumberOfLoci"])
+    start_ind = int(attrs.get("StartLocusIndex", 0))
+    start = start_ind * step
     stop = start + num_dist_channels * step
298-309: Avoid double traversal of HDF5; compute attrs+data in one pass.

Current zip() walks _yield_data_nodes twice. Single-pass reduces IO and complexity.

Apply this diff:
 def _read_prodml(fi, distance=None, time=None):
     """Read the prodml values into a patch."""
     out = []
-    iterator = zip(_yield_prodml_attrs_coords(fi), _yield_data_nodes(fi), strict=True)
-    for (attrs, cm), info in iterator:
-        data_func = _NODE_DATA_PROCESSORS[info.patch_type]
-        data = data_func(info)
+    acq = fi["Acquisition"]
+    base_info = maybe_get_items(acq.attrs, _ROOT_ATTRS)
+    d_coord = _get_distance_coord(acq)
+    for info in _yield_data_nodes(fi):
+        attr_func = _NODE_ATTRS_PROCESSORS[info.patch_type]
+        attrs, cm = attr_func(info, d_coord, base_info)
+        data = _NODE_DATA_PROCESSORS[info.patch_type](info)
         if time is not None or distance is not None:
             cm, data = cm.select(array=data, time=time, distance=distance)
         if data.size:
             out.append(dc.Patch(data=data, attrs=attrs, coords=cm))
     return out

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71c2dba and ed4f198.

📒 Files selected for processing (5)

dascore/io/dasdae/core.py (2 hunks)
dascore/io/dasdae/utils.py (1 hunks)
dascore/io/prodml/utils.py (6 hunks)
dascore/io/sintela_binary/utils.py (2 hunks)
tests/test_io/test_dasdae/test_dasdae.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

tests/test_io/test_dasdae/test_dasdae.py
dascore/io/dasdae/core.py

🧰 Additional context used

🧬 Code graph analysis (2)

dascore/io/sintela_binary/utils.py (3)

dascore/io/ap_sensing/utils.py (1)

_get_patches (90-101)

dascore/io/silixah5/utils.py (1)

_get_patches (104-117)

dascore/compat.py (1)

array (31-37)

dascore/io/prodml/utils.py (4)

dascore/core/coordmanager.py (4)

get_coord (1039-1058)

keys (1027-1029)

update (240-308)

get_coord_manager (1127-1215)

dascore/core/coords.py (4)

get_coord (1515-1703)

update (738-751)

update (965-967)

data (606-608)

dascore/utils/misc.py (3)

maybe_get_items (438-460)

register_func (35-58)

unbyte (370-374)

dascore/core/attrs.py (4)

PatchAttrs (65-301)

items (181-183)

get (174-179)

update (234-255)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)

GitHub Check: test_code_min_deps (windows-latest, 3.13)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
GitHub Check: test_code_min_deps (windows-latest, 3.12)
GitHub Check: test_code_min_deps (macos-latest, 3.12)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
GitHub Check: test_code (windows-latest, 3.13)
GitHub Check: test_code (windows-latest, 3.10)
GitHub Check: test_code (macos-latest, 3.13)
GitHub Check: test_code (windows-latest, 3.11)
GitHub Check: test_code (ubuntu-latest, 3.11)
GitHub Check: test_code (windows-latest, 3.12)
GitHub Check: test_code (ubuntu-latest, 3.10)
GitHub Check: test_code (ubuntu-latest, 3.13)
GitHub Check: test_code (macos-latest, 3.11)
GitHub Check: test_code (macos-latest, 3.12)
GitHub Check: test_code (macos-latest, 3.10)
GitHub Check: test_code (ubuntu-latest, 3.12)
GitHub Check: Run benchmarks

🔇 Additional comments (2)

dascore/io/sintela_binary/utils.py (2)
220-228: Verify coordinate consistency between attrs and explicit coords parameter.

The patch_attrs object is created at line 220 before any slicing occurs. When slicing is applied at line 225, cm is updated to the sliced coordinate manager, but patch_attrs retains the original (unsliced) coordinates. At line 228, the Patch is constructed with both coords=cm (sliced) and attrs=patch_attrs (containing original coords).

This differs from the pattern in ap_sensing/utils.py (lines 89-100) and silixah5/utils.py (lines 103-116), where attrs["coords"] is explicitly updated after slicing, and then the attrs object is validated.

Please verify that the Patch constructor correctly handles this scenario, ensuring that the explicit coords parameter takes precedence and that patch.attrs.coords is synchronized with patch.coords to avoid inconsistencies.

Consider refactoring to match the pattern in similar functions by keeping attrs as a dict, updating attrs["coords"] after slicing, and then validating:
 def _get_patches(
     resource, attr_class=PatchAttrs, extras=None, time=None, distance=None, **kwargs
 ):
     """Get the patch from the sintela file."""
-    patch_attrs, cm, header = _get_attrs_coords_header(resource, attr_class, extras)
+    _, cm, header = _get_attrs_coords_header(resource, attr_class, extras)
+    attrs = _get_attr_dict(header, extras)
+    attrs["coords"] = cm
     data = _load_data(resource, header)
     # Apply slicing if needed. This is done before patch creation so a memmap
     # can be sliced before loading all data into memory.
     if time is not None or distance is not None:
         cm, data = cm.select(data, time=time, distance=distance)
+        attrs["coords"] = cm
     if not data.size:  # no data in this slice.
         return []
+    patch_attrs = attr_class.model_validate(attrs)
     patch = dc.Patch(data=array(data), coords=cm, attrs=patch_attrs)
     return [patch]
226-229: LGTM: Empty data handling and list-based return.

The empty data check at lines 226-227 correctly returns an empty list when the sliced data has no elements, avoiding unnecessary patch creation. The return type change at line 229 (wrapping the patch in a list) aligns with the broader repository refactoring to standardize patch retrieval across multiple IO modules, as seen in similar functions in ap_sensing/utils.py and silixah5/utils.py.

coderabbitai · 2025-10-13T10:08:15Z

dascore/io/prodml/utils.py

+_FBE_PARENT_ATTRS = {
+    "RawReference": "raw_reference",
+    "TransformSize": "transform_size",
+    "TransformType": "transform_type",
+    "WindowFunction": "window_function",
+    "WindowOverlap": "window_overlap",
+    "WindowSize": "window_size",
+}


⚠️ Potential issue | 🟠 Major

window_type never populated (mapping mismatch).

FBE_PARENT_ATTRS maps WindowFunction to window_function, but the attrs model defines window_type. The value is lost.

Apply this diff:

_FBE_PARENT_ATTRS = { "RawReference": "raw_reference", "TransformSize": "transform_size", "TransformType": "transform_type", - "WindowFunction": "window_function", + "WindowFunction": "window_type", "WindowOverlap": "window_overlap", "WindowSize": "window_size", }

Optionally, add a field for transform_type in ProdMLFbePatchAttrs if you want it validated instead of becoming an extra.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_FBE_PARENT_ATTRS = {

"RawReference": "raw_reference",

"TransformSize": "transform_size",

"TransformType": "transform_type",

"WindowFunction": "window_function",

"WindowOverlap": "window_overlap",

"WindowSize": "window_size",

}

_FBE_PARENT_ATTRS = {

"RawReference": "raw_reference",

"TransformSize": "transform_size",

"TransformType": "transform_type",

"WindowFunction": "window_type",

"WindowOverlap": "window_overlap",

"WindowSize": "window_size",

}

🤖 Prompt for AI Agents

In dascore/io/prodml/utils.py around lines 48 to 55, the mapping uses "WindowFunction": "window_function" which doesn't match the attrs model that defines window_type, so the window type value is dropped; update the mapping to "WindowFunction": "window_type" so the attribute is populated correctly, and optionally add a transform_type field to the ProdMLFbePatchAttrs model if you want transform_type validated rather than treated as an extra.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

dascore/io/prodml/utils.py (2)
224-229: Still missing raw data units (read from RawData dataset).

Line 225 pulls units from the group, but ProdML files typically store RawDataUnit on the RawData dataset. That leaves data_units empty for raw patches. Please fetch units from node_info.node["RawData"].attrs, falling back to the group only if absent.

Apply this diff:
-    info.update(_get_data_unit_and_type(node_info.node))
+    raw_dataset = node_info.node["RawData"]
+    info.update(_get_data_unit_and_type(raw_dataset))
+    if "data_units" not in info:
+        info.update(_get_data_unit_and_type(node_info.node))
245-248: Convert FBE indices to integers before slicing.

StartLocusIndex / NumberOfLoci often arrive as byte strings or zero-d arrays. Using them directly in distance[start_ind:...] raises TypeError. Decode/collapse first (e.g., int(unbyte(...)) and int(np.asarray(...).item())) before slicing.

Apply this diff:
-    start_ind = node_info.parent_node.attrs.get("StartLocusIndex", 0)
-    total_inds = node_info.parent_node.attrs["NumberOfLoci"]
+    start_raw = node_info.parent_node.attrs.get("StartLocusIndex", 0)
+    total_raw = node_info.parent_node.attrs["NumberOfLoci"]
+    start_ind = int(np.asarray(unbyte(start_raw)) if isinstance(start_raw, (bytes, np.bytes_)) else np.asarray(start_raw).item())
+    total_inds = int(np.asarray(unbyte(total_raw)) if isinstance(total_raw, (bytes, np.bytes_)) else np.asarray(total_raw).item())

🧹 Nitpick comments (1)

dascore/io/dasdae/utils.py (1)
190-197: LGTM! Consider adding a type hint.

The function logic is correct and handles all edge cases properly. The approach of filtering kwargs before checking emptiness is clear and maintainable.

Optionally, you could add a type hint for the kwargs parameter to improve type safety:
-def _kwargs_empty(kwargs) -> bool:
+def _kwargs_empty(kwargs: dict) -> bool:
Or if you want to be more specific:
from typing import Any

def _kwargs_empty(kwargs: dict[str, Any]) -> bool:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed4f198 and 79408d2.

📒 Files selected for processing (2)

dascore/io/dasdae/utils.py (2 hunks)
dascore/io/prodml/utils.py (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

dascore/io/prodml/utils.py (4)

dascore/core/coordmanager.py (5)

get_coord (1039-1058)

step (1068-1070)

update (240-308)

get_coord_manager (1127-1215)

select (594-636)

dascore/core/coords.py (9)

get_coord (1515-1703)

update (738-751)

update (965-967)

data (606-608)

select (323-329)

select (997-1017)

select (1149-1174)

select (1320-1346)

select (1431-1456)

dascore/utils/misc.py (4)

iterate (257-269)

maybe_get_items (438-460)

register_func (35-58)

unbyte (370-374)

dascore/core/attrs.py (4)

PatchAttrs (65-301)

items (181-183)

get (174-179)

update (234-255)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)

GitHub Check: Run benchmarks
GitHub Check: test_code (windows-latest, 3.11)
GitHub Check: test_code (macos-latest, 3.12)
GitHub Check: test_code (windows-latest, 3.12)
GitHub Check: test_code (ubuntu-latest, 3.12)
GitHub Check: test_code (windows-latest, 3.10)
GitHub Check: test_code (macos-latest, 3.10)
GitHub Check: test_code_min_deps (macos-latest, 3.13)
GitHub Check: test_code (ubuntu-latest, 3.11)
GitHub Check: test_code (ubuntu-latest, 3.13)
GitHub Check: test_code (ubuntu-latest, 3.10)
GitHub Check: test_code (windows-latest, 3.13)
GitHub Check: test_code (macos-latest, 3.13)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
GitHub Check: test_code_min_deps (windows-latest, 3.12)
GitHub Check: test_code_min_deps (windows-latest, 3.13)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
GitHub Check: test_code_min_deps (macos-latest, 3.12)

🔇 Additional comments (1)

dascore/io/dasdae/utils.py (1)

15-16: LGTM!

The constant is well-defined with a clear purpose. Using a set is appropriate for efficient membership testing.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

dascore/io/prodml/utils.py (2)

74-85: Rename window_function to window_type to match expected field name.

Past reviews flagged that WindowFunction should map to window_type, not window_function. The class definition should use window_type to align with this expectation.

Apply this diff:

 class ProdMLFbePatchAttrs(ProdMLRawPatchAttrs):
     """Patch attrs for fbe (frequency band extracted) data in Prodml."""
 
     raw_reference: UTF8Str = ""
     transform_size: float = np.nan
     transform_type: UTF8Str = ""
     window_size: int | None = None
-    window_function: UTF8Str = ""
+    window_type: UTF8Str = ""
     window_overlap: int | None = None
     start_frequency: float = 0
     end_frequency: float = np.inf

Also update the mapping at line 52:

 _FBE_PARENT_ATTRS = {
     "RawReference": "raw_reference",
     "TransformSize": "transform_size",
     "TransformType": "transform_type",
-    "WindowFunction": "window_function",
+    "WindowFunction": "window_type",
     "WindowOverlap": "window_overlap",
     "WindowSize": "window_size",
 }

220-231: Read units from RawData dataset for consistency.

Line 225 reads units from the group, but line 228 reads dimensions from the RawData dataset. For consistency and correctness (as flagged in past reviews), units should also be read from the dataset.

Apply this diff:

 @register_func(_NODE_ATTRS_PROCESSORS, key="raw")
 def _get_raw_node_attr_coords(node_info, d_coord, base_info):
     """Get the raw data information."""
     info = dict(base_info)
     t_coord = _get_time_coord(node_info.node)
     info.update(_get_data_unit_and_type(node_info.node))
+    # Override with dataset-level attributes if present
+    info.update(_get_data_unit_and_type(node_info.node["RawData"]))
     coords = dc.get_coord_manager(
         coords={"time": t_coord, "distance": d_coord},
         dims=_get_dims_from_attrs(node_info.node["RawData"].attrs),
     )
     return ProdMLRawPatchAttrs(**info), coords

🧹 Nitpick comments (2)

dascore/io/prodml/utils.py (2)

245-257: Simplify the type conversion logic.

The current type conversion for start_ind and total_inds is unnecessarily verbose. Since unbyte returns a string for bytes, you can simplify the conversion.

Apply this diff:

-    start_ind = int(
-        np.asarray(unbyte(start_raw))
-        if isinstance(start_raw, bytes | np.bytes_)
-        else np.asarray(start_raw).item()
-    )
-    total_inds = int(
-        np.asarray(unbyte(total_raw))
-        if isinstance(total_raw, bytes | np.bytes_)
-        else np.asarray(total_raw).item()
-    )
+    start_ind = int(unbyte(start_raw) if isinstance(start_raw, bytes | np.bytes_) else start_raw)
+    total_inds = int(unbyte(total_raw) if isinstance(total_raw, bytes | np.bytes_) else total_raw)

266-274: Consider replacing assert with explicit exception.

Line 271 uses assert for validation, which can be disabled with Python's -O flag. For production code, prefer raising an explicit exception.

 @register_func(_NODE_DATA_PROCESSORS, key="raw")
 def _get_raw_data(node_info):
     """Get the data from a raw node."""
     node = node_info.node
     data_node = [x for x in list(node) if x.lower().endswith("data")]
-    assert len(data_node) == 1, "more than one data node found."
+    if len(data_node) != 1:
+        msg = f"Expected exactly one data node, found {len(data_node)} in {node_info.name}"
+        raise ValueError(msg)
     data = node[data_node[0]]
     return data

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79408d2 and a0aa431.

📒 Files selected for processing (1)

dascore/io/prodml/utils.py (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

dascore/io/prodml/utils.py (4)

dascore/core/coordmanager.py (5)

get_coord (1039-1058)

step (1068-1070)

keys (1027-1029)

update (240-308)

get_coord_manager (1127-1215)

dascore/core/coords.py (4)

get_coord (1515-1703)

update (738-751)

update (965-967)

data (606-608)

dascore/utils/misc.py (4)

iterate (257-269)

maybe_get_items (438-460)

register_func (35-58)

unbyte (370-374)

dascore/core/attrs.py (4)

PatchAttrs (65-301)

items (181-183)

get (174-179)

update (234-255)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)

GitHub Check: test_code (macos-latest, 3.13)
GitHub Check: test_code (windows-latest, 3.10)
GitHub Check: test_code (windows-latest, 3.13)
GitHub Check: test_code (windows-latest, 3.12)
GitHub Check: test_code (macos-latest, 3.12)
GitHub Check: test_code (macos-latest, 3.10)
GitHub Check: test_code (ubuntu-latest, 3.10)
GitHub Check: test_code (windows-latest, 3.11)
GitHub Check: test_code (ubuntu-latest, 3.12)
GitHub Check: test_code (ubuntu-latest, 3.13)
GitHub Check: test_code (ubuntu-latest, 3.11)
GitHub Check: test_code_min_deps (macos-latest, 3.13)
GitHub Check: test_code_min_deps (windows-latest, 3.13)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
GitHub Check: test_code_min_deps (macos-latest, 3.12)
GitHub Check: Run benchmarks
GitHub Check: test_code_min_deps (windows-latest, 3.12)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)

🔇 Additional comments (4)

dascore/io/prodml/utils.py (4)

27-55: LGTM! Well-structured attribute mappings.

The root and FBE attribute mappings handle multiple naming conventions (e.g., PulseWidthUnit, PulseWidthUnits, PulseWidth.uom) which accommodates different ProdML schema versions.

160-203: LGTM! Well-structured node traversal with empty-data handling.

The iterator-based approach cleanly separates raw and FBE node traversal, and the empty data check (lines 181-182) ensures only valid raw nodes are yielded.

315-329: LGTM! Correct empty-data handling.

The function correctly skips creating patches when selections yield empty data (line 327), which aligns with the PR's objective to return empty spools for out-of-bounds selections.

282-294: LGTM! Single-pass iteration resolves past zip concern.

The refactored single-pass iteration over _yield_data_nodes eliminates the previous concern about missing strict=True on zip.

d-chambers added 3 commits October 11, 2025 20:04

support for prodml fbe

c94a110

re-arrange data size check

e66be72

test slicing passed file end

71c2dba

coderabbitai bot added the IO Work for reading/writing different formats label Oct 11, 2025

coderabbitai bot reviewed Oct 11, 2025

View reviewed changes

rabbit review

7d69ec5

coderabbitai bot added the ready_for_review PR is ready for review label Oct 13, 2025

don't eader load fbe data

ed4f198

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

address rabbit, again

79408d2

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

rabbit, again

a0aa431

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

rabbit 3

f66a73f

d-chambers merged commit a7f8e48 into master Oct 13, 2025
24 checks passed

d-chambers deleted the prodml_fbe branch October 13, 2025 14:34

Prodml fbe #571

Prodml fbe #571

Uh oh!

Conversation

d-chambers commented Oct 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codspeed-hq bot commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #571 will degrade performances by 16.27%

Summary

Benchmarks breakdown

Uh oh!

d-chambers commented Oct 13, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

d-chambers commented Oct 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 11, 2025 •

edited

Loading

codecov bot commented Oct 11, 2025 •

edited

Loading

codspeed-hq bot commented Oct 11, 2025 •

edited

Loading