support and testing for valencia febus files #589

d-chambers · 2025-12-09T03:55:48Z

Description

This PR tries to make the Febus parser more flexible so that it can also handle older (version 1) files. Closes #587.

Checklist

I have (if applicable):

referenced the GitHub issue this PR closes.
documented the new feature with docstrings and/or appropriate doc page.
included tests. See testing guidelines.
added the "ready_for_review" tag once the PR is ready to be reviewed.

Summary by CodeRabbit

New Features
- Added support for the Febus1 data format
- Added valencia_febus_example.h5 example dataset to the registry
Improvements
- More robust Febus time handling with caching and spacing helpers
- Time-extent mismatches now emit warnings instead of failing
- Waterfall plotting uses a centralized Tukey-fence utility for outlier handling
Other
- Consistent warning behavior across the application; tests updated accordingly

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-09T03:55:58Z

Walkthrough

Adds Febus1 to the public Febus IO surface, introduces tukey_fence utility, refactors Febus timing and time-slicing logic with cached helpers, adds module-level UserWarning filtering, appends a data registry entry, and updates tests to exercise the new API and utilities.

Changes

Cohort / File(s)	Change Summary
Febus IO public API `dascore/io/febus/__init__.py`, `dascore/io/febus/core.py`	Exposes `Febus1` alongside `Febus2` in the Febus package; minor docstring capitalization fix for `Febus1`.
Febus timing & time-slicing refactor `dascore/io/febus/utils.py`	Adds cached helpers `_get_block_time(feb)` and `_get_sample_spacing(feb, n_time_samps)`; changes `_get_time_overlap_samples` signature to `(feb, n_time_samps, tstep=None)` and updates `_get_time_coord`, `_get_data_new_cm`, and time-filtering logic; replaces strict extent assertion with a UserWarning and adjusts stop-index logic.
Outlier utility & viz usage `dascore/utils/misc.py`, `dascore/viz/waterfall.py`	Adds `tukey_fence(data, fence_multiplier=1.5)` to compute Tukey fences; replaces inline IQR fence computation in waterfall with `tukey_fence` and removes unused IQR constant.
Module/test warning behavior `dascore/__init__.py`, `tests/conftest.py`	Adds `warnings` import and a `warnings.filterwarnings("default", category=UserWarning)` call to ensure UserWarnings are emitted consistently.
Data registry & tests updates `dascore/data_registry.txt`, `tests/test_io/test_common_io.py`, `tests/test_utils/test_misc.py`, `tests/conftest.py`	Appends `valencia_febus_example.h5` entry to data registry; updates tests to include `Febus1` in COMMON_IO_READ_TESTS and to import/use `tukey_fence`; adjusts test iteration to use `iterate(data_files)` and adds tests for Tukey fence behavior.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The PR description identifies the problem (#587), mentions the closing issue reference, but the checklist shows no items marked complete despite making changes that should address all categories.	Check the corresponding boxes to accurately reflect what was completed: docstrings added, tests included, and whether ready_for_review tag was added.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main objective of the PR: adding support for and testing of Valencia Febus files (specifically version 1 format).
Linked Issues check	✅ Passed	Changes implement handling for missing BlockRate attribute and provide fallback timing utilities, addressing the core KeyError in issue #587 when reading Valencia (version 1) Febus files.
Out of Scope Changes check	✅ Passed	All changes are scoped to Febus file handling or supporting utilities; warning management and visualization tukey_fence integration are reasonable supporting changes for stability.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch valencia_febus

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-12-09T03:59:36Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.92%. Comparing base (1dccf58) to head (b871d03).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #589   +/-   ##
=======================================
  Coverage   99.92%   99.92%           
=======================================
  Files         129      129           
  Lines       11241    11272   +31     
=======================================
+ Hits        11233    11264   +31     
  Misses          8        8

Flag	Coverage Δ
unittests	`99.92% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test_io/test_common_io.py (1)
252-264: Use exact type comparison to test both IO format directions

The isinstance check creates asymmetric testing: because Febus1 subclasses Febus2, files from Febus1 are never tested against Febus2's format detection. Replace with exact type comparison:
-            if isinstance(other_io, type(io_instance)):
+            if type(other_io) is type(io_instance):
                 continue
This ensures each IO class is tested against all others, regardless of inheritance relationships.

🧹 Nitpick comments (5)

dascore/__init__.py (1)
22-24: Consider scoping the warning filter more narrowly.

This module-level filter affects all UserWarning instances globally, including warnings from third-party libraries. While the intent to reduce warning spam is good, this could inadvertently suppress important warnings from other packages that users depend on.

Consider using a more targeted approach, such as filtering only warnings from DASCore modules:
warnings.filterwarnings("once", category=UserWarning, module=r"dascore\..*")
dascore/io/febus/utils.py (2)
155-168: Parameter naming is misleading.

The parameter n_blocks is confusing because:

At line 183, n_time_samps (time samples per block) is passed

At line 311, data_shape[1] (also time samples per block) is passed

The parameter actually represents the number of time samples per block, not the number of blocks. Consider renaming for clarity:
-def _get_time_overlap_samples(feb, n_blocks, tstep=None):
+def _get_time_overlap_samples(feb, n_time_samps, tstep=None):
     """Determine the number of redundant samples in the time dimension."""
-    tstep = tstep if tstep is not None else _get_sample_spacing(feb, n_blocks)
+    tstep = tstep if tstep is not None else _get_sample_spacing(feb, n_time_samps)
30-57: Cache decorator with HDF5 objects won't be reused across file opens.

The @cache decorator hashes the feb namedtuple by object identity. Since h5py Group/Dataset objects are only hashable by id, each file open creates new instances with different hashes. The cache won't benefit repeated file reads—only multiple calls within a single file's processing. Given that these are inexpensive operations, consider removing @cache or switching to a file-path-based cache key if performance becomes critical.
tests/test_io/test_common_io.py (1)

135-141: Using iterate in _get_flat_io_test avoids string-iteration pitfalls

Wrapping fetch_name_list in iterate() makes _get_flat_io_test robust to both scalar and sequence entries in COMMON_IO_READ_TESTS, preventing accidental iteration over characters for plain string values.

dascore/utils/misc.py (1)

931-941: tukey_fence implementation matches expected IQR‑fence behavior

The utility correctly computes Tukey fences using nan‑aware percentiles and clamps to the observed min/max; it returns a 2‑element NumPy array suitable for consumers like _get_scale. Consider adding small unit tests (e.g., constant data, simple ranges, all‑NaN) to lock in behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ae947a and b2bbc3f.

📒 Files selected for processing (9)

dascore/__init__.py (2 hunks)
dascore/data_registry.txt (1 hunks)
dascore/io/febus/__init__.py (1 hunks)
dascore/io/febus/core.py (1 hunks)
dascore/io/febus/utils.py (7 hunks)
dascore/utils/misc.py (1 hunks)
dascore/viz/waterfall.py (2 hunks)
tests/conftest.py (2 hunks)
tests/test_io/test_common_io.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

dascore/viz/waterfall.py (1)

dascore/utils/misc.py (1)

tukey_fence (931-941)

tests/test_io/test_common_io.py (2)

dascore/io/febus/core.py (2)

Febus1 (98-104)

Febus2 (46-95)

dascore/utils/misc.py (1)

iterate (257-269)

dascore/io/febus/__init__.py (1)

dascore/io/febus/core.py (2)

Febus1 (98-104)

Febus2 (46-95)

🪛 Ruff (0.14.8)

dascore/io/febus/utils.py

198-198: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)

GitHub Check: test_code_min_deps (windows-latest, 3.13)
GitHub Check: test_code_min_deps (windows-latest, 3.12)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
GitHub Check: test_code_min_deps (macos-latest, 3.13)
GitHub Check: test_code (macos-latest, 3.11)
GitHub Check: test_code (windows-latest, 3.12)
GitHub Check: test_code (windows-latest, 3.13)
GitHub Check: test_code (macos-latest, 3.10)
GitHub Check: test_code (windows-latest, 3.10)
GitHub Check: test_code (macos-latest, 3.12)
GitHub Check: test_code (macos-latest, 3.13)
GitHub Check: test_code (windows-latest, 3.11)
GitHub Check: test_code (ubuntu-latest, 3.10)
GitHub Check: test_code (ubuntu-latest, 3.12)
GitHub Check: test_code (ubuntu-latest, 3.13)
GitHub Check: test_code (ubuntu-latest, 3.11)
GitHub Check: Run benchmarks

🔇 Additional comments (8)

dascore/data_registry.txt (1)

35-35: LGTM!

The new data registry entry for the Valencia Febus example file follows the established format and is consistent with existing entries.

tests/conftest.py (1)

7-7: LGTM!

Good approach: the production code uses "once" filter to avoid spamming users, while tests override to "default" to ensure warning assertions are deterministic and not affected by execution order.

Also applies to: 39-41

dascore/io/febus/utils.py (2)

60-79: LGTM with same caching caveat.

The logic for selecting between the two potential time steps is sound. The same HDF5 object caching concern from _get_block_time applies here.

311-315: LGTM!

The updated overlap handling correctly uses the new function signature and properly handles the edge case where excess_rows == 0 (resulting in slice(0, None) which captures all rows).

dascore/io/febus/core.py (1)

98-104: LGTM!

Minor docstring capitalization fix for consistency.

dascore/io/febus/__init__.py (1)

9-9: Expose Febus1 via package API looks correct

Importing Febus1 alongside Febus2 cleanly exposes the legacy Febus v1 format without altering existing behavior.

dascore/viz/waterfall.py (1)

15-15: Centralizing Tukey fence logic via tukey_fence is consistent and maintains behavior

Importing and delegating to tukey_fence(data) for the default relative/no-scale case keeps the 1.5·IQR fence semantics while consolidating logic in a single utility; the returned 2‑element array integrates cleanly with the existing scale handling.

Also applies to: 78-79

tests/test_io/test_common_io.py (1)

30-31: Adding Febus1 to common IO tests is appropriate

Importing Febus1 and registering it in COMMON_IO_READ_TESTS with valencia_febus_example.h5 ensures the legacy Febus v1 path is exercised by the shared read/scan/slice test suite, which aligns with the Valencia support objective.

Also applies to: 59-86

dascore/io/febus/utils.py

codspeed-hq · 2025-12-09T04:26:17Z

CodSpeed Performance Report

Merging #589 will not alter performance

_{Comparing valencia_febus (b871d03) with master (7ae947a)}

Summary

✅ 54 untouched
⏩ 2 skipped¹

2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2bbc3f and b871d03.

📒 Files selected for processing (3)

dascore/__init__.py (2 hunks)
dascore/io/febus/utils.py (7 hunks)
tests/test_utils/test_misc.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

tests/test_utils/test_misc.py (1)

dascore/utils/misc.py (2)

suppress_warnings (82-94)

tukey_fence (931-941)

dascore/io/febus/utils.py (1)

dascore/utils/misc.py (2)

tukey_fence (931-941)

_maybe_unpack (463-468)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)

GitHub Check: test_code (macos-latest, 3.13)
GitHub Check: test_code (ubuntu-latest, 3.12)
GitHub Check: test_code (windows-latest, 3.12)
GitHub Check: test_code (macos-latest, 3.11)
GitHub Check: test_code (windows-latest, 3.11)
GitHub Check: test_code (ubuntu-latest, 3.11)
GitHub Check: test_code (macos-latest, 3.12)
GitHub Check: test_code (ubuntu-latest, 3.13)
GitHub Check: test_code (windows-latest, 3.13)
GitHub Check: test_code (windows-latest, 3.10)
GitHub Check: test_code (ubuntu-latest, 3.10)
GitHub Check: Run benchmarks
GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
GitHub Check: test_code_min_deps (windows-latest, 3.12)
GitHub Check: test_code_min_deps (macos-latest, 3.13)
GitHub Check: test_code_min_deps (macos-latest, 3.12)
GitHub Check: test_code_min_deps (windows-latest, 3.13)

🔇 Additional comments (10)

dascore/__init__.py (2)

3-5: LGTM!

The warnings import is properly placed and necessary for the module-level warning configuration.

22-24: "once" is more restrictive than "default" and hides warnings globally.

"once" prints the first occurrence of a warning only, regardless of line number, whereas "default" prints the first occurrence per (module + line number) combination. This means warnings matching the same message will be suppressed across the entire application with "once", not per location. If different modules emit the same UserWarning, only the first occurrence anywhere in dascore will be shown. Consider whether "default" better preserves visibility into warnings from different code paths.

Likely an incorrect or invalid review comment.

tests/test_utils/test_misc.py (2)

27-29: LGTM!

The new imports for suppress_warnings and tukey_fence are properly integrated into the alphabetically sorted import list.

261-296: LGTM! Comprehensive test coverage for Tukey fence.

The test class provides excellent coverage of the tukey_fence function:

Constant data edge case

Simple range with proper fence clamping

All-NaN handling with appropriate warning suppression

Outlier detection with correct bound calculation

The test logic and expected values are mathematically correct.

dascore/io/febus/utils.py (6)

5-20: LGTM on imports.

The additions of warnings, cache, and tukey_fence are appropriate for the new Febus v1 file handling.

30-57: Good fallback handling for missing BlockRate.

The graceful handling of missing BlockRate attribute (line 36) properly addresses the Valencia file issue. The use of tukey_fence to handle timing variability is a sensible approach for real-world data.

One consideration: @cache is unbounded and uses argument identity/hash. Since feb contains HDF5 object references that hash by identity, this works correctly within a single file session. If memory becomes a concern with many files, consider @lru_cache(maxsize=...) instead.

155-168: LGTM on overlap calculation refactor.

The updated signature makes the function more flexible by allowing pre-computed tstep to be passed in, reducing redundant calculations. The even-number rounding for expected_samples ensures symmetric overlap trimming.

190-198: Good change from assertion to warning.

Converting the extent mismatch from a hard assertion to a UserWarning allows Valencia files to be processed despite metadata inconsistencies. The stacklevel=2 is correctly set per the previous review feedback.

289-292: LGTM on edge case handling.

The conditional (1 if len(times) else 0) correctly handles the case where no time blocks match the filter, preventing an off-by-one error in the slice calculation.

311-315: LGTM on updated call site.

The updated call to _get_time_overlap_samples correctly passes data_shape[1] as the time dimension, and the slice handling properly addresses the edge case where skip_rows == 0.

coderabbitai · 2025-12-18T04:35:12Z

dascore/io/febus/utils.py

+@cache
+def _get_sample_spacing(feb: _FebusSlice, n_time_samps: int):
+    """
+    Determine the temporal sample spacing (in seconds).
+    """
+    # Note: This is a bit dicey, but we are trying to account for variability
+    # seen in real Febus Files. In some files the zone Spacing attr indicates one
+    # sample rate, while zone.attrs['SamplingRate'] indicates another. It
+    # varies as to which one is actually right, so we try to figure that
+    # out here.
+    ts_1 = feb.zone.attrs["Spacing"][1] / 1_000  # value in ms, convert to s.
+    # In most cases sample_rate is either bogus or in Hz. It isn't even mentioned
+    # in some Febus documentation.
+    ts_2 = _maybe_unpack(1.0 / feb.zone.attrs["SamplingRate"])
+    # Get the block time. This doesn't account for overlap, so it wont be exact.
+    block_time = _get_block_time(feb)
+    # Get candidate times, return the closet to the block_time.
+    ts_array = np.array([ts_1, ts_2])
+    block_time_array = ts_array * n_time_samps
+    return ts_array[np.argmin(np.abs(block_time_array - block_time))]


⚠️ Potential issue | 🟡 Minor

Potential division by zero on line 73.

If SamplingRate is 0 (or contains 0), the division 1.0 / feb.zone.attrs["SamplingRate"] will raise ZeroDivisionError. Consider adding a guard or using np.inf as a fallback.

🔎 Suggested fix:

- ts_2 = _maybe_unpack(1.0 / feb.zone.attrs["SamplingRate"]) + sr = feb.zone.attrs["SamplingRate"] + ts_2 = _maybe_unpack(1.0 / sr) if np.all(sr != 0) else np.inf

🤖 Prompt for AI Agents

In dascore/io/febus/utils.py around lines 60 to 79, the computation ts_2 = _maybe_unpack(1.0 / feb.zone.attrs["SamplingRate"]) can raise ZeroDivisionError if SamplingRate is 0 (or contains zero); guard against zero or missing SamplingRate by checking the value(s) before dividing and use a safe fallback (e.g., set ts_2 to np.inf or np.array of np.inf matching shape) when SamplingRate is 0 or falsy, or wrap the division in a try/except and assign np.inf on error; ensure the fallback works with the later array math (ts_array * n_time_samps) so types/shapes remain compatible.

support and testing for valencia febus files

b2bbc3f

coderabbitai bot added bug Something isn't working IO Work for reading/writing different formats labels Dec 9, 2025

coderabbitai bot reviewed Dec 9, 2025

View reviewed changes

dascore/io/febus/utils.py Outdated Show resolved Hide resolved

address review

b871d03

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

d-chambers merged commit 1ebc98f into master Dec 18, 2025
23 checks passed

d-chambers deleted the valencia_febus branch December 18, 2025 04:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support and testing for valencia febus files #589

support and testing for valencia febus files #589

d-chambers commented Dec 9, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 9, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 9, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

codspeed-hq bot commented Dec 9, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

support and testing for valencia febus files #589

support and testing for valencia febus files #589

Conversation

d-chambers commented Dec 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codspeed-hq bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #589 will not alter performance

Summary

Footnotes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

d-chambers commented Dec 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 9, 2025 •

edited

Loading

codecov bot commented Dec 9, 2025 •

edited

Loading

codspeed-hq bot commented Dec 9, 2025 •

edited

Loading