Skip to content

Conversation

@d-chambers
Copy link
Contributor

@d-chambers d-chambers commented Dec 9, 2025

Description

This PR tries to make the Febus parser more flexible so that it can also handle older (version 1) files. Closes #587.

Checklist

I have (if applicable):

  • referenced the GitHub issue this PR closes.
  • documented the new feature with docstrings and/or appropriate doc page.
  • included tests. See testing guidelines.
  • added the "ready_for_review" tag once the PR is ready to be reviewed.

Summary by CodeRabbit

  • New Features

    • Added support for the Febus1 data format
    • Added valencia_febus_example.h5 example dataset to the registry
  • Improvements

    • More robust Febus time handling with caching and spacing helpers
    • Time-extent mismatches now emit warnings instead of failing
    • Waterfall plotting uses a centralized Tukey-fence utility for outlier handling
  • Other

    • Consistent warning behavior across the application; tests updated accordingly

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 9, 2025

Walkthrough

Adds Febus1 to the public Febus IO surface, introduces tukey_fence utility, refactors Febus timing and time-slicing logic with cached helpers, adds module-level UserWarning filtering, appends a data registry entry, and updates tests to exercise the new API and utilities.

Changes

Cohort / File(s) Change Summary
Febus IO public API
dascore/io/febus/__init__.py, dascore/io/febus/core.py
Exposes Febus1 alongside Febus2 in the Febus package; minor docstring capitalization fix for Febus1.
Febus timing & time-slicing refactor
dascore/io/febus/utils.py
Adds cached helpers _get_block_time(feb) and _get_sample_spacing(feb, n_time_samps); changes _get_time_overlap_samples signature to (feb, n_time_samps, tstep=None) and updates _get_time_coord, _get_data_new_cm, and time-filtering logic; replaces strict extent assertion with a UserWarning and adjusts stop-index logic.
Outlier utility & viz usage
dascore/utils/misc.py, dascore/viz/waterfall.py
Adds tukey_fence(data, fence_multiplier=1.5) to compute Tukey fences; replaces inline IQR fence computation in waterfall with tukey_fence and removes unused IQR constant.
Module/test warning behavior
dascore/__init__.py, tests/conftest.py
Adds warnings import and a warnings.filterwarnings("default", category=UserWarning) call to ensure UserWarnings are emitted consistently.
Data registry & tests updates
dascore/data_registry.txt, tests/test_io/test_common_io.py, tests/test_utils/test_misc.py, tests/conftest.py
Appends valencia_febus_example.h5 entry to data registry; updates tests to include Febus1 in COMMON_IO_READ_TESTS and to import/use tukey_fence; adjusts test iteration to use iterate(data_files) and adds tests for Tukey fence behavior.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description identifies the problem (#587), mentions the closing issue reference, but the checklist shows no items marked complete despite making changes that should address all categories. Check the corresponding boxes to accurately reflect what was completed: docstrings added, tests included, and whether ready_for_review tag was added.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective of the PR: adding support for and testing of Valencia Febus files (specifically version 1 format).
Linked Issues check ✅ Passed Changes implement handling for missing BlockRate attribute and provide fallback timing utilities, addressing the core KeyError in issue #587 when reading Valencia (version 1) Febus files.
Out of Scope Changes check ✅ Passed All changes are scoped to Febus file handling or supporting utilities; warning management and visualization tukey_fence integration are reasonable supporting changes for stability.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch valencia_febus

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai bot added bug Something isn't working IO Work for reading/writing different formats labels Dec 9, 2025
@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.92%. Comparing base (1dccf58) to head (b871d03).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #589   +/-   ##
=======================================
  Coverage   99.92%   99.92%           
=======================================
  Files         129      129           
  Lines       11241    11272   +31     
=======================================
+ Hits        11233    11264   +31     
  Misses          8        8           
Flag Coverage Δ
unittests 99.92% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_io/test_common_io.py (1)

252-264: Use exact type comparison to test both IO format directions

The isinstance check creates asymmetric testing: because Febus1 subclasses Febus2, files from Febus1 are never tested against Febus2's format detection. Replace with exact type comparison:

-            if isinstance(other_io, type(io_instance)):
+            if type(other_io) is type(io_instance):
                 continue

This ensures each IO class is tested against all others, regardless of inheritance relationships.

🧹 Nitpick comments (5)
dascore/__init__.py (1)

22-24: Consider scoping the warning filter more narrowly.

This module-level filter affects all UserWarning instances globally, including warnings from third-party libraries. While the intent to reduce warning spam is good, this could inadvertently suppress important warnings from other packages that users depend on.

Consider using a more targeted approach, such as filtering only warnings from DASCore modules:

warnings.filterwarnings("once", category=UserWarning, module=r"dascore\..*")
dascore/io/febus/utils.py (2)

155-168: Parameter naming is misleading.

The parameter n_blocks is confusing because:

  • At line 183, n_time_samps (time samples per block) is passed
  • At line 311, data_shape[1] (also time samples per block) is passed

The parameter actually represents the number of time samples per block, not the number of blocks. Consider renaming for clarity:

-def _get_time_overlap_samples(feb, n_blocks, tstep=None):
+def _get_time_overlap_samples(feb, n_time_samps, tstep=None):
     """Determine the number of redundant samples in the time dimension."""
-    tstep = tstep if tstep is not None else _get_sample_spacing(feb, n_blocks)
+    tstep = tstep if tstep is not None else _get_sample_spacing(feb, n_time_samps)

30-57: Cache decorator with HDF5 objects won't be reused across file opens.

The @cache decorator hashes the feb namedtuple by object identity. Since h5py Group/Dataset objects are only hashable by id, each file open creates new instances with different hashes. The cache won't benefit repeated file reads—only multiple calls within a single file's processing. Given that these are inexpensive operations, consider removing @cache or switching to a file-path-based cache key if performance becomes critical.

tests/test_io/test_common_io.py (1)

135-141: Using iterate in _get_flat_io_test avoids string-iteration pitfalls

Wrapping fetch_name_list in iterate() makes _get_flat_io_test robust to both scalar and sequence entries in COMMON_IO_READ_TESTS, preventing accidental iteration over characters for plain string values.

dascore/utils/misc.py (1)

931-941: tukey_fence implementation matches expected IQR‑fence behavior

The utility correctly computes Tukey fences using nan‑aware percentiles and clamps to the observed min/max; it returns a 2‑element NumPy array suitable for consumers like _get_scale. Consider adding small unit tests (e.g., constant data, simple ranges, all‑NaN) to lock in behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ae947a and b2bbc3f.

📒 Files selected for processing (9)
  • dascore/__init__.py (2 hunks)
  • dascore/data_registry.txt (1 hunks)
  • dascore/io/febus/__init__.py (1 hunks)
  • dascore/io/febus/core.py (1 hunks)
  • dascore/io/febus/utils.py (7 hunks)
  • dascore/utils/misc.py (1 hunks)
  • dascore/viz/waterfall.py (2 hunks)
  • tests/conftest.py (2 hunks)
  • tests/test_io/test_common_io.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
dascore/viz/waterfall.py (1)
dascore/utils/misc.py (1)
  • tukey_fence (931-941)
tests/test_io/test_common_io.py (2)
dascore/io/febus/core.py (2)
  • Febus1 (98-104)
  • Febus2 (46-95)
dascore/utils/misc.py (1)
  • iterate (257-269)
dascore/io/febus/__init__.py (1)
dascore/io/febus/core.py (2)
  • Febus1 (98-104)
  • Febus2 (46-95)
🪛 Ruff (0.14.8)
dascore/io/febus/utils.py

198-198: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.13)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.13)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.13)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: Run benchmarks
🔇 Additional comments (8)
dascore/data_registry.txt (1)

35-35: LGTM!

The new data registry entry for the Valencia Febus example file follows the established format and is consistent with existing entries.

tests/conftest.py (1)

7-7: LGTM!

Good approach: the production code uses "once" filter to avoid spamming users, while tests override to "default" to ensure warning assertions are deterministic and not affected by execution order.

Also applies to: 39-41

dascore/io/febus/utils.py (2)

60-79: LGTM with same caching caveat.

The logic for selecting between the two potential time steps is sound. The same HDF5 object caching concern from _get_block_time applies here.


311-315: LGTM!

The updated overlap handling correctly uses the new function signature and properly handles the edge case where excess_rows == 0 (resulting in slice(0, None) which captures all rows).

dascore/io/febus/core.py (1)

98-104: LGTM!

Minor docstring capitalization fix for consistency.

dascore/io/febus/__init__.py (1)

9-9: Expose Febus1 via package API looks correct

Importing Febus1 alongside Febus2 cleanly exposes the legacy Febus v1 format without altering existing behavior.

dascore/viz/waterfall.py (1)

15-15: Centralizing Tukey fence logic via tukey_fence is consistent and maintains behavior

Importing and delegating to tukey_fence(data) for the default relative/no-scale case keeps the 1.5·IQR fence semantics while consolidating logic in a single utility; the returned 2‑element array integrates cleanly with the existing scale handling.

Also applies to: 78-79

tests/test_io/test_common_io.py (1)

30-31: Adding Febus1 to common IO tests is appropriate

Importing Febus1 and registering it in COMMON_IO_READ_TESTS with valencia_febus_example.h5 ensures the legacy Febus v1 path is exercised by the shared read/scan/slice test suite, which aligns with the Valencia support objective.

Also applies to: 59-86

@codspeed-hq
Copy link

codspeed-hq bot commented Dec 9, 2025

CodSpeed Performance Report

Merging #589 will not alter performance

Comparing valencia_febus (b871d03) with master (7ae947a)

Summary

✅ 54 untouched
⏩ 2 skipped1

Footnotes

  1. 2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2bbc3f and b871d03.

📒 Files selected for processing (3)
  • dascore/__init__.py (2 hunks)
  • dascore/io/febus/utils.py (7 hunks)
  • tests/test_utils/test_misc.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_utils/test_misc.py (1)
dascore/utils/misc.py (2)
  • suppress_warnings (82-94)
  • tukey_fence (931-941)
dascore/io/febus/utils.py (1)
dascore/utils/misc.py (2)
  • tukey_fence (931-941)
  • _maybe_unpack (463-468)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: test_code (macos-latest, 3.13)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.13)
  • GitHub Check: test_code (windows-latest, 3.13)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: Run benchmarks
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
🔇 Additional comments (10)
dascore/__init__.py (2)

3-5: LGTM!

The warnings import is properly placed and necessary for the module-level warning configuration.


22-24: "once" is more restrictive than "default" and hides warnings globally.

"once" prints the first occurrence of a warning only, regardless of line number, whereas "default" prints the first occurrence per (module + line number) combination. This means warnings matching the same message will be suppressed across the entire application with "once", not per location. If different modules emit the same UserWarning, only the first occurrence anywhere in dascore will be shown. Consider whether "default" better preserves visibility into warnings from different code paths.

Likely an incorrect or invalid review comment.

tests/test_utils/test_misc.py (2)

27-29: LGTM!

The new imports for suppress_warnings and tukey_fence are properly integrated into the alphabetically sorted import list.


261-296: LGTM! Comprehensive test coverage for Tukey fence.

The test class provides excellent coverage of the tukey_fence function:

  • Constant data edge case
  • Simple range with proper fence clamping
  • All-NaN handling with appropriate warning suppression
  • Outlier detection with correct bound calculation

The test logic and expected values are mathematically correct.

dascore/io/febus/utils.py (6)

5-20: LGTM on imports.

The additions of warnings, cache, and tukey_fence are appropriate for the new Febus v1 file handling.


30-57: Good fallback handling for missing BlockRate.

The graceful handling of missing BlockRate attribute (line 36) properly addresses the Valencia file issue. The use of tukey_fence to handle timing variability is a sensible approach for real-world data.

One consideration: @cache is unbounded and uses argument identity/hash. Since feb contains HDF5 object references that hash by identity, this works correctly within a single file session. If memory becomes a concern with many files, consider @lru_cache(maxsize=...) instead.


155-168: LGTM on overlap calculation refactor.

The updated signature makes the function more flexible by allowing pre-computed tstep to be passed in, reducing redundant calculations. The even-number rounding for expected_samples ensures symmetric overlap trimming.


190-198: Good change from assertion to warning.

Converting the extent mismatch from a hard assertion to a UserWarning allows Valencia files to be processed despite metadata inconsistencies. The stacklevel=2 is correctly set per the previous review feedback.


289-292: LGTM on edge case handling.

The conditional (1 if len(times) else 0) correctly handles the case where no time blocks match the filter, preventing an off-by-one error in the slice calculation.


311-315: LGTM on updated call site.

The updated call to _get_time_overlap_samples correctly passes data_shape[1] as the time dimension, and the slice handling properly addresses the edge case where skip_rows == 0.

Comment on lines +60 to +79
@cache
def _get_sample_spacing(feb: _FebusSlice, n_time_samps: int):
"""
Determine the temporal sample spacing (in seconds).
"""
# Note: This is a bit dicey, but we are trying to account for variability
# seen in real Febus Files. In some files the zone Spacing attr indicates one
# sample rate, while zone.attrs['SamplingRate'] indicates another. It
# varies as to which one is actually right, so we try to figure that
# out here.
ts_1 = feb.zone.attrs["Spacing"][1] / 1_000 # value in ms, convert to s.
# In most cases sample_rate is either bogus or in Hz. It isn't even mentioned
# in some Febus documentation.
ts_2 = _maybe_unpack(1.0 / feb.zone.attrs["SamplingRate"])
# Get the block time. This doesn't account for overlap, so it wont be exact.
block_time = _get_block_time(feb)
# Get candidate times, return the closet to the block_time.
ts_array = np.array([ts_1, ts_2])
block_time_array = ts_array * n_time_samps
return ts_array[np.argmin(np.abs(block_time_array - block_time))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential division by zero on line 73.

If SamplingRate is 0 (or contains 0), the division 1.0 / feb.zone.attrs["SamplingRate"] will raise ZeroDivisionError. Consider adding a guard or using np.inf as a fallback.

🔎 Suggested fix:
-    ts_2 = _maybe_unpack(1.0 / feb.zone.attrs["SamplingRate"])
+    sr = feb.zone.attrs["SamplingRate"]
+    ts_2 = _maybe_unpack(1.0 / sr) if np.all(sr != 0) else np.inf
🤖 Prompt for AI Agents
In dascore/io/febus/utils.py around lines 60 to 79, the computation ts_2 =
_maybe_unpack(1.0 / feb.zone.attrs["SamplingRate"]) can raise ZeroDivisionError
if SamplingRate is 0 (or contains zero); guard against zero or missing
SamplingRate by checking the value(s) before dividing and use a safe fallback
(e.g., set ts_2 to np.inf or np.array of np.inf matching shape) when
SamplingRate is 0 or falsy, or wrap the division in a try/except and assign
np.inf on error; ensure the fallback works with the later array math (ts_array *
n_time_samps) so types/shapes remain compatible.

@d-chambers d-chambers merged commit 1ebc98f into master Dec 18, 2025
23 checks passed
@d-chambers d-chambers deleted the valencia_febus branch December 18, 2025 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working IO Work for reading/writing different formats

Projects

None yet

Development

Successfully merging this pull request may close these issues.

can't create index file for PubDAS's Valencia data

2 participants