-
Notifications
You must be signed in to change notification settings - Fork 28
support and testing for valencia febus files #589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds Febus1 to the public Febus IO surface, introduces tukey_fence utility, refactors Febus timing and time-slicing logic with cached helpers, adds module-level UserWarning filtering, appends a data registry entry, and updates tests to exercise the new API and utilities. Changes
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #589 +/- ##
=======================================
Coverage 99.92% 99.92%
=======================================
Files 129 129
Lines 11241 11272 +31
=======================================
+ Hits 11233 11264 +31
Misses 8 8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_io/test_common_io.py (1)
252-264: Use exact type comparison to test both IO format directionsThe
isinstancecheck creates asymmetric testing: becauseFebus1subclassesFebus2, files fromFebus1are never tested againstFebus2's format detection. Replace with exact type comparison:- if isinstance(other_io, type(io_instance)): + if type(other_io) is type(io_instance): continueThis ensures each IO class is tested against all others, regardless of inheritance relationships.
🧹 Nitpick comments (5)
dascore/__init__.py (1)
22-24: Consider scoping the warning filter more narrowly.This module-level filter affects all
UserWarninginstances globally, including warnings from third-party libraries. While the intent to reduce warning spam is good, this could inadvertently suppress important warnings from other packages that users depend on.Consider using a more targeted approach, such as filtering only warnings from DASCore modules:
warnings.filterwarnings("once", category=UserWarning, module=r"dascore\..*")dascore/io/febus/utils.py (2)
155-168: Parameter naming is misleading.The parameter
n_blocksis confusing because:
- At line 183,
n_time_samps(time samples per block) is passed- At line 311,
data_shape[1](also time samples per block) is passedThe parameter actually represents the number of time samples per block, not the number of blocks. Consider renaming for clarity:
-def _get_time_overlap_samples(feb, n_blocks, tstep=None): +def _get_time_overlap_samples(feb, n_time_samps, tstep=None): """Determine the number of redundant samples in the time dimension.""" - tstep = tstep if tstep is not None else _get_sample_spacing(feb, n_blocks) + tstep = tstep if tstep is not None else _get_sample_spacing(feb, n_time_samps)
30-57: Cache decorator with HDF5 objects won't be reused across file opens.The
@cachedecorator hashes thefebnamedtuple by object identity. Since h5py Group/Dataset objects are only hashable by id, each file open creates new instances with different hashes. The cache won't benefit repeated file reads—only multiple calls within a single file's processing. Given that these are inexpensive operations, consider removing@cacheor switching to a file-path-based cache key if performance becomes critical.tests/test_io/test_common_io.py (1)
135-141: Usingiteratein_get_flat_io_testavoids string-iteration pitfallsWrapping
fetch_name_listiniterate()makes_get_flat_io_testrobust to both scalar and sequence entries inCOMMON_IO_READ_TESTS, preventing accidental iteration over characters for plain string values.dascore/utils/misc.py (1)
931-941:tukey_fenceimplementation matches expected IQR‑fence behaviorThe utility correctly computes Tukey fences using nan‑aware percentiles and clamps to the observed min/max; it returns a 2‑element NumPy array suitable for consumers like
_get_scale. Consider adding small unit tests (e.g., constant data, simple ranges, all‑NaN) to lock in behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
dascore/__init__.py(2 hunks)dascore/data_registry.txt(1 hunks)dascore/io/febus/__init__.py(1 hunks)dascore/io/febus/core.py(1 hunks)dascore/io/febus/utils.py(7 hunks)dascore/utils/misc.py(1 hunks)dascore/viz/waterfall.py(2 hunks)tests/conftest.py(2 hunks)tests/test_io/test_common_io.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
dascore/viz/waterfall.py (1)
dascore/utils/misc.py (1)
tukey_fence(931-941)
tests/test_io/test_common_io.py (2)
dascore/io/febus/core.py (2)
Febus1(98-104)Febus2(46-95)dascore/utils/misc.py (1)
iterate(257-269)
dascore/io/febus/__init__.py (1)
dascore/io/febus/core.py (2)
Febus1(98-104)Febus2(46-95)
🪛 Ruff (0.14.8)
dascore/io/febus/utils.py
198-198: No explicit stacklevel keyword argument found
Set stacklevel=2
(B028)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: test_code_min_deps (windows-latest, 3.13)
- GitHub Check: test_code_min_deps (windows-latest, 3.12)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
- GitHub Check: test_code_min_deps (macos-latest, 3.13)
- GitHub Check: test_code (macos-latest, 3.11)
- GitHub Check: test_code (windows-latest, 3.12)
- GitHub Check: test_code (windows-latest, 3.13)
- GitHub Check: test_code (macos-latest, 3.10)
- GitHub Check: test_code (windows-latest, 3.10)
- GitHub Check: test_code (macos-latest, 3.12)
- GitHub Check: test_code (macos-latest, 3.13)
- GitHub Check: test_code (windows-latest, 3.11)
- GitHub Check: test_code (ubuntu-latest, 3.10)
- GitHub Check: test_code (ubuntu-latest, 3.12)
- GitHub Check: test_code (ubuntu-latest, 3.13)
- GitHub Check: test_code (ubuntu-latest, 3.11)
- GitHub Check: Run benchmarks
🔇 Additional comments (8)
dascore/data_registry.txt (1)
35-35: LGTM!The new data registry entry for the Valencia Febus example file follows the established format and is consistent with existing entries.
tests/conftest.py (1)
7-7: LGTM!Good approach: the production code uses
"once"filter to avoid spamming users, while tests override to"default"to ensure warning assertions are deterministic and not affected by execution order.Also applies to: 39-41
dascore/io/febus/utils.py (2)
60-79: LGTM with same caching caveat.The logic for selecting between the two potential time steps is sound. The same HDF5 object caching concern from
_get_block_timeapplies here.
311-315: LGTM!The updated overlap handling correctly uses the new function signature and properly handles the edge case where
excess_rows == 0(resulting inslice(0, None)which captures all rows).dascore/io/febus/core.py (1)
98-104: LGTM!Minor docstring capitalization fix for consistency.
dascore/io/febus/__init__.py (1)
9-9: ExposeFebus1via package API looks correctImporting
Febus1alongsideFebus2cleanly exposes the legacy Febus v1 format without altering existing behavior.dascore/viz/waterfall.py (1)
15-15: Centralizing Tukey fence logic viatukey_fenceis consistent and maintains behaviorImporting and delegating to
tukey_fence(data)for the default relative/no-scale case keeps the 1.5·IQR fence semantics while consolidating logic in a single utility; the returned 2‑element array integrates cleanly with the existingscalehandling.Also applies to: 78-79
tests/test_io/test_common_io.py (1)
30-31: AddingFebus1to common IO tests is appropriateImporting
Febus1and registering it inCOMMON_IO_READ_TESTSwithvalencia_febus_example.h5ensures the legacy Febus v1 path is exercised by the shared read/scan/slice test suite, which aligns with the Valencia support objective.Also applies to: 59-86
CodSpeed Performance ReportMerging #589 will not alter performanceComparing Summary
Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
dascore/__init__.py(2 hunks)dascore/io/febus/utils.py(7 hunks)tests/test_utils/test_misc.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_utils/test_misc.py (1)
dascore/utils/misc.py (2)
suppress_warnings(82-94)tukey_fence(931-941)
dascore/io/febus/utils.py (1)
dascore/utils/misc.py (2)
tukey_fence(931-941)_maybe_unpack(463-468)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: test_code (macos-latest, 3.13)
- GitHub Check: test_code (ubuntu-latest, 3.12)
- GitHub Check: test_code (windows-latest, 3.12)
- GitHub Check: test_code (macos-latest, 3.11)
- GitHub Check: test_code (windows-latest, 3.11)
- GitHub Check: test_code (ubuntu-latest, 3.11)
- GitHub Check: test_code (macos-latest, 3.12)
- GitHub Check: test_code (ubuntu-latest, 3.13)
- GitHub Check: test_code (windows-latest, 3.13)
- GitHub Check: test_code (windows-latest, 3.10)
- GitHub Check: test_code (ubuntu-latest, 3.10)
- GitHub Check: Run benchmarks
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
- GitHub Check: test_code_min_deps (windows-latest, 3.12)
- GitHub Check: test_code_min_deps (macos-latest, 3.13)
- GitHub Check: test_code_min_deps (macos-latest, 3.12)
- GitHub Check: test_code_min_deps (windows-latest, 3.13)
🔇 Additional comments (10)
dascore/__init__.py (2)
3-5: LGTM!The warnings import is properly placed and necessary for the module-level warning configuration.
22-24: "once" is more restrictive than "default" and hides warnings globally."once" prints the first occurrence of a warning only, regardless of line number, whereas "default" prints the first occurrence per (module + line number) combination. This means warnings matching the same message will be suppressed across the entire application with "once", not per location. If different modules emit the same UserWarning, only the first occurrence anywhere in dascore will be shown. Consider whether "default" better preserves visibility into warnings from different code paths.
Likely an incorrect or invalid review comment.
tests/test_utils/test_misc.py (2)
27-29: LGTM!The new imports for
suppress_warningsandtukey_fenceare properly integrated into the alphabetically sorted import list.
261-296: LGTM! Comprehensive test coverage for Tukey fence.The test class provides excellent coverage of the
tukey_fencefunction:
- Constant data edge case
- Simple range with proper fence clamping
- All-NaN handling with appropriate warning suppression
- Outlier detection with correct bound calculation
The test logic and expected values are mathematically correct.
dascore/io/febus/utils.py (6)
5-20: LGTM on imports.The additions of
warnings,cache, andtukey_fenceare appropriate for the new Febus v1 file handling.
30-57: Good fallback handling for missingBlockRate.The graceful handling of missing
BlockRateattribute (line 36) properly addresses the Valencia file issue. The use oftukey_fenceto handle timing variability is a sensible approach for real-world data.One consideration:
@cacheis unbounded and uses argument identity/hash. Sincefebcontains HDF5 object references that hash by identity, this works correctly within a single file session. If memory becomes a concern with many files, consider@lru_cache(maxsize=...)instead.
155-168: LGTM on overlap calculation refactor.The updated signature makes the function more flexible by allowing pre-computed
tstepto be passed in, reducing redundant calculations. The even-number rounding forexpected_samplesensures symmetric overlap trimming.
190-198: Good change from assertion to warning.Converting the extent mismatch from a hard assertion to a
UserWarningallows Valencia files to be processed despite metadata inconsistencies. Thestacklevel=2is correctly set per the previous review feedback.
289-292: LGTM on edge case handling.The conditional
(1 if len(times) else 0)correctly handles the case where no time blocks match the filter, preventing an off-by-one error in the slice calculation.
311-315: LGTM on updated call site.The updated call to
_get_time_overlap_samplescorrectly passesdata_shape[1]as the time dimension, and the slice handling properly addresses the edge case whereskip_rows == 0.
| @cache | ||
| def _get_sample_spacing(feb: _FebusSlice, n_time_samps: int): | ||
| """ | ||
| Determine the temporal sample spacing (in seconds). | ||
| """ | ||
| # Note: This is a bit dicey, but we are trying to account for variability | ||
| # seen in real Febus Files. In some files the zone Spacing attr indicates one | ||
| # sample rate, while zone.attrs['SamplingRate'] indicates another. It | ||
| # varies as to which one is actually right, so we try to figure that | ||
| # out here. | ||
| ts_1 = feb.zone.attrs["Spacing"][1] / 1_000 # value in ms, convert to s. | ||
| # In most cases sample_rate is either bogus or in Hz. It isn't even mentioned | ||
| # in some Febus documentation. | ||
| ts_2 = _maybe_unpack(1.0 / feb.zone.attrs["SamplingRate"]) | ||
| # Get the block time. This doesn't account for overlap, so it wont be exact. | ||
| block_time = _get_block_time(feb) | ||
| # Get candidate times, return the closet to the block_time. | ||
| ts_array = np.array([ts_1, ts_2]) | ||
| block_time_array = ts_array * n_time_samps | ||
| return ts_array[np.argmin(np.abs(block_time_array - block_time))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential division by zero on line 73.
If SamplingRate is 0 (or contains 0), the division 1.0 / feb.zone.attrs["SamplingRate"] will raise ZeroDivisionError. Consider adding a guard or using np.inf as a fallback.
🔎 Suggested fix:
- ts_2 = _maybe_unpack(1.0 / feb.zone.attrs["SamplingRate"])
+ sr = feb.zone.attrs["SamplingRate"]
+ ts_2 = _maybe_unpack(1.0 / sr) if np.all(sr != 0) else np.inf🤖 Prompt for AI Agents
In dascore/io/febus/utils.py around lines 60 to 79, the computation ts_2 =
_maybe_unpack(1.0 / feb.zone.attrs["SamplingRate"]) can raise ZeroDivisionError
if SamplingRate is 0 (or contains zero); guard against zero or missing
SamplingRate by checking the value(s) before dividing and use a safe fallback
(e.g., set ts_2 to np.inf or np.array of np.inf matching shape) when
SamplingRate is 0 or falsy, or wrap the division in a try/except and assign
np.inf on error; ensure the fallback works with the later array math (ts_array *
n_time_samps) so types/shapes remain compatible.
Description
This PR tries to make the Febus parser more flexible so that it can also handle older (version 1) files. Closes #587.
Checklist
I have (if applicable):
Summary by CodeRabbit
New Features
Improvements
Other
✏️ Tip: You can customize this high-level summary in your review settings.