-
Notifications
You must be signed in to change notification settings - Fork 28
Hampel Filter #543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hampel Filter #543
Conversation
WalkthroughAdds a Hampel despiking filter implementation, exports it via Changes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #543 +/- ##
=======================================
Coverage 99.92% 99.92%
=======================================
Files 124 125 +1
Lines 10336 10435 +99
=======================================
+ Hits 10328 10427 +99
Misses 8 8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
✅ Documentation built: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
docs/recipes/despiking.qmd (3)
3-5: Fix Quarto front‑matter key.Use
warning: false(notwarn) underexecute:to suppress warnings.execute: - warn: false + warning: false
69-71: Use Quarto callouts instead of a fenced{warning}block.Replace with a callout so it renders correctly.
-```{warning} -The Hampel filter can be quite slow if large windows are used. We recommend using window sizes that correspond to <10 samples for most cases. -``` +::: {.callout-warning} +The Hampel filter can be quite slow if large windows are used. We recommend using window sizes that correspond to <10 samples for most cases. +:::
26-26: Minor doc nits (typo, unused import, spacing).
- Remove unused
matplotlib.pyplot as plt.- Fix spacing in indexing.
- Grammar: “distortion” not “distorting”.
-import matplotlib.pyplot as plt @@ -data [75, :] = -1400 +data[75, :] = -1400 @@ -The resulting signal had some of the spikes removed with minimal distorting to the original signal. +The resulting signal had some of the spikes removed with minimal distortion to the original signal.Also applies to: 45-46, 66-66
tests/test_proc/test_hampel.py (2)
227-238: Cover the “some dimensions size == 1” separable guard.Current comment is contradictory and the test doesn’t hit that branch. Force one dim to 1 sample to exercise the non‑separable path.
- # Test case 3: not all(s > 1 for s in size) (some dimensions have size 1) - # This will use the separable path since it has 2 dimensions with size > 1 - result3 = patch_with_spikes.hampel_filter( - time=0.6, distance=3.0, threshold=3.5, separable=True - ) + # Test case 3: not all(s > 1 for s in size) (some dimensions have size 1) + # Force distance window to 1 sample to ensure separable path is skipped + dist_step = patch_with_spikes.get_coord("distance").step + result3 = patch_with_spikes.hampel_filter( + time=0.6, distance=0.1 * dist_step, threshold=3.5, separable=True + )
208-217: Avoid testing private helper directly.Importing
_get_hampel_window_sizein tests ties to a private API and may break with internal refactors. Prefer validating via publichampel_filterbehavior (e.g., parity adjustment, min‑samples error).dascore/proc/hampel.py (3)
41-49: Warnings should set stacklevel.Without
stacklevel, warnings point inside the library rather than the caller. Setstacklevel=2.- warnings.warn(msg, UserWarning) + warnings.warn(msg, UserWarning, stacklevel=2)
163-186: Compute in float to avoid integer overflow/precision loss; cast back at the end.Median/MAD on integer arrays can overflow and degrade precision. Do math in float64, then cast to input dtype.
- # First build axis windows - data = patch.data + # First build axis windows + data = patch.data + dataf = np.asarray(data, dtype=np.float64) @@ - size = _get_hampel_window_size(patch, kwargs, samples) + size = _get_hampel_window_size(patch, kwargs, samples) @@ - med = _separable_median(data, size, mode) - abs_med_diff = np.abs(data - med) + med = _separable_median(dataf, size, mode) + abs_med_diff = np.abs(dataf - med) mad = _separable_median(abs_med_diff, size, mode) else: - # Use standard 2D median filter (more accurate but slower) - med, abs_med_diff, mad = _calculate_standard_median_and_mad(data, size, mode) + # Use standard 2D median filter (more accurate but slower) + med, abs_med_diff, mad = _calculate_standard_median_and_mad(dataf, size, mode) @@ - out = np.where(thresholded > threshold, med, data) - return patch.update(data=out) + out = np.where(thresholded > threshold, med, dataf) + out = out.astype(data.dtype, copy=False) + return patch.update(data=out)
124-127: Docstring link formatting.Use plain text or a proper path; current Markdown is malformed in docstrings.
- - The (Despiking recipe)[`docs/recipes/despiking.qmd`]. + - Despiking recipe: docs/recipes/despiking.qmd
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
dascore/core/patch.py(1 hunks)dascore/proc/__init__.py(1 hunks)dascore/proc/hampel.py(1 hunks)docs/recipes/despiking.qmd(1 hunks)scripts/_templates/_quarto.yml(1 hunks)tests/test_proc/test_hampel.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
dascore/proc/__init__.py (1)
dascore/proc/hampel.py (1)
hampel_filter(78-186)
dascore/proc/hampel.py (4)
dascore/proc/filter.py (1)
median_filter(201-249)dascore/exceptions.py (1)
ParameterError(26-27)dascore/utils/patch.py (1)
get_dim_axis_value(558-643)dascore/core/coords.py (1)
get_sample_count(753-789)
tests/test_proc/test_hampel.py (4)
dascore/exceptions.py (1)
ParameterError(26-27)dascore/examples.py (1)
get_example_patch(659-706)dascore/proc/coords.py (1)
update_coords(218-243)dascore/proc/hampel.py (1)
_get_hampel_window_size(15-51)
dascore/core/patch.py (1)
dascore/proc/hampel.py (1)
hampel_filter(78-186)
🪛 Ruff (0.13.1)
dascore/proc/hampel.py
48-48: No explicit stacklevel keyword argument found
Set stacklevel=2
(B028)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: test_code_min_deps (windows-latest, 3.12)
- GitHub Check: test_code_min_deps (macos-latest, 3.12)
- GitHub Check: test_code_min_deps (windows-latest, 3.13)
- GitHub Check: test_code_min_deps (macos-latest, 3.13)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
- GitHub Check: test_code (ubuntu-latest, 3.10)
- GitHub Check: test_code (windows-latest, 3.10)
- GitHub Check: test_code (macos-latest, 3.12)
- GitHub Check: test_code (windows-latest, 3.12)
- GitHub Check: test_code (windows-latest, 3.11)
- GitHub Check: test_code (macos-latest, 3.13)
- GitHub Check: test_code (windows-latest, 3.13)
- GitHub Check: test_code (macos-latest, 3.10)
- GitHub Check: test_code (ubuntu-latest, 3.11)
- GitHub Check: test_code (ubuntu-latest, 3.13)
- GitHub Check: test_code (ubuntu-latest, 3.12)
- GitHub Check: test_build_docs
🔇 Additional comments (3)
dascore/proc/__init__.py (1)
17-17: Public export OK.Importing
hampel_filterhere makes it available viadascore.proc. No concerns.dascore/core/patch.py (1)
326-326: API consistency: Patch.hampel_filter binding LGTM.Matches existing pattern for exposing processing funcs on Patch.
scripts/_templates/_quarto.yml (1)
136-139: Nav entry added for despiking — file exists; build not verified.
docs/recipes/despiking.qmd found and referenced in scripts/_templates/_quarto.yml (line 136). Run a site build to confirm the page renders in the sidebar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (9)
dascore/proc/hampel.py (7)
118-123: Docstring: clarify odd-window behavior for samples=True vs False.Currently reads as if even window lengths are always auto-adjusted. In code, adjustment only happens when
samples=False; withsamples=Trueyou raiseParameterError. Please update the docs for accuracy.Apply this diff:
- If the selected window lengths don't represent odd numbers in each - dimension they will be adjusted to be an odd length. This ensures - a clean median calculation is possible. + When samples=False, even window lengths are bumped to the next odd + value to ensure a clean median calculation. When samples=True, an + even sample count raises a ParameterError.
176-185: Enable separable path when ≥2 dims have window > 1 (handles extra singleton dims).Requiring all dims to have size > 1 can unnecessarily block the fast separable path when an unrelated dimension has a window of 1. Prefer “at least two dims > 1”.
Apply this diff:
- if separable and len(size) > 1 and all(s > 1 for s in size): + if separable and sum(s > 1 for s in size) >= 2:
171-172: Fast no-op: early-return when all window sizes are 1.Avoid two median_filter passes when no dimension is selected.
Apply this diff:
size = _get_hampel_window_size(patch, kwargs, samples) +# No-op if no dimension was selected (all ones). +if all(s == 1 for s in size): + return patch
192-194: Preserve integer semantics by rounding before casting back to int.Casting float medians to int truncates toward zero. Round to nearest to avoid bias.
Apply this diff:
out = np.where(thresholded > threshold, med, dataf) -# Cast back to original dtype -out = out.astype(data.dtype, copy=False) +# Cast back to original dtype (round if original was integer) +if np.issubdtype(data.dtype, np.integer): + out = np.rint(out) +out = out.astype(data.dtype, copy=False) return patch.update(data=out)
162-165: Validate finite thresholds (reject NaN/Inf).A non-finite threshold silently disables replacement or yields undefined behavior. Fail fast.
Apply this diff:
if threshold <= 0: msg = "hampel_filter threshold must be greater than zero" raise ParameterError(msg) +if not np.isfinite(threshold): + raise ParameterError("hampel_filter `threshold` must be finite.")
15-21: Minor: move warnings import to module scope.Avoid function-local import; keeps imports consistent with the rest of the module.
Apply this diff:
-from __future__ import annotations - -import numpy as np +from __future__ import annotations +import warnings +import numpy as np from scipy.ndimage import median_filter- import warningsAlso applies to: 41-49
23-39: Optional: guard against windows larger than the coordinate length.Consider passing
enforce_lt_coord=Trueto surface overly large windows early (scipy will still run, but catching this could prevent surprises).Apply this diff:
- samps = coord.get_sample_count(val, samples=samples) + samps = coord.get_sample_count(val, samples=samples, enforce_lt_coord=True)tests/test_proc/test_hampel.py (2)
256-260: Remove duplicate shape assertions.Redundant checks; keep one pair.
Apply this diff:
- assert result_time.data.shape == patch_with_spikes.data.shape - assert result_distance.data.shape == patch_with_spikes.data.shape - assert result_time.data.shape == patch_with_spikes.data.shape - assert result_distance.data.shape == patch_with_spikes.data.shape + assert result_time.data.shape == patch_with_spikes.data.shape + assert result_distance.data.shape == patch_with_spikes.data.shape
278-282: Enforce finite threshold and add NaN/Inf testshampel_filter only checks "threshold <= 0" (dascore/proc/hampel.py ~lines 162–164); NaN/Inf pass that check and silently disable detection. Add tests and validate finiteness.
- Add tests in tests/test_proc/test_hampel.py (near the existing bad-threshold test) to assert ParameterError for np.inf and np.nan (diff below).
- Update dascore/proc/hampel.py to raise ParameterError when not np.isfinite(threshold) (check before the <= 0 test) with a message containing "must be finite".
Apply this diff:
def test_bad_threshold_raises(self, patch_with_spikes): """Ensure a bad threshold raises ParameterError.""" with pytest.raises(ParameterError, match="must be greater than zero"): patch_with_spikes.hampel_filter(time=0.6, threshold=0) + with pytest.raises(ParameterError, match="must be finite"): + patch_with_spikes.hampel_filter(time=0.6, threshold=np.inf) + with pytest.raises(ParameterError, match="must be finite"): + patch_with_spikes.hampel_filter(time=0.6, threshold=np.nan)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dascore/proc/hampel.py(1 hunks)tests/test_proc/test_hampel.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_proc/test_hampel.py (6)
dascore/exceptions.py (1)
ParameterError(26-27)dascore/examples.py (1)
get_example_patch(659-706)dascore/core/patch.py (2)
data(221-223)shape(226-228)dascore/proc/coords.py (1)
update_coords(218-243)dascore/proc/hampel.py (1)
hampel_filter(78-194)dascore/core/coordmanager.py (1)
step(1036-1038)
dascore/proc/hampel.py (4)
dascore/proc/filter.py (1)
median_filter(201-249)dascore/exceptions.py (1)
ParameterError(26-27)dascore/utils/patch.py (1)
get_dim_axis_value(558-643)dascore/core/coords.py (1)
get_sample_count(753-789)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: test_code (windows-latest, 3.11)
- GitHub Check: test_code (windows-latest, 3.12)
- GitHub Check: test_code (windows-latest, 3.10)
- GitHub Check: test_code (windows-latest, 3.13)
- GitHub Check: test_code (ubuntu-latest, 3.12)
- GitHub Check: test_code (macos-latest, 3.10)
- GitHub Check: test_code (macos-latest, 3.11)
- GitHub Check: test_code (macos-latest, 3.13)
- GitHub Check: test_code (ubuntu-latest, 3.10)
- GitHub Check: test_code (ubuntu-latest, 3.13)
- GitHub Check: test_code (ubuntu-latest, 3.11)
- GitHub Check: test_code_min_deps (windows-latest, 3.12)
- GitHub Check: test_code_min_deps (windows-latest, 3.13)
- GitHub Check: test_code_min_deps (macos-latest, 3.13)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
- GitHub Check: test_code_min_deps (macos-latest, 3.12)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
- GitHub Check: test_build_docs
🔇 Additional comments (3)
dascore/proc/hampel.py (1)
162-165: Resolved: threshold validation added (good).The
threshold > 0guard addresses the earlier review note; implementation and error type/message look good.tests/test_proc/test_hampel.py (2)
59-72: Nice: targeted spike reduction assertion.Good invariants and readable checks.
312-323: Solid multi-dimensional effectiveness checks.Clear comparisons of multi-d vs single-d filtering; assertions are pragmatic.
Also applies to: 365-375
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
dascore/proc/hampel.py (3)
107-113: Docstring nit: use “Warnings” section header.NumPy-style docs prefer “Warnings” (plural).
Apply this doc-only tweak:
- Warning - ------- + Warnings + --------
81-85: Expose edge-handling parameters (mode, cval).Hard-coding reflect limits control at boundaries. Recommend surfacing
mode(andcvalfor constant mode) to match scipy.ndimage behavior and existing proc.filter API style.Minimal change set:
@@ def hampel_filter( patch: PatchType, *, threshold: float, samples=False, - separable=False, + separable=False, + mode: str = "reflect", + cval: float = 0.0, **kwargs, ): @@ - # For now we just hardcode mode as it is probably the only one that - # makes sense in a DAS data context. - mode = "reflect" + # Edge handling; default mirrors current behavior. @@ - if separable and sum(s > 1 for s in size) >= 2: + if separable and sum(s > 1 for s in size) >= 2: # Use separable filtering for multi-dimensional windows # This is faster but provides an approximation - med = _separable_median(dataf, size, mode) + med = _separable_median(dataf, size, mode, cval) abs_med_diff = np.abs(dataf - med) - mad = _separable_median(abs_med_diff, size, mode) + mad = _separable_median(abs_med_diff, size, mode, cval) else: # Use standard 2D median filter (more accurate but slower) - med, abs_med_diff, mad = _calculate_standard_median_and_mad(dataf, size, mode) + med, abs_med_diff, mad = _calculate_standard_median_and_mad( + dataf, size, mode, cval + )Update helpers:
@@ -def _separable_median(data, size, mode): +def _separable_median(data, size, mode, cval): @@ - med = median_filter(med, size=tuple(axis_size), mode=mode) + med = median_filter(med, size=tuple(axis_size), mode=mode, cval=cval) @@ -def _calculate_standard_median_and_mad(data, size, mode): +def _calculate_standard_median_and_mad(data, size, mode, cval): @@ - med = median_filter(data, size=size, mode=mode) + med = median_filter(data, size=size, mode=mode, cval=cval) @@ - mad = median_filter(abs_med_diff, size=size, mode=mode) + mad = median_filter(abs_med_diff, size=size, mode=mode, cval=cval)Also applies to: 167-170, 54-66, 69-74, 174-183
183-189: Optional: configurable MAD floor.Dividing by eps makes near-constant windows very aggressive. Consider an optional
mad_floor(e.g., relative to global/median MAD) to tune sensitivity on low-variance data.If you want this, I can wire it as
mad_safe = np.where(mad < mad_floor, mad_floor, mad)with a sensible default (e.g.,mad_floor=None→ current behavior).tests/test_proc/test_hampel.py (3)
95-98: Make change-count comparison robust for floats.Equality on floats can be brittle. Use
np.iscloseor a tolerance.Apply:
- diff_low = np.sum(result_low.data != patch_with_spikes.data) - diff_high = np.sum(result_high.data != patch_with_spikes.data) + diff_low = np.count_nonzero(~np.isclose(result_low.data, patch_with_spikes.data)) + diff_high = np.count_nonzero(~np.isclose(result_high.data, patch_with_spikes.data))
188-193: Comment mismatch with behavior.This value yields an even sample count that gets bumped to odd; update comment to reflect that.
- # We need to find a value that gives odd samples when converted - # Let's use a value that will result in an odd number of samples + # Choose a value that yields an even sample count which will be bumped to odd
232-237: Comment doesn’t match separable guard.Current guard is
sum(s > 1) >= 2, not “not all(s > 1)”. Tweak comment for clarity.- # Use a larger time window but force distance to minimum to test separable guard + # Separable path engages when at least two dims have size > 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dascore/proc/hampel.py(1 hunks)tests/test_proc/test_hampel.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
dascore/proc/hampel.py (4)
dascore/proc/filter.py (1)
median_filter(201-249)dascore/exceptions.py (1)
ParameterError(26-27)dascore/utils/patch.py (2)
get_dim_axis_value(558-643)patch_function(179-285)dascore/core/coords.py (1)
get_sample_count(753-789)
tests/test_proc/test_hampel.py (5)
dascore/exceptions.py (1)
ParameterError(26-27)dascore/examples.py (1)
get_example_patch(659-706)dascore/proc/coords.py (1)
update_coords(218-243)dascore/proc/hampel.py (1)
hampel_filter(78-189)dascore/core/coordmanager.py (1)
step(1036-1038)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: test_code (windows-latest, 3.10)
- GitHub Check: test_code (macos-latest, 3.12)
- GitHub Check: test_code (windows-latest, 3.12)
- GitHub Check: test_code (windows-latest, 3.11)
- GitHub Check: test_code (macos-latest, 3.13)
- GitHub Check: test_code (windows-latest, 3.13)
- GitHub Check: test_code (macos-latest, 3.11)
- GitHub Check: test_code (ubuntu-latest, 3.12)
- GitHub Check: test_code (ubuntu-latest, 3.13)
- GitHub Check: test_code (ubuntu-latest, 3.11)
- GitHub Check: test_code (ubuntu-latest, 3.10)
- GitHub Check: test_build_docs
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
- GitHub Check: test_code_min_deps (macos-latest, 3.13)
- GitHub Check: test_code_min_deps (windows-latest, 3.12)
- GitHub Check: test_code_min_deps (macos-latest, 3.12)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
- GitHub Check: test_code_min_deps (windows-latest, 3.13)
🔇 Additional comments (7)
dascore/proc/hampel.py (2)
24-39: Good input validation and useful perf warning.
- Min-sample and odd-length enforcement are correct and user-friendly.
- The large-window warning is a nice touch for discoverability.
Also applies to: 41-49
162-165: Threshold validation addresses prior review.The strict check for finite, >0 thresholds resolves the earlier concern.
tests/test_proc/test_hampel.py (5)
116-156: Nice parity check for 1D separable vs standard.Asserting identical arrays in the single-dimension case is a solid invariant.
157-165: Good perf warning test.Covers the large-window warning with a clear matcher.
276-284: Edge-case threshold tests are thorough.Covers zero, inf, and nan with precise message matching.
285-342: Effective multi-channel spike scenarios.Strong coverage for distance/time-channel and mixed spikes; assertions are clear and resilient.
Also applies to: 344-393, 394-442
47-49: Confirmed: update_coords accepts time_step. update_coords delegates to coords.update and the kwarg time_step is used throughout tests/docs (e.g. tests/test_proc/test_hampel.py:48; tests/test_io/test_segy.py:94; docs/recipes/low_freq_proc.qmd:144). No change required.
Description
This PR implements a
hampel_fitlerfor spike removal. It also adds corresponding documentation.Checklist
I have (if applicable):
Summary by CodeRabbit
New Features
Documentation
Tests