Skip to content

Conversation

@d-chambers
Copy link
Contributor

@d-chambers d-chambers commented Sep 27, 2025

Description

This PR speeds up the hampel_filter by allocating fewer arrays and introducing bottleneck as an optional dependency. It also adds a moving module for simplifying window operations.

Checklist

I have (if applicable):

  • referenced the GitHub issue this PR closes.
  • documented the new feature with docstrings or appropriate doc page.
  • included a test. See testing guidelines.
  • your name has been added to the contributors page (docs/contributors.md).
  • added the "ready_for_review" tag once the PR is ready to be reviewed.

Summary by CodeRabbit

  • New Features
    • Added moving-window utilities with automatic backend selection and convenience helpers.
    • Hampel filter now defaults to an approximate (faster) mode with an exact option; default threshold updated.
    • optional_import gains on_missing control ("raise", "warn", "ignore").
  • Bug Fixes
    • Conditional deduplication reduces duplicate results from large directory indexes.
  • Documentation
    • Updated despiking guidance to reflect Hampel changes and performance notes.
  • Chores
    • Added optional bottleneck dependency and updated dev extras.
  • Tests
    • Expanded moving-window and Hampel tests, standardized many benchmark names, and added redundant-index chunking tests.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 27, 2025

Walkthrough

Renames many benchmark tests; refactors the Hampel filter to add an approximate (default) fast path and an exact path using new moving-window utilities; adds a moving-window module and re-exports; extends optional_import with on_missing behaviors; adds conditional spool deduplication and related tests and docs/pyproject changes.

Changes

Cohort / File(s) Summary of changes
Benchmarks renames
benchmarks/test_io_benchmarks.py, benchmarks/test_patch_benchmarks.py, benchmarks/test_spool_benchmarks.py
Many test methods renamed to drop the _performance suffix; a few new test variants added (e.g., Hampel non-approximate); @pytest.mark.usefixtures("cleanup_mpl") added to some plotting tests. No behavioral changes to test logic.
Hampel refactor
dascore/proc/hampel.py, tests/test_proc/test_hampel.py, docs/recipes/despiking.qmd
Reworked Hampel internals: new approximate: bool = True API, added _hampel_separable (fast) and _hampel_non_separable (exact), _separable_median now writes to external out and uses move_median; tests adapted to interior comparisons and immutability; docs updated.
Moving-window utilities & exports
dascore/utils/moving.py, dascore/utils/__init__.py, tests/test_utils/test_moving.py
New moving-window module with engine selection (auto/scipy/bottleneck) and helpers moving_window, move_median, move_mean, move_std, move_sum, move_min, move_max; these are re-exported from dascore.utils; comprehensive tests added.
optional_import enhancement
dascore/utils/misc.py, tests/test_utils/test_misc.py
optional_import gains on_missing: Literal["raise","warn","ignore"], can return None or warn/ignore instead of raising; warn_or_raise updated to accept "ignore"; tests added for the ignore path.
Spool deduplication & tests
dascore/core/spool.py, tests/test_clients/test_dirspool.py
Conditional deduplication added in _get_patches_from_index when joined index size > 10; new fixture and test verify chunking with redundant index entries.
Project metadata
pyproject.toml
Added bottleneck to optional dependencies and adjusted dev extras to use dascore[profile_base].

Possibly related PRs

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title “Hampel speed” accurately refers to the performance improvements made to the hampel_filter, which is a core focus of this PR, but it omits the addition of the new moving window module that is also a significant feature. Based on the criteria, the title is partially related to the changeset and thus meets the minimum requirement for relevance.
Description Check ✅ Passed The PR description follows the repository template by including both “## Description” and “## Checklist” sections, and it concisely explains the two main changes: reducing array allocations for hampel_filter performance and introducing the bottleneck dependency along with a new moving module. Although none of the checklist items are checked off, the structure is in place and the description content aligns with the documented objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hampel_speed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai bot added the proc Related to processing module label Sep 27, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (3)
benchmarks/test_patch_benchmarks.py (3)

205-208: Silence Ruff ARG002 by using the fixture without a param

Use the fixture via a decorator and drop the unused function arg.

Apply this diff:

-    @pytest.mark.benchmark
-    def test_waterfall(self, example_patch, cleanup_mpl):
+    @pytest.mark.usefixtures("cleanup_mpl")
+    @pytest.mark.benchmark
+    def test_waterfall(self, example_patch):
         """Timing for waterfall patch."""
         example_patch.viz.waterfall()

220-224: Same fix for wiggle plot test

Avoid the unused argument while still activating the fixture.

Apply this diff:

-    @pytest.mark.benchmark
-    def test_wiggle(self, example_patch, cleanup_mpl):
+    @pytest.mark.usefixtures("cleanup_mpl")
+    @pytest.mark.benchmark
+    def test_wiggle(self, example_patch):
         """Time wiggle plot visualization."""
         patch = example_patch.select(distance=(0, 100))  # Subset for performance
         patch.viz.wiggle()

129-140: Consider parametrizing separable for Hampel to reduce duplication

One parametrized test can cover both separable=True/False.

Apply this diff:

-    @pytest.mark.benchmark
-    def test_hampel_filter_non_separable(self, example_patch):
-        """Time the Hampel filter."""
-        example_patch.hampel_filter(
-            threshold=3.0, distance=3, time=5, samples=True, separable=False
-        )
-
-    @pytest.mark.benchmark
-    def test_hampel_filter(self, example_patch):
-        """Time the Hampel filter."""
-        example_patch.hampel_filter(
-            threshold=3.0, distance=3, time=5, samples=True, separable=True
-        )
+    @pytest.mark.parametrize("separable", [False, True])
+    @pytest.mark.benchmark
+    def test_hampel_filter(self, example_patch, separable):
+        """Time the Hampel filter (both non-separable and separable)."""
+        example_patch.hampel_filter(
+            threshold=3.0, distance=3, time=5, samples=True, separable=separable
+        )

Also applies to: 135-140

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b94e78 and 2716153.

📒 Files selected for processing (10)
  • benchmarks/test_io_benchmarks.py (1 hunks)
  • benchmarks/test_patch_benchmarks.py (9 hunks)
  • benchmarks/test_spool_benchmarks.py (4 hunks)
  • dascore/proc/hampel.py (5 hunks)
  • dascore/utils/__init__.py (1 hunks)
  • dascore/utils/misc.py (5 hunks)
  • dascore/utils/moving.py (1 hunks)
  • tests/test_proc/test_hampel.py (3 hunks)
  • tests/test_utils/test_misc.py (1 hunks)
  • tests/test_utils/test_moving.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
tests/test_utils/test_moving.py (2)
dascore/exceptions.py (1)
  • ParameterError (26-27)
dascore/utils/moving.py (8)
  • _get_available_engines (100-103)
  • move_max (215-217)
  • move_mean (195-197)
  • move_median (190-192)
  • move_min (210-212)
  • move_std (200-202)
  • move_sum (205-207)
  • moving_window (138-186)
dascore/utils/misc.py (1)
dascore/exceptions.py (1)
  • MissingOptionalDependencyError (98-99)
dascore/utils/__init__.py (1)
dascore/utils/moving.py (7)
  • move_max (215-217)
  • move_mean (195-197)
  • move_median (190-192)
  • move_min (210-212)
  • move_std (200-202)
  • move_sum (205-207)
  • moving_window (138-186)
dascore/proc/hampel.py (2)
dascore/utils/moving.py (1)
  • move_median (190-192)
dascore/utils/patch.py (1)
  • get_patch_window_size (677-762)
dascore/utils/moving.py (2)
dascore/exceptions.py (1)
  • ParameterError (26-27)
dascore/utils/misc.py (1)
  • optional_import (300-344)
benchmarks/test_io_benchmarks.py (2)
dascore/exceptions.py (1)
  • MissingOptionalDependencyError (98-99)
dascore/io/core.py (1)
  • scan_to_df (654-702)
tests/test_utils/test_misc.py (1)
dascore/utils/misc.py (1)
  • optional_import (300-344)
tests/test_proc/test_hampel.py (2)
dascore/core/patch.py (2)
  • data (272-283)
  • shape (286-299)
dascore/proc/hampel.py (1)
  • hampel_filter (98-198)
benchmarks/test_patch_benchmarks.py (1)
dascore/proc/hampel.py (1)
  • hampel_filter (98-198)
🪛 Ruff (0.13.1)
tests/test_utils/test_moving.py

69-69: Unused method argument: operation

(ARG002)


192-192: Loop control variable description not used within loop body

Rename unused description to _description

(B007)

dascore/utils/moving.py

55-55: Unpacked variable module_name is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


103-103: Consider ['scipy', *bottle_list] instead of concatenation

Replace with ['scipy', *bottle_list]

(RUF005)


172-172: Avoid specifying long messages outside the exception class

(TRY003)


174-174: Avoid specifying long messages outside the exception class

(TRY003)

benchmarks/test_patch_benchmarks.py

205-205: Unused method argument: cleanup_mpl

(ARG002)


220-220: Unused method argument: cleanup_mpl

(ARG002)

🪛 GitHub Actions: LintCode
tests/test_utils/test_moving.py

[error] 95-95: E501 Line too long (95 > 88)


[error] 105-105: NPY002 Replace legacy np.random.rand call with np.random.Generator


[error] 118-118: E501 Line too long (98 > 88)

dascore/utils/moving.py

[error] 53-53: E501 Line too long (109 > 88)


[error] 75-75: E501 Line too long (114 > 88)


[error] 103-103: RUF005 Consider ["scipy", *bottle_list] instead of concatenation


[error] 190-190: E501 Line too long (108 > 88)


[error] 195-195: E501 Line too long (106 > 88)


[error] 200-200: E501 Line too long (105 > 88)


[error] 205-205: E501 Line too long (105 > 88)


[error] 210-210: E501 Line too long (105 > 88)


[error] 215-215: E501 Line too long (105 > 88)

🔇 Additional comments (4)
benchmarks/test_spool_benchmarks.py (1)

42-42: Benchmark test renames look good

Names are cleaner while preserving behavior and markers. No issues spotted.

Also applies to: 47-47, 54-54, 59-59, 64-64, 78-78, 87-87, 96-96, 103-103

dascore/utils/__init__.py (1)

4-12: Public API re-exports: verify doc discovery/all

If your docs or API checks rely on all, ensure these are added there; otherwise, re-exports are fine.

Would you like me to add/update an all list here to make the public surface explicit?

tests/test_utils/test_moving.py (2)

171-175: Engine “invalid_engine” behavior assumption

This test expects a warning and fallback for an unknown engine. Confirm moving_window/_select_engine indeed warns (not raises) on invalid engine strings; otherwise adjust test or implementation accordingly.


120-121: Break long function signature to satisfy E501

Wrap parameters across lines.

Apply this diff:

-    def test_operation_engine_combinations(self, test_data, operation, engine, available_engines):
+    def test_operation_engine_combinations(
+        self, test_data, operation, engine, available_engines
+    ):
         """Test all operation/engine combinations."""

Likely an incorrect or invalid review comment.

Comment on lines 184 to 198
is_int = data.dtype.kind == "i"
dataf = data.copy() if not is_int else data.astype(np.float32)

# Convert to float64 to avoid integer overflow/precision loss
dataf = data.astype(np.float64, copy=False)

# Local median and MAD via median filters.
# Apply hampel filtering using appropriate method
if separable and sum(s > 1 for s in size) >= 2:
# Use separable filtering for multi-dimensional windows
# This is faster but provides an approximation
med = _separable_median(dataf, size, mode)
abs_med_diff = np.abs(dataf - med)
mad = _separable_median(abs_med_diff, size, mode)
# Use separable filtering for multi-dimensional windows (faster approximation)
dataf = _hampel_separable(dataf, size, mode, threshold)
else:
# Use standard 2D median filter (more accurate but slower)
med, abs_med_diff, mad = _calculate_standard_median_and_mad(dataf, size, mode)
# Handle mad values of 0 so denominator doesn't blow up.
mad_safe = np.where(mad == 0.0, np.finfo(float).eps, mad)
# Hampel test and replacement.
thresholded = abs_med_diff / mad_safe
out = np.where(thresholded > threshold, med, dataf)
dataf = _hampel_non_separable(dataf, size, mode, threshold)
# Cast back to original dtype (round if original was integer)
if np.issubdtype(data.dtype, np.integer):
out = np.rint(out)
out = out.astype(data.dtype, copy=False)
dataf = np.rint(dataf)
out = dataf.astype(data.dtype, copy=False)
return patch.update(data=out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Handle unsigned integer inputs without crashing

Right now only signed integers are promoted to float. Patches with unsigned types (e.g., uint16) keep their dtype, so _hampel_* hits np.finfo(dataf.dtype) and raises ValueError, and arithmetic wraps around. Please cast all integer dtypes before filtering.

-    is_int = data.dtype.kind == "i"
-    dataf = data.copy() if not is_int else data.astype(np.float32)
+    is_int = np.issubdtype(data.dtype, np.integer)
+    dataf = data.astype(np.float32) if is_int else data.copy()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
is_int = data.dtype.kind == "i"
dataf = data.copy() if not is_int else data.astype(np.float32)
# Convert to float64 to avoid integer overflow/precision loss
dataf = data.astype(np.float64, copy=False)
# Local median and MAD via median filters.
# Apply hampel filtering using appropriate method
if separable and sum(s > 1 for s in size) >= 2:
# Use separable filtering for multi-dimensional windows
# This is faster but provides an approximation
med = _separable_median(dataf, size, mode)
abs_med_diff = np.abs(dataf - med)
mad = _separable_median(abs_med_diff, size, mode)
# Use separable filtering for multi-dimensional windows (faster approximation)
dataf = _hampel_separable(dataf, size, mode, threshold)
else:
# Use standard 2D median filter (more accurate but slower)
med, abs_med_diff, mad = _calculate_standard_median_and_mad(dataf, size, mode)
# Handle mad values of 0 so denominator doesn't blow up.
mad_safe = np.where(mad == 0.0, np.finfo(float).eps, mad)
# Hampel test and replacement.
thresholded = abs_med_diff / mad_safe
out = np.where(thresholded > threshold, med, dataf)
dataf = _hampel_non_separable(dataf, size, mode, threshold)
# Cast back to original dtype (round if original was integer)
if np.issubdtype(data.dtype, np.integer):
out = np.rint(out)
out = out.astype(data.dtype, copy=False)
dataf = np.rint(dataf)
out = dataf.astype(data.dtype, copy=False)
return patch.update(data=out)
is_int = np.issubdtype(data.dtype, np.integer)
dataf = data.astype(np.float32) if is_int else data.copy()
# Apply hampel filtering using appropriate method
if separable and sum(s > 1 for s in size) >= 2:
# Use separable filtering for multi-dimensional windows (faster approximation)
dataf = _hampel_separable(dataf, size, mode, threshold)
else:
# Use standard 2D median filter (more accurate but slower)
dataf = _hampel_non_separable(dataf, size, mode, threshold)
# Cast back to original dtype (round if original was integer)
if np.issubdtype(data.dtype, np.integer):
dataf = np.rint(dataf)
out = dataf.astype(data.dtype, copy=False)
return patch.update(data=out)
🤖 Prompt for AI Agents
In dascore/proc/hampel.py around lines 184-198, the current check only treats
signed integer dtypes as needing promotion to float, so unsigned integer patches
(e.g., uint16) are left as-is and cause np.finfo errors or wraparound; change
the dtype check to detect any integer dtype (use np.issubdtype(data.dtype,
np.integer)) and cast those to a floating dtype (e.g., np.float32) before
calling the _hampel_* functions, then after filtering round if original was
integer and cast back to the original dtype; preserve the existing
separable/non-separable flow and ensure copy semantics remain correct.

@coderabbitai coderabbitai bot added the patch related to Patch class label Sep 27, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2716153 and 7a443cc.

📒 Files selected for processing (7)
  • benchmarks/test_patch_benchmarks.py (9 hunks)
  • dascore/proc/hampel.py (7 hunks)
  • dascore/utils/misc.py (5 hunks)
  • dascore/utils/moving.py (1 hunks)
  • docs/recipes/despiking.qmd (1 hunks)
  • tests/test_proc/test_hampel.py (4 hunks)
  • tests/test_utils/test_moving.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
dascore/utils/misc.py (1)
dascore/exceptions.py (1)
  • MissingOptionalDependencyError (98-99)
dascore/proc/hampel.py (2)
dascore/utils/moving.py (1)
  • move_median (193-197)
dascore/utils/patch.py (1)
  • get_patch_window_size (677-762)
tests/test_utils/test_moving.py (2)
dascore/exceptions.py (1)
  • ParameterError (26-27)
dascore/utils/moving.py (8)
  • _get_available_engines (103-106)
  • move_max (228-232)
  • move_mean (200-204)
  • move_median (193-197)
  • move_min (221-225)
  • move_std (207-211)
  • move_sum (214-218)
  • moving_window (141-189)
dascore/utils/moving.py (2)
dascore/exceptions.py (1)
  • ParameterError (26-27)
dascore/utils/misc.py (1)
  • optional_import (300-344)
tests/test_proc/test_hampel.py (2)
dascore/core/patch.py (3)
  • data (272-283)
  • ndim (229-231)
  • shape (286-299)
dascore/proc/hampel.py (1)
  • hampel_filter (98-211)
benchmarks/test_patch_benchmarks.py (3)
dascore/proc/coords.py (1)
  • snap_coords (20-53)
dascore/proc/hampel.py (1)
  • hampel_filter (98-211)
dascore/viz/waterfall.py (1)
  • waterfall (41-116)
🪛 Ruff (0.13.1)
dascore/utils/moving.py

57-57: Unpacked variable module_name is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


175-175: Avoid specifying long messages outside the exception class

(TRY003)


177-177: Avoid specifying long messages outside the exception class

(TRY003)

tests/test_proc/test_hampel.py

191-191: Loop control variable spike_val not used within loop body

Rename unused spike_val to _spike_val

(B007)

Comment on lines +197 to +210
# Need to convert ints to float for calculations to avoid roundoff error.
# There were issues using np.issubdtype not working so this uses kind.
is_int = data.dtype.kind in {"i", "u"}
dataf = data.copy() if not is_int else data.astype(np.float32)
# Apply hampel filtering using appropriate method
if approximate:
dataf = _hampel_separable(dataf, size, mode, threshold)
else:
# Use standard 2D median filter (more accurate but slower)
med, abs_med_diff, mad = _calculate_standard_median_and_mad(dataf, size, mode)
# Handle mad values of 0 so denominator doesn't blow up.
mad_safe = np.where(mad == 0.0, np.finfo(float).eps, mad)
# Hampel test and replacement.
thresholded = abs_med_diff / mad_safe
out = np.where(thresholded > threshold, med, dataf)
dataf = _hampel_non_separable(dataf, size, mode, threshold)
# Cast back to original dtype (round if original was integer)
if np.issubdtype(data.dtype, np.integer):
out = np.rint(out)
out = out.astype(data.dtype, copy=False)
dataf = np.rint(dataf)
out = dataf.astype(data.dtype, copy=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Do not downcast integer patches to float32

Casting every integer patch to np.float32 destroys precision for any value above 224. For example, an untouched sample with value 16_777_217 becomes 16_777_216 after the round‑trip through the filter, so we silently corrupt data even when no spike replacement occurs. We need to promote integers to a dtype that can exactly represent the original range (float64 is sufficient up to 253) before filtering.

-    # Need to convert ints to float for calculations to avoid roundoff error.
-    # There were issues using np.issubdtype not working so this uses kind.
-    is_int = data.dtype.kind in {"i", "u"}
-    dataf = data.copy() if not is_int else data.astype(np.float32)
+    # Promote integer inputs to a floating dtype that preserves their precision.
+    is_int = data.dtype.kind in {"i", "u"}
+    dataf = data.copy() if not is_int else data.astype(np.float64, copy=False)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Need to convert ints to float for calculations to avoid roundoff error.
# There were issues using np.issubdtype not working so this uses kind.
is_int = data.dtype.kind in {"i", "u"}
dataf = data.copy() if not is_int else data.astype(np.float32)
# Apply hampel filtering using appropriate method
if approximate:
dataf = _hampel_separable(dataf, size, mode, threshold)
else:
# Use standard 2D median filter (more accurate but slower)
med, abs_med_diff, mad = _calculate_standard_median_and_mad(dataf, size, mode)
# Handle mad values of 0 so denominator doesn't blow up.
mad_safe = np.where(mad == 0.0, np.finfo(float).eps, mad)
# Hampel test and replacement.
thresholded = abs_med_diff / mad_safe
out = np.where(thresholded > threshold, med, dataf)
dataf = _hampel_non_separable(dataf, size, mode, threshold)
# Cast back to original dtype (round if original was integer)
if np.issubdtype(data.dtype, np.integer):
out = np.rint(out)
out = out.astype(data.dtype, copy=False)
dataf = np.rint(dataf)
out = dataf.astype(data.dtype, copy=False)
# Promote integer inputs to a floating dtype that preserves their precision.
is_int = data.dtype.kind in {"i", "u"}
dataf = data.copy() if not is_int else data.astype(np.float64, copy=False)
# Apply hampel filtering using appropriate method
if approximate:
dataf = _hampel_separable(dataf, size, mode, threshold)
else:
# Use standard 2D median filter (more accurate but slower)
dataf = _hampel_non_separable(dataf, size, mode, threshold)
# Cast back to original dtype (round if original was integer)
if np.issubdtype(data.dtype, np.integer):
dataf = np.rint(dataf)
out = dataf.astype(data.dtype, copy=False)
🤖 Prompt for AI Agents
In dascore/proc/hampel.py around lines 197 to 210, integer patches are being
cast to np.float32 which loses precision for large integers; change the
promotion so integer dtypes are converted to np.float64 (not float32) before
filtering, keep non-integer arrays untouched, run the existing filtering
branches, then perform the same rounding and cast-back logic to the original
dtype; ensure you only promote when data.dtype is integer and preserve copy
semantics.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
dascore/utils/moving.py (5)

57-57: Clean up naming and fix Ruff unused-variable warning

Rename the unpacked module_name to _ and disambiguate func_name to avoid confusion with _get_engine_function’s parameter name. No behavior change.

-    module_name, func_name = OPERATION_REGISTRY[operation]["scipy"]
+    _module_name, scipy_func_name = OPERATION_REGISTRY[operation]["scipy"]
     func = _get_engine_function("scipy", operation)
 
-    if func_name == "uniform_filter1d":
+    if scipy_func_name == "uniform_filter1d":
 ...
-    elif func_name == "median_filter":
+    elif scipy_func_name == "median_filter":

Also applies to: 61-61, 70-70


60-69: Cross‑engine boundary/NaN semantics diverge (document or align)

  • SciPy path uses padding (eg, reflect) via uniform_filter1d; scaling by window yields a padded-window sum.
  • Bottleneck uses sliding windows with min_count semantics and NaN‑aware reductions.
    Results can differ at the edges and in presence of NaNs. Either document this explicitly or add a boundary/min_count option at the top API and route it per engine for predictable behavior.

If you want, I can draft tests asserting expected cross‑engine behavior at boundaries and with NaNs.

Also applies to: 86-89


105-110: Return type annotation mismatch (tuple vs list)

Function returns a tuple; update the annotation to match to keep type checkers quiet.

-@cache
-def _get_available_engines() -> list[str]:
+@cache
+def _get_available_engines() -> tuple[str, ...]:
     """Get list of available engines (cached)."""
     bottle_list = [] if bn is None else ["bottleneck"]
     return tuple(["scipy", *bottle_list])

112-117: Cache engine function lookup and fix param naming

Name the param operation (it is not a function name) and cache the resolution. This also removes shadowing with local func_name elsewhere.

-def _get_engine_function(engine: str, func_name: str) -> Callable | None:
-    """Get and cache engine function."""
-    module_name, func_name = OPERATION_REGISTRY[func_name][engine]
+@cache
+def _get_engine_function(engine: str, operation: str) -> Callable:
+    """Get and cache engine function."""
+    module_name, func_name = OPERATION_REGISTRY[operation][engine]
     mod = _get_module(module_name)
     return getattr(mod, func_name)

174-186: Normalize operation case and validate/normalize axis

  • Accept case‑insensitive operations.
  • Raise ParameterError (not IndexError) on bad axis and normalize negatives.
     data = np.asarray(data)
     # Validate inputs
     if window <= 0:
         raise ParameterError("Window size must be positive")
-    if operation not in OPERATION_REGISTRY:
+    operation = operation.lower()
+    if operation not in OPERATION_REGISTRY:
         raise ParameterError(f"Unknown operation: {operation}")
-    if window > data.shape[axis]:
+    if not (-data.ndim <= axis < data.ndim):
+        raise ParameterError(
+            f"Axis {axis} is out of bounds for array with ndim={data.ndim}"
+        )
+    axis = axis % data.ndim
+    if window > data.shape[axis]:
         msg = (
             f"Window size ({window}) larger than data size ({data.shape[axis]}) "
             f"along axis {axis}"
         )
         raise ParameterError(msg)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a443cc and 1651d35.

📒 Files selected for processing (1)
  • dascore/utils/moving.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
dascore/utils/moving.py (2)
dascore/exceptions.py (1)
  • ParameterError (26-27)
dascore/utils/misc.py (1)
  • optional_import (300-344)
🪛 Ruff (0.13.1)
dascore/utils/moving.py

57-57: Unpacked variable module_name is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


177-177: Avoid specifying long messages outside the exception class

(TRY003)


179-179: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (3)
dascore/utils/moving.py (3)

24-50: Good registry design and engine mapping

Clear split of SciPy vs Bottleneck, with explicit None for unsupported ops (std on SciPy). This keeps dispatch simple and discoverable.


63-65: Preserving complex inputs through SciPy path — nice

Only promoting integer/boolean types to float avoids stripping imaginary parts. Good catch.


119-141: Engine fallback UX is clear

Message and fallback behavior are sensible; raising on unsupported op/engine combos is appropriate.

Do we want a short note in the docstring that std requires Bottleneck and will error if only SciPy is installed?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
dascore/proc/hampel.py (2)

45-74: Minor alloc cuts in separable path (optional)

Preallocate a boolean mask and use np.copyto(where=…) to avoid temporary arrays/fancy indexing.

-    outlier_mask = abs_med_diff > threshold
-    dataf[outlier_mask] = med[outlier_mask]
+    # Preallocate mask once
+    mask = np.empty(dataf.shape, dtype=bool)
+    np.greater(abs_med_diff, threshold, out=mask)
+    # Copy only where needed
+    np.copyto(dataf, med, where=mask)

76-95: Avoid extra arrays in exact path; compute in-place

Saves two temporaries in hot path without changing behavior.

-    # Handle mad values of 0 so denominator doesn't blow up
-    mad_safe = np.where(mad == 0.0, np.finfo(dataf.dtype).eps, mad)
-
-    # Hampel test and replacement
-    thresholded = abs_med_diff / mad_safe
-
-    # Use in-place assignment for consistency
-    outlier_mask = thresholded > threshold
+    # Stabilize denominator in-place
+    eps = np.finfo(dataf.dtype).eps
+    mad[mad == 0.0] = eps
+    # Reuse abs_med_diff buffer
+    np.divide(abs_med_diff, mad, out=abs_med_diff)
+    outlier_mask = abs_med_diff > threshold
     dataf[outlier_mask] = med[outlier_mask]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 280c56d and 048e255.

📒 Files selected for processing (2)
  • dascore/core/spool.py (1 hunks)
  • dascore/proc/hampel.py (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
dascore/proc/hampel.py (2)
dascore/utils/moving.py (1)
  • move_median (202-206)
dascore/utils/patch.py (1)
  • get_patch_window_size (677-762)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (macos-latest, 3.13)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code (windows-latest, 3.13)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.13)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: Run benchmarks
🔇 Additional comments (4)
dascore/proc/hampel.py (4)

12-12: Import looks good

Re-export via utils.moving is clear; usage below is consistent.


204-206: Confirm rounding policy for integer casts

np.rint uses bankers’ rounding (ties to even). If you want “away from zero” semantics, switch to np.round(..., decimals=0) and specify a cast. Otherwise, please confirm current behavior is intended.


193-197: Integer round-trip corruption: promote to float64 (not float32) and use robust dtype check

float32 cannot exactly represent integers > 2**24; current round‑trip corrupts untouched samples. Use np.issubdtype and promote integers to float64 before filtering.

-    # Need to convert ints to float for calculations to avoid roundoff error.
-    # There were issues using np.issubdtype not working so this uses kind.
-    is_int = data.dtype.kind in {"i", "u"}
-    dataf = data.copy() if not is_int else data.astype(np.float32)
+    # Promote integer inputs to float64 to avoid precision loss during filtering.
+    is_int = np.issubdtype(data.dtype, np.integer)
+    dataf = data.copy() if not is_int else data.astype(np.float64, copy=True)

Note: If your data can exceed 2**53 (rare for amplitude arrays), we should avoid whole‑array round‑trips for integer inputs and only write back changed indices. If that’s a real use case here, I can propose a mask‑based write‑back variant.


16-35: Confirm mode handling in move_median
The move_median signature accepts a mode kwarg and forwards it to moving_window. Under the SciPy backend mode is honored by median_filter, and under the Bottleneck backend it’s filtered out (no TypeError) to use Bottleneck’s default edge behavior.

@codspeed-hq
Copy link

codspeed-hq bot commented Sep 29, 2025

CodSpeed Performance Report

Merging #557 will not alter performance

Comparing hampel_speed (dd1a655) with master (9b94e78)

Summary

✅ 1 untouched
🆕 49 new
⏩ 48 skipped1

Benchmarks breakdown

Benchmark BASE HEAD Change
🆕 test_get_format N/A 96 ms N/A
🆕 test_read N/A 15.8 s N/A
🆕 test_scan N/A 254 ms N/A
🆕 test_scan_df N/A 349.2 ms N/A
🆕 test_spool N/A 16.5 s N/A
🆕 test_first N/A 6 ms N/A
🆕 test_last N/A 6.2 ms N/A
🆕 test_max N/A 14.8 ms N/A
🆕 test_mean N/A 32.9 ms N/A
🆕 test_median N/A 141.7 ms N/A
🆕 test_min N/A 14.8 ms N/A
🆕 test_std N/A 48.1 ms N/A
🆕 test_sum N/A 29.2 ms N/A
🆕 test_decimate N/A 208.9 ms N/A
🆕 test_hampel_filter N/A 284.6 ms N/A
🆕 test_hampel_filter_non_approximate N/A 338 ms N/A
🆕 test_interpolate N/A 92.7 ms N/A
🆕 test_median_filter N/A 299.3 ms N/A
🆕 test_pass_filter N/A 251.4 ms N/A
🆕 test_roll N/A 13.4 ms N/A
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Footnotes

  1. 48 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link

codecov bot commented Sep 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.92%. Comparing base (9b94e78) to head (dd1a655).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master     #557    +/-   ##
========================================
  Coverage   99.92%   99.92%            
========================================
  Files         126      127     +1     
  Lines       10481    10586   +105     
========================================
+ Hits        10473    10578   +105     
  Misses          8        8            
Flag Coverage Δ
unittests 99.92% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pyproject.toml (1)

62-69: Pin bottleneck>=1.4.1 in extras (NumPy 2.x / Python 3.13 support).

Unversioned extras can pull incompatible wheels—update pyproject.toml and note in CHANGELOG:

-    "bottleneck",
+    "bottleneck>=1.4.1",
🧹 Nitpick comments (7)
tests/test_utils/test_moving.py (4)

20-29: Stabilize test surface by avoiding internal registry import in parametrization

Importing OPERATION_REGISTRY couples tests to internal implementation and can shift test coverage unintentionally if new ops land (e.g., “std”). Prefer an explicit, curated list for this module’s scope.

-TEST_OPERATIONS = list(OPERATION_REGISTRY.keys())
+# Keep this list explicit to avoid accidental coverage drift when registry changes.
+TEST_OPERATIONS = ["median", "mean", "sum", "min", "max", "std"]

175-182: Rename test or docstring: it raises, not warns

The docstring says “warning,” but the test asserts a ParameterError. Align wording to avoid confusion.

-    """Test warning for window larger than data."""
+    """Test error for window larger than data."""

65-73: Guard interior slice for small windows and document behavior

Logic is fine; consider naming the half-window and add a brief comment for readability. No behavior change.

-        interior_slice = slice(window // 2, -window // 2 if window > 2 else None)
+        h = window // 2  # half-window; avoid edges where padding differs by engine
+        interior_slice = slice(h, -h if window > 2 else None)

179-182: Remove unreachable assertion inside raises block

The isinstance check never runs because moving_window raises. Drop it to keep the test tight.

-        with pytest.raises(ParameterError, match="larger than data size"):
-            result = moving_window(data, large_window, "median")
-            assert isinstance(result, np.ndarray)
+        with pytest.raises(ParameterError, match="larger than data size"):
+            moving_window(data, large_window, "median")
dascore/proc/hampel.py (3)

203-207: Prevent overflow on cast-back to narrow integer types

Round then clip to the target dtype’s bounds before astype to avoid wraparound.

-    if np.issubdtype(data.dtype, np.integer):
-        dataf = np.rint(dataf)
-    out = dataf.astype(data.dtype, copy=False)
+    if np.issubdtype(data.dtype, np.integer):
+        dataf = np.rint(dataf, out=dataf)
+        info = np.iinfo(data.dtype)
+        np.clip(dataf, info.min, info.max, out=dataf)
+    out = dataf.astype(data.dtype, copy=False)

84-93: Reduce allocations in exact path by mutating MAD in place

You can avoid creating mad_safe by fixing zeros in-place and dividing with out to reuse abs_med_diff.

-    mad_safe = np.where(mad == 0.0, np.finfo(dataf.dtype).eps, mad)
-    # Hampel test and replacement
-    thresholded = abs_med_diff / mad_safe
-    # Use in-place assignment for consistency
-    outlier_mask = thresholded > threshold
+    eps = np.finfo(dataf.dtype).eps
+    mad[mad == 0.0] = eps
+    # Reuse abs_med_diff buffer for thresholding
+    np.divide(abs_med_diff, mad, out=abs_med_diff)
+    outlier_mask = abs_med_diff > threshold

147-151: Doc nit: quantify performance claims and qualify with data shape/engine

If you have benchmark numbers, consider citing ranges (e.g., “~3–4x on 2D patches of shape … with bottleneck 1.3.x”). This helps set expectations.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 048e255 and 7420d17.

📒 Files selected for processing (4)
  • dascore/core/spool.py (1 hunks)
  • dascore/proc/hampel.py (7 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_utils/test_moving.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • dascore/core/spool.py
🧰 Additional context used
🧬 Code graph analysis (2)
dascore/proc/hampel.py (2)
dascore/utils/moving.py (1)
  • move_median (202-206)
dascore/utils/patch.py (1)
  • get_patch_window_size (677-762)
tests/test_utils/test_moving.py (2)
dascore/exceptions.py (1)
  • ParameterError (26-27)
dascore/utils/moving.py (7)
  • _get_available_engines (113-116)
  • move_max (237-241)
  • move_mean (209-213)
  • move_median (202-206)
  • move_min (230-234)
  • move_sum (223-227)
  • moving_window (150-198)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: test_code (windows-latest, 3.13)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.13)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.13)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: Run benchmarks
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
🔇 Additional comments (1)
dascore/proc/hampel.py (1)

28-33: No change needed for mode kwarg
The Bottleneck wrapper (_apply_bottleneck_operation) explicitly drops unsupported keys (mode, origin, cval), so passing mode into move_median(..., engine="auto") won’t reach the Bottleneck function or raise a TypeError.

Likely an incorrect or invalid review comment.

Comment on lines +193 to +197
# Need to convert ints to float for calculations to avoid roundoff error.
# There were issues using np.issubdtype not working so this uses kind.
is_int = data.dtype.kind in {"i", "u"}
dataf = data.copy() if not is_int else data.astype(np.float32)
# Apply hampel filtering using appropriate method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: promote integer inputs to float64 (not float32) and use np.issubdtype

Casting integers to float32 loses precision above 2**24 and silently corrupts untouched samples. Also include unsigned integers via np.issubdtype.

-    # Need to convert ints to float for calculations to avoid roundoff error.
-    # There were issues using np.issubdtype not working so this uses kind.
-    is_int = data.dtype.kind in {"i", "u"}
-    dataf = data.copy() if not is_int else data.astype(np.float32)
+    # Promote integer inputs to a float dtype that preserves integer precision.
+    is_int = np.issubdtype(data.dtype, np.integer)
+    dataf = data.astype(np.float64, copy=False) if is_int else data.copy()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Need to convert ints to float for calculations to avoid roundoff error.
# There were issues using np.issubdtype not working so this uses kind.
is_int = data.dtype.kind in {"i", "u"}
dataf = data.copy() if not is_int else data.astype(np.float32)
# Apply hampel filtering using appropriate method
# Promote integer inputs to a float dtype that preserves integer precision.
is_int = np.issubdtype(data.dtype, np.integer)
dataf = data.astype(np.float64, copy=False) if is_int else data.copy()
# Apply hampel filtering using appropriate method
🤖 Prompt for AI Agents
In dascore/proc/hampel.py around lines 193 to 197, the current logic casts
integer inputs to float32 and detects integers by dtype.kind; change this to use
np.issubdtype(data.dtype, np.integer) (which covers signed and unsigned ints) to
detect integer types and promote integer arrays to float64 instead of float32 to
avoid precision loss; ensure you only cast when needed (leave non-integer dtypes
untouched or copy only when necessary) so downstream calculations use float64
precision.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
tests/test_utils/test_moving.py (7)

9-18: Include move_std import to exercise std convenience path (gated by engine).

You’re testing all convenience funcs except std; import it so we can add a conditional test without breaking when Bottleneck isn’t installed.

 from dascore.utils.moving import (
     OPERATION_REGISTRY,
     _get_available_engines,
     move_max,
     move_mean,
     move_median,
     move_min,
+    move_std,
     move_sum,
     moving_window,
 )

136-147: Test move_std convenience function when Bottleneck is available.

Std is supported via the Bottleneck engine; add a gated check to cover this path and its equivalence to moving_window.

-    def test_convenience_functions(self, test_data):
+    def test_convenience_functions(self, test_data, available_engines):
         """Test that convenience functions work correctly."""
         data = test_data["1d"]
         window = 3
 
         for operation, func in TEST_CONVENIENCE_FUNCS.items():
             result = func(data, window)
             self._validate_basic_result(result, data)
 
             # Check equivalence with generic function
             expected = moving_window(data, window, operation)
             np.testing.assert_array_equal(result, expected)
+
+        # Also validate std where supported (Bottleneck)
+        if "bottleneck" in available_engines:
+            result_std = move_std(data, window, engine="bottleneck")
+            self._validate_basic_result(result_std, data)
+            expected_std = moving_window(data, window, "std", engine="bottleneck")
+            np.testing.assert_array_equal(result_std, expected_std)

42-64: Strengthen property check: assert identity sum == mean * window.

Inequality (sum ≥ mean) only holds for non-negative data; here we can make a stricter, window-aware identity to reduce false positives. Pass window into the helper and assert equality on finite positions.

-    def _test_finite_properties(self, results, data):
+    def _test_finite_properties(self, results, data, window):
         """Test mathematical properties of operations."""
         # Find where all results are finite
         all_finite = np.ones(len(data), dtype=bool)
         for result in results.values():
             all_finite &= np.isfinite(result)
 
         if not np.any(all_finite):
             return  # Skip if no finite values
 
         finite_indices = np.where(all_finite)[0]
 
         # Test properties only where values are finite
         mean_vals = results["mean"][finite_indices]
         sum_vals = results["sum"][finite_indices]
         min_vals = results["min"][finite_indices]
         max_vals = results["max"][finite_indices]
 
-        # Sum should be >= mean (for positive window size)
-        assert np.all(sum_vals >= mean_vals)
+        # Identity: sum equals mean * window
+        np.testing.assert_allclose(sum_vals, mean_vals * window, rtol=1e-12, atol=0.0)
         # Min should be <= max
         assert np.all(min_vals <= max_vals)

211-223: Propagate window to property helper.

Pass the window so the strengthened identity can be checked.

     def test_operation_properties(self, test_data):
         """Test mathematical properties of operations."""
         data = test_data["1d"]
         window = 3
 
         results = {}
         for operation in ["median", "mean", "sum", "min", "max"]:
             results[operation] = moving_window(data, window, operation)
 
-        # Test properties where both results are finite
-        self._test_finite_properties(results, data)
+        # Test properties where both results are finite
+        self._test_finite_properties(results, data, window)

175-181: Remove unreachable assertions inside pytest.raises context and align name.

Code after the raising call won’t execute; drop it. Optionally rename the test to “raises” for clarity.

     def test_large_window_warning(self, test_data):
         """Test warning for window larger than data."""
         data = test_data["1d"]
         large_window = len(data) + 1
         with pytest.raises(ParameterError, match="larger than data size"):
-            result = moving_window(data, large_window, "median")
-            assert isinstance(result, np.ndarray)
+            moving_window(data, large_window, "median")

170-174: Also assert the function returns a valid result when falling back after warn.

Capture the output and check shape to ensure the fallback engine path works.

     def test_unavailable_engine_warns(self, test_data):
         """Test unavailable engine warns."""
-        with pytest.warns(UserWarning, match="not available"):
-            moving_window(test_data["1d"], 3, "median", engine="invalid_engine")
+        with pytest.warns(UserWarning, match="not available"):
+            out = moving_window(test_data["1d"], 3, "median", engine="invalid_engine")
+        assert isinstance(out, np.ndarray)
+        assert out.shape == test_data["1d"].shape

115-135: Decouple skip logic from exception message text.

Relying on the substring “not available” is brittle; skip on any ParameterError here since inputs are otherwise valid.

-        except ParameterError as e:
-            if "not available" in str(e):
-                pytest.skip(f"Operation {operation} not available in {engine}")
-            else:
-                raise
+        except ParameterError as e:
+            pytest.skip(f"{operation} not supported on engine={engine}: {e}")
tests/test_core/test_spool.py (1)

178-194: Strengthen the test with a sanity check that duplicates exist on the dedup subset.

Add a quick pre-check so the test fails loudly if the fixture stops producing duplicates on the expected columns.

 def test_drop_duplicates_large_joined_dataframe(self, large_spool_with_duplicates):
     """Test that duplicates are removed when joined dataframe > 10 rows."""
     spool = large_spool_with_duplicates
+    # Sanity: duplicates must exist on the dedup subset to exercise the path.
+    df = spool.get_contents()
+    subset = [c for c in ("network", "station", "data_type", "tag") if c in df.columns]
+    assert subset, "Expected group columns not present in contents"
+    assert len(df) > 10, "Joined DataFrame must exceed threshold to hit dedup path"
+    assert df.duplicated(subset=subset).any(), (
+        "Fixture did not create duplicates on the dedup subset"
+    )

     # Access a patch to trigger _get_patches_from_index
     # This should execute the drop_duplicates code path when len(joined) > 10
     patch = spool[0]
     assert isinstance(patch, dc.Patch)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7420d17 and c82d728.

📒 Files selected for processing (2)
  • tests/test_core/test_spool.py (2 hunks)
  • tests/test_utils/test_moving.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_utils/test_moving.py (2)
dascore/exceptions.py (1)
  • ParameterError (26-27)
dascore/utils/moving.py (7)
  • _get_available_engines (113-116)
  • move_max (237-241)
  • move_mean (209-213)
  • move_median (202-206)
  • move_min (230-234)
  • move_sum (223-227)
  • moving_window (150-198)
tests/test_core/test_spool.py (4)
dascore/examples.py (1)
  • get_example_patch (659-706)
tests/conftest.py (2)
  • patch (375-377)
  • spool (569-571)
dascore/proc/basic.py (1)
  • update_attrs (96-133)
dascore/core/spool.py (1)
  • spool (661-690)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.13)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.13)
  • GitHub Check: Run benchmarks
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
🔇 Additional comments (1)
tests/test_core/test_spool.py (1)

128-156: Ignore this suggestion: the fixture’s patches share the default dims attribute, so they already duplicate on (“network”, “station”, “dims”, “data_type”, “tag”) and exercise the dedup path correctly.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/test_clients/test_dirspool.py (2)

367-372: Strengthen assertions to validate dedup effectiveness, not just success.

Check the chunk spans the full time extent of the (potentially duplicated) contents.

 def test_chunk_redundant_index(self, directory_spool_redundant_index):
     """Ensure redundant indices are handled effectively with chunking"""
     spool = directory_spool_redundant_index.chunk(time=None)
     patch = spool[0]
     assert isinstance(patch, dc.Patch)
+    contents = directory_spool_redundant_index.get_contents()
+    assert patch.attrs.time_min == contents["time_min"].min()
+    assert patch.attrs.time_max == contents["time_max"].max()

71-84: Harden directory_spool_redundant_index fixture to avoid flakiness

  • Exclude the index file (.dascore_index.h5) by globbing only *.hdf5 data files.
  • Advance each file’s mtime explicitly with os.utime to guarantee duplicates on filesystems with coarse timestamp resolution.
  • (Optional) set scope="module" to reduce per-test setup overhead.

Apply this diff:

+import os
 @pytest.fixture(scope="module")
 def directory_spool_redundant_index(random_spool, tmp_path_factory):
     """Force a spool to be indexed many times with same files."""
     path = Path(tmp_path_factory.mktemp("redundant_index_spool"))
     dascore.examples.spool_to_directory(random_spool, path, "dasdae")
     spool = dc.spool(path).update()

-    # Touch each file, re-index to saturate index with duplicates.
-    for _ in range(12):
-        for file_path in path.glob("*"):
-            file_path.touch()
-        spool = spool.update()
+    # Touch only data files, force monotonic mtime to create duplicates robustly.
+    for i in range(12):
+        for file_path in path.glob("*.hdf5"):
+            st = file_path.stat()
+            os.utime(file_path, (st.st_atime, st.st_mtime + i + 1))
+        spool = spool.update()
     return spool
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c82d728 and 4d6c62b.

📒 Files selected for processing (4)
  • dascore/core/spool.py (1 hunks)
  • pyproject.toml (2 hunks)
  • tests/test_clients/test_dirspool.py (2 hunks)
  • tests/test_utils/test_moving.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • dascore/core/spool.py
  • pyproject.toml
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_clients/test_dirspool.py (7)
dascore/examples.py (2)
  • random_spool (553-576)
  • spool_to_directory (635-655)
tests/conftest.py (3)
  • random_spool (447-451)
  • spool (569-571)
  • patch (375-377)
dascore/core/spool.py (4)
  • spool (663-692)
  • update (237-249)
  • chunk (94-141)
  • chunk (501-531)
dascore/io/indexer.py (2)
  • update (90-95)
  • update (309-346)
dascore/clients/filespool.py (1)
  • update (71-82)
dascore/clients/dirspool.py (1)
  • update (105-112)
dascore/utils/chunk.py (1)
  • chunk (429-471)
tests/test_utils/test_moving.py (2)
dascore/exceptions.py (1)
  • ParameterError (26-27)
dascore/utils/moving.py (8)
  • _get_available_engines (113-116)
  • move_max (237-241)
  • move_mean (209-213)
  • move_median (202-206)
  • move_min (230-234)
  • move_std (216-220)
  • move_sum (223-227)
  • moving_window (150-198)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: test_code (windows-latest, 3.13)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.13)
  • GitHub Check: test_code (macos-latest, 3.13)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: Run benchmarks
🔇 Additional comments (15)
tests/test_utils/test_moving.py (15)

1-31: Well-structured test module with comprehensive imports and configuration.

The module imports are clean and appropriate. The test configuration constants (TEST_OPERATIONS, TEST_ENGINES, TEST_CONVENIENCE_FUNCS) provide good coverage for parametrized testing across different operations, engines, and convenience functions.


33-42: Robust basic validation helper method.

The _validate_basic_result helper properly validates fundamental properties: numpy array type, shape preservation, and appropriate numeric dtypes. The dtype check covers float, int, and complex types which is comprehensive.


43-65: Well-designed finite properties test with mathematical validation.

This helper method intelligently handles cases where operations might produce non-finite values and validates mathematical properties (sum ≥ mean for positive windows, min ≤ max) only where all results are finite. The logic is sound and defensive.


66-96: Thorough engine comparison with reasonable tolerance for edge effects.

The comparison method properly handles interior values to avoid edge effects, validates results within reasonable bounds, and allows for 50% tolerance in relative differences to accommodate different edge handling strategies between engines. This is a pragmatic approach to cross-engine validation.


99-108: Good test data fixtures with deterministic random generation.

The fixtures provide comprehensive test scenarios including 1D/2D arrays, large arrays with deterministic randomness, and edge cases (single element, constant values). Using np.random.default_rng(42) ensures reproducible test runs while satisfying modern NumPy practices.


116-136: Comprehensive parametrized testing with proper error handling.

The test properly handles unavailable engines/operations by skipping rather than failing, which is appropriate for optional dependencies like bottleneck. The exception handling distinguishes between availability issues and actual errors.


137-149: Excellent equivalence testing for convenience functions.

This test validates that convenience functions produce identical results to the generic moving_window interface, which is crucial for API consistency. The approach ensures the convenience functions are proper aliases without behavioral differences.


150-158: Good multi-axis support validation.

Testing operations on different axes ensures the moving window operations work correctly across array dimensions. The test appropriately validates shape preservation for 2D arrays.


160-170: Proper input validation testing.

The tests correctly validate that invalid window sizes and unknown operations raise ParameterError with appropriate error messages. This ensures the API provides clear feedback for invalid inputs.


185-201: Solid edge case testing with meaningful validation.

The edge cases cover important scenarios (single element, constant values) and include the important validation that window size 1 should return the input unchanged. This is mathematically correct behavior.


202-211: Good dtype compatibility testing.

Testing with different numeric types (int32, float32, float64) ensures the moving window operations handle various input dtypes correctly. The test appropriately uses a smaller subset for performance.


213-224: Excellent mathematical property validation.

This test validates fundamental mathematical relationships between different operations (median, mean, sum, min, max) by delegating to the well-designed _test_finite_properties helper. This ensures operations behave consistently with mathematical expectations.


225-240: Strong numerical accuracy testing with known results.

The test validates exact expected values for specific cases, which is crucial for ensuring correctness. Testing both median and mean with known inputs and expected outputs provides confidence in the implementations.


241-262: Robust cross-engine consistency testing.

This test properly handles optional bottleneck dependency and validates that different engines produce consistent results for the same operations. The graceful handling of unavailable operations and use of the comparison helper method ensures reliable cross-engine validation.


263-270: Appropriate bottleneck-specific testing.

The test specifically validates bottleneck's std operation (which scipy engine doesn't support) with proper import skipping. This ensures bottleneck-specific functionality is tested when available while gracefully handling its absence.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
tests/test_utils/test_moving.py (6)

69-73: Make interior slicing robust; skip tiny/even-window edge cases

For consistent “interior” comparisons across engines, bail early for very small windows and use a symmetric slice without the special-case None. This avoids pulling edge-influenced values (e.g., for window == 2).

-        interior_slice = slice(window // 2, -window // 2 if window > 2 else None)
+        # Skip comparison for tiny windows where a stable interior is ill-defined.
+        if window < 3:
+            return
+        # Symmetric interior (works for odd/even windows and avoids edges)
+        interior_slice = slice(window // 2, -window // 2)

83-87: Avoid hard-coded value bounds; remove or derive from data

Asserting results lie in [-10, 10] is brittle and unrelated to input distribution or operation (e.g., “sum” scales with window). Either drop the guard or compute operation-aware bounds. Given you already compare relative means, the simplest fix is to remove this block.

-            # Check that both give reasonable results (within data range)
-            data_min, data_max = -10, 10  # Reasonable range
-            assert np.all((vals1 >= data_min) & (vals1 <= data_max))
-            assert np.all((vals2 >= data_min) & (vals2 <= data_max))
+            # Value-range guard removed; relative agreement is checked below.

21-30: Include std convenience check when Bottleneck is available

You already guard engine availability elsewhere; add std here to increase wrapper coverage.

 TEST_CONVENIENCE_FUNCS = {
     "median": move_median,
     "mean": move_mean,
     "sum": move_sum,
     "min": move_min,
     "max": move_max,
 }
+
+# Add std wrapper when bottleneck is available.
+if "bottleneck" in _get_available_engines():
+    TEST_CONVENIENCE_FUNCS["std"] = move_std

Also applies to: 137-149


43-65: Strengthen property test: relate sum and mean on interior

sum >= mean is weak and data-sign dependent. Prefer sum ≈ mean * window on interior indices to avoid edge effects. Pass window into the helper and assert allclose on the interior slice.

-    def _test_finite_properties(self, results, data):
+    def _test_finite_properties(self, results, data, window):
         """Test mathematical properties of operations."""
@@
-        # Sum should be >= mean (for positive window size)
-        assert np.all(sum_vals >= mean_vals)
+        # On interior indices, sum should equal mean * window (within tolerance).
+        left = window // 2
+        right = -window // 2
+        interior = slice(left, right if window >= 3 else None)
+        np.testing.assert_allclose(
+            sum_vals[interior], mean_vals[interior] * window, rtol=1e-6, atol=1e-12
+        )
         # Min should be <= max
         assert np.all(min_vals <= max_vals)

And update the call site:

-        self._test_finite_properties(results, data)
+        self._test_finite_properties(results, data, window)

Also applies to: 215-226


150-158: Broaden multi-axis coverage to more ops

Axis handling bugs often differ by op; consider parametrizing over a small set like median/mean/sum/min/max.

-        for axis in [0, 1]:
-            result = move_median(data, window, axis=axis)
-            assert result.shape == data.shape
+        for axis in [0, 1]:
+            for func in (move_median, move_mean, move_sum, move_min, move_max):
+                result = func(data, window, axis=axis)
+                assert result.shape == data.shape

208-213: Assert dtype promotion for integer inputs to mean

Mean over integer arrays should produce a floating dtype; assert it to catch regressions.

-            result = move_mean(data, 3)
-            assert isinstance(result, np.ndarray)
+            result = move_mean(data, 3)
+            assert isinstance(result, np.ndarray)
+            # Mean should promote to floating dtype for integer inputs.
+            assert result.dtype.kind == "f"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d6c62b and 9ebffc9.

📒 Files selected for processing (1)
  • tests/test_utils/test_moving.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_utils/test_moving.py (2)
dascore/exceptions.py (1)
  • ParameterError (26-27)
dascore/utils/moving.py (8)
  • _get_available_engines (113-116)
  • move_max (237-241)
  • move_mean (209-213)
  • move_median (202-206)
  • move_min (230-234)
  • move_std (216-220)
  • move_sum (223-227)
  • moving_window (150-198)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: Run benchmarks
  • GitHub Check: test_code (windows-latest, 3.13)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (macos-latest, 3.13)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.13)
🔇 Additional comments (1)
tests/test_utils/test_moving.py (1)

171-177: select_engine warns and falls back as expected
The function issues a UserWarning containing “not available” and then returns the first available engine (“scipy”), so the test’s warning and return-value assertions are valid.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
tests/test_utils/test_moving.py (6)

45-45: Include unsigned integer dtype in basic result check

Some ops (min/max/sum) can return unsigned kinds for unsigned inputs. Accept "u" as well.

-        assert result.dtype.kind in ["f", "i", "c"]  # float, int, or complex
+        assert result.dtype.kind in ("f", "i", "u", "c")  # float, (u)int, or complex

147-154: Make skip condition robust to error-message wording

Relying only on "not available" is brittle. Broaden to common phrasings and lowercase once.

-        except ParameterError as e:
-            if "not available" in str(e):
-                pytest.skip(f"Operation {operation} not available in {engine}")
-            else:
-                raise
+        except ParameterError as e:
+            msg = str(e).lower()
+            if any(s in msg for s in ("not available", "unsupported", "unknown operation")):
+                pytest.skip(f"{operation} not supported by engine={engine}")
+            raise

169-178: Strengthen multi‑axis checks by asserting equivalence with generic API

Currently only shape is asserted. Compare values to moving_window for each op/axis.

-        for axis in [0, 1]:
-            for func in (move_median, move_mean, move_sum, move_min, move_max):
-                result = func(data, window, axis=axis)
-                assert result.shape == data.shape
+        for axis in [0, 1]:
+            for operation, func in TEST_CONVENIENCE_FUNCS.items():
+                result = func(data, window, axis=axis)
+                expected = moving_window(data, window, operation, axis=axis)
+                assert result.shape == data.shape
+                np.testing.assert_array_equal(result, expected)

191-197: Also verify fallback equivalence on invalid engine

After warning, ensure output matches the automatic engine selection.

     def test_unavailable_engine_warns(self, test_data):
         """Test unavailable engine warns."""
         with pytest.warns(UserWarning, match="not available"):
-            out = moving_window(test_data["1d"], 3, "median", engine="invalid_engine")
+            out = moving_window(test_data["1d"], 3, "median", engine="invalid_engine")
         # It should also return an array.
         assert isinstance(out, np.ndarray)
         assert out.shape == test_data["1d"].shape
+        # And it should be equivalent to the auto-selected engine.
+        expected = moving_window(test_data["1d"], 3, "median", engine="auto")
+        np.testing.assert_array_equal(out, expected)

288-295: Strengthen bottleneck std test: use constant data and check zeros

Shape-only assertion can miss regressions. Constant input should yield zeros.

     def test_bottleneck_std(self, test_data):
         """
         Ensure the bottleneck std works (current scipy engine doesnt have this).
         """
         pytest.importorskip("bottleneck")
-        out = move_std(test_data["1d"], 10, axis=0, engine="bottleneck")
-        assert out.shape == test_data["1d"].shape
+        data = test_data["constant"]
+        out = move_std(data, 3, axis=0, engine="bottleneck")
+        assert out.shape == data.shape
+        assert np.allclose(out, 0.0)

23-23: Optional: build TEST_ENGINES dynamically to avoid skip noise

Derive engines from runtime availability and append "auto".

-TEST_ENGINES = ["scipy", "bottleneck", "auto"]
+TEST_ENGINES = [*_get_available_engines(), "auto"]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ebffc9 and 0781e5e.

📒 Files selected for processing (1)
  • tests/test_utils/test_moving.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_utils/test_moving.py (2)
dascore/exceptions.py (1)
  • ParameterError (26-27)
dascore/utils/moving.py (8)
  • _get_available_engines (113-116)
  • move_max (237-241)
  • move_mean (209-213)
  • move_median (202-206)
  • move_min (230-234)
  • move_std (216-220)
  • move_sum (223-227)
  • moving_window (150-198)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: test_code (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: Run benchmarks
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.13)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code (macos-latest, 3.13)
🔇 Additional comments (1)
tests/test_utils/test_moving.py (1)

37-39: Good addition: comprehensive, engine-aware test scaffold

Solid coverage across ops/engines with helpful helpers. Nice use of fixtures and interior checks.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
tests/test_utils/test_moving.py (7)

62-76: Unify “interior” indexing with the helper and widen valid region

Use the same symmetric interior definition as _compare_engine_results to avoid off‑by‑one inconsistencies and cover more valid samples.

-        if window >= 3 and len(data) >= window:
-            # True interior: start from window-1 to end-(window-1)
-            interior_start = window - 1
-            interior_end = len(data) - (window - 1)
-            if interior_end > interior_start:
-                interior_slice = slice(interior_start, interior_end)
+        if window >= 3 and len(data) >= window:
+            # Symmetric interior consistent with _compare_engine_results
+            interior_slice = slice(window // 2, len(data) - (window // 2))
+            if interior_slice.stop - interior_slice.start > 0:
                 interior_finite = all_finite[interior_slice]
                 if np.any(interior_finite):
                     mean_interior = mean_vals[interior_slice][interior_finite]
                     sum_interior = sum_vals[interior_slice][interior_finite]
                     np.testing.assert_allclose(
                         sum_interior, mean_interior * window, rtol=1e-6, atol=1e-12
                     )

151-154: Make engine-unavailable skip more robust to message phrasing

Avoid coupling to a single substring; handle common variants.

-        except ParameterError as e:
-            if "not available" in str(e):
-                pytest.skip(f"Operation {operation} not available in {engine}")
-            else:
-                raise
+        except ParameterError as e:
+            msg = str(e).lower()
+            if any(p in msg for p in ("not available", "unsupported", "unavailable")):
+                pytest.skip(f"Operation {operation} not available in {engine}")
+            else:
+                raise

191-197: Also assert behavior matches the auto-engine fallback

Strengthen this by verifying output equals the auto-selected engine result.

     def test_unavailable_engine_warns(self, test_data):
         """Test unavailable engine warns."""
         with pytest.warns(UserWarning, match="not available"):
             out = moving_window(test_data["1d"], 3, "median", engine="invalid_engine")
-        # It should also return an array.
-        assert isinstance(out, np.ndarray)
-        assert out.shape == test_data["1d"].shape
+        # It should also return an array and match auto-engine output.
+        assert isinstance(out, np.ndarray)
+        assert out.shape == test_data["1d"].shape
+        expected = moving_window(test_data["1d"], 3, "median", engine="auto")
+        np.testing.assert_array_equal(out, expected)

174-178: Parametrize axis (including negative) for clearer test matrix

Parametrization improves reporting and adds axis=-1 coverage.

-    def test_multi_axis_operations(self, test_data):
+    @pytest.mark.parametrize("axis", (0, 1, -1))
+    def test_multi_axis_operations(self, test_data, axis):
         """Test operations on different axes."""
         data = test_data["2d"]
         window = 2
-
-        for axis in [0, 1]:
-            for func in (move_median, move_mean, move_sum, move_min, move_max):
-                result = func(data, window, axis=axis)
-                assert result.shape == data.shape
+        for func in (move_median, move_mean, move_sum, move_min, move_max):
+            result = func(data, window, axis=axis)
+            assert result.shape == data.shape

276-283: Tighten cross‑engine agreement for additive ops on the interior

For mean/sum, engines should agree element‑wise on interior. Keep the existing lenient check, but add an allclose assert.

         for operation in operations_to_test:
             try:
                 result_scipy = moving_window(data, window, operation, engine="scipy")
                 result_bn = moving_window(data, window, operation, engine="bottleneck")
 
                 # Compare interior values (avoiding edge effects)
                 self._compare_engine_results(result_scipy, result_bn, window)
+                interior = slice(window // 2, -window // 2)
+                if operation in ("mean", "sum"):
+                    np.testing.assert_allclose(
+                        result_scipy[interior], result_bn[interior], rtol=1e-7, atol=1e-12
+                    )
 
             except ParameterError:
                 # Skip if operation not available in one engine
                 continue

228-236: Broaden dtype coverage and simplify promotion check

Include more integer types and assert promotion generically.

-        dtypes = [np.int32, np.float32, np.float64]
+        dtypes = [np.int32, np.int64, np.uint8, np.float32, np.float64]
         for dtype in dtypes:
             data = base_data.astype(dtype)
             result = move_mean(data, 3)
             assert isinstance(result, np.ndarray)
             # Mean should promote to floating dtype for integer inputs.
-            if dtype in [np.int32]:
-                assert result.dtype.kind == "f"
+            if np.issubdtype(dtype, np.integer):
+                assert result.dtype.kind == "f"

107-114: Lax engine‑comparison tolerance risks false negatives

Mean-of-interior with 50% tolerance may miss mismatches. The added per-element allclose (suggested in test_engine_consistency) will catch real divergences while keeping this guard if you want a coarse sanity check.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0781e5e and dd1a655.

📒 Files selected for processing (1)
  • tests/test_utils/test_moving.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_utils/test_moving.py (2)
dascore/exceptions.py (1)
  • ParameterError (26-27)
dascore/utils/moving.py (8)
  • _get_available_engines (113-116)
  • move_max (237-241)
  • move_mean (209-213)
  • move_median (202-206)
  • move_min (230-234)
  • move_std (216-220)
  • move_sum (223-227)
  • moving_window (150-198)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code (macos-latest, 3.13)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.13)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code (ubuntu-latest, 3.13)
  • GitHub Check: Run benchmarks
🔇 Additional comments (1)
tests/test_utils/test_moving.py (1)

37-39: Solid, comprehensive test scaffold

Good coverage across ops, engines, axes, validation, and numerical properties. Nice use of importorskip for optional bn.

@d-chambers d-chambers merged commit 23c13a0 into master Sep 29, 2025
24 checks passed
@d-chambers d-chambers deleted the hampel_speed branch September 29, 2025 13:49
@coderabbitai coderabbitai bot mentioned this pull request Oct 27, 2025
4 tasks
@coderabbitai coderabbitai bot mentioned this pull request Dec 22, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch related to Patch class proc Related to processing module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants