Skip to content

Conversation

@d-chambers
Copy link
Contributor

@d-chambers d-chambers commented Sep 20, 2025

Description

This PR fixes a bug in patch reverse operations (currently are not correct), and makes the spool equality comparisons more robust to circular references (just in case).

Checklist

I have (if applicable):

  • referenced the GitHub issue this PR closes.
  • documented the new feature with docstrings or appropriate doc page.
  • included a test. See testing guidelines.
  • your name has been added to the contributors page (docs/contributors.md).
  • added the "ready_for_review" tag once the PR is ready to be reviewed.

Summary by CodeRabbit

  • New Features

    • Reverse arithmetic support for Patch-like objects (e.g., scalar/array − Patch, **, /, //, %).
    • Added a public deep-equality utility for robust comparisons of nested structures, arrays, and DataFrames.
  • Bug Fixes

    • More reliable spool equality handling for mixed and nested types.
    • Strain-rate conversions now validate step_multiple and raise clear errors for non-positive values.
    • Deprecation warnings now point to the correct call site for clearer diagnostics.
  • Tests

    • Expanded coverage for patch ops, spool equality, strain validations, and deep-equality utility.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 20, 2025

Walkthrough

Updates warning stacklevel in CoordManager; implements explicit reverse arithmetic operators on Patch; refactors BaseSpool equality to use a new deep_equality_check utility; adds that utility; enforces positive step_multiple in strain-rate transforms; and expands tests for these behaviors.

Changes

Cohort / File(s) Summary of changes
CoordManager warning level
dascore/core/coordmanager.py
Increased deprecation warning stacklevel in CoordManager.__getitem__ from 2 to 6.
Patch reverse operators & tests
dascore/core/patch.py, tests/test_core/test_patch.py
Replaced reverse-operator aliasing with explicit __rsub__, __rpow__, __rtruediv__, __rfloordiv__, __rmod__ methods that call dascore.utils.array.apply_ufunc with appropriate NumPy ufuncs; added tests for reverse subtraction with scalar and array operands.
Spool equality refactor & tests
dascore/core/spool.py, tests/test_core/test_spool.py
Replaced in-method deep-dict/object comparison with a call to deep_equality_check; added tests exercising spool equality across mixed types, objects with __dict__, arrays, and DataFrames.
Deep equality utility & tests
dascore/utils/misc.py, tests/test_utils/test_misc.py
Added public deep_equality_check(obj1, obj2, visited=None) handling mappings, objects (__dict__), numpy arrays, pandas DataFrames, and circular references; added extensive unit tests for many edge cases.
Strain validation & tests
dascore/transform/strain.py, tests/test_transform/test_strain.py
Added validation requiring step_multiple > 0 in velocity_to_strain_rate and velocity_to_strain_rate_edgeless (raises ParameterError on non-positive); expanded tests for negative/zero/odd/even and valid even cases.

Possibly related PRs

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Fix spool equality, patch reverse operation" is concise and directly reflects the two principal changes in the diff (spool equality robustness and correction of patch reverse operations), so it accurately summarizes the main changes; it is short and clear for a teammate scanning history. Minor nit: "operation" could be pluralized to "operations" to reflect multiple reverse operator methods changed. Overall, it is related and appropriately focused.
Description Check ✅ Passed The PR description follows the repository template by including a Description and Checklist and succinctly explains the intent to fix patch reverse operations and improve spool equality handling for circular references; this gives a clear high-level summary of the changes. However, the checklist items are all unchecked, no related issue or documentation links are provided, and the description does not explicitly note the added tests or required docstring/changelog updates. Because the core intent is clear but implementation details, references, and checklist completion are missing, the description is mostly complete but would benefit from small additions.
Docstring Coverage ✅ Passed Docstring coverage is 80.95% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bug_review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai bot added bug Something isn't working patch related to Patch class proc Related to processing module labels Sep 20, 2025
@codecov
Copy link

codecov bot commented Sep 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.92%. Comparing base (25b4b75) to head (886e3e4).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #542   +/-   ##
=======================================
  Coverage   99.92%   99.92%           
=======================================
  Files         124      124           
  Lines       10336    10375   +39     
=======================================
+ Hits        10328    10367   +39     
  Misses          8        8           
Flag Coverage Δ
unittests 99.92% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
dascore/utils/misc.py (1)

842-842: Consider type consistency in base case.

Line 842 performs direct equality comparison (return obj1 == obj2) for non-dict objects. However, this might not handle numpy arrays consistently since their equality returns an array, not a boolean. Consider standardizing the return value handling similar to lines 866-872.

-        if not isinstance(obj1, dict) or not isinstance(obj2, dict):
-            # Non-dict comparison, use direct comparison
-            return obj1 == obj2
+        if not isinstance(obj1, dict) or not isinstance(obj2, dict):
+            # Non-dict comparison, handle arrays and other types
+            try:
+                equal = obj1 == obj2
+                # Handle numpy arrays and other array-like objects
+                if hasattr(equal, "all"):
+                    return equal.all()
+                return bool(equal)
+            except (ValueError, TypeError):
+                # For objects that can't be compared, fall back to False
+                return False
tests/test_utils/test_misc.py (1)

523-523: Consider using a more specific exception message.

The message "Cannot compare" is quite generic. Based on the static analysis hint, consider using a more specific error message that explains why comparison failed.

Apply this diff:

-            def __eq__(self, other):
-                raise ValueError("Cannot compare")
+            def __eq__(self, other):
+                raise ValueError("BadComparisonObj instances cannot be compared")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25b4b75 and e10bb6e.

📒 Files selected for processing (9)
  • dascore/core/coordmanager.py (1 hunks)
  • dascore/core/patch.py (1 hunks)
  • dascore/core/spool.py (2 hunks)
  • dascore/transform/strain.py (2 hunks)
  • dascore/utils/misc.py (1 hunks)
  • tests/test_core/test_patch.py (1 hunks)
  • tests/test_core/test_spool.py (1 hunks)
  • tests/test_transform/test_strain.py (2 hunks)
  • tests/test_utils/test_misc.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (8)
dascore/core/spool.py (1)
dascore/utils/misc.py (1)
  • deep_equality_check (789-879)
dascore/core/patch.py (1)
dascore/utils/array.py (1)
  • apply_ufunc (448-495)
dascore/utils/misc.py (2)
dascore/core/coordmanager.py (1)
  • equals (746-756)
dascore/proc/basic.py (1)
  • equals (108-153)
dascore/transform/strain.py (1)
dascore/exceptions.py (1)
  • ParameterError (26-27)
tests/test_utils/test_misc.py (1)
dascore/utils/misc.py (1)
  • deep_equality_check (789-879)
tests/test_core/test_patch.py (2)
dascore/examples.py (1)
  • get_example_patch (659-706)
dascore/core/patch.py (1)
  • data (237-239)
tests/test_core/test_spool.py (3)
dascore/examples.py (2)
  • random_spool (553-576)
  • get_example_patch (659-706)
tests/conftest.py (3)
  • random_spool (447-451)
  • patch (375-377)
  • spool (569-571)
dascore/core/spool.py (1)
  • spool (656-685)
tests/test_transform/test_strain.py (4)
tests/conftest.py (1)
  • patch (375-377)
dascore/exceptions.py (1)
  • ParameterError (26-27)
dascore/transform/strain.py (2)
  • velocity_to_strain_rate (21-121)
  • velocity_to_strain_rate_edgeless (128-210)
dascore/core/patch.py (1)
  • attrs (227-229)
🪛 Ruff (0.13.1)
tests/test_utils/test_misc.py

523-523: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (windows-latest, 3.13)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.13)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code (macos-latest, 3.13)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
🔇 Additional comments (20)
dascore/transform/strain.py (2)

102-105: LGTM! Proper input validation for step_multiple.

The addition of validation to ensure step_multiple is positive is appropriate and prevents invalid states early. The error message is clear and consistent with other parameter validation in the codebase.


180-183: LGTM! Consistent validation across both functions.

Good consistency in adding the same validation to the edgeless variant. This ensures both functions have uniform input validation.

dascore/core/spool.py (2)

32-32: Good refactor: Centralized deep equality logic.

Importing deep_equality_check from dascore.utils.misc promotes code reuse and centralizes the equality checking logic.


86-90: Clean replacement with robust equality checker.

The replacement of inline dictionary comparison with deep_equality_check is cleaner and provides better handling of complex nested structures, circular references, and special types like DataFrames and numpy arrays.

tests/test_core/test_patch.py (2)

653-662: LGTM! Important test for reverse subtraction operator.

This test validates that the new __rsub__ implementation correctly handles scalar - patch, ensuring the result is scalar - patch.data rather than patch.data - scalar. This is crucial for mathematical correctness.


663-672: LGTM! Comprehensive test with array operand.

Good addition of array-based reverse subtraction test, ensuring the operator works correctly with different operand types.

dascore/utils/misc.py (1)

789-879: Well-designed deep equality utility with robust circular reference handling.

The implementation is comprehensive and handles various edge cases well:

  • Proper circular reference detection using object IDs
  • Support for multiple data types (dicts, DataFrames, numpy arrays, objects with __dict__)
  • Clean try-finally pattern to ensure visited set cleanup

A few observations:

  1. The docstring examples are helpful and demonstrate key use cases
  2. The short-circuit on object identity (if val1 is val2) is a good optimization
  3. The fallback behavior for comparison errors is reasonable
dascore/core/coordmanager.py (1)

211-211: Verify stacklevel value change for coords deprecation

CoordManager.getitem (dascore/core/coordmanager.py:211) warns with stacklevel=6 — this will attribute the warning 4 frames up; confirm this is intentional and that it points to user code (not skipping relevant DASCore internal call sites).

tests/test_utils/test_misc.py (4)

12-12: Good addition of pandas import for DataFrame testing.

The pandas import is correctly added to support the comprehensive DataFrame comparison tests in the new TestDeepEqualityCheck class.


20-20: Correctly importing the new deep_equality_check function.

The import aligns with the new public API function added in dascore/utils/misc.py.


387-596: Comprehensive test coverage for deep_equality_check function.

This test class provides excellent coverage of all the edge cases and code paths in the deep_equality_check function:

  • Circular reference detection and handling ✓
  • Non-dict vs dict comparisons ✓
  • Nested dictionary comparisons ✓
  • pandas DataFrame comparisons using .equals() method ✓
  • Objects with __dict__ attributes ✓
  • NumPy array comparisons ✓
  • Exception handling for objects that raise ValueError on comparison ✓
  • Mixed complex data structures ✓

The tests are well-structured and cover both positive and negative cases. The use of custom classes like BadComparisonObj effectively tests the exception handling paths.


574-587: Good edge case coverage for mixed equals method scenarios.

This test effectively covers the case where one object has an equals method (DataFrame) and the other doesn't (custom object), ensuring the comparison falls through to the general comparison path correctly.

dascore/core/patch.py (2)

150-170: Excellent implementation of explicit reverse operators.

The implementation correctly replaces the problematic aliasing approach (__rsub__ = __sub__) with explicit methods that ensure proper operand ordering. Key strengths:

  1. Correct operand order: other - self, other ** self, etc.
  2. Consistent with existing pattern: Uses dascore.utils.array.apply_ufunc like forward operators
  3. Proper documentation: Each method includes clear docstrings
  4. Complete coverage: All necessary reverse operators are implemented

This fixes the reverse operation issue mentioned in the PR objectives where the current reverse behavior was incorrect.


148-148: Verify the aliasing for commutative operations.

The __radd__ = __add__ and __rmul__ = __mul__ aliases are correct since addition and multiplication are commutative operations. This is appropriate and efficient.

Also applies to: 154-154

tests/test_transform/test_strain.py (3)

102-129: Excellent comprehensive validation tests for step_multiple parameter.

These tests provide thorough coverage of the input validation logic added to the strain transformation functions:

  1. Negative values: Correctly tests that negative step_multiple raises ParameterError with "must be positive" message
  2. Zero values: Validates zero is also rejected with the same message
  3. Odd values: Ensures odd values still trigger the existing "must be even" validation
  4. Valid cases: Confirms positive even values work correctly and produce expected data_type

The tests align perfectly with the validation logic shown in the relevant code snippets from dascore/transform/strain.py.


201-213: Good parallel testing for edgeless variant.

The tests for velocity_to_strain_rate_edgeless correctly mirror the validation tests for the main function, ensuring both implementations have consistent input validation for the positive constraint on step_multiple.


189-200: Validate the test logic for function equivalence.

The test verifies that both strain-rate conversion functions produce equivalent results for even step multiples. This is a valuable integration test that ensures the new input validation doesn't break the existing behavior equivalence between the two approaches.

tests/test_core/test_spool.py (3)

658-792: Excellent targeted coverage for spool equality edge cases.

This new test class provides comprehensive coverage for the nuanced equality scenarios that the deep_equality_check function handles within spool comparisons:

  1. Non-dict comparisons (Lines 661-677): Tests string comparisons that should hit the non-dict comparison path
  2. Object with __dict__ differences (Lines 679-694): Tests objects with different internal state
  3. Object with __dict__ equality (Lines 696-711): Tests objects with same internal state via recursive comparison
  4. Mixed data types (Lines 713-739): Comprehensive testing with integers, lists, NumPy arrays
  5. DataFrame comparisons (Lines 740-762): Tests pandas DataFrame equality using the .equals() method
  6. Specific line coverage (Lines 764-791): Targeted tests to ensure specific code paths are exercised

The tests are well-structured and use appropriate fixtures and assertions. The approach of adding attributes to spool instances is a clever way to test the deep equality logic without interfering with the core spool functionality.


713-738: Verify NumPy array comparison behavior.

The test correctly validates that NumPy arrays with different values ([1, 2, 4] vs [1, 2, 3]) are detected as unequal. This ensures the deep_equality_check function properly handles the .all() method on NumPy array comparison results.


740-762: Good DataFrame equality testing using pandas equals() method.

The test correctly demonstrates that DataFrames with identical data are considered equal via the .equals() method, while DataFrames with different data are correctly identified as unequal. Based on the web search results, the pandas .equals() method treats NaNs in the same location as equal and compares both shape and elements, which is the appropriate comparison method for DataFrames in this context.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
dascore/utils/misc.py (3)

853-856: Generalize to Mapping for broader dict-like support

This function currently restricts deep traversal to dict. Using collections.abc.Mapping covers other dict-like types (e.g., defaultdict, OrderedDict) without changing semantics.

Apply this diff:

-        if not isinstance(obj1, dict) or not isinstance(obj2, dict):
+        if not isinstance(obj1, Mapping) or not isinstance(obj2, Mapping):
             # Non-dict comparison, handle arrays and other types
             return _robust_equality_check(obj1, obj2)
@@
-            elif isinstance(val1, dict) and isinstance(val2, dict):
+            elif isinstance(val1, Mapping) and isinstance(val2, Mapping):
                 if not deep_equality_check(val1, val2, visited):
                     return False

Also applies to: 864-866


831-836: Address Ruff TRY300 (minor): add an else and normalize bool conversion

Pure style, but also makes the intent explicit and avoids returning non-bool accidentally.

Apply this diff if you don’t take the larger refactor above:

-            if hasattr(equal, "all"):
-                return equal.all()
-            return equal
+            if hasattr(equal, "all"):
+                return bool(equal.all())
+            else:
+                return bool(equal)

847-851: Cycle detection note (optional): consider symmetric pair check

Current visited pairs are ordered (id(obj1), id(obj2)). In rare symmetric cycles, you could also short-circuit when the reversed pair was visited to reduce redundant work.

Example tweak:

-    if pair_id in visited:
+    if pair_id in visited or (obj2_id, obj1_id) in visited:
         return True
tests/test_utils/test_misc.py (3)

387-658: Add a top-level pandas equality test to prevent regressions

deep_equality_check(df1, df2) (not wrapped in dicts) currently returns a non-bool without the fix above. Add a direct test to lock in correct behavior.

Apply this addition:

def test_top_level_dataframe_equality():
    df1 = pd.DataFrame({"a": [1, 2, 3]})
    df2 = pd.DataFrame({"a": [1, 2, 3]})
    assert deep_equality_check(df1, df2) is True

def test_top_level_dataframe_inequality():
    df1 = pd.DataFrame({"a": [1, 2, 3]})
    df2 = pd.DataFrame({"a": [1, 2, 4]})
    assert deep_equality_check(df1, df2) is False

546-548: Remove redundant local imports of pandas

pd is imported at the module top (Line 12). Re-importing inside tests is unnecessary.

Apply this diff:

-        import pandas as pd
+        # pd already imported at module scope
@@
-        import pandas as pd
+        # pd already imported at module scope

Also applies to: 575-577


639-645: Avoid brittle references to source line numbers in test docstrings

Docstrings referencing exact source line numbers will rot with unrelated edits. Describe behavior instead.

Apply this diff:

-    def test_non_array_equal_comparison(self):
-        """Test line 834: return equal path for non-array objects."""
+    def test_non_array_equal_comparison(self):
+        """Non-array comparison returns correct boolean for simple types."""
@@
-    def test_circular_reference_return_true(self):
-        """Test line 850: circular reference detection returns True."""
+    def test_circular_reference_return_true(self):
+        """Circular reference detection returns True without infinite recursion."""

Also applies to: 646-658

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e10bb6e and 5bfbebb.

📒 Files selected for processing (2)
  • dascore/utils/misc.py (1 hunks)
  • tests/test_utils/test_misc.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_utils/test_misc.py (1)
dascore/utils/misc.py (1)
  • deep_equality_check (789-881)
dascore/utils/misc.py (2)
dascore/core/coordmanager.py (1)
  • equals (746-756)
dascore/proc/basic.py (1)
  • equals (108-153)
🪛 Ruff (0.13.1)
tests/test_utils/test_misc.py

523-523: Avoid specifying long messages outside the exception class

(TRY003)


617-617: Avoid specifying long messages outside the exception class

(TRY003)


631-631: Avoid specifying long messages outside the exception class

(TRY003)

dascore/utils/misc.py

835-835: Consider moving this statement to an else block

(TRY300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: test_code (windows-latest, 3.13)
  • GitHub Check: test_code (macos-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code (macos-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.13)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
🔇 Additional comments (1)
tests/test_utils/test_misc.py (1)

375-385: Nice edge case for BytesIO pointer position

Good addition; verifies fallback reads entire buffer regardless of current file pointer.

Comment on lines 828 to 839
def _robust_equality_check(obj1, obj2):
"""Robust equality check to also handle arrays."""
try:
equal = obj1 == obj2
# Handle numpy arrays and other array-like objects
if hasattr(equal, "all"):
return equal.all()
return equal
except (ValueError, TypeError):
# For objects that can't be compared, fall back to False
return False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Always return a bool; handle pandas/NumPy explicitly in _robust_equality_check

When comparing top-level pandas objects (e.g., DataFrame vs DataFrame), obj1 == obj2 yields a DataFrame of booleans and .all() returns a Series, not a bool. That leaks a non-bool return from deep_equality_check. Also, prefer equals() for pandas and np.array_equal for NumPy.

Apply this diff to harden and normalize returns:

-    def _robust_equality_check(obj1, obj2):
-        """Robust equality check to also handle arrays."""
-        try:
-            equal = obj1 == obj2
-            # Handle numpy arrays and other array-like objects
-            if hasattr(equal, "all"):
-                return equal.all()
-            return equal
-        except (ValueError, TypeError):
-            # For objects that can't be compared, fall back to False
-            return False
+    def _robust_equality_check(obj1, obj2):
+        """Robust equality check to also handle pandas/NumPy and array-likes."""
+        # Prefer object-provided equality when available (e.g., pandas)
+        if hasattr(obj1, "equals") and hasattr(obj2, "equals"):
+            try:
+                return bool(obj1.equals(obj2))
+            except Exception:
+                return False
+        # NumPy arrays: exact structural equality
+        if isinstance(obj1, np.ndarray) and isinstance(obj2, np.ndarray):
+            try:
+                return bool(np.array_equal(obj1, obj2))
+            except Exception:
+                return False
+        # Fallback to Python/array-like equality
+        try:
+            equal = obj1 == obj2
+            # Convert array-like results to a bool
+            if hasattr(equal, "to_numpy"):  # pandas objects
+                try:
+                    return bool(np.asarray(equal.to_numpy()).all())
+                except Exception:
+                    return False
+            if hasattr(equal, "all"):  # NumPy arrays or similar
+                try:
+                    return bool(equal.all())
+                except Exception:
+                    return False
+            return bool(equal)
+        except (ValueError, TypeError):
+            # For objects that can't be compared, fall back to False
+            return False
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _robust_equality_check(obj1, obj2):
"""Robust equality check to also handle arrays."""
try:
equal = obj1 == obj2
# Handle numpy arrays and other array-like objects
if hasattr(equal, "all"):
return equal.all()
return equal
except (ValueError, TypeError):
# For objects that can't be compared, fall back to False
return False
def _robust_equality_check(obj1, obj2):
"""Robust equality check to also handle pandas/NumPy and array-likes."""
# Prefer object-provided equality when available (e.g., pandas)
if hasattr(obj1, "equals") and hasattr(obj2, "equals"):
try:
return bool(obj1.equals(obj2))
except Exception:
return False
# NumPy arrays: exact structural equality
if isinstance(obj1, np.ndarray) and isinstance(obj2, np.ndarray):
try:
return bool(np.array_equal(obj1, obj2))
except Exception:
return False
# Fallback to Python/array-like equality
try:
equal = obj1 == obj2
# Convert array-like results to a bool
if hasattr(equal, "to_numpy"): # pandas objects
try:
return bool(np.asarray(equal.to_numpy()).all())
except Exception:
return False
if hasattr(equal, "all"): # NumPy arrays or similar
try:
return bool(equal.all())
except Exception:
return False
return bool(equal)
except (ValueError, TypeError):
# For objects that can't be compared, fall back to False
return False
🧰 Tools
🪛 Ruff (0.13.1)

835-835: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
In dascore/utils/misc.py around lines 828 to 839, _robust_equality_check can
return non-bool types (e.g., pandas DataFrame/Series) when using obj1 == obj2;
update it to always return a bool by detecting pandas objects and
numpy/array-likes explicitly: if either obj is a pandas DataFrame/Series use
pandas.equals, if either is a numpy ndarray or has dtype/shape use
numpy.array_equal, otherwise fall back to the existing try/except but coerce any
array-like truth to a single bool (e.g., using .all()) and ensure exceptions
return False.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
dascore/utils/misc.py (1)

828-845: Strengthen _robust_equality_check for pandas/NumPy; always return bool (duplicate of earlier feedback)

Prefer equals() and np.array_equal() when available; keeps NaN semantics consistent and avoids DataFrame->Series truth pitfalls. Current code coerces to bool but still computes elementwise first.

     def _robust_equality_check(obj1, obj2):
         """Robust equality check to also handle arrays."""
-        try:
+        # Prefer object-provided/vectorized equality when available
+        if hasattr(obj1, "equals") and hasattr(obj2, "equals"):
+            try:
+                return bool(obj1.equals(obj2))
+            except Exception:
+                return False
+        if isinstance(obj1, np.ndarray) and isinstance(obj2, np.ndarray):
+            try:
+                return bool(np.array_equal(obj1, obj2))
+            except Exception:
+                return False
+        try:
             equal = obj1 == obj2
             # Handle numpy arrays and other array-like objects
             if hasattr(equal, "all"):
                 result = equal.all()
                 # Handle case where .all() returns a Series (e.g., pandas DataFrame)
                 # In such cases, call .all() again to get a boolean
                 if hasattr(result, "all"):
                     result = result.all()
                 # Ensure we return a Python bool, not numpy.bool_
                 return bool(result)
             return bool(equal)
         except (ValueError, TypeError):
             # For objects that can't be compared, fall back to False
             return False
🧹 Nitpick comments (5)
dascore/utils/misc.py (3)

886-887: Make cleanup resilient

Use discard to avoid a rare KeyError if visited were externally mutated while recursing.

-    finally:
-        visited.remove(pair_id)
+    finally:
+        visited.discard(pair_id)

789-826: Docstring claims top-level dict handling — align implementation or adjust docs

After applying the above refactor, the docstring will be accurate. If not, narrow the claim.


846-858: Optional: guard against unhashable/unsupported Mapping subtypes

If custom Mapping implementations have surprising behavior, consider collections.abc.Mapping check (already used) plus try/except around key iteration. Low risk; keep as-is if no known offenders.

tests/test_utils/test_misc.py (2)

387-684: Great coverage; add two targeted cases to lock desired semantics

  • Top-level objects with dict should compare by attributes (not only when nested).
  • Top-level DataFrames with NaNs should follow DataFrame.equals (NaNs equal).
@@ class TestDeepEqualityCheck:
+    def test_top_level_object_with_dict(self):
+        class T:
+            def __init__(self, x): self.x = x
+        assert deep_equality_check(T(1), T(1)) is True
+        assert deep_equality_check(T(1), T(2)) is False
+
+    def test_top_level_dataframe_nan_semantics(self):
+        df1 = pd.DataFrame({"a": [1.0, np.nan]})
+        df2 = pd.DataFrame({"a": [1.0, np.nan]})
+        # pandas.equals treats NaNs as equal; deep_equality_check should too
+        assert deep_equality_check(df1, df2) is True

483-494: Top-level DataFrame tests: consider asserting NaN and index/dtype parity

As-is they only cover simple equality; extend to cover NaNs and differing index/dtype to mirror equals.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5bfbebb and 886e3e4.

📒 Files selected for processing (2)
  • dascore/utils/misc.py (1 hunks)
  • tests/test_utils/test_misc.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
dascore/utils/misc.py (2)
dascore/core/coordmanager.py (1)
  • equals (746-756)
dascore/proc/basic.py (1)
  • equals (108-153)
tests/test_utils/test_misc.py (1)
dascore/utils/misc.py (1)
  • deep_equality_check (789-887)
🪛 Ruff (0.13.1)
tests/test_utils/test_misc.py

535-535: Avoid specifying long messages outside the exception class

(TRY003)


627-627: Avoid specifying long messages outside the exception class

(TRY003)


641-641: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: test_code (windows-latest, 3.12)
  • GitHub Check: test_code_min_deps (windows-latest, 3.13)
  • GitHub Check: test_code (windows-latest, 3.10)
  • GitHub Check: test_code_min_deps (windows-latest, 3.12)
  • GitHub Check: test_code (macos-latest, 3.11)
  • GitHub Check: test_code (windows-latest, 3.11)
  • GitHub Check: test_code (macos-latest, 3.13)
  • GitHub Check: test_code (macos-latest, 3.10)
  • GitHub Check: test_code (windows-latest, 3.13)
  • GitHub Check: test_code (ubuntu-latest, 3.13)
  • GitHub Check: test_code (ubuntu-latest, 3.11)
  • GitHub Check: test_code (ubuntu-latest, 3.12)
  • GitHub Check: test_code_min_deps (macos-latest, 3.13)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
  • GitHub Check: test_code (ubuntu-latest, 3.10)
  • GitHub Check: test_code_min_deps (macos-latest, 3.12)
  • GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
🔇 Additional comments (4)
tests/test_utils/test_misc.py (3)

12-12: LGTM: pandas import added for new equality tests


608-619: Array-like .all() path covered — nice

These verify the _robust_equality_check array branch.


634-647: Exception paths are exercised — good negative coverage

Covers ValueError/TypeError fallbacks to False.

dascore/utils/misc.py (1)

859-862: Resolved — BaseSpool.eq delegates to deep_equality_check Confirmed: dascore/core/spool.py returns deep_equality_check(my_dict, other_dict) (line 90).

Comment on lines +859 to +862
if not isinstance(obj1, Mapping) or not isinstance(obj2, Mapping):
# Non-dict comparison, handle arrays and other types
return _robust_equality_check(obj1, obj2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle top-level pandas/NumPy/objects-with-dict before fallback to avoid inconsistent semantics

Currently, top-level DataFrames/ndarrays/objects with dict go through _robust_equality_check, while nested ones use equals/recursive __dict__, leading to different NaN and attribute semantics. Gate these types first, then branch to Mapping vs. fallback.

 def deep_equality_check(obj1, obj2, visited=None):
@@
-    try:
-        if not isinstance(obj1, Mapping) or not isinstance(obj2, Mapping):
-            # Non-dict comparison, handle arrays and other types
-            return _robust_equality_check(obj1, obj2)
+    try:
+        # Fast paths: identity and type-specialized equality
+        if obj1 is obj2:
+            return True
+        # Prefer object-provided equality (e.g., pandas)
+        if hasattr(obj1, "equals") and hasattr(obj2, "equals"):
+            try:
+                return bool(obj1.equals(obj2))
+            except Exception:
+                pass
+        # NumPy arrays
+        if isinstance(obj1, np.ndarray) and isinstance(obj2, np.ndarray):
+            try:
+                return bool(np.array_equal(obj1, obj2))
+            except Exception:
+                return False
+        # Objects with __dict__: compare attribute dicts deeply
+        if hasattr(obj1, "__dict__") and hasattr(obj2, "__dict__"):
+            return deep_equality_check(obj1.__dict__, obj2.__dict__, visited)
+
+        # Mapping path
+        if isinstance(obj1, Mapping) and isinstance(obj2, Mapping):
             # ...
-        if (set1 := set(obj1)) != set(obj2):
+        if (set1 := set(obj1)) != set(obj2):
             return False
         for key in set1:
             val1, val2 = obj1[key], obj2[key]
             # Check for object identity first to handle self-references
             if val1 is val2:
                 continue
             elif isinstance(val1, Mapping) and isinstance(val2, Mapping):
                 if not deep_equality_check(val1, val2, visited):
                     return False
             # this is primarily for dataframes which have equals method.
             elif hasattr(val1, "equals") and hasattr(val2, "equals"):
                 if not val1.equals(val2):
                     return False
             # Handle object comparison carefully to avoid infinite recursion
             elif hasattr(val1, "__dict__") and hasattr(val2, "__dict__"):
                 # For objects with __dict__, use recursive comparison
                 if not deep_equality_check(val1.__dict__, val2.__dict__, visited):
                     return False
             else:
                 if not _robust_equality_check(val1, val2):
                     return False
-        return True
+        # Non-mapping fallback
+        return True if isinstance(obj1, Mapping) and isinstance(obj2, Mapping) else _robust_equality_check(obj1, obj2)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not isinstance(obj1, Mapping) or not isinstance(obj2, Mapping):
# Non-dict comparison, handle arrays and other types
return _robust_equality_check(obj1, obj2)
def deep_equality_check(obj1, obj2, visited=None):
try:
# Fast paths: identity and type-specialized equality
if obj1 is obj2:
return True
# Prefer object-provided equality (e.g., pandas)
if hasattr(obj1, "equals") and hasattr(obj2, "equals"):
try:
return bool(obj1.equals(obj2))
except Exception:
pass
# NumPy arrays
if isinstance(obj1, np.ndarray) and isinstance(obj2, np.ndarray):
try:
return bool(np.array_equal(obj1, obj2))
except Exception:
return False
# Objects with __dict__: compare attribute dicts deeply
if hasattr(obj1, "__dict__") and hasattr(obj2, "__dict__"):
return deep_equality_check(obj1.__dict__, obj2.__dict__, visited)
# Mapping path
if isinstance(obj1, Mapping) and isinstance(obj2, Mapping):
if (set1 := set(obj1)) != set(obj2):
return False
for key in set1:
val1, val2 = obj1[key], obj2[key]
# Check for object identity first to handle self-references
if val1 is val2:
continue
elif isinstance(val1, Mapping) and isinstance(val2, Mapping):
if not deep_equality_check(val1, val2, visited):
return False
# this is primarily for dataframes which have equals method.
elif hasattr(val1, "equals") and hasattr(val2, "equals"):
if not val1.equals(val2):
return False
# Handle object comparison carefully to avoid infinite recursion
elif hasattr(val1, "__dict__") and hasattr(val2, "__dict__"):
# For objects with __dict__, use recursive comparison
if not deep_equality_check(val1.__dict__, val2.__dict__, visited):
return False
else:
if not _robust_equality_check(val1, val2):
return False
# Non-mapping fallback
return True if isinstance(obj1, Mapping) and isinstance(obj2, Mapping) else _robust_equality_check(obj1, obj2)
except Exception:
return False
🤖 Prompt for AI Agents
In dascore/utils/misc.py around lines 859 to 862, top-level pandas objects
(DataFrame/Series), NumPy ndarrays, and objects with a __dict__ are currently
falling through to _robust_equality_check which differs from how nested
instances are compared; detect and handle these types before the Mapping check:
first check for pandas DataFrame/Series and use their .equals (or appropriate
pandas comparison) to compare, then check for numpy.ndarray and compare
elementwise with NaN-aware semantics (e.g., numpy functions that treat NaNs as
equal), then if the object has a __dict__ compare their __dict__ recursively
using the same equality logic; only after these guards run, fall back to the
Mapping vs fallback branch as before.

@d-chambers d-chambers merged commit 272d0e7 into master Sep 20, 2025
22 checks passed
@d-chambers d-chambers deleted the bug_review branch September 20, 2025 12:28
@coderabbitai coderabbitai bot mentioned this pull request Oct 4, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working patch related to Patch class proc Related to processing module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants