-
Notifications
You must be signed in to change notification settings - Fork 28
Fix spool equality, patch reverse operation #542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughUpdates warning stacklevel in CoordManager; implements explicit reverse arithmetic operators on Patch; refactors BaseSpool equality to use a new deep_equality_check utility; adds that utility; enforces positive step_multiple in strain-rate transforms; and expands tests for these behaviors. Changes
Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #542 +/- ##
=======================================
Coverage 99.92% 99.92%
=======================================
Files 124 124
Lines 10336 10375 +39
=======================================
+ Hits 10328 10367 +39
Misses 8 8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
dascore/utils/misc.py (1)
842-842: Consider type consistency in base case.Line 842 performs direct equality comparison (
return obj1 == obj2) for non-dict objects. However, this might not handle numpy arrays consistently since their equality returns an array, not a boolean. Consider standardizing the return value handling similar to lines 866-872.- if not isinstance(obj1, dict) or not isinstance(obj2, dict): - # Non-dict comparison, use direct comparison - return obj1 == obj2 + if not isinstance(obj1, dict) or not isinstance(obj2, dict): + # Non-dict comparison, handle arrays and other types + try: + equal = obj1 == obj2 + # Handle numpy arrays and other array-like objects + if hasattr(equal, "all"): + return equal.all() + return bool(equal) + except (ValueError, TypeError): + # For objects that can't be compared, fall back to False + return Falsetests/test_utils/test_misc.py (1)
523-523: Consider using a more specific exception message.The message "Cannot compare" is quite generic. Based on the static analysis hint, consider using a more specific error message that explains why comparison failed.
Apply this diff:
- def __eq__(self, other): - raise ValueError("Cannot compare") + def __eq__(self, other): + raise ValueError("BadComparisonObj instances cannot be compared")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
dascore/core/coordmanager.py(1 hunks)dascore/core/patch.py(1 hunks)dascore/core/spool.py(2 hunks)dascore/transform/strain.py(2 hunks)dascore/utils/misc.py(1 hunks)tests/test_core/test_patch.py(1 hunks)tests/test_core/test_spool.py(1 hunks)tests/test_transform/test_strain.py(2 hunks)tests/test_utils/test_misc.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (8)
dascore/core/spool.py (1)
dascore/utils/misc.py (1)
deep_equality_check(789-879)
dascore/core/patch.py (1)
dascore/utils/array.py (1)
apply_ufunc(448-495)
dascore/utils/misc.py (2)
dascore/core/coordmanager.py (1)
equals(746-756)dascore/proc/basic.py (1)
equals(108-153)
dascore/transform/strain.py (1)
dascore/exceptions.py (1)
ParameterError(26-27)
tests/test_utils/test_misc.py (1)
dascore/utils/misc.py (1)
deep_equality_check(789-879)
tests/test_core/test_patch.py (2)
dascore/examples.py (1)
get_example_patch(659-706)dascore/core/patch.py (1)
data(237-239)
tests/test_core/test_spool.py (3)
dascore/examples.py (2)
random_spool(553-576)get_example_patch(659-706)tests/conftest.py (3)
random_spool(447-451)patch(375-377)spool(569-571)dascore/core/spool.py (1)
spool(656-685)
tests/test_transform/test_strain.py (4)
tests/conftest.py (1)
patch(375-377)dascore/exceptions.py (1)
ParameterError(26-27)dascore/transform/strain.py (2)
velocity_to_strain_rate(21-121)velocity_to_strain_rate_edgeless(128-210)dascore/core/patch.py (1)
attrs(227-229)
🪛 Ruff (0.13.1)
tests/test_utils/test_misc.py
523-523: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
- GitHub Check: test_code (windows-latest, 3.10)
- GitHub Check: test_code (windows-latest, 3.13)
- GitHub Check: test_code (windows-latest, 3.12)
- GitHub Check: test_code (windows-latest, 3.11)
- GitHub Check: test_code (macos-latest, 3.11)
- GitHub Check: test_code (ubuntu-latest, 3.12)
- GitHub Check: test_code (ubuntu-latest, 3.13)
- GitHub Check: test_code (macos-latest, 3.12)
- GitHub Check: test_code (macos-latest, 3.10)
- GitHub Check: test_code (macos-latest, 3.13)
- GitHub Check: test_code (ubuntu-latest, 3.11)
- GitHub Check: test_code (ubuntu-latest, 3.10)
- GitHub Check: test_code_min_deps (windows-latest, 3.12)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
- GitHub Check: test_code_min_deps (windows-latest, 3.13)
- GitHub Check: test_code_min_deps (macos-latest, 3.12)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
🔇 Additional comments (20)
dascore/transform/strain.py (2)
102-105: LGTM! Proper input validation for step_multiple.The addition of validation to ensure
step_multipleis positive is appropriate and prevents invalid states early. The error message is clear and consistent with other parameter validation in the codebase.
180-183: LGTM! Consistent validation across both functions.Good consistency in adding the same validation to the edgeless variant. This ensures both functions have uniform input validation.
dascore/core/spool.py (2)
32-32: Good refactor: Centralized deep equality logic.Importing
deep_equality_checkfromdascore.utils.miscpromotes code reuse and centralizes the equality checking logic.
86-90: Clean replacement with robust equality checker.The replacement of inline dictionary comparison with
deep_equality_checkis cleaner and provides better handling of complex nested structures, circular references, and special types like DataFrames and numpy arrays.tests/test_core/test_patch.py (2)
653-662: LGTM! Important test for reverse subtraction operator.This test validates that the new
__rsub__implementation correctly handlesscalar - patch, ensuring the result isscalar - patch.datarather thanpatch.data - scalar. This is crucial for mathematical correctness.
663-672: LGTM! Comprehensive test with array operand.Good addition of array-based reverse subtraction test, ensuring the operator works correctly with different operand types.
dascore/utils/misc.py (1)
789-879: Well-designed deep equality utility with robust circular reference handling.The implementation is comprehensive and handles various edge cases well:
- Proper circular reference detection using object IDs
- Support for multiple data types (dicts, DataFrames, numpy arrays, objects with
__dict__)- Clean try-finally pattern to ensure visited set cleanup
A few observations:
- The docstring examples are helpful and demonstrate key use cases
- The short-circuit on object identity (
if val1 is val2) is a good optimization- The fallback behavior for comparison errors is reasonable
dascore/core/coordmanager.py (1)
211-211: Verify stacklevel value change for coords deprecationCoordManager.getitem (dascore/core/coordmanager.py:211) warns with stacklevel=6 — this will attribute the warning 4 frames up; confirm this is intentional and that it points to user code (not skipping relevant DASCore internal call sites).
tests/test_utils/test_misc.py (4)
12-12: Good addition of pandas import for DataFrame testing.The pandas import is correctly added to support the comprehensive DataFrame comparison tests in the new
TestDeepEqualityCheckclass.
20-20: Correctly importing the new deep_equality_check function.The import aligns with the new public API function added in
dascore/utils/misc.py.
387-596: Comprehensive test coverage for deep_equality_check function.This test class provides excellent coverage of all the edge cases and code paths in the
deep_equality_checkfunction:
- Circular reference detection and handling ✓
- Non-dict vs dict comparisons ✓
- Nested dictionary comparisons ✓
- pandas DataFrame comparisons using
.equals()method ✓- Objects with
__dict__attributes ✓- NumPy array comparisons ✓
- Exception handling for objects that raise
ValueErroron comparison ✓- Mixed complex data structures ✓
The tests are well-structured and cover both positive and negative cases. The use of custom classes like
BadComparisonObjeffectively tests the exception handling paths.
574-587: Good edge case coverage for mixed equals method scenarios.This test effectively covers the case where one object has an
equalsmethod (DataFrame) and the other doesn't (custom object), ensuring the comparison falls through to the general comparison path correctly.dascore/core/patch.py (2)
150-170: Excellent implementation of explicit reverse operators.The implementation correctly replaces the problematic aliasing approach (
__rsub__ = __sub__) with explicit methods that ensure proper operand ordering. Key strengths:
- Correct operand order:
other - self,other ** self, etc.- Consistent with existing pattern: Uses
dascore.utils.array.apply_ufunclike forward operators- Proper documentation: Each method includes clear docstrings
- Complete coverage: All necessary reverse operators are implemented
This fixes the reverse operation issue mentioned in the PR objectives where the current reverse behavior was incorrect.
148-148: Verify the aliasing for commutative operations.The
__radd__ = __add__and__rmul__ = __mul__aliases are correct since addition and multiplication are commutative operations. This is appropriate and efficient.Also applies to: 154-154
tests/test_transform/test_strain.py (3)
102-129: Excellent comprehensive validation tests for step_multiple parameter.These tests provide thorough coverage of the input validation logic added to the strain transformation functions:
- Negative values: Correctly tests that negative
step_multipleraisesParameterErrorwith "must be positive" message- Zero values: Validates zero is also rejected with the same message
- Odd values: Ensures odd values still trigger the existing "must be even" validation
- Valid cases: Confirms positive even values work correctly and produce expected
data_typeThe tests align perfectly with the validation logic shown in the relevant code snippets from
dascore/transform/strain.py.
201-213: Good parallel testing for edgeless variant.The tests for
velocity_to_strain_rate_edgelesscorrectly mirror the validation tests for the main function, ensuring both implementations have consistent input validation for the positive constraint onstep_multiple.
189-200: Validate the test logic for function equivalence.The test verifies that both strain-rate conversion functions produce equivalent results for even step multiples. This is a valuable integration test that ensures the new input validation doesn't break the existing behavior equivalence between the two approaches.
tests/test_core/test_spool.py (3)
658-792: Excellent targeted coverage for spool equality edge cases.This new test class provides comprehensive coverage for the nuanced equality scenarios that the
deep_equality_checkfunction handles within spool comparisons:
- Non-dict comparisons (Lines 661-677): Tests string comparisons that should hit the non-dict comparison path
- Object with
__dict__differences (Lines 679-694): Tests objects with different internal state- Object with
__dict__equality (Lines 696-711): Tests objects with same internal state via recursive comparison- Mixed data types (Lines 713-739): Comprehensive testing with integers, lists, NumPy arrays
- DataFrame comparisons (Lines 740-762): Tests pandas DataFrame equality using the
.equals()method- Specific line coverage (Lines 764-791): Targeted tests to ensure specific code paths are exercised
The tests are well-structured and use appropriate fixtures and assertions. The approach of adding attributes to spool instances is a clever way to test the deep equality logic without interfering with the core spool functionality.
713-738: Verify NumPy array comparison behavior.The test correctly validates that NumPy arrays with different values (
[1, 2, 4]vs[1, 2, 3]) are detected as unequal. This ensures thedeep_equality_checkfunction properly handles the.all()method on NumPy array comparison results.
740-762: Good DataFrame equality testing using pandas equals() method.The test correctly demonstrates that DataFrames with identical data are considered equal via the
.equals()method, while DataFrames with different data are correctly identified as unequal. Based on the web search results, the pandas.equals()method treats NaNs in the same location as equal and compares both shape and elements, which is the appropriate comparison method for DataFrames in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (6)
dascore/utils/misc.py (3)
853-856: Generalize to Mapping for broader dict-like supportThis function currently restricts deep traversal to
dict. Usingcollections.abc.Mappingcovers other dict-like types (e.g.,defaultdict,OrderedDict) without changing semantics.Apply this diff:
- if not isinstance(obj1, dict) or not isinstance(obj2, dict): + if not isinstance(obj1, Mapping) or not isinstance(obj2, Mapping): # Non-dict comparison, handle arrays and other types return _robust_equality_check(obj1, obj2) @@ - elif isinstance(val1, dict) and isinstance(val2, dict): + elif isinstance(val1, Mapping) and isinstance(val2, Mapping): if not deep_equality_check(val1, val2, visited): return FalseAlso applies to: 864-866
831-836: Address Ruff TRY300 (minor): add anelseand normalize bool conversionPure style, but also makes the intent explicit and avoids returning non-bool accidentally.
Apply this diff if you don’t take the larger refactor above:
- if hasattr(equal, "all"): - return equal.all() - return equal + if hasattr(equal, "all"): + return bool(equal.all()) + else: + return bool(equal)
847-851: Cycle detection note (optional): consider symmetric pair checkCurrent visited pairs are ordered
(id(obj1), id(obj2)). In rare symmetric cycles, you could also short-circuit when the reversed pair was visited to reduce redundant work.Example tweak:
- if pair_id in visited: + if pair_id in visited or (obj2_id, obj1_id) in visited: return Truetests/test_utils/test_misc.py (3)
387-658: Add a top-level pandas equality test to prevent regressions
deep_equality_check(df1, df2)(not wrapped in dicts) currently returns a non-bool without the fix above. Add a direct test to lock in correct behavior.Apply this addition:
def test_top_level_dataframe_equality(): df1 = pd.DataFrame({"a": [1, 2, 3]}) df2 = pd.DataFrame({"a": [1, 2, 3]}) assert deep_equality_check(df1, df2) is True def test_top_level_dataframe_inequality(): df1 = pd.DataFrame({"a": [1, 2, 3]}) df2 = pd.DataFrame({"a": [1, 2, 4]}) assert deep_equality_check(df1, df2) is False
546-548: Remove redundant local imports of pandas
pdis imported at the module top (Line 12). Re-importing inside tests is unnecessary.Apply this diff:
- import pandas as pd + # pd already imported at module scope @@ - import pandas as pd + # pd already imported at module scopeAlso applies to: 575-577
639-645: Avoid brittle references to source line numbers in test docstringsDocstrings referencing exact source line numbers will rot with unrelated edits. Describe behavior instead.
Apply this diff:
- def test_non_array_equal_comparison(self): - """Test line 834: return equal path for non-array objects.""" + def test_non_array_equal_comparison(self): + """Non-array comparison returns correct boolean for simple types.""" @@ - def test_circular_reference_return_true(self): - """Test line 850: circular reference detection returns True.""" + def test_circular_reference_return_true(self): + """Circular reference detection returns True without infinite recursion."""Also applies to: 646-658
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dascore/utils/misc.py(1 hunks)tests/test_utils/test_misc.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_utils/test_misc.py (1)
dascore/utils/misc.py (1)
deep_equality_check(789-881)
dascore/utils/misc.py (2)
dascore/core/coordmanager.py (1)
equals(746-756)dascore/proc/basic.py (1)
equals(108-153)
🪛 Ruff (0.13.1)
tests/test_utils/test_misc.py
523-523: Avoid specifying long messages outside the exception class
(TRY003)
617-617: Avoid specifying long messages outside the exception class
(TRY003)
631-631: Avoid specifying long messages outside the exception class
(TRY003)
dascore/utils/misc.py
835-835: Consider moving this statement to an else block
(TRY300)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
- GitHub Check: test_code (windows-latest, 3.13)
- GitHub Check: test_code (macos-latest, 3.12)
- GitHub Check: test_code (windows-latest, 3.12)
- GitHub Check: test_code (windows-latest, 3.11)
- GitHub Check: test_code (windows-latest, 3.10)
- GitHub Check: test_code (macos-latest, 3.13)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
- GitHub Check: test_code (ubuntu-latest, 3.13)
- GitHub Check: test_code (macos-latest, 3.10)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
- GitHub Check: test_code_min_deps (windows-latest, 3.13)
- GitHub Check: test_code_min_deps (windows-latest, 3.12)
- GitHub Check: test_code_min_deps (macos-latest, 3.13)
- GitHub Check: test_code_min_deps (macos-latest, 3.12)
- GitHub Check: test_code (ubuntu-latest, 3.12)
- GitHub Check: test_code (ubuntu-latest, 3.11)
- GitHub Check: test_code (ubuntu-latest, 3.10)
🔇 Additional comments (1)
tests/test_utils/test_misc.py (1)
375-385: Nice edge case for BytesIO pointer positionGood addition; verifies fallback reads entire buffer regardless of current file pointer.
| def _robust_equality_check(obj1, obj2): | ||
| """Robust equality check to also handle arrays.""" | ||
| try: | ||
| equal = obj1 == obj2 | ||
| # Handle numpy arrays and other array-like objects | ||
| if hasattr(equal, "all"): | ||
| return equal.all() | ||
| return equal | ||
| except (ValueError, TypeError): | ||
| # For objects that can't be compared, fall back to False | ||
| return False | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always return a bool; handle pandas/NumPy explicitly in _robust_equality_check
When comparing top-level pandas objects (e.g., DataFrame vs DataFrame), obj1 == obj2 yields a DataFrame of booleans and .all() returns a Series, not a bool. That leaks a non-bool return from deep_equality_check. Also, prefer equals() for pandas and np.array_equal for NumPy.
Apply this diff to harden and normalize returns:
- def _robust_equality_check(obj1, obj2):
- """Robust equality check to also handle arrays."""
- try:
- equal = obj1 == obj2
- # Handle numpy arrays and other array-like objects
- if hasattr(equal, "all"):
- return equal.all()
- return equal
- except (ValueError, TypeError):
- # For objects that can't be compared, fall back to False
- return False
+ def _robust_equality_check(obj1, obj2):
+ """Robust equality check to also handle pandas/NumPy and array-likes."""
+ # Prefer object-provided equality when available (e.g., pandas)
+ if hasattr(obj1, "equals") and hasattr(obj2, "equals"):
+ try:
+ return bool(obj1.equals(obj2))
+ except Exception:
+ return False
+ # NumPy arrays: exact structural equality
+ if isinstance(obj1, np.ndarray) and isinstance(obj2, np.ndarray):
+ try:
+ return bool(np.array_equal(obj1, obj2))
+ except Exception:
+ return False
+ # Fallback to Python/array-like equality
+ try:
+ equal = obj1 == obj2
+ # Convert array-like results to a bool
+ if hasattr(equal, "to_numpy"): # pandas objects
+ try:
+ return bool(np.asarray(equal.to_numpy()).all())
+ except Exception:
+ return False
+ if hasattr(equal, "all"): # NumPy arrays or similar
+ try:
+ return bool(equal.all())
+ except Exception:
+ return False
+ return bool(equal)
+ except (ValueError, TypeError):
+ # For objects that can't be compared, fall back to False
+ return False📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _robust_equality_check(obj1, obj2): | |
| """Robust equality check to also handle arrays.""" | |
| try: | |
| equal = obj1 == obj2 | |
| # Handle numpy arrays and other array-like objects | |
| if hasattr(equal, "all"): | |
| return equal.all() | |
| return equal | |
| except (ValueError, TypeError): | |
| # For objects that can't be compared, fall back to False | |
| return False | |
| def _robust_equality_check(obj1, obj2): | |
| """Robust equality check to also handle pandas/NumPy and array-likes.""" | |
| # Prefer object-provided equality when available (e.g., pandas) | |
| if hasattr(obj1, "equals") and hasattr(obj2, "equals"): | |
| try: | |
| return bool(obj1.equals(obj2)) | |
| except Exception: | |
| return False | |
| # NumPy arrays: exact structural equality | |
| if isinstance(obj1, np.ndarray) and isinstance(obj2, np.ndarray): | |
| try: | |
| return bool(np.array_equal(obj1, obj2)) | |
| except Exception: | |
| return False | |
| # Fallback to Python/array-like equality | |
| try: | |
| equal = obj1 == obj2 | |
| # Convert array-like results to a bool | |
| if hasattr(equal, "to_numpy"): # pandas objects | |
| try: | |
| return bool(np.asarray(equal.to_numpy()).all()) | |
| except Exception: | |
| return False | |
| if hasattr(equal, "all"): # NumPy arrays or similar | |
| try: | |
| return bool(equal.all()) | |
| except Exception: | |
| return False | |
| return bool(equal) | |
| except (ValueError, TypeError): | |
| # For objects that can't be compared, fall back to False | |
| return False |
🧰 Tools
🪛 Ruff (0.13.1)
835-835: Consider moving this statement to an else block
(TRY300)
🤖 Prompt for AI Agents
In dascore/utils/misc.py around lines 828 to 839, _robust_equality_check can
return non-bool types (e.g., pandas DataFrame/Series) when using obj1 == obj2;
update it to always return a bool by detecting pandas objects and
numpy/array-likes explicitly: if either obj is a pandas DataFrame/Series use
pandas.equals, if either is a numpy ndarray or has dtype/shape use
numpy.array_equal, otherwise fall back to the existing try/except but coerce any
array-like truth to a single bool (e.g., using .all()) and ensure exceptions
return False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
dascore/utils/misc.py (1)
828-845: Strengthen _robust_equality_check for pandas/NumPy; always return bool (duplicate of earlier feedback)Prefer
equals()andnp.array_equal()when available; keeps NaN semantics consistent and avoids DataFrame->Series truth pitfalls. Current code coerces to bool but still computes elementwise first.def _robust_equality_check(obj1, obj2): """Robust equality check to also handle arrays.""" - try: + # Prefer object-provided/vectorized equality when available + if hasattr(obj1, "equals") and hasattr(obj2, "equals"): + try: + return bool(obj1.equals(obj2)) + except Exception: + return False + if isinstance(obj1, np.ndarray) and isinstance(obj2, np.ndarray): + try: + return bool(np.array_equal(obj1, obj2)) + except Exception: + return False + try: equal = obj1 == obj2 # Handle numpy arrays and other array-like objects if hasattr(equal, "all"): result = equal.all() # Handle case where .all() returns a Series (e.g., pandas DataFrame) # In such cases, call .all() again to get a boolean if hasattr(result, "all"): result = result.all() # Ensure we return a Python bool, not numpy.bool_ return bool(result) return bool(equal) except (ValueError, TypeError): # For objects that can't be compared, fall back to False return False
🧹 Nitpick comments (5)
dascore/utils/misc.py (3)
886-887: Make cleanup resilientUse
discardto avoid a rare KeyError ifvisitedwere externally mutated while recursing.- finally: - visited.remove(pair_id) + finally: + visited.discard(pair_id)
789-826: Docstring claims top-level dict handling — align implementation or adjust docsAfter applying the above refactor, the docstring will be accurate. If not, narrow the claim.
846-858: Optional: guard against unhashable/unsupported Mapping subtypesIf custom Mapping implementations have surprising behavior, consider
collections.abc.Mappingcheck (already used) plus try/except around key iteration. Low risk; keep as-is if no known offenders.tests/test_utils/test_misc.py (2)
387-684: Great coverage; add two targeted cases to lock desired semantics
- Top-level objects with dict should compare by attributes (not only when nested).
- Top-level DataFrames with NaNs should follow
DataFrame.equals(NaNs equal).@@ class TestDeepEqualityCheck: + def test_top_level_object_with_dict(self): + class T: + def __init__(self, x): self.x = x + assert deep_equality_check(T(1), T(1)) is True + assert deep_equality_check(T(1), T(2)) is False + + def test_top_level_dataframe_nan_semantics(self): + df1 = pd.DataFrame({"a": [1.0, np.nan]}) + df2 = pd.DataFrame({"a": [1.0, np.nan]}) + # pandas.equals treats NaNs as equal; deep_equality_check should too + assert deep_equality_check(df1, df2) is True
483-494: Top-level DataFrame tests: consider asserting NaN and index/dtype parityAs-is they only cover simple equality; extend to cover NaNs and differing index/dtype to mirror
equals.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dascore/utils/misc.py(1 hunks)tests/test_utils/test_misc.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
dascore/utils/misc.py (2)
dascore/core/coordmanager.py (1)
equals(746-756)dascore/proc/basic.py (1)
equals(108-153)
tests/test_utils/test_misc.py (1)
dascore/utils/misc.py (1)
deep_equality_check(789-887)
🪛 Ruff (0.13.1)
tests/test_utils/test_misc.py
535-535: Avoid specifying long messages outside the exception class
(TRY003)
627-627: Avoid specifying long messages outside the exception class
(TRY003)
641-641: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
- GitHub Check: test_code (windows-latest, 3.12)
- GitHub Check: test_code_min_deps (windows-latest, 3.13)
- GitHub Check: test_code (windows-latest, 3.10)
- GitHub Check: test_code_min_deps (windows-latest, 3.12)
- GitHub Check: test_code (macos-latest, 3.11)
- GitHub Check: test_code (windows-latest, 3.11)
- GitHub Check: test_code (macos-latest, 3.13)
- GitHub Check: test_code (macos-latest, 3.10)
- GitHub Check: test_code (windows-latest, 3.13)
- GitHub Check: test_code (ubuntu-latest, 3.13)
- GitHub Check: test_code (ubuntu-latest, 3.11)
- GitHub Check: test_code (ubuntu-latest, 3.12)
- GitHub Check: test_code_min_deps (macos-latest, 3.13)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.13)
- GitHub Check: test_code (ubuntu-latest, 3.10)
- GitHub Check: test_code_min_deps (macos-latest, 3.12)
- GitHub Check: test_code_min_deps (ubuntu-latest, 3.12)
🔇 Additional comments (4)
tests/test_utils/test_misc.py (3)
12-12: LGTM: pandas import added for new equality tests
608-619: Array-like .all() path covered — niceThese verify the
_robust_equality_checkarray branch.
634-647: Exception paths are exercised — good negative coverageCovers ValueError/TypeError fallbacks to False.
dascore/utils/misc.py (1)
859-862: Resolved — BaseSpool.eq delegates to deep_equality_check Confirmed: dascore/core/spool.py returns deep_equality_check(my_dict, other_dict) (line 90).
| if not isinstance(obj1, Mapping) or not isinstance(obj2, Mapping): | ||
| # Non-dict comparison, handle arrays and other types | ||
| return _robust_equality_check(obj1, obj2) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle top-level pandas/NumPy/objects-with-dict before fallback to avoid inconsistent semantics
Currently, top-level DataFrames/ndarrays/objects with dict go through _robust_equality_check, while nested ones use equals/recursive __dict__, leading to different NaN and attribute semantics. Gate these types first, then branch to Mapping vs. fallback.
def deep_equality_check(obj1, obj2, visited=None):
@@
- try:
- if not isinstance(obj1, Mapping) or not isinstance(obj2, Mapping):
- # Non-dict comparison, handle arrays and other types
- return _robust_equality_check(obj1, obj2)
+ try:
+ # Fast paths: identity and type-specialized equality
+ if obj1 is obj2:
+ return True
+ # Prefer object-provided equality (e.g., pandas)
+ if hasattr(obj1, "equals") and hasattr(obj2, "equals"):
+ try:
+ return bool(obj1.equals(obj2))
+ except Exception:
+ pass
+ # NumPy arrays
+ if isinstance(obj1, np.ndarray) and isinstance(obj2, np.ndarray):
+ try:
+ return bool(np.array_equal(obj1, obj2))
+ except Exception:
+ return False
+ # Objects with __dict__: compare attribute dicts deeply
+ if hasattr(obj1, "__dict__") and hasattr(obj2, "__dict__"):
+ return deep_equality_check(obj1.__dict__, obj2.__dict__, visited)
+
+ # Mapping path
+ if isinstance(obj1, Mapping) and isinstance(obj2, Mapping):
# ...
- if (set1 := set(obj1)) != set(obj2):
+ if (set1 := set(obj1)) != set(obj2):
return False
for key in set1:
val1, val2 = obj1[key], obj2[key]
# Check for object identity first to handle self-references
if val1 is val2:
continue
elif isinstance(val1, Mapping) and isinstance(val2, Mapping):
if not deep_equality_check(val1, val2, visited):
return False
# this is primarily for dataframes which have equals method.
elif hasattr(val1, "equals") and hasattr(val2, "equals"):
if not val1.equals(val2):
return False
# Handle object comparison carefully to avoid infinite recursion
elif hasattr(val1, "__dict__") and hasattr(val2, "__dict__"):
# For objects with __dict__, use recursive comparison
if not deep_equality_check(val1.__dict__, val2.__dict__, visited):
return False
else:
if not _robust_equality_check(val1, val2):
return False
- return True
+ # Non-mapping fallback
+ return True if isinstance(obj1, Mapping) and isinstance(obj2, Mapping) else _robust_equality_check(obj1, obj2)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not isinstance(obj1, Mapping) or not isinstance(obj2, Mapping): | |
| # Non-dict comparison, handle arrays and other types | |
| return _robust_equality_check(obj1, obj2) | |
| def deep_equality_check(obj1, obj2, visited=None): | |
| try: | |
| # Fast paths: identity and type-specialized equality | |
| if obj1 is obj2: | |
| return True | |
| # Prefer object-provided equality (e.g., pandas) | |
| if hasattr(obj1, "equals") and hasattr(obj2, "equals"): | |
| try: | |
| return bool(obj1.equals(obj2)) | |
| except Exception: | |
| pass | |
| # NumPy arrays | |
| if isinstance(obj1, np.ndarray) and isinstance(obj2, np.ndarray): | |
| try: | |
| return bool(np.array_equal(obj1, obj2)) | |
| except Exception: | |
| return False | |
| # Objects with __dict__: compare attribute dicts deeply | |
| if hasattr(obj1, "__dict__") and hasattr(obj2, "__dict__"): | |
| return deep_equality_check(obj1.__dict__, obj2.__dict__, visited) | |
| # Mapping path | |
| if isinstance(obj1, Mapping) and isinstance(obj2, Mapping): | |
| if (set1 := set(obj1)) != set(obj2): | |
| return False | |
| for key in set1: | |
| val1, val2 = obj1[key], obj2[key] | |
| # Check for object identity first to handle self-references | |
| if val1 is val2: | |
| continue | |
| elif isinstance(val1, Mapping) and isinstance(val2, Mapping): | |
| if not deep_equality_check(val1, val2, visited): | |
| return False | |
| # this is primarily for dataframes which have equals method. | |
| elif hasattr(val1, "equals") and hasattr(val2, "equals"): | |
| if not val1.equals(val2): | |
| return False | |
| # Handle object comparison carefully to avoid infinite recursion | |
| elif hasattr(val1, "__dict__") and hasattr(val2, "__dict__"): | |
| # For objects with __dict__, use recursive comparison | |
| if not deep_equality_check(val1.__dict__, val2.__dict__, visited): | |
| return False | |
| else: | |
| if not _robust_equality_check(val1, val2): | |
| return False | |
| # Non-mapping fallback | |
| return True if isinstance(obj1, Mapping) and isinstance(obj2, Mapping) else _robust_equality_check(obj1, obj2) | |
| except Exception: | |
| return False |
🤖 Prompt for AI Agents
In dascore/utils/misc.py around lines 859 to 862, top-level pandas objects
(DataFrame/Series), NumPy ndarrays, and objects with a __dict__ are currently
falling through to _robust_equality_check which differs from how nested
instances are compared; detect and handle these types before the Mapping check:
first check for pandas DataFrame/Series and use their .equals (or appropriate
pandas comparison) to compare, then check for numpy.ndarray and compare
elementwise with NaN-aware semantics (e.g., numpy functions that treat NaNs as
equal), then if the object has a __dict__ compare their __dict__ recursively
using the same equality logic; only after these guards run, fall back to the
Mapping vs fallback branch as before.
Description
This PR fixes a bug in patch reverse operations (currently are not correct), and makes the spool equality comparisons more robust to circular references (just in case).
Checklist
I have (if applicable):
Summary by CodeRabbit
New Features
Bug Fixes
Tests