-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Allow mean with time dtypes
#10227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow mean with time dtypes
#10227
Conversation
Closes pydata#5897 Closes pydata#6995 Closes pydata#10217
- Changed numeric_only from True to False in mean() aggregations - Added Dataset.mean() override to filter out string variables while preserving datetime/timedelta types - Only filters data variables, preserves all coordinates - Added comprehensive tests for datetime mean with edge cases including NaT handling - Tests cover Dataset, DataArray, groupby, and mixed type scenarios This enables mean() to work with datetime64 and timedelta64 types as requested in PR pydata#10227 while preventing errors from string variables.
- Changed numeric_only from True to False in mean() aggregations - Added Dataset.mean() override to filter out string variables while preserving datetime/timedelta types - Only filters data variables, preserves all coordinates - Added comprehensive tests for datetime mean with edge cases including NaT handling - Tests cover Dataset, DataArray, groupby, and mixed type scenarios This enables mean() to work with datetime64 and timedelta64 types as requested in PR pydata#10227 while preventing errors from string variables.
|
@dcherian I had Claude Code try and help, let's see how it does |
Add pytest.mark.skipif decorator to test_mean_preserves_non_string_object_arrays to skip the test in minimal dependency environments where cftime is not installed.
- Revert numeric_only back to True for mean() to prevent strings from being included - Add datetime64/timedelta64 types to numeric_only checks in Dataset.reduce() and flox path - Also check for object arrays containing datetime-like objects (cftime dates) - This allows mean() to work with datetime types while excluding strings that would cause errors
Auto-formatted the multi-line condition for better readability
- Adapted datetime/timedelta type checks to use pd.api.types.is_extension_array_dtype - Maintained datetime mean functionality with updated code structure 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Auto-formatted multi-line conditions for better readability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
The generated mean() methods for DatasetGroupBy and ResampleGroupBy were incorrectly passing numeric_only=False in the non-flox path, causing string variables to fail during reduction. Changed to numeric_only=True to match the flox path behavior. This fixes test_groupby_dataset_reduce_ellipsis failures. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Created a canonical helper function to check if a dtype can be used in numeric aggregations. This replaces complex repeated conditionals in dataset.py and groupby.py with a single, well-documented function. The helper checks for: - Numeric types (int, float, complex) - Boolean type - Datetime types (datetime64, timedelta64) - Object arrays containing datetime-like objects (e.g., cftime) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work Max & Claude Max
EDIT: oooo can claude max make unit tests for the linked issues?
- Document datetime mean behavior in generate_aggregations.py - Simplify test assertion using isnull().item() instead of pd.notna - Remove redundant test_mean_dataarray_datetime test - Add comprehensive tests for linked issues: - Issue pydata#5897: Test mean with cftime objects - Issue pydata#6995: Test groupby_bins with datetime64 mean - Issue pydata#10217: Test groupby_bins mean on time series data 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
The test was unconditionally using dask operations which caused failures in the "all-but-dask" CI environment. Now the dask-specific tests are only run when dask is available. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Split test_mean_with_cftime_objects into two tests: - Base test runs without dask - Dask-specific test uses @requires_dask decorator This follows xarray's standard pattern for dependency-specific tests and is cleaner than using if has_dask conditionals. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
dcherian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems right to me! I can't approve my own PR, so I',m happy for you to address comments and merge
- Created xarray/tests/CLAUDE.md with comprehensive guidelines for handling optional dependencies in tests - Updated cftime tests to use @requires_cftime decorator instead of pytest.importorskip, following xarray's standard patterns - This ensures consistent handling across CI environments 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Ensure all Python code blocks have complete, valid syntax for blackdoc. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
As suggested by @dcherian in review comment r2337966239, directly use _contains_cftime_datetimes(var._data) instead of the more complex check. This is cleaner since _contains_cftime_datetimes already handles the object dtype check internally. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Made CLAUDE.md more concise by removing verbose decorator listings - Moved _is_numeric_aggregatable_dtype import to top of groupby.py as suggested by @dcherian in review comment r2337968851 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Move datetime/timedelta note to global _NUMERIC_ONLY_NOTES constant (addresses comment r2337970143) - Merge test_mean_preserves_non_string_object_arrays into test_mean_with_cftime_objects (addresses comment r2337128692) - Both changes address reviewer feedback about simplification
…tterns Update testing guidelines to explicitly discourage pytest.mark.skipif in parametrize, recommending splitting tests or using @requires decorators
Co-authored-by: Claude <[email protected]>
|
ok, let's see how this goes. I added some instructions for better convention coverage; let's see if that helps with future changes |
* Allow `mean` with time dtypes Closes pydata#5897 Closes pydata#6995 Closes pydata#10217 * Allow mean() to work with datetime dtypes - Changed numeric_only from True to False in mean() aggregations - Added Dataset.mean() override to filter out string variables while preserving datetime/timedelta types - Only filters data variables, preserves all coordinates - Added comprehensive tests for datetime mean with edge cases including NaT handling - Tests cover Dataset, DataArray, groupby, and mixed type scenarios This enables mean() to work with datetime64 and timedelta64 types as requested in PR pydata#10227 while preventing errors from string variables. * Skip cftime test when cftime not available Add pytest.mark.skipif decorator to test_mean_preserves_non_string_object_arrays to skip the test in minimal dependency environments where cftime is not installed. * Fix string/datetime handling in mean() aggregations - Revert numeric_only back to True for mean() to prevent strings from being included - Add datetime64/timedelta64 types to numeric_only checks in Dataset.reduce() and flox path - Also check for object arrays containing datetime-like objects (cftime dates) - This allows mean() to work with datetime types while excluding strings that would cause errors * Trigger CI workflow * Format groupby.py numeric_only condition Auto-formatted the multi-line condition for better readability * Apply formatting changes to dataset.py and test_groupby.py - Auto-formatted multi-line conditions for better readability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix numeric_only parameter in groupby mean for non-flox path The generated mean() methods for DatasetGroupBy and ResampleGroupBy were incorrectly passing numeric_only=False in the non-flox path, causing string variables to fail during reduction. Changed to numeric_only=True to match the flox path behavior. This fixes test_groupby_dataset_reduce_ellipsis failures. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Simplify dtype checking with _is_numeric_aggregatable_dtype helper Created a canonical helper function to check if a dtype can be used in numeric aggregations. This replaces complex repeated conditionals in dataset.py and groupby.py with a single, well-documented function. The helper checks for: - Numeric types (int, float, complex) - Boolean type - Datetime types (datetime64, timedelta64) - Object arrays containing datetime-like objects (e.g., cftime) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Address review feedback from @dcherian - Document datetime mean behavior in generate_aggregations.py - Simplify test assertion using isnull().item() instead of pd.notna - Remove redundant test_mean_dataarray_datetime test - Add comprehensive tests for linked issues: - Issue pydata#5897: Test mean with cftime objects - Issue pydata#6995: Test groupby_bins with datetime64 mean - Issue pydata#10217: Test groupby_bins mean on time series data 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Fix test_mean_with_cftime_objects for non-dask environments The test was unconditionally using dask operations which caused failures in the "all-but-dask" CI environment. Now the dask-specific tests are only run when dask is available. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Use standard @requires_dask decorator for dask-specific tests Split test_mean_with_cftime_objects into two tests: - Base test runs without dask - Dask-specific test uses @requires_dask decorator This follows xarray's standard pattern for dependency-specific tests and is cleaner than using if has_dask conditionals. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Add testing guidelines and use proper decorators - Created xarray/tests/CLAUDE.md with comprehensive guidelines for handling optional dependencies in tests - Updated cftime tests to use @requires_cftime decorator instead of pytest.importorskip, following xarray's standard patterns - This ensures consistent handling across CI environments 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix CLAUDE.md blackdoc formatting Ensure all Python code blocks have complete, valid syntax for blackdoc. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Simplify cftime check in _is_numeric_aggregatable_dtype As suggested by @dcherian in review comment r2337966239, directly use _contains_cftime_datetimes(var._data) instead of the more complex check. This is cleaner since _contains_cftime_datetimes already handles the object dtype check internally. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Streamline CLAUDE.md and move import to top of file - Made CLAUDE.md more concise by removing verbose decorator listings - Moved _is_numeric_aggregatable_dtype import to top of groupby.py as suggested by @dcherian in review comment r2337968851 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Address review comments: clarify numeric_only scope and merge tests - Move datetime/timedelta note to global _NUMERIC_ONLY_NOTES constant (addresses comment r2337970143) - Merge test_mean_preserves_non_string_object_arrays into test_mean_with_cftime_objects (addresses comment r2337128692) - Both changes address reviewer feedback about simplification * Clarify that @requires decorators should be used instead of skipif patterns Update testing guidelines to explicitly discourage pytest.mark.skipif in parametrize, recommending splitting tests or using @requires decorators * Fix CLAUDE.md blackdoc formatting Co-authored-by: Claude <[email protected]> --------- Co-authored-by: Maximilian Roos <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Claude <[email protected]>
Closes #5897
Closes #6995
Closes #10217
I don't have the bandwidth to complete it so much appreciated if someone else can take this on.
This needs more testing.