Allow `mean` with time dtypes #10227

dcherian · 2025-04-15T21:20:50Z

I don't have the bandwidth to complete it so much appreciated if someone else can take this on.

This needs more testing.

Closes pydata#5897 Closes pydata#6995 Closes pydata#10217

- Changed numeric_only from True to False in mean() aggregations - Added Dataset.mean() override to filter out string variables while preserving datetime/timedelta types - Only filters data variables, preserves all coordinates - Added comprehensive tests for datetime mean with edge cases including NaT handling - Tests cover Dataset, DataArray, groupby, and mixed type scenarios This enables mean() to work with datetime64 and timedelta64 types as requested in PR pydata#10227 while preventing errors from string variables.

max-sixty · 2025-09-08T20:26:00Z

@dcherian I had Claude Code try and help, let's see how it does

Add pytest.mark.skipif decorator to test_mean_preserves_non_string_object_arrays to skip the test in minimal dependency environments where cftime is not installed.

- Revert numeric_only back to True for mean() to prevent strings from being included - Add datetime64/timedelta64 types to numeric_only checks in Dataset.reduce() and flox path - Also check for object arrays containing datetime-like objects (cftime dates) - This allows mean() to work with datetime types while excluding strings that would cause errors

Auto-formatted the multi-line condition for better readability

- Adapted datetime/timedelta type checks to use pd.api.types.is_extension_array_dtype - Maintained datetime mean functionality with updated code structure 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Auto-formatted multi-line conditions for better readability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

for more information, see https://pre-commit.ci

The generated mean() methods for DatasetGroupBy and ResampleGroupBy were incorrectly passing numeric_only=False in the non-flox path, causing string variables to fail during reduction. Changed to numeric_only=True to match the flox path behavior. This fixes test_groupby_dataset_reduce_ellipsis failures. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Created a canonical helper function to check if a dtype can be used in numeric aggregations. This replaces complex repeated conditionals in dataset.py and groupby.py with a single, well-documented function. The helper checks for: - Numeric types (int, float, complex) - Boolean type - Datetime types (datetime64, timedelta64) - Object arrays containing datetime-like objects (e.g., cftime) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

xarray/core/common.py

xarray/core/_aggregations.py

xarray/tests/test_groupby.py

dcherian

nice work Max & Claude Max

EDIT: oooo can claude max make unit tests for the linked issues?

- Document datetime mean behavior in generate_aggregations.py - Simplify test assertion using isnull().item() instead of pd.notna - Remove redundant test_mean_dataarray_datetime test - Add comprehensive tests for linked issues: - Issue pydata#5897: Test mean with cftime objects - Issue pydata#6995: Test groupby_bins with datetime64 mean - Issue pydata#10217: Test groupby_bins mean on time series data 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

The test was unconditionally using dask operations which caused failures in the "all-but-dask" CI environment. Now the dask-specific tests are only run when dask is available. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Split test_mean_with_cftime_objects into two tests: - Base test runs without dask - Dask-specific test uses @requires_dask decorator This follows xarray's standard pattern for dependency-specific tests and is cleaner than using if has_dask conditionals. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

xarray/core/groupby.py

xarray/util/generate_aggregations.py

xarray/tests/test_groupby.py

dcherian

Seems right to me! I can't approve my own PR, so I',m happy for you to address comments and merge

- Created xarray/tests/CLAUDE.md with comprehensive guidelines for handling optional dependencies in tests - Updated cftime tests to use @requires_cftime decorator instead of pytest.importorskip, following xarray's standard patterns - This ensures consistent handling across CI environments 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

for more information, see https://pre-commit.ci

Ensure all Python code blocks have complete, valid syntax for blackdoc. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

@dcherian

As suggested by @dcherian in review comment r2337966239, directly use _contains_cftime_datetimes(var._data) instead of the more complex check. This is cleaner since _contains_cftime_datetimes already handles the object dtype check internally. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

@dcherian

- Made CLAUDE.md more concise by removing verbose decorator listings - Moved _is_numeric_aggregatable_dtype import to top of groupby.py as suggested by @dcherian in review comment r2337968851 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Move datetime/timedelta note to global _NUMERIC_ONLY_NOTES constant (addresses comment r2337970143) - Merge test_mean_preserves_non_string_object_arrays into test_mean_with_cftime_objects (addresses comment r2337128692) - Both changes address reviewer feedback about simplification

@requires

…tterns Update testing guidelines to explicitly discourage pytest.mark.skipif in parametrize, recommending splitting tests or using @requires decorators

Co-authored-by: Claude <[email protected]>

max-sixty · 2025-09-11T01:11:13Z

ok, let's see how this goes.

I added some instructions for better convention coverage; let's see if that helps with future changes

@dcherian

* Allow `mean` with time dtypes Closes pydata#5897 Closes pydata#6995 Closes pydata#10217 * Allow mean() to work with datetime dtypes - Changed numeric_only from True to False in mean() aggregations - Added Dataset.mean() override to filter out string variables while preserving datetime/timedelta types - Only filters data variables, preserves all coordinates - Added comprehensive tests for datetime mean with edge cases including NaT handling - Tests cover Dataset, DataArray, groupby, and mixed type scenarios This enables mean() to work with datetime64 and timedelta64 types as requested in PR pydata#10227 while preventing errors from string variables. * Skip cftime test when cftime not available Add pytest.mark.skipif decorator to test_mean_preserves_non_string_object_arrays to skip the test in minimal dependency environments where cftime is not installed. * Fix string/datetime handling in mean() aggregations - Revert numeric_only back to True for mean() to prevent strings from being included - Add datetime64/timedelta64 types to numeric_only checks in Dataset.reduce() and flox path - Also check for object arrays containing datetime-like objects (cftime dates) - This allows mean() to work with datetime types while excluding strings that would cause errors * Trigger CI workflow * Format groupby.py numeric_only condition Auto-formatted the multi-line condition for better readability * Apply formatting changes to dataset.py and test_groupby.py - Auto-formatted multi-line conditions for better readability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix numeric_only parameter in groupby mean for non-flox path The generated mean() methods for DatasetGroupBy and ResampleGroupBy were incorrectly passing numeric_only=False in the non-flox path, causing string variables to fail during reduction. Changed to numeric_only=True to match the flox path behavior. This fixes test_groupby_dataset_reduce_ellipsis failures. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Simplify dtype checking with _is_numeric_aggregatable_dtype helper Created a canonical helper function to check if a dtype can be used in numeric aggregations. This replaces complex repeated conditionals in dataset.py and groupby.py with a single, well-documented function. The helper checks for: - Numeric types (int, float, complex) - Boolean type - Datetime types (datetime64, timedelta64) - Object arrays containing datetime-like objects (e.g., cftime) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Address review feedback from @dcherian - Document datetime mean behavior in generate_aggregations.py - Simplify test assertion using isnull().item() instead of pd.notna - Remove redundant test_mean_dataarray_datetime test - Add comprehensive tests for linked issues: - Issue pydata#5897: Test mean with cftime objects - Issue pydata#6995: Test groupby_bins with datetime64 mean - Issue pydata#10217: Test groupby_bins mean on time series data 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Fix test_mean_with_cftime_objects for non-dask environments The test was unconditionally using dask operations which caused failures in the "all-but-dask" CI environment. Now the dask-specific tests are only run when dask is available. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Use standard @requires_dask decorator for dask-specific tests Split test_mean_with_cftime_objects into two tests: - Base test runs without dask - Dask-specific test uses @requires_dask decorator This follows xarray's standard pattern for dependency-specific tests and is cleaner than using if has_dask conditionals. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Add testing guidelines and use proper decorators - Created xarray/tests/CLAUDE.md with comprehensive guidelines for handling optional dependencies in tests - Updated cftime tests to use @requires_cftime decorator instead of pytest.importorskip, following xarray's standard patterns - This ensures consistent handling across CI environments 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix CLAUDE.md blackdoc formatting Ensure all Python code blocks have complete, valid syntax for blackdoc. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Simplify cftime check in _is_numeric_aggregatable_dtype As suggested by @dcherian in review comment r2337966239, directly use _contains_cftime_datetimes(var._data) instead of the more complex check. This is cleaner since _contains_cftime_datetimes already handles the object dtype check internally. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Streamline CLAUDE.md and move import to top of file - Made CLAUDE.md more concise by removing verbose decorator listings - Moved _is_numeric_aggregatable_dtype import to top of groupby.py as suggested by @dcherian in review comment r2337968851 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Address review comments: clarify numeric_only scope and merge tests - Move datetime/timedelta note to global _NUMERIC_ONLY_NOTES constant (addresses comment r2337970143) - Merge test_mean_preserves_non_string_object_arrays into test_mean_with_cftime_objects (addresses comment r2337128692) - Both changes address reviewer feedback about simplification * Clarify that @requires decorators should be used instead of skipif patterns Update testing guidelines to explicitly discourage pytest.mark.skipif in parametrize, recommending splitting tests or using @requires decorators * Fix CLAUDE.md blackdoc formatting Co-authored-by: Claude <[email protected]> --------- Co-authored-by: Maximilian Roos <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Claude <[email protected]>

Allow mean with time dtypes

9ffe4b0

Closes pydata#5897 Closes pydata#6995 Closes pydata#10217

github-actions bot added topic-documentation topic-NamedArray Lightweight version of Variable labels Apr 15, 2025

dcherian mentioned this pull request Apr 15, 2025

groupby_bins fails().mean() on time series data #10217

Closed

5 tasks

dcherian added topic-groupby contrib-help-wanted labels Apr 15, 2025

dcherian removed the topic-documentation label Jul 11, 2025

max-sixty and others added 9 commits September 8, 2025 13:29

Skip cftime test when cftime not available

75ad92e

Add pytest.mark.skipif decorator to test_mean_preserves_non_string_object_arrays to skip the test in minimal dependency environments where cftime is not installed.

Trigger CI workflow

ad0ddca

Format groupby.py numeric_only condition

20ec051

Auto-formatted the multi-line condition for better readability

Apply formatting changes to dataset.py and test_groupby.py

ebda6c7

- Auto-formatted multi-line conditions for better readability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

8903fe0

for more information, see https://pre-commit.ci