Skip to content

Conversation

@dcherian
Copy link
Contributor

@dcherian dcherian commented Apr 15, 2025

Closes #5897
Closes #6995
Closes #10217

I don't have the bandwidth to complete it so much appreciated if someone else can take this on.

This needs more testing.

@github-actions github-actions bot added topic-documentation topic-NamedArray Lightweight version of Variable labels Apr 15, 2025
max-sixty added a commit to max-sixty/xarray that referenced this pull request Sep 7, 2025
- Changed numeric_only from True to False in mean() aggregations
- Added Dataset.mean() override to filter out string variables while preserving datetime/timedelta types
- Only filters data variables, preserves all coordinates
- Added comprehensive tests for datetime mean with edge cases including NaT handling
- Tests cover Dataset, DataArray, groupby, and mixed type scenarios

This enables mean() to work with datetime64 and timedelta64 types as requested in PR pydata#10227 while preventing errors from string variables.
- Changed numeric_only from True to False in mean() aggregations
- Added Dataset.mean() override to filter out string variables while preserving datetime/timedelta types
- Only filters data variables, preserves all coordinates
- Added comprehensive tests for datetime mean with edge cases including NaT handling
- Tests cover Dataset, DataArray, groupby, and mixed type scenarios

This enables mean() to work with datetime64 and timedelta64 types as requested in PR pydata#10227 while preventing errors from string variables.
@max-sixty
Copy link
Collaborator

@dcherian I had Claude Code try and help, let's see how it does

max-sixty and others added 9 commits September 8, 2025 13:29
Add pytest.mark.skipif decorator to test_mean_preserves_non_string_object_arrays
to skip the test in minimal dependency environments where cftime is not installed.
- Revert numeric_only back to True for mean() to prevent strings from being included
- Add datetime64/timedelta64 types to numeric_only checks in Dataset.reduce() and flox path
- Also check for object arrays containing datetime-like objects (cftime dates)
- This allows mean() to work with datetime types while excluding strings that would cause errors
Auto-formatted the multi-line condition for better readability
- Adapted datetime/timedelta type checks to use pd.api.types.is_extension_array_dtype
- Maintained datetime mean functionality with updated code structure

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Auto-formatted multi-line conditions for better readability

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
The generated mean() methods for DatasetGroupBy and ResampleGroupBy
were incorrectly passing numeric_only=False in the non-flox path,
causing string variables to fail during reduction. Changed to
numeric_only=True to match the flox path behavior.

This fixes test_groupby_dataset_reduce_ellipsis failures.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Created a canonical helper function to check if a dtype can be used in
numeric aggregations. This replaces complex repeated conditionals in
dataset.py and groupby.py with a single, well-documented function.

The helper checks for:
- Numeric types (int, float, complex)
- Boolean type
- Datetime types (datetime64, timedelta64)
- Object arrays containing datetime-like objects (e.g., cftime)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Copy link
Contributor Author

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work Max & Claude Max

EDIT: oooo can claude max make unit tests for the linked issues?

- Document datetime mean behavior in generate_aggregations.py
- Simplify test assertion using isnull().item() instead of pd.notna
- Remove redundant test_mean_dataarray_datetime test
- Add comprehensive tests for linked issues:
  - Issue pydata#5897: Test mean with cftime objects
  - Issue pydata#6995: Test groupby_bins with datetime64 mean
  - Issue pydata#10217: Test groupby_bins mean on time series data

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@max-sixty max-sixty marked this pull request as ready for review September 10, 2025 17:44
max-sixty and others added 2 commits September 10, 2025 10:50
The test was unconditionally using dask operations which caused failures
in the "all-but-dask" CI environment. Now the dask-specific tests are
only run when dask is available.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Split test_mean_with_cftime_objects into two tests:
- Base test runs without dask
- Dask-specific test uses @requires_dask decorator

This follows xarray's standard pattern for dependency-specific tests
and is cleaner than using if has_dask conditionals.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Copy link
Contributor Author

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems right to me! I can't approve my own PR, so I',m happy for you to address comments and merge

max-sixty and others added 9 commits September 10, 2025 15:47
- Created xarray/tests/CLAUDE.md with comprehensive guidelines for
  handling optional dependencies in tests
- Updated cftime tests to use @requires_cftime decorator instead of
  pytest.importorskip, following xarray's standard patterns
- This ensures consistent handling across CI environments

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Ensure all Python code blocks have complete, valid syntax for blackdoc.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
As suggested by @dcherian in review comment r2337966239, directly use
_contains_cftime_datetimes(var._data) instead of the more complex check.
This is cleaner since _contains_cftime_datetimes already handles the
object dtype check internally.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Made CLAUDE.md more concise by removing verbose decorator listings
- Moved _is_numeric_aggregatable_dtype import to top of groupby.py
  as suggested by @dcherian in review comment r2337968851

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Move datetime/timedelta note to global _NUMERIC_ONLY_NOTES constant
  (addresses comment r2337970143)
- Merge test_mean_preserves_non_string_object_arrays into
  test_mean_with_cftime_objects (addresses comment r2337128692)
- Both changes address reviewer feedback about simplification
…tterns

Update testing guidelines to explicitly discourage pytest.mark.skipif
in parametrize, recommending splitting tests or using @requires decorators
@max-sixty
Copy link
Collaborator

max-sixty commented Sep 11, 2025

ok, let's see how this goes.

I added some instructions for better convention coverage; let's see if that helps with future changes

@max-sixty max-sixty enabled auto-merge (squash) September 11, 2025 01:35
@max-sixty max-sixty merged commit 98cfee5 into pydata:main Sep 11, 2025
36 checks passed
shoyer pushed a commit to shoyer/xarray that referenced this pull request Sep 16, 2025
* Allow `mean` with time dtypes

Closes pydata#5897
Closes pydata#6995
Closes pydata#10217

* Allow mean() to work with datetime dtypes

- Changed numeric_only from True to False in mean() aggregations
- Added Dataset.mean() override to filter out string variables while preserving datetime/timedelta types
- Only filters data variables, preserves all coordinates
- Added comprehensive tests for datetime mean with edge cases including NaT handling
- Tests cover Dataset, DataArray, groupby, and mixed type scenarios

This enables mean() to work with datetime64 and timedelta64 types as requested in PR pydata#10227 while preventing errors from string variables.

* Skip cftime test when cftime not available

Add pytest.mark.skipif decorator to test_mean_preserves_non_string_object_arrays
to skip the test in minimal dependency environments where cftime is not installed.

* Fix string/datetime handling in mean() aggregations

- Revert numeric_only back to True for mean() to prevent strings from being included
- Add datetime64/timedelta64 types to numeric_only checks in Dataset.reduce() and flox path
- Also check for object arrays containing datetime-like objects (cftime dates)
- This allows mean() to work with datetime types while excluding strings that would cause errors

* Trigger CI workflow

* Format groupby.py numeric_only condition

Auto-formatted the multi-line condition for better readability

* Apply formatting changes to dataset.py and test_groupby.py

- Auto-formatted multi-line conditions for better readability

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix numeric_only parameter in groupby mean for non-flox path

The generated mean() methods for DatasetGroupBy and ResampleGroupBy
were incorrectly passing numeric_only=False in the non-flox path,
causing string variables to fail during reduction. Changed to
numeric_only=True to match the flox path behavior.

This fixes test_groupby_dataset_reduce_ellipsis failures.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* Simplify dtype checking with _is_numeric_aggregatable_dtype helper

Created a canonical helper function to check if a dtype can be used in
numeric aggregations. This replaces complex repeated conditionals in
dataset.py and groupby.py with a single, well-documented function.

The helper checks for:
- Numeric types (int, float, complex)
- Boolean type
- Datetime types (datetime64, timedelta64)
- Object arrays containing datetime-like objects (e.g., cftime)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* Address review feedback from @dcherian

- Document datetime mean behavior in generate_aggregations.py
- Simplify test assertion using isnull().item() instead of pd.notna
- Remove redundant test_mean_dataarray_datetime test
- Add comprehensive tests for linked issues:
  - Issue pydata#5897: Test mean with cftime objects
  - Issue pydata#6995: Test groupby_bins with datetime64 mean
  - Issue pydata#10217: Test groupby_bins mean on time series data

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* Fix test_mean_with_cftime_objects for non-dask environments

The test was unconditionally using dask operations which caused failures
in the "all-but-dask" CI environment. Now the dask-specific tests are
only run when dask is available.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* Use standard @requires_dask decorator for dask-specific tests

Split test_mean_with_cftime_objects into two tests:
- Base test runs without dask
- Dask-specific test uses @requires_dask decorator

This follows xarray's standard pattern for dependency-specific tests
and is cleaner than using if has_dask conditionals.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* Add testing guidelines and use proper decorators

- Created xarray/tests/CLAUDE.md with comprehensive guidelines for
  handling optional dependencies in tests
- Updated cftime tests to use @requires_cftime decorator instead of
  pytest.importorskip, following xarray's standard patterns
- This ensures consistent handling across CI environments

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix CLAUDE.md blackdoc formatting

Ensure all Python code blocks have complete, valid syntax for blackdoc.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* Simplify cftime check in _is_numeric_aggregatable_dtype

As suggested by @dcherian in review comment r2337966239, directly use
_contains_cftime_datetimes(var._data) instead of the more complex check.
This is cleaner since _contains_cftime_datetimes already handles the
object dtype check internally.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* Streamline CLAUDE.md and move import to top of file

- Made CLAUDE.md more concise by removing verbose decorator listings
- Moved _is_numeric_aggregatable_dtype import to top of groupby.py
  as suggested by @dcherian in review comment r2337968851

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* Address review comments: clarify numeric_only scope and merge tests

- Move datetime/timedelta note to global _NUMERIC_ONLY_NOTES constant
  (addresses comment r2337970143)
- Merge test_mean_preserves_non_string_object_arrays into
  test_mean_with_cftime_objects (addresses comment r2337128692)
- Both changes address reviewer feedback about simplification

* Clarify that @requires decorators should be used instead of skipif patterns

Update testing guidelines to explicitly discourage pytest.mark.skipif
in parametrize, recommending splitting tests or using @requires decorators

* Fix CLAUDE.md blackdoc formatting

Co-authored-by: Claude <[email protected]>

---------

Co-authored-by: Maximilian Roos <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

2 participants