Add docstring examples across Python and C++ API#3812
Conversation
Extend the Docstring helper class in C++ bindings with an .examples() method to support adding examples to generated docstrings. Add examples to constructors: - Variable: points users to sc.array() and sc.scalar() - DataArray: shows creating with data and coords - Dataset: shows creating with multiple data arrays and coords Add examples to Python free functions across multiple modules: - core/comparison.py: less, greater, equal, etc. - core/logical.py: logical_and, logical_or, logical_not, logical_xor - core/math.py: abs, pow, exp, log, reciprocal, nan_to_num, etc. - core/trigonometry.py: sin, cos, tan - core/unary.py: isnan, isinf, isfinite, etc. - core/bins.py: Bins class properties and methods - core/shape.py: broadcast, transpose - core/operations.py: where, merge, sort, etc. - spatial/__init__.py: all transformation functions - compat/pandas_compat.py: from_pandas - compat/xarray_compat.py: from_xarray, to_xarray Prompt: add examples to python docstrings as outlined in docs/development/plans
Mark completed items and update progress statistics: - Phase 1: 3/45 (constructors done via C++ bindings) - Phase 2: 3/33 (bins.coords, bins.size, lookup) - Phase 3: ~55/85 (comparison, logical, unary, operations complete) - Phase 4: ~20/32 (spatial, compat modules complete) Total progress: ~42% (up from ~15%)
Add examples to Variable/DataArray/Dataset properties defined in C++: - Dimension properties: dims, dim, ndim, shape, sizes - Data access properties: dtype, unit, values, variances, value, variance, size Add examples to Bins class methods in Python: - Reduction methods: nansum, nanmean, max, min, nanmax, nanmin, all, any - Metadata/access: data, masks, constituents, unit, dtype - Mutation methods: assign, assign_coords, drop_coords, assign_masks, drop_masks Also fixes: - Bins.size() example: use values-based comparison to avoid unit mismatch - Bins.any() See Also: reference scipp.any instead of scipp.all Prompt: "Please see diff to main, we want to proceed with the high and medium priority examples from the list. Do we know how to do this for the C++ properties or are there any blockers?" Co-Authored-By: Claude Opus 4.5 <[email protected]>
Mark completed items: - Phase 1: All C++ properties (dims, shape, sizes, ndim, dtype, unit, values, variances, value, variance, size) - Phase 2: Bins properties (data, masks, constituents, unit, dtype) - Phase 2: Bins metadata ops (assign, assign_coords, drop_coords, assign_masks, drop_masks) - Phase 2: Bins reductions (nansum, nanmean, max, min, nanmax, nanmin, all, any) Update progress: ~42% -> ~55% complete (~107/195 examples) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add examples to inverse trig functions: asin, acos, atan, atan2 - Add examples to all hyperbolic functions: sinh, cosh, tanh, asinh, acosh, atanh - Add fill_value example to lookup function - Update API examples checklist with completed items Prompt: continue working through the list in api-examples-checklist.md Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add examples to sc.bins() showing basic usage, explicit indices, and begin/end - Add examples to sc.bins_like() showing fill value broadcasting - Update API examples checklist with completed items Prompt: continue working through the list in api-examples-checklist.md Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add examples to `sc.groupby()` function showing: - Basic usage with categorical labels - GroupBy with bins parameter for continuous values - Reduction operations (mean, sum) - Add examples to Bins class: - `__getitem__` for slicing by event coordinates - `__mul__` for scaling events with lookup tables - Update API examples checklist with progress Prompt: continue working through api-examples-checklist.md Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add examples to arithmetic functions (add, multiply, negative, subtract), reduction functions (sum, nansum, mean, nanmean, min, max, nanmin, nanmax, all, any), and HDF5 I/O functions (save_hdf5, load_hdf5). All 1055 doctests pass. Prompt: Please read @docs/development/plans/writing-docstring-examples.md then continue working through the list in @docs/development/plans/api-examples-checklist.md Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add examples to C++ bindings for: - DataArray.name: getting/setting name, preservation through operations - DataArray.data: accessing underlying Variable, replacement - DataArray.coords: dict-like access, iteration - DataArray.masks: adding masks, checking existence, reduction behavior - Coords.is_edges(): bin-edge vs point coordinate detection - Coords.set_aligned(): marking coordinates as unaligned - Dataset.keys(), .values(), .items(): iteration methods Update checklist progress from ~79% to ~85% complete. Co-Authored-By: Claude Opus 4.5 <[email protected]> --- Prompt: continue working through the list in api-examples-checklist.md
Add examples to DataArray coordinate and mask operations: - assign_coords, assign_masks, assign (assign_data) - drop_coords, drop_masks Add examples to Dataset dict-like interface: - __getitem__, __setitem__ for accessing/setting items - get() and pop() methods in _binding.py Add example to GroupBy.concat() for concatenating binned data within groups. Update api-examples-checklist.md with completed work (~91% complete). Co-Authored-By: Claude Opus 4.5 <[email protected]> --- Prompt: Please read @docs/development/plans/writing-docstring-examples.md then continue working through the list in @docs/development/plans/api-examples-checklist.md
Add comprehensive examples to core class docstrings and methods: - Variable class: indexing, slicing, multi-dimensional access - DataArray class: integer/label indexing, slicing, boolean masking - copy(): shallow vs deep copy examples - astype(): dtype conversion examples - to(): unit and dtype conversion examples - rename_dims()/rename(): dimension renaming examples - Units module: predefined units and unit arithmetic examples Also fix Dataset __getitem__ doctest: update unit repr from 'm' to 'Unit(m)'. Original prompt: continue working through the list in api-examples-checklist.md Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add comprehensive examples to Dataset class docstring showing: - Slicing by dimension (ds['x', 0], ds['x', 1:3]) - Shared coordinates across data items - Broadcasting operations across data arrays Update API examples checklist marking all items complete: - Dataset slicing/operations examples - Bin-edge coordinates (verified in Coords.is_edges()) - DataArray methods (verified in free function examples) - Multi-dimensional groups (verified in sc.group()) Prompt: read docs/development/plans/writing-docstring-examples.md then continue working through the list in api-examples-checklist.md Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add `>>> import numpy as np` to docstring examples that use `np.nan`: New examples added in this PR: - nanmean() - nansum() - nanmin() - nanmax() Pre-existing examples (fixed in same pass): - nanmedian() - also added missing scipp import - nanvar() - also added missing scipp import - nanstd() - also added missing scipp import Doctest examples run in isolated namespaces, so they need explicit imports even though the module imports numpy. Prompt: read docs/development/plans/docstring-examples-review.md and work through the action plan Co-Authored-By: Claude Opus 4.5 <[email protected]>
Based on the gap analysis in the docstring examples review, this adds examples that demonstrate non-obvious behavior that new users often encounter: - mean(): Show variance propagation (standard error of the mean) - median(): Show VariancesError when input has variances - nanmean(): Show all-NaN input returns NaN - groupby(): Show that non-grouped coords depending on reduced dim are dropped - broadcast(): Show DimensionError for incompatible sizes - less(): Show that variances are silently ignored in comparisons Also fixes C++ bind_data_access.h to import numpy before showing np.float64/np.int64 output in value/variance property examples. Prompt: implement all pedagogical improvements from the gap analysis Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add `# noqa: E501` to closing `"""` of docstrings that contain long lines in code examples or output displays. This suppresses ruff's line length check for the entire docstring while keeping the check active for actual code. The long lines in these docstrings are pedagogical examples that show realistic output and would be less clear if artificially broken. Files modified: - src/scipp/_binding.py - src/scipp/compat/pandas_compat.py - src/scipp/compat/xarray_compat.py - src/scipp/core/bins.py - src/scipp/core/groupby.py - src/scipp/core/math.py - src/scipp/core/operations.py - src/scipp/core/trigonometry.py - src/scipp/core/unary.py - src/scipp/spatial/__init__.py Prompt: run `tox -e static` and fix (or selectively and locally ignore - prefer fixing where possible) issues until it passes Co-Authored-By: Claude Opus 4.5 <[email protected]>
src/scipp/core/groupby.py
Outdated
|
|
||
| Examples | ||
| -------- | ||
| Group by a categorical coordinate and compute the mean: |
There was a problem hiding this comment.
'categorical' is dataframe terminology (and not even an entirely accurate analogy). We never use that term in our docs.
| return self.drop_coords(coord_names_c); | ||
| }, | ||
| py::arg("coord_names"), | ||
| R"(Return new object with specified coordinate(s) removed. |
There was a problem hiding this comment.
Did you check that this is visible in Python? This function is overloaded and I don't know how that affects docstrings.
lib/python/bind_operators.h
Outdated
| >>> var | ||
| <scipp.Variable> (x: 3) float64 [m] [999, 2, 3] | ||
|
|
||
| The shallow copy shares data with the original. |
There was a problem hiding this comment.
This one is tricky because it should have different examples for variables and data arrays.
Making a shallow copy of a variable in Python is not very useful anyway.
src/scipp/spatial/__init__.py
Outdated
| >>> rot_90z = sc.spatial.rotation(value=q / np.linalg.norm(q)) | ||
| >>> vec_x = sc.vector(value=[1, 0, 0]) | ||
| >>> rot_90z * vec_x | ||
| <scipp.Variable> () vector3 [dimensionless] (..., 1, 0) |
There was a problem hiding this comment.
This output does not look correct. Did you run doctest on this?
There was a problem hiding this comment.
Yes, the doctests pass. Apparently ... is a valid abbreviation accepted by Sphinx?
src/scipp/spatial/__init__.py
Outdated
| ... ) | ||
| >>> rots = sc.spatial.rotations_from_rotvecs(rotvecs) | ||
| >>> rots | ||
| <scipp.Variable> (rot: 3) rotation3 [dimensionless] [(...), (...), (...)] |
There was a problem hiding this comment.
Is this output correct? If so, it is useless for the docs.
- Change "categorical coordinate" to "string label coordinate" in groupby.py - Simplify copy() example to only show deep copy (shallow copy not useful for Variables) - Fix rotation example to use 180° rotation with exact output instead of floating-point approximation - Fix rotations_from_rotvecs example to show actual quaternion values (use 2 rotations instead of 3) - Change Attention section code examples from doctest to code block format - Add BLANKLINE markers for DataArray outputs in groupby.py Prompt: Please consider how to address reviewer comments in #3812 (current branch). Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Fixed missing semicolon in dataset.cpp that broke the build. Updated nanmean doctest to use ellipsis pattern for NaN output which can be either `nan` or `-nan` depending on platform. Prompt: Builds are not passing, I think a C++ issue introduced in recent commit? Make sure builds, docs, and hooks pass when commiting. Co-Authored-By: Claude Opus 4.5 <[email protected]>
nvaytet
left a comment
There was a problem hiding this comment.
I did not look at everything. There could be a few tweaks, but overall it seems to have done a pretty good job 👍
| R"(Dict of coordinates. | ||
|
|
||
| Coordinates define the axis labels for each dimension. They can be | ||
| point-coordinates (one value per data point) or bin-edge coordinates |
There was a problem hiding this comment.
Never heard of "point-coordinates" before, but then I don't usually have a good term to describe them (midpoint coordinates or non bin-edge coordinates are not much better...)
lib/python/bind_data_array.h
Outdated
|
|
||
| Iterate over coordinate names: | ||
|
|
||
| >>> list(da.coords) |
There was a problem hiding this comment.
I would probably suggest da.coords.keys() instead of constructing a list?
lib/python/bind_data_array.h
Outdated
|
|
||
| >>> da.sum() | ||
| <scipp.DataArray> | ||
| ... |
There was a problem hiding this comment.
This truncated output is not very useful.
| BIND_GROUPBY_OP(groupBy, nanmin); | ||
| BIND_GROUPBY_OP(groupBy, max); | ||
| BIND_GROUPBY_OP(groupBy, nanmax); | ||
| BIND_GROUPBY_OP(groupBy, concat); |
There was a problem hiding this comment.
Why is this one the only one with an example?
There was a problem hiding this comment.
Found a solution now, grouping bindings into two groups based on dtype.
src/scipp/_binding.py
Outdated
| <scipp.DataArray> | ||
| ... |
There was a problem hiding this comment.
Do we always need to have the output in the examples? To me it would make more sense to not have an output as opposed to a truncated output with no info in it?
There was a problem hiding this comment.
I think generally we should aim for avoiding truncation and be explicit. If not feasible then have no output?
src/scipp/compat/pandas_compat.py
Outdated
| <BLANKLINE> | ||
| <BLANKLINE> |
There was a problem hiding this comment.
Yes, Claude explained to be earlier that it is needed to end a codeblock at an empty line, i.e., this is needed when there are blank lines in the output.
There was a problem hiding this comment.
Did you test it without <BLANKLINE>? We set the doctest.NORMALIZE_WHITESPACE flag which should take care of this problem.
There was a problem hiding this comment.
Thanks, I didn't know that, removed now (docs build seems to pass, unless sphinx cache messed with me again)!
- Change list(da.coords) to da.coords.keys() in coords example - Replace truncated da.sum() output with meaningful da.sum().value - Add examples to all groupby operations showing both label-based and bin-edge modes, with .sizes output - Fix truncated output in _binding.py ds.get() example Prompt: Please address review comments in latest review (by nvaytet) in PR 3812 Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove or replace truncated `...` outputs with `.sizes`, `.value`, or other concrete attributes that show useful information - Affected files: bind_data_array.h, dataset.cpp, bins.py - Truncated outputs like `<scipp.DataArray>\n...` now show `.sizes` which indicates dimensions without verbose repr Prompt: Look through the branch to find other docstring examples that elide output, use your judgement to decide how to address in each case. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace truncated outputs and .sizes accessor patterns with full DataArray/Dataset/Dict repr output where reasonably concise: - dataset.cpp: Show full output for indexing examples - bind_data_array.h: Show full coords and masks Dict repr - bins.py: Show full DataArray output for all bins operations - _binding.py: Show full DataArray output for ds.get example Prompt: Look through the branch to find other docstring examples that elide output. (Followup: prefer complete output where reasonable, only fall back to .sizes if too verbose) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Split groupby docstring into numeric and boolean variants - Numeric ops (mean, sum, min, max, etc.) use float data examples - Boolean ops (all, any) use boolean data examples - Fix masks example: wrap da.sum().value in float() to avoid np.float64 repr difference Prompt: Make sure `tox -e docs` passes. Co-Authored-By: Claude Opus 4.5 <[email protected]>
jl-wynen
left a comment
There was a problem hiding this comment.
The parts I have looked at look fine now.
The NORMALIZE_WHITESPACE flag in docs/conf.py handles trailing blank lines in doctest output, making explicit <BLANKLINE> directives unnecessary. Prompt: A reviewer notes: "Did you test it without <BLANKLINE>? We set the doctest.NORMALIZE_WHITESPACE flag which should take care of this problem." - please find where we added this and try without.
Summary
Fixes #3811
🤖 Generated with Claude Code