Skip to content

Add docstring examples across Python and C++ API#3812

Merged
SimonHeybrock merged 26 commits intomainfrom
api-examples
Jan 16, 2026
Merged

Add docstring examples across Python and C++ API#3812
SimonHeybrock merged 26 commits intomainfrom
api-examples

Conversation

@SimonHeybrock
Copy link
Copy Markdown
Member

Summary

  • Add comprehensive docstring examples to Python API modules: bins, groupby, trigonometry, hyperbolic, math, operations, reduction, shape, unary, comparison, logical, arithmetic, assignments, dimensions
  • Add examples to C++ bindings: Variable, DataArray, Dataset properties and methods, Coords/Masks operations, Bins methods
  • Add examples to spatial module (vectors, rotations, transforms, affine transforms)
  • Add examples to IO, units, and pandas/xarray compatibility modules
  • Examples follow NumPy-style formatting and demonstrate common usage patterns

Fixes #3811

🤖 Generated with Claude Code

SimonHeybrock and others added 17 commits January 14, 2026 13:53
Extend the Docstring helper class in C++ bindings with an .examples()
method to support adding examples to generated docstrings.

Add examples to constructors:
- Variable: points users to sc.array() and sc.scalar()
- DataArray: shows creating with data and coords
- Dataset: shows creating with multiple data arrays and coords

Add examples to Python free functions across multiple modules:
- core/comparison.py: less, greater, equal, etc.
- core/logical.py: logical_and, logical_or, logical_not, logical_xor
- core/math.py: abs, pow, exp, log, reciprocal, nan_to_num, etc.
- core/trigonometry.py: sin, cos, tan
- core/unary.py: isnan, isinf, isfinite, etc.
- core/bins.py: Bins class properties and methods
- core/shape.py: broadcast, transpose
- core/operations.py: where, merge, sort, etc.
- spatial/__init__.py: all transformation functions
- compat/pandas_compat.py: from_pandas
- compat/xarray_compat.py: from_xarray, to_xarray

Prompt: add examples to python docstrings as outlined in docs/development/plans
Mark completed items and update progress statistics:
- Phase 1: 3/45 (constructors done via C++ bindings)
- Phase 2: 3/33 (bins.coords, bins.size, lookup)
- Phase 3: ~55/85 (comparison, logical, unary, operations complete)
- Phase 4: ~20/32 (spatial, compat modules complete)

Total progress: ~42% (up from ~15%)
Add examples to Variable/DataArray/Dataset properties defined in C++:
- Dimension properties: dims, dim, ndim, shape, sizes
- Data access properties: dtype, unit, values, variances, value, variance, size

Add examples to Bins class methods in Python:
- Reduction methods: nansum, nanmean, max, min, nanmax, nanmin, all, any
- Metadata/access: data, masks, constituents, unit, dtype
- Mutation methods: assign, assign_coords, drop_coords, assign_masks, drop_masks

Also fixes:
- Bins.size() example: use values-based comparison to avoid unit mismatch
- Bins.any() See Also: reference scipp.any instead of scipp.all

Prompt: "Please see diff to main, we want to proceed with the high and medium
priority examples from the list. Do we know how to do this for the C++
properties or are there any blockers?"

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Mark completed items:
- Phase 1: All C++ properties (dims, shape, sizes, ndim, dtype, unit,
  values, variances, value, variance, size)
- Phase 2: Bins properties (data, masks, constituents, unit, dtype)
- Phase 2: Bins metadata ops (assign, assign_coords, drop_coords,
  assign_masks, drop_masks)
- Phase 2: Bins reductions (nansum, nanmean, max, min, nanmax, nanmin,
  all, any)

Update progress: ~42% -> ~55% complete (~107/195 examples)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add examples to inverse trig functions: asin, acos, atan, atan2
- Add examples to all hyperbolic functions: sinh, cosh, tanh, asinh, acosh, atanh
- Add fill_value example to lookup function
- Update API examples checklist with completed items

Prompt: continue working through the list in api-examples-checklist.md

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add examples to sc.bins() showing basic usage, explicit indices, and begin/end
- Add examples to sc.bins_like() showing fill value broadcasting
- Update API examples checklist with completed items

Prompt: continue working through the list in api-examples-checklist.md

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add examples to `sc.groupby()` function showing:
  - Basic usage with categorical labels
  - GroupBy with bins parameter for continuous values
  - Reduction operations (mean, sum)

- Add examples to Bins class:
  - `__getitem__` for slicing by event coordinates
  - `__mul__` for scaling events with lookup tables

- Update API examples checklist with progress

Prompt: continue working through api-examples-checklist.md

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add examples to arithmetic functions (add, multiply, negative, subtract),
reduction functions (sum, nansum, mean, nanmean, min, max, nanmin, nanmax,
all, any), and HDF5 I/O functions (save_hdf5, load_hdf5).

All 1055 doctests pass.

Prompt: Please read @docs/development/plans/writing-docstring-examples.md
then continue working through the list in
@docs/development/plans/api-examples-checklist.md

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add examples to C++ bindings for:
- DataArray.name: getting/setting name, preservation through operations
- DataArray.data: accessing underlying Variable, replacement
- DataArray.coords: dict-like access, iteration
- DataArray.masks: adding masks, checking existence, reduction behavior
- Coords.is_edges(): bin-edge vs point coordinate detection
- Coords.set_aligned(): marking coordinates as unaligned
- Dataset.keys(), .values(), .items(): iteration methods

Update checklist progress from ~79% to ~85% complete.

Co-Authored-By: Claude Opus 4.5 <[email protected]>

---
Prompt: continue working through the list in api-examples-checklist.md
Add examples to DataArray coordinate and mask operations:
- assign_coords, assign_masks, assign (assign_data)
- drop_coords, drop_masks

Add examples to Dataset dict-like interface:
- __getitem__, __setitem__ for accessing/setting items
- get() and pop() methods in _binding.py

Add example to GroupBy.concat() for concatenating binned data within groups.

Update api-examples-checklist.md with completed work (~91% complete).

Co-Authored-By: Claude Opus 4.5 <[email protected]>

---
Prompt: Please read @docs/development/plans/writing-docstring-examples.md then
continue working through the list in @docs/development/plans/api-examples-checklist.md
Add comprehensive examples to core class docstrings and methods:
- Variable class: indexing, slicing, multi-dimensional access
- DataArray class: integer/label indexing, slicing, boolean masking
- copy(): shallow vs deep copy examples
- astype(): dtype conversion examples
- to(): unit and dtype conversion examples
- rename_dims()/rename(): dimension renaming examples
- Units module: predefined units and unit arithmetic examples

Also fix Dataset __getitem__ doctest: update unit repr from 'm' to 'Unit(m)'.

Original prompt: continue working through the list in api-examples-checklist.md

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add comprehensive examples to Dataset class docstring showing:
- Slicing by dimension (ds['x', 0], ds['x', 1:3])
- Shared coordinates across data items
- Broadcasting operations across data arrays

Update API examples checklist marking all items complete:
- Dataset slicing/operations examples
- Bin-edge coordinates (verified in Coords.is_edges())
- DataArray methods (verified in free function examples)
- Multi-dimensional groups (verified in sc.group())

Prompt: read docs/development/plans/writing-docstring-examples.md then
continue working through the list in api-examples-checklist.md

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add `>>> import numpy as np` to docstring examples that use `np.nan`:

New examples added in this PR:
- nanmean()
- nansum()
- nanmin()
- nanmax()

Pre-existing examples (fixed in same pass):
- nanmedian() - also added missing scipp import
- nanvar() - also added missing scipp import
- nanstd() - also added missing scipp import

Doctest examples run in isolated namespaces, so they need explicit
imports even though the module imports numpy.

Prompt: read docs/development/plans/docstring-examples-review.md and work through the action plan

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Based on the gap analysis in the docstring examples review, this adds
examples that demonstrate non-obvious behavior that new users often
encounter:

- mean(): Show variance propagation (standard error of the mean)
- median(): Show VariancesError when input has variances
- nanmean(): Show all-NaN input returns NaN
- groupby(): Show that non-grouped coords depending on reduced dim are dropped
- broadcast(): Show DimensionError for incompatible sizes
- less(): Show that variances are silently ignored in comparisons

Also fixes C++ bind_data_access.h to import numpy before showing
np.float64/np.int64 output in value/variance property examples.

Prompt: implement all pedagogical improvements from the gap analysis

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add `# noqa: E501` to closing `"""` of docstrings that contain
long lines in code examples or output displays. This suppresses
ruff's line length check for the entire docstring while keeping
the check active for actual code.

The long lines in these docstrings are pedagogical examples that
show realistic output and would be less clear if artificially
broken.

Files modified:
- src/scipp/_binding.py
- src/scipp/compat/pandas_compat.py
- src/scipp/compat/xarray_compat.py
- src/scipp/core/bins.py
- src/scipp/core/groupby.py
- src/scipp/core/math.py
- src/scipp/core/operations.py
- src/scipp/core/trigonometry.py
- src/scipp/core/unary.py
- src/scipp/spatial/__init__.py

Prompt: run `tox -e static` and fix (or selectively and locally
ignore - prefer fixing where possible) issues until it passes

Co-Authored-By: Claude Opus 4.5 <[email protected]>

Examples
--------
Group by a categorical coordinate and compute the mean:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'categorical' is dataframe terminology (and not even an entirely accurate analogy). We never use that term in our docs.

return self.drop_coords(coord_names_c);
},
py::arg("coord_names"),
R"(Return new object with specified coordinate(s) removed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check that this is visible in Python? This function is overloaded and I don't know how that affects docstrings.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is!

>>> var
<scipp.Variable> (x: 3) float64 [m] [999, 2, 3]

The shallow copy shares data with the original.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is tricky because it should have different examples for variables and data arrays.
Making a shallow copy of a variable in Python is not very useful anyway.

>>> rot_90z = sc.spatial.rotation(value=q / np.linalg.norm(q))
>>> vec_x = sc.vector(value=[1, 0, 0])
>>> rot_90z * vec_x
<scipp.Variable> () vector3 [dimensionless] (..., 1, 0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output does not look correct. Did you run doctest on this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the doctests pass. Apparently ... is a valid abbreviation accepted by Sphinx?

... )
>>> rots = sc.spatial.rotations_from_rotvecs(rotvecs)
>>> rots
<scipp.Variable> (rot: 3) rotation3 [dimensionless] [(...), (...), (...)]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this output correct? If so, it is useless for the docs.

SimonHeybrock and others added 3 commits January 15, 2026 09:36
- Change "categorical coordinate" to "string label coordinate" in groupby.py
- Simplify copy() example to only show deep copy (shallow copy not useful for Variables)
- Fix rotation example to use 180° rotation with exact output instead of floating-point approximation
- Fix rotations_from_rotvecs example to show actual quaternion values (use 2 rotations instead of 3)
- Change Attention section code examples from doctest to code block format
- Add BLANKLINE markers for DataArray outputs in groupby.py

Prompt: Please consider how to address reviewer comments in #3812 (current branch).

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Fixed missing semicolon in dataset.cpp that broke the build.
Updated nanmean doctest to use ellipsis pattern for NaN output
which can be either `nan` or `-nan` depending on platform.

Prompt: Builds are not passing, I think a C++ issue introduced in recent commit? Make sure builds, docs, and hooks pass when commiting.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link
Copy Markdown
Member

@nvaytet nvaytet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not look at everything. There could be a few tweaks, but overall it seems to have done a pretty good job 👍

R"(Dict of coordinates.

Coordinates define the axis labels for each dimension. They can be
point-coordinates (one value per data point) or bin-edge coordinates
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never heard of "point-coordinates" before, but then I don't usually have a good term to describe them (midpoint coordinates or non bin-edge coordinates are not much better...)


Iterate over coordinate names:

>>> list(da.coords)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably suggest da.coords.keys() instead of constructing a list?


>>> da.sum()
<scipp.DataArray>
...
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This truncated output is not very useful.

BIND_GROUPBY_OP(groupBy, nanmin);
BIND_GROUPBY_OP(groupBy, max);
BIND_GROUPBY_OP(groupBy, nanmax);
BIND_GROUPBY_OP(groupBy, concat);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this one the only one with an example?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a solution now, grouping bindings into two groups based on dtype.

Comment on lines +98 to +99
<scipp.DataArray>
...
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always need to have the output in the examples? To me it would make more sense to not have an output as opposed to a truncated output with no info in it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think generally we should aim for avoiding truncation and be explicit. If not feasible then have no output?

Comment on lines +180 to +181
<BLANKLINE>
<BLANKLINE>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sphinx syntax?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Claude explained to be earlier that it is needed to end a codeblock at an empty line, i.e., this is needed when there are blank lines in the output.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test it without <BLANKLINE>? We set the doctest.NORMALIZE_WHITESPACE flag which should take care of this problem.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't know that, removed now (docs build seems to pass, unless sphinx cache messed with me again)!

SimonHeybrock and others added 4 commits January 15, 2026 13:43
- Change list(da.coords) to da.coords.keys() in coords example
- Replace truncated da.sum() output with meaningful da.sum().value
- Add examples to all groupby operations showing both label-based
  and bin-edge modes, with .sizes output
- Fix truncated output in _binding.py ds.get() example

Prompt: Please address review comments in latest review (by nvaytet) in PR 3812

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove or replace truncated `...` outputs with `.sizes`, `.value`,
  or other concrete attributes that show useful information
- Affected files: bind_data_array.h, dataset.cpp, bins.py
- Truncated outputs like `<scipp.DataArray>\n...` now show
  `.sizes` which indicates dimensions without verbose repr

Prompt: Look through the branch to find other docstring examples that
elide output, use your judgement to decide how to address in each case.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace truncated outputs and .sizes accessor patterns with full
DataArray/Dataset/Dict repr output where reasonably concise:

- dataset.cpp: Show full output for indexing examples
- bind_data_array.h: Show full coords and masks Dict repr
- bins.py: Show full DataArray output for all bins operations
- _binding.py: Show full DataArray output for ds.get example

Prompt: Look through the branch to find other docstring examples that
elide output. (Followup: prefer complete output where reasonable, only
fall back to .sizes if too verbose)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Split groupby docstring into numeric and boolean variants
  - Numeric ops (mean, sum, min, max, etc.) use float data examples
  - Boolean ops (all, any) use boolean data examples
- Fix masks example: wrap da.sum().value in float() to avoid
  np.float64 repr difference

Prompt: Make sure `tox -e docs` passes.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link
Copy Markdown
Member

@jl-wynen jl-wynen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parts I have looked at look fine now.

The NORMALIZE_WHITESPACE flag in docs/conf.py handles trailing blank
lines in doctest output, making explicit <BLANKLINE> directives
unnecessary.

Prompt: A reviewer notes: "Did you test it without <BLANKLINE>?
We set the doctest.NORMALIZE_WHITESPACE flag which should take care
of this problem." - please find where we added this and try without.
@SimonHeybrock SimonHeybrock merged commit a5ca95f into main Jan 16, 2026
4 checks passed
@SimonHeybrock SimonHeybrock deleted the api-examples branch January 16, 2026 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add more examples in docs

3 participants