Skip to content

Document exclusive-end chunk_ids range and broaden type annotation in pai_utils#89

Open
lonexreb wants to merge 1 commit intoNVlabs:mainfrom
lonexreb:docs/clarify-pai-utils-chunk-range-exclusive-end
Open

Document exclusive-end chunk_ids range and broaden type annotation in pai_utils#89
lonexreb wants to merge 1 commit intoNVlabs:mainfrom
lonexreb:docs/clarify-pai-utils-chunk-range-exclusive-end

Conversation

@lonexreb
Copy link
Copy Markdown
Contributor

@lonexreb lonexreb commented May 4, 2026

Problem

PhysicalAIAVDatasetLocalInterface.__init__ in src/alpamayo_r1/data/pai_utils.py accepts a wider variety of chunk_ids inputs than its annotation and docstring suggest:

  • The annotation is list[int] | None, but the body has explicit branches for str (range form), int (single id), and any iterable. Static checkers reject the documented "0-9" range usage outright.

  • The docstring reads:

    chunk_ids: List of chunk IDs to load, or a range string (e.g. "0-9").

    …without spelling out whether the end is inclusive. The implementation does range(chunk_start, chunk_end), which is exclusive of the end — matching the semantics in scripts/download_pai.py (called out explicitly in Accept multiple --chunk-ids without quoting in download_pai.py #81) and scripts/curate_pai_samples.py (made explicit in Document exclusive-end --chunk range in curate_pai_samples.py #87). A user reading the example "0-9" would reasonably guess "0..9 inclusive" and silently lose chunk 9.

Fix

Pure docs + annotation. No runtime behavior change.

Verification

The new annotation accepts every concrete shape the body's isinstance chain handles. Range example matches the existing implementation (range(chunk_start, chunk_end)).

PhysicalAIAVDatasetLocalInterface.__init__ accepts a wider variety of
chunk_ids inputs than its annotation and docstring suggest:

- The annotation is `list[int] | None` but the body branches on `str`
  (range form), `int` (single id), and any iterable. Static checkers
  reject the documented `"0-9"` range usage.

- The docstring says `range string (e.g. "0-9")` without spelling out
  whether the end is inclusive. The implementation does
  `range(chunk_start, chunk_end)`, which is **exclusive** of the end --
  matching the semantics in scripts/download_pai.py (PR NVlabs#81 docs the
  same) and scripts/curate_pai_samples.py (PR NVlabs#87). A user reading
  "0-9" would reasonably guess inclusive and silently lose chunk 9.

This commit:

- Broadens the type hint to
  `list[int] | tuple[int, ...] | int | str | None`, matching the actual
  isinstance branches in the body.

- Replaces the one-line docstring with a bullet list of accepted forms,
  spells out the exclusive-end semantics with a worked example
  (`"0-10"` -> chunks `0..9`), and cross-references the two scripts that
  share the convention so they stay in lockstep.

Pure docs + annotation. No runtime behavior change.

Signed-off-by: lonexreb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant