Document exclusive-end chunk_ids range and broaden type annotation in pai_utils#89
Open
lonexreb wants to merge 1 commit intoNVlabs:mainfrom
Open
Conversation
PhysicalAIAVDatasetLocalInterface.__init__ accepts a wider variety of chunk_ids inputs than its annotation and docstring suggest: - The annotation is `list[int] | None` but the body branches on `str` (range form), `int` (single id), and any iterable. Static checkers reject the documented `"0-9"` range usage. - The docstring says `range string (e.g. "0-9")` without spelling out whether the end is inclusive. The implementation does `range(chunk_start, chunk_end)`, which is **exclusive** of the end -- matching the semantics in scripts/download_pai.py (PR NVlabs#81 docs the same) and scripts/curate_pai_samples.py (PR NVlabs#87). A user reading "0-9" would reasonably guess inclusive and silently lose chunk 9. This commit: - Broadens the type hint to `list[int] | tuple[int, ...] | int | str | None`, matching the actual isinstance branches in the body. - Replaces the one-line docstring with a bullet list of accepted forms, spells out the exclusive-end semantics with a worked example (`"0-10"` -> chunks `0..9`), and cross-references the two scripts that share the convention so they stay in lockstep. Pure docs + annotation. No runtime behavior change. Signed-off-by: lonexreb <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
PhysicalAIAVDatasetLocalInterface.__init__insrc/alpamayo_r1/data/pai_utils.pyaccepts a wider variety ofchunk_idsinputs than its annotation and docstring suggest:The annotation is
list[int] | None, but the body has explicit branches forstr(range form),int(single id), and any iterable. Static checkers reject the documented"0-9"range usage outright.The docstring reads:
…without spelling out whether the end is inclusive. The implementation does
range(chunk_start, chunk_end), which is exclusive of the end — matching the semantics inscripts/download_pai.py(called out explicitly in Accept multiple --chunk-ids without quoting in download_pai.py #81) andscripts/curate_pai_samples.py(made explicit in Document exclusive-end --chunk range in curate_pai_samples.py #87). A user reading the example"0-9"would reasonably guess "0..9 inclusive" and silently lose chunk 9.Fix
Pure docs + annotation. No runtime behavior change.
list[int] | tuple[int, ...] | int | str | None, matching the actualisinstancebranches in the body."0-10"→ chunks0..9), and cross-reference the two scripts that share the convention so they stay in lockstep with Accept multiple --chunk-ids without quoting in download_pai.py #81 and Document exclusive-end --chunk range in curate_pai_samples.py #87.Verification
The new annotation accepts every concrete shape the body's
isinstancechain handles. Range example matches the existing implementation (range(chunk_start, chunk_end)).