-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[data] Add iterator batch_format=None support, which will yield batches in the current batch format with zero copies #33562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
| batch_size: Optional[Union[int, Literal["default"]]] = "default", | ||
| compute: Optional[Union[str, ComputeStrategy]] = None, | ||
| batch_format: Literal["default", "pandas", "pyarrow", "numpy"] = "default", | ||
| batch_format: Optional[str] = "default", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep it as Optional[Literal] for full explicitness of supported batch formats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like that is hard to maintain (per the inconsistencies in the code already), so opted to go unify on the shorter signature.
Signed-off-by: Eric Liang <[email protected]>
Done |
| ``"numpy"`` to select ``numpy.ndarray`` for tensor datasets and | ||
| ``Dict[str, numpy.ndarray]`` for tabular datasets. Default is "default". | ||
| ``Dict[str, numpy.ndarray]`` for tabular datasets, or None to return | ||
| the underlying block exactly as is with no additional formatting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like batch_size=None a good bit more than adding another literal string!
…es in the current batch format with zero copies (ray-project#33562) This PR is a cleanup of ray-project#33536 It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format. Signed-off-by: elliottower <[email protected]>
…project#324… (ray-project#33601) The failure in rllib should have been fixed by ray-project#33562 Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`. Signed-off-by: elliottower <[email protected]>
…es in the current batch format with zero copies (ray-project#33562) This PR is a cleanup of ray-project#33536 It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format. Signed-off-by: Jack He <[email protected]>
…project#324… (ray-project#33601) The failure in rllib should have been fixed by ray-project#33562 Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`. Signed-off-by: Jack He <[email protected]>
Why are these changes needed?
This PR is a cleanup of #33536
It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format.