[data] Add iterator batch_format=None support, which will yield batches in the current batch format with zero copies #33562

ericl · 2023-03-21T23:29:20Z

Why are these changes needed?

This PR is a cleanup of #33536

It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format.

Signed-off-by: Eric Liang <[email protected]>

amogkam · 2023-03-21T23:32:11Z

python/ray/data/dataset.py

        batch_size: Optional[Union[int, Literal["default"]]] = "default",
        compute: Optional[Union[str, ComputeStrategy]] = None,
-        batch_format: Literal["default", "pandas", "pyarrow", "numpy"] = "default",
+        batch_format: Optional[str] = "default",


keep it as Optional[Literal] for full explicitness of supported batch formats?

I feel like that is hard to maintain (per the inconsistencies in the code already), so opted to go unify on the shorter signature.

amogkam · 2023-03-21T23:34:09Z

lets also keep the documentation changes from https://github.com/ray-project/ray/pull/33536/files#diff-988f3832ac94d085daf61260175e2580920ebd1521dc760f58b426b94379d5b7L235?

Signed-off-by: Eric Liang <[email protected]>

ericl · 2023-03-22T02:14:07Z

lets also keep the documentation changes from https://github.com/ray-project/ray/pull/33536/files#diff-988f3832ac94d085daf61260175e2580920ebd1521dc760f58b426b94379d5b7L235?

Done

clarkzinzow · 2023-03-22T16:05:29Z

python/ray/data/dataset.py

                ``"numpy"`` to select ``numpy.ndarray`` for tensor datasets and
-                ``Dict[str, numpy.ndarray]`` for tabular datasets. Default is "default".
+                ``Dict[str, numpy.ndarray]`` for tabular datasets, or None to return
+                the underlying block exactly as is with no additional formatting.


Nice, I like batch_size=None a good bit more than adding another literal string!

#33601) The failure in rllib should have been fixed by #33562 Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`.

…es in the current batch format with zero copies (ray-project#33562) This PR is a cleanup of ray-project#33536 It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format. Signed-off-by: elliottower <[email protected]>

…project#324… (ray-project#33601) The failure in rllib should have been fixed by ray-project#33562 Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`. Signed-off-by: elliottower <[email protected]>

…es in the current batch format with zero copies (ray-project#33562) This PR is a cleanup of ray-project#33536 It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format. Signed-off-by: Jack He <[email protected]>

…project#324… (ray-project#33601) The failure in rllib should have been fixed by ray-project#33562 Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`. Signed-off-by: Jack He <[email protected]>

ericl added 6 commits March 21, 2023 14:54

wip

5d15807

wip

43ba159

Signed-off-by: Eric Liang <[email protected]>

wip

26b8d92

Signed-off-by: Eric Liang <[email protected]>

update

79cd16c

Signed-off-by: Eric Liang <[email protected]>

update

d876c83

Signed-off-by: Eric Liang <[email protected]>

fix

b310e35

Signed-off-by: Eric Liang <[email protected]>

ericl requested review from c21, clarkzinzow, jianoaix, jjyao and scv119 as code owners March 21, 2023 23:29

ericl assigned c21, amogkam and jianoaix Mar 21, 2023

amogkam approved these changes Mar 21, 2023

View reviewed changes

ericl mentioned this pull request Mar 22, 2023

[Data] Deprecate dataset_format #33437

Merged

8 tasks

add zero copy docs

4e6aec4

Signed-off-by: Eric Liang <[email protected]>

ericl requested review from a team and maxpumperla as code owners March 22, 2023 02:14

clarkzinzow approved these changes Mar 22, 2023

View reviewed changes

ericl merged commit 68afa43 into ray-project:master Mar 22, 2023

jianoaix mentioned this pull request Mar 22, 2023

Revert "[Datasets] Revert "Enable streaming executor by default (#324… #33601

Merged

8 tasks

ericl mentioned this pull request Mar 25, 2023

[Datasets] Add a new "zero_copy" batch format #32662

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[data] Add iterator batch_format=None support, which will yield batches in the current batch format with zero copies #33562

[data] Add iterator batch_format=None support, which will yield batches in the current batch format with zero copies #33562

Uh oh!

ericl commented Mar 21, 2023

Uh oh!

amogkam Mar 21, 2023

Uh oh!

ericl Mar 22, 2023

Uh oh!

amogkam commented Mar 21, 2023

Uh oh!

ericl commented Mar 22, 2023

Uh oh!

clarkzinzow Mar 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[data] Add iterator batch_format=None support, which will yield batches in the current batch format with zero copies #33562

[data] Add iterator batch_format=None support, which will yield batches in the current batch format with zero copies #33562

Uh oh!

Conversation

ericl commented Mar 21, 2023

Why are these changes needed?

Uh oh!

amogkam Mar 21, 2023

Choose a reason for hiding this comment

Uh oh!

ericl Mar 22, 2023

Choose a reason for hiding this comment

Uh oh!

amogkam commented Mar 21, 2023

Uh oh!

ericl commented Mar 22, 2023

Uh oh!

clarkzinzow Mar 22, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants