Skip to content

[feature request] __getbatch__ for dataset to use with dataloader #18096

@Roulbac

Description

@Roulbac

Dataloader call getitem as many times as indices in the current batch.
In case datasets support a list of indices in one call, or a native python slice object, add a getbatch (optional) to pytorch dataset class.
If dataloader sees the dataset has such a method implemented, make it fetch the batch corresping to the list of indices with one getbatch call rather than many getitem calls.
This saves the trouble of calling getitem many times in cases where for example a disk file has to be opened then closed for every call and the underlying data structure supports a slicing or lists of indices for example hdf5 files.

Metadata

Metadata

Assignees

Labels

featureA request for a proper, new feature.high prioritymodule: dataloaderRelated to torch.utils.data.DataLoader and SamplertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions