-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
I'm using the IterableDataset class to iterate through a list of tar files. Roughly, it works like
def__init__(self, chunk_list):
self.chunk_list = chunk_list
NUM_FILES_IN_TAR_FILE = 4096
self.length = NUM_FILES_IN_TAR_FILE * len(chunk_list)
def __iter__(self):
worker_info = torch.utils.data.get_worker_info()
if worker_info is None:
chunk_list = self.chunk_list
else:
start_index = worker_info.id
num_workers = worker_info.num_workers
chunk_list = self.chunk_list[start_index::num_workers]
while True:
for chunk in chunk_list: # e.g. 000009.tar
tar = tarfile.open(chunk)
for file in tar.getmembers():
# do something
yield data
def __len__(self):
return self.length
Since I know the chunk size a priori, I know the length of my dataset, and want to use the length to keep track of how far I am through an epoch, and because it plays nicely with Pytorch Ignite. In my opinion, this should be left to the user to choose to implement.
Metadata
Metadata
Assignees
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module