Skip to content

Conversation

@dmitriy-serdyuk
Copy link
Contributor

Fixes #8652, fixes #8957

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Jun 27, 2018

@pytorchbot retest this please

@alsrgv
Copy link

alsrgv commented Aug 3, 2018

This change makes DistributedSampler very slow. To repro, take MNIST example and add DistributedSampler:

    train_dataset = \
        datasets.MNIST('../data', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ]))
    train_loader = torch.utils.data.DataLoader(train_dataset,
        batch_size=args.batch_size, sampler=torch.utils.data.distributed.DistributedSampler(train_dataset, num_replicas=1, rank=0), **kwargs)

The reason seems to be that it creates a lot of tensors for each index element. I was able to make it fast again by adding the cast to int.

cc @dmitriy-serdyuk, @apaszke

@fmassa
Copy link
Member

fmassa commented Aug 3, 2018

Thanks for the message @alsrgv !
The better fix is to replace this line with something like

indices = torch.randperm(len(self.dataset), generator=g).tolist()

as list(tensor) returns now a list of 0d tensors, while what we want is to return a list with python numbers (obtained via tensor.tolist()).

Can you send a PR fixing this?

@bearpelican
Copy link

Fixes it for me!

alsrgv pushed a commit to alsrgv/pytorch that referenced this pull request Aug 8, 2018
@alsrgv
Copy link

alsrgv commented Aug 8, 2018

@fmassa, thanks for the suggestion. Submitted as #10361

facebook-github-bot pushed a commit that referenced this pull request Aug 9, 2018
Summary: Pull Request resolved: #10361

Differential Revision: D9240798

Pulled By: ezyang

fbshipit-source-id: dc4cfe79612f711bbcff34a147877df6a5f7b89f
PenghuiCheng pushed a commit to PenghuiCheng/pytorch that referenced this pull request Aug 10, 2018
Summary: Pull Request resolved: pytorch#10361

Differential Revision: D9240798

Pulled By: ezyang

fbshipit-source-id: dc4cfe79612f711bbcff34a147877df6a5f7b89f
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
Summary: Pull Request resolved: pytorch#10361

Differential Revision: D9240798

Pulled By: ezyang

fbshipit-source-id: dc4cfe79612f711bbcff34a147877df6a5f7b89f
facebook-github-bot pushed a commit that referenced this pull request Aug 22, 2018
Summary:
Since #8958 was merged, the BatchSampler samples 0d tensors from WeightedRandomSampler instead of integers. It significantly reduces performance. This PR fix it the same way as #10361 fix DistributedSampler.
Pull Request resolved: #10636

Differential Revision: D9423869

Pulled By: zou3519

fbshipit-source-id: f94da2d4cccf70e63beea6cfc3d1230b5610ae44
PenghuiCheng pushed a commit to PenghuiCheng/pytorch that referenced this pull request Sep 11, 2018
Summary:
Since pytorch#8958 was merged, the BatchSampler samples 0d tensors from WeightedRandomSampler instead of integers. It significantly reduces performance. This PR fix it the same way as pytorch#10361 fix DistributedSampler.
Pull Request resolved: pytorch#10636

Differential Revision: D9423869

Pulled By: zou3519

fbshipit-source-id: f94da2d4cccf70e63beea6cfc3d1230b5610ae44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BatchSampler docstring is outdated Batch Sampler and Dataset indexing too restricted

7 participants