Unify the output pathname of archive reader and extractor#65424
Unify the output pathname of archive reader and extractor#65424ejguan wants to merge 1 commit intogh/ejguan/92/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit de83421 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
@ejguan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Pull Request resolved: #65424 This PR is re-implementation for https://github.com/facebookexternal/torchdata/pull/93 Same PR has landed into torchdata https://github.com/facebookexternal/torchdata/pull/157 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31090447 Pulled By: ejguan fbshipit-source-id: 45af1ad9b24310bebfd6e010f41cff398946ba65
…5932) * Unify the output pathname of archive reader and extractor (#65424) Summary: Pull Request resolved: #65424 This PR is re-implementation for https://github.com/facebookexternal/torchdata/pull/93 Same PR has landed into torchdata https://github.com/facebookexternal/torchdata/pull/157 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31090447 Pulled By: ejguan fbshipit-source-id: 45af1ad9b24310bebfd6e010f41cff398946ba65 * [DatePipe] add deprecation warnings for DataPipes that will solely exist in TorchData (#65827) Summary: Pull Request resolved: #65827 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31272794 Pulled By: NivekT fbshipit-source-id: 8da8266184b4df050422904cbc5fca6d7c3d2e02 * [DataPipe] Fixes an issue where TarArchiveReader closes stream when read into a buffer (#65877) Summary: Pull Request resolved: #65877 Fixes #65808 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31296041 Pulled By: NivekT fbshipit-source-id: cdcad3a333ae9781d6063678a122a128955b0ff4 Co-authored-by: Erjia Guan <[email protected]>
|
I would consider the new behavior a bug or at least a bad UX. Let's say I have the following setup Using This gets even worse if we have nested archives: This will now also result |
It should be |
from torch.utils.data.datapipes.iter import TarArchiveReader, FileLister, FileLoader
dp = FileLister("/tmp/foo")
dp = FileLoader(dp)
dp = TarArchiveReader(dp)
print(tuple(dp))from torch.utils.data.datapipes.iter import TarArchiveReader, FileLister, FileLoader
dp = FileLister("/tmp/foo")
dp = FileLoader(dp)
dp = TarArchiveReader(dp)
dp = TarArchiveReader(dp)
print(tuple(dp))To get the your path, we need the following setup: from torch.utils.data.datapipes.iter import TarArchiveReader, FileLister, FileLoader
dp = FileLister("/tmp/foo")
dp = FileLoader(dp)
dp = TarArchiveReader(dp)
print(tuple(dp))
Nope. Rolling back the changes of this PR, we get the following output for the three setups above:
|
|
I hope this is not something that varies across python version. Let me take a deeper look. |
|
@pmeier Thanks for that. I will send a patch shortly and fix the tests for TorchText |
Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) cc: @Nayef211 Differential Revision: [D31797765](https://our.internmc.facebook.com/intern/diff/D31797765) [ghstack-poisoned]
Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) cc: @Nayef211 Differential Revision: [D31797765](https://our.internmc.facebook.com/intern/diff/D31797765) [ghstack-poisoned]
Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) cc: @Nayef211 Differential Revision: [D31797765](https://our.internmc.facebook.com/intern/diff/D31797765) [ghstack-poisoned]
Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) cc: @Nayef211 Differential Revision: [D31797765](https://our.internmc.facebook.com/intern/diff/D31797765) [ghstack-poisoned]
Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) cc: @Nayef211 Differential Revision: [D31797765](https://our.internmc.facebook.com/intern/diff/D31797765) [ghstack-poisoned]
Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) cc: @Nayef211 Differential Revision: [D31797765](https://our.internmc.facebook.com/intern/diff/D31797765) [ghstack-poisoned]
Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) cc: @Nayef211 Differential Revision: [D31797765](https://our.internmc.facebook.com/intern/diff/D31797765) [ghstack-poisoned]
Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) cc: @Nayef211 Differential Revision: [D31797765](https://our.internmc.facebook.com/intern/diff/D31797765) [ghstack-poisoned]
Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) cc: @Nayef211 Differential Revision: [D31797765](https://our.internmc.facebook.com/intern/diff/D31797765) [ghstack-poisoned]
Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) cc: @Nayef211 Differential Revision: [D31797765](https://our.internmc.facebook.com/intern/diff/D31797765) [ghstack-poisoned]
Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) cc: @Nayef211 Differential Revision: [D31797765](https://our.internmc.facebook.com/intern/diff/D31797765) [ghstack-poisoned]
Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) cc: @Nayef211 Differential Revision: [D31797765](https://our.internmc.facebook.com/intern/diff/D31797765) [ghstack-poisoned]
Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) cc: @Nayef211 Differential Revision: [D31797765](https://our.internmc.facebook.com/intern/diff/D31797765) [ghstack-poisoned]
Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) cc: @Nayef211 Differential Revision: [D31797765](https://our.internmc.facebook.com/intern/diff/D31797765) [ghstack-poisoned]
Summary: Pull Request resolved: #73 Previous implementation of `ArchiveReader` has a bug. Please take a reference from pytorch/pytorch#65424 (comment) Reviewed By: NivekT Differential Revision: D31797765 fbshipit-source-id: 494e1a49b43d5a846de971a67586089e6d7ebafc
Stack from ghstack:
This PR is re-implementation for https://github.com/facebookexternal/torchdata/pull/93
Same PR has landed into torchdata https://github.com/facebookexternal/torchdata/pull/157
Differential Revision: D31090447