-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Open
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixmodule: datatorch.utils.datatorch.utils.dataneeds researchWe need to decide whether or not this merits inclusion, based on research worldWe need to decide whether or not this merits inclusion, based on research worldtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Bug
The base classes for torch.utils.data.Dataset and torch.utils.data.Sampler do not define a __len__ method, which means that downstream code relying on PyTorch (with typehints) cannot assume that the class is supposed to have a __len__ method (adhering to the "Sized" protocol).
There's a fairly lengthy comment about why this is the case:
pytorch/torch/utils/data/sampler.py
Lines 27 to 52 in dfdc1db
| # NOTE [ Lack of Default `__len__` in Python Abstract Base Classes ] | |
| # | |
| # Many times we have an abstract class representing a collection/iterable of | |
| # data, e.g., `torch.utils.data.Sampler`, with its subclasses optionally | |
| # implementing a `__len__` method. In such cases, we must make sure to not | |
| # provide a default implementation, because both straightforward default | |
| # implementations have their issues: | |
| # | |
| # + `return NotImplemented`: | |
| # Calling `len(subclass_instance)` raises: | |
| # TypeError: 'NotImplementedType' object cannot be interpreted as an integer | |
| # | |
| # + `raise NotImplementedError()`: | |
| # This prevents triggering some fallback behavior. E.g., the built-in | |
| # `list(X)` tries to call `len(X)` first, and executes a different code | |
| # path if the method is not found or `NotImplemented` is returned, while | |
| # raising an `NotImplementedError` will propagate and and make the call | |
| # fail where it could have use `__iter__` to complete the call. | |
| # | |
| # Thus, the only two sensible things to do are | |
| # | |
| # + **not** provide a default `__len__`. | |
| # | |
| # + raise a `TypeError` instead, which is what Python uses when users call | |
| # a method that is not defined on an object. | |
| # (@ssnl verifies that this works on at least Python 3.7.) |
I admit that I'm not quite sure what this comment means -- in my small explorations, I didn't encounter either problem (e.g. subclass instances work just fine when returning a NotImplemented). That said, I'd like to offer two additional potential solutions:
- Make these classes
abc.ABCs and make__len__an abstractmethod:
@abc.abstractmethod
def __len__(self) -> int:
...- Make the base classes inherit from
typing.Sized.
edran, aaossa, alexgiving, lucasbasquerotto, nairbv and 2 more
Metadata
Metadata
Assignees
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixmodule: datatorch.utils.datatorch.utils.dataneeds researchWe need to decide whether or not this merits inclusion, based on research worldWe need to decide whether or not this merits inclusion, based on research worldtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module