Skip to content

[types] torch.utils.data.{Dataset, Sampler} are not Sized #47055

@alanhdu

Description

@alanhdu

🐛 Bug

The base classes for torch.utils.data.Dataset and torch.utils.data.Sampler do not define a __len__ method, which means that downstream code relying on PyTorch (with typehints) cannot assume that the class is supposed to have a __len__ method (adhering to the "Sized" protocol).

There's a fairly lengthy comment about why this is the case:

# NOTE [ Lack of Default `__len__` in Python Abstract Base Classes ]
#
# Many times we have an abstract class representing a collection/iterable of
# data, e.g., `torch.utils.data.Sampler`, with its subclasses optionally
# implementing a `__len__` method. In such cases, we must make sure to not
# provide a default implementation, because both straightforward default
# implementations have their issues:
#
# + `return NotImplemented`:
# Calling `len(subclass_instance)` raises:
# TypeError: 'NotImplementedType' object cannot be interpreted as an integer
#
# + `raise NotImplementedError()`:
# This prevents triggering some fallback behavior. E.g., the built-in
# `list(X)` tries to call `len(X)` first, and executes a different code
# path if the method is not found or `NotImplemented` is returned, while
# raising an `NotImplementedError` will propagate and and make the call
# fail where it could have use `__iter__` to complete the call.
#
# Thus, the only two sensible things to do are
#
# + **not** provide a default `__len__`.
#
# + raise a `TypeError` instead, which is what Python uses when users call
# a method that is not defined on an object.
# (@ssnl verifies that this works on at least Python 3.7.)

I admit that I'm not quite sure what this comment means -- in my small explorations, I didn't encounter either problem (e.g. subclass instances work just fine when returning a NotImplemented). That said, I'd like to offer two additional potential solutions:

  • Make these classes abc.ABCs and make __len__ an abstractmethod:
@abc.abstractmethod
def __len__(self) -> int:
    ...
  • Make the base classes inherit from typing.Sized.

cc @VitalyFedyunin @ejguan

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNot as big of a feature, but technically not a bug. Should be easy to fixmodule: datatorch.utils.dataneeds researchWe need to decide whether or not this merits inclusion, based on research worldtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions