Skip to content

[proposal] "Name" string attribute for modules, parameters, buffers, tensors for more pleasant debugging (including distributed FSDP2 logging) #104247

@vadimkantorov

Description

@vadimkantorov

🚀 The feature, motivation and pitch

This is useful for debugging complex tree-structured models to be able to .name and understand where a module is found within the whole model tree

This idea is currently used only for parameters within state_dict() serialization. I suggest that enabling .name attribute/property would be useful for debugging and various formatting/debug-printing in the general context. Such .name could be used in __str__ implementations of modules and tensors, could be used for creating easier-visualizable ONNX graphs. Access to module name might also be useful for various finer-grained logic hacks in module hooks (although maybe not the best coding practice in all cases - but for hacks might be okay).

I propose to:

  1. introduce a .name property (backed by ._name attribute which may not exist if we wrongly torch.load and old model file) or just a .name attribute if the deserializing-old-module-objects-without-some-attributes is not a problem. It should be an empty string by default.
  2. introduce an instance .set_names_recursively() (modulo bikeshedding) method on torch.nn.Module which would go around and produce names similar to what's now found in state_dict() keys formatting

I also propose to support setting/getting such attribute for any tensor. It's understood that most tensors won't have it assigned and the user would need to set it manually for any usefulness, but it's still good (e.g. can be set for tensors-to-be-returned-from-a-function). Propagation of such tensor names is a more complex task, and I propose that it's out of scope as the feature is already useful if only manual tensor names are supported (for the cases useful for debugging). Alternatively a tensor.name() / tensor.name(value) could be maybe used instead of the attribute so that the setter would return self for fluency and convenience, so that one can write return (x + 1).name("mytensorplus1") - although it would mean that the name(...) method would return either a string or a Tensor/sth else, depending on its argument.

Another consideration is that currently ONNX exporter already produces some automatic module/op names, and some code might have taken some dependencies on this. So the ONNX naming behavior should be unchanged unless explicitly .set_names_recursively() was called (so empty names would be overridden by existing automatic names)

Also probably if .names are set, state_dict() should use them. Another consideration is that some people are probably monkey-patching .name attributes on tensors/modules themselves, so if state_dict starts using them, it might be surprising, so maybe some attribute name bikeshedding is required

e.g. https://github.com/facebookresearch/fvcore/blob/main/fvcore/nn/jit_analysis.py is doing sth similar wrt to module names

cc @ezyang @albanD

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureA request for a proper, new feature.needs designWe want to add this feature but we need to figure out how firsttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions