Typed Graph References for Dataclasses

Dataclasses naturally model tree-structured data, but many domains require graph structures where objects reference each other. Currently, there’s no standard way to express “this field references another dataclass” with full type safety.

The Problem

@dataclass
class Network:
    cidr: str

@dataclass
class Subnet:
    network: ???  # How do we express “reference to a Network”?
    cidr: str

Common workarounds each have drawbacks:

  • String identifiers (network_id: str) — no type safety

  • Forward references (network: “Network”) — type checker sees the class, not a reference relationship

  • The class itself (network: type[Network]) — semantically wrong (we want an instance reference, not the class)

A Proposal: Ref[T] and Attr[T, “name”]

I’ve been exploring typed reference markers that work with the existing type system:

  from graph_refs import Ref, Attr

  @dataclass
  class Subnet:
      network: Ref[Network]           # Reference to a Network
      gateway_id: Attr[Gateway, "Id"] # Reference to Gateway's Id attribute

  @dataclass
  class LoadBalancer:
      targets: RefList[Instance]      # List of references

This enables:

  • Type checkers verify reference targets
  • IDE autocomplete for valid targets
  • Static dependency graph analysis
  • Framework introspection via get_refs() and get_dependencies()

Use Cases

This pattern appears in infrastructure-as-code, configuration management, entity relationships, and workflow systems — anywhere objects form a graph rather than a tree:

  @entity
  class Order:
      customer: Ref[Customer]
      items: RefList[Product]

  @task
  class ProcessData:
      depends_on: RefList[Task]
      output: Attr[Storage, "Path"]

Implementation

I’ve published two packages exploring this:

The graph-refs-dataclasses repo example demonstrates a complete mini-DSL with custom decorator, dependency ordering, and JSON serialization.

Design Principles

  • Zero runtime cost — Type markers have no instance data; value is at development time
  • Minimal surface area — Five primitives that compose with existing types
  • Compatibility first — Works with get_type_hints(), mypy, pyright, dataclasses

Questions for Discussion

  1. Is there interest in standardizing vocabulary for inter-dataclass references?
  2. Are the semantics of Ref[T] vs type[T] clear and useful?
  3. Should Attr[T, “name”] validate that T has attribute name?
  4. Could @dataclass_transform (PEP 681) gain parameters to better support reference-aware decorators?

I’d welcome feedback on the concept, API design, or alternative approaches.

I would like to thank @ericvsmith for the foundation here as well, thanks Eric!

You’ll have to elaborate on that. How exactly does the type checker treat that’s different from just ?

Ref[Network] doesn’t behave ( much ) differently from Network.

The value is for frameworks to inspect type hints and treat Ref fields differently such as serializing as references or building dependency graphs. It is more convention than a type system feature.

Should affect type checking somehow, or is the introspection use case enough?

I believe there’s already a mechanism for attaching non-type metadata to type hints. It sounds like it would be better to use that for these kinds of purposes rather than add a new type notation that type checkers would need to be taught about, even if just to ignore it.

Every variable is a reference in python, so what is it that you’re trying to achieve that network: Network doesn’t?

2 Likes

Thanks @gcewing , I was able to use Annotated to achieve what I need. I deleted graph-refs and graph-refs-dataclasses.

@Eneg Annotated[T, Ref()] is just semantic distinction between a reference being a class, or an actual dependency reference. This only matters if it helps your implementation, in my case it does.