Dataclasses naturally model tree-structured data, but many domains require graph structures where objects reference each other. Currently, there’s no standard way to express “this field references another dataclass” with full type safety.
The Problem
@dataclass
class Network:
cidr: str
@dataclass
class Subnet:
network: ??? # How do we express “reference to a Network”?
cidr: str
Common workarounds each have drawbacks:
-
String identifiers (network_id: str) — no type safety
-
Forward references (network: “Network”) — type checker sees the class, not a reference relationship
-
The class itself (network: type[Network]) — semantically wrong (we want an instance reference, not the class)
A Proposal: Ref[T] and Attr[T, “name”]
I’ve been exploring typed reference markers that work with the existing type system:
from graph_refs import Ref, Attr
@dataclass
class Subnet:
network: Ref[Network] # Reference to a Network
gateway_id: Attr[Gateway, "Id"] # Reference to Gateway's Id attribute
@dataclass
class LoadBalancer:
targets: RefList[Instance] # List of references
This enables:
- Type checkers verify reference targets
- IDE autocomplete for valid targets
- Static dependency graph analysis
- Framework introspection via get_refs() and get_dependencies()
Use Cases
This pattern appears in infrastructure-as-code, configuration management, entity relationships, and workflow systems — anywhere objects form a graph rather than a tree:
@entity
class Order:
customer: Ref[Customer]
items: RefList[Product]
@task
class ProcessData:
depends_on: RefList[Task]
output: Attr[Storage, "Path"]
Implementation
I’ve published two packages exploring this:
- GitHub - lex00/graph-refs: Typed graph references for Python dataclasses — The type markers (Ref, Attr, RefList, RefDict, ContextRef) plus introspection (get_refs, get_dependencies)
- GitHub - lex00/graph-refs-dataclasses: Dataclass runtime machinery for declarative DSLs using the no-parens pattern — Runtime machinery for building DSLs with a “no-parens” pattern where references are expressed as direct class names rather than function calls
The graph-refs-dataclasses repo example demonstrates a complete mini-DSL with custom decorator, dependency ordering, and JSON serialization.
Design Principles
- Zero runtime cost — Type markers have no instance data; value is at development time
- Minimal surface area — Five primitives that compose with existing types
- Compatibility first — Works with get_type_hints(), mypy, pyright, dataclasses
Questions for Discussion
- Is there interest in standardizing vocabulary for inter-dataclass references?
- Are the semantics of Ref[T] vs type[T] clear and useful?
- Should Attr[T, “name”] validate that T has attribute name?
- Could @dataclass_transform (PEP 681) gain parameters to better support reference-aware decorators?
I’d welcome feedback on the concept, API design, or alternative approaches.
I would like to thank @ericvsmith for the foundation here as well, thanks Eric!