[BUG] Schema of DeepCompile C++ ops does not reflect aliases

**Describe the bug**

Today the schema of DeepCompile C++ ops is defined as follows. Specifically all ops that return tensors are declared as returning a fresh-new one, which is incorrect as `wait_allgather()` and `release_param` actually return their first tensor inputs without copying them.

```
    m.def("allgather_param(Tensor a, int graph_id, int id) -> Tensor");
    m.def("prefetch_params_fused(int graph_id, Tensor[] params, int[] ids) -> ()");
    m.def("wait_allgather(Tensor a, int graph_id, int id) -> Tensor");
    m.def("release_param(Tensor a, int graph_id, int id, int n_users) -> Tensor");
    m.def("reduce_grad(Tensor a, int graph_id, int id) -> Tensor");
    m.def("free_tensors(Tensor[] a) -> ()");
    m.def("offload_tensor(Tensor a, int id, int id) -> Tensor");
    m.def("reload_tensor(Tensor a, int id, int id) -> Tensor");
    m.def("wait_offload(Tensor a, int id, int id) -> Tensor");
    m.def("wait_reload(Tensor a, int id, int id) -> Tensor");
    m.def("offload_parameter(Tensor a, int id, int id) -> ()");
    m.def("reload_parameter(Tensor a, int id, int id) -> ()");
    m.def("end_backward(int graph_id) -> ()");
```

This causes the tensors to be freed, earlier than expected, by the additional tensor `del` statements in inductor-generated code, and eventually causes training losses to become NaN.

**To Reproduce**
1. Run `deepspeed --num_gpus=N openvla-like.py -c`

[openvla-like.py](https://gist.github.com/eternalNight/3c2cf8c703f1e9e7742d3b7f9e1edae3)

**Expected behavior**
With DeepCompile, loss curves of eager & inductor backends match.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Schema of DeepCompile C++ ops does not reflect aliases #7596

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Schema of DeepCompile C++ ops does not reflect aliases #7596

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions