Parallelize Forward / Backward by Depth

Forward and Backward are done in sequence by layer ID at the moment. In principle, all Forward / Backward steps at the same depth in the DAG can be executed in parallel.

In DAG models where single layer operations do not saturate the host / device, this should improve performance.

As I understand it, this would be done by batch cuBLAS and [streams](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#asynchronous-concurrent-execution) for parallel kernel execution at each depth in the model.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize Forward / Backward by Depth #547

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parallelize Forward / Backward by Depth #547

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions