Backpropagation can work on arbitrary directed acyclic networks. Does the current implementation support a blob being used by two different layers. I see that each layer is initializing bottom_diff to zero and then accumulating gradient into it, this would override the gradient contributed by a second layer acting on the same blob. Or am I missing something? If not, is simply changing = to += a good way of solving the problem?