-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PReLU is slow #2406
Copy link
Copy link
Closed
Labels
Description
Did some benchmarking and found that Prelu is slower than conv layer in backpropagation
I0502 15:58:03.235862 19023 caffe.cpp:276] conv1 forward: 23.0175 ms.
I0502 15:58:03.235872 19023 caffe.cpp:279] conv1 backward: 26.5506 ms.
I0502 15:58:03.235882 19023 caffe.cpp:276] prelu1 forward: 5.75406 ms.
I0502 15:58:03.235893 19023 caffe.cpp:279] prelu1 backward: 111.537 ms.
A small change in the prelu_layer.cu speeds up prelu 3-4 times
Change: CAFFE_GET_BLOCKS(count)
PReLUParamBackward<Dtype><<<CAFFE_GET_BLOCKS(count),
CAFFE_CUDA_NUM_THREADS>>>(
cdim, top_diff + top[0]->offset(n),
bottom_data + bottom[0]->offset(n), multiplier_.mutable_gpu_diff());To: CAFFE_GET_BLOCKS(cdim)
PReLUParamBackward<Dtype><<<CAFFE_GET_BLOCKS(cdim),
CAFFE_CUDA_NUM_THREADS>>>(
cdim, top_diff + top[0]->offset(n),
bottom_data + bottom[0]->offset(n), multiplier_.mutable_gpu_diff());I ran the caffe test after this and it passed.
Other than this I think the rest of the code can be optimized further because we do backpropagation 1 output at a time. Is someone working on that?
Reactions are currently unavailable