In GPU mode with conv_mode: LOWERED_CCNMM , we need to first remove all-zero columns and rows in the feature map matrix col_buffer_ . This concatenation process is temporally using the corresponding CPU routine.
We plan to substitute it with a GPU routine. Please pull request if anyone implements this.
Code branch: https://github.com/wenwei202/caffe/tree/scnn