I just wanted to point out that there are probably many people out there who are interested in having a forward GPU computation for SigmoidCrossEntropyLossLayer on GPU. Right now, so much time is spent shuffling data back and forth, that much of the improvement from using GPU at all is lost.
I would love to do this myself, except that I know very little: things like gpu memory allocations/deallocations, gpu loops, or not knowing diff_ responsibilities trouble me.
A PR addressing this would be an obvious improvement to caffe.
@slayton58
Obviously your time is very valuable, but could you take a look when a chance arises?