Synchronize mean/variance of batch normalization when training with multiple GPUs 

Hi everyone,

Recently, I find that when training with multiple GPUs using the `DataParallelTable`, the performance is slightly worse than training with a single GPU. It seems that the problem is caused by that the mean and variance of batch normalization are not synchronized across different GPUs. Does anybody know the solution for this issue? Thank you very much.