Batch Normalization for Multi-GPU / Data Parallelism

Where is the batch normalization implementation for Multi-GPU scenarios? How does one keep track of `mean`, `variance`, `offset` and `scale` in the context of the Multi-GPU example as given in the [CIFAR-10 tutorial](https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py)?

Why is the question on [StackOverflow](http://stackoverflow.com/questions/41819080/how-do-i-use-batch-normalization-in-a-multi-gpu-setting-in-tensorflow) left unanswered for so long?

For all the beauty that it brings with Tensorboard etc.. , it's kinda appalling to see Tensorflow so far behind Torch in terms of its modeling capability. I'd be really glad if someone takes up responsibility and comes up with a decent Batch Normalization implementation for all cases. Even if it is already there, could anyone care enough to make a **good documentation** out of it?

There are so many issues pertaining to batch normalization with Tensorflow. It's important that you guys straighten this out as batch normalization enables super-fast convergence for very deep networks and it is **REALLY** important for modern day deep learning research.

PS: Please spare my outburst. I've been a Torch user for more than a year and I had very high hopes on Tensorflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch Normalization for Multi-GPU / Data Parallelism #7439

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Batch Normalization for Multi-GPU / Data Parallelism #7439

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions