alexnet example

The Alexnet demo is a timing benchmark for AlexNet inference. The algorithm was developed by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton and won a NIPS contest a number of years back. All three worked at University of Toronto and later joined the Google Research team. The paper Imagenet classification with deep convolutional neural networks is highly cited with around 6000 citations for a single paper. For comparison average papers in computer science are usually cited around 10 times.

The code was released and developed as high-performance C++/CUDA implementation of convolutional neural networks cuda-convent and cuda-convnet2 which also supports multi-GPU setups.

The code below shows examples and outputs from a CPU only experiment (no vGPU support in VMs). A detailed discussion with Tesla K40 and Geforce Titan X results can be found at the dedicated TF benchmark page.

We can run a very short CPU only demo with 10x10 to see how it performs. The commands below come from a docker run. Inside the docker installation we can then run.

cd tensorflow
python tensorflow/models/image/alexnet/alexnet_benchmark.py --batch_size 10  --num_batches 10

root@fb729273837c:/tensorflow# python tensorflow/models/image/alexnet/alexnet_benchmark.py --batch_size 10  --num_batches 10
conv1   [10, 55, 55, 64]
pool1   [10, 27, 27, 64]
conv2   [10, 27, 27, 192]
pool2   [10, 13, 13, 192]
conv3   [10, 13, 13, 384]
conv4   [10, 13, 13, 256]
conv5   [10, 13, 13, 256]
pool5   [10, 6, 6, 256]
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 8
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 8
2015-11-26 20:50:38.064129: Forward across 10 steps, 0.170 +/- 0.058 sec / batch
2015-11-26 20:50:54.550445: Forward-backward across 10 steps, 0.746 +/- 0.250 sec / batch

The longer demo which also has reference data for GPUs can be invoked in the docker image by changing the batchsize and batch replications. We call it with commandline options --batch_size 128 --num_batches 100

python tensorflow/models/image/alexnet/alexnet_benchmark.py --batch_size 128  --num_batches 100

or simply

python tensorflow/models/image/alexnet/alexnet_benchmark.py

which will create the output

root@fb729273837c:/tensorflow# python tensorflow/models/image/alexnet/alexnet_benchmark.py --batch_size 128  --num_batches 100
conv1   [128, 55, 55, 64]
pool1   [128, 27, 27, 64]
conv2   [128, 27, 27, 192]
pool2   [128, 13, 13, 192]
conv3   [128, 13, 13, 384]
conv4   [128, 13, 13, 256]
conv5   [128, 13, 13, 256]
pool5   [128, 6, 6, 256]
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 8
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 8
2015-11-26 21:24:30.956213: step 10, duration = 2.271
2015-11-26 21:24:53.889235: step 20, duration = 2.324
2015-11-26 21:25:17.470194: step 30, duration = 2.380
2015-11-26 21:25:40.652514: step 40, duration = 2.396
2015-11-26 21:26:03.827588: step 50, duration = 2.367
2015-11-26 21:26:26.532876: step 60, duration = 2.177
2015-11-26 21:26:49.768012: step 70, duration = 2.293
2015-11-26 21:27:12.705549: step 80, duration = 2.270
2015-11-26 21:27:35.671724: step 90, duration = 2.283
2015-11-26 21:27:56.601975: Forward across 100 steps, 2.285 +/- 0.239 sec / batch
2015-11-26 21:31:22.048707: step 10, duration = 9.731
2015-11-26 21:32:59.365643: step 20, duration = 9.663
2015-11-26 21:34:36.980601: step 30, duration = 9.600
2015-11-26 21:36:15.138785: step 40, duration = 10.290
2015-11-26 21:37:53.282469: step 50, duration = 10.078
2015-11-26 21:39:32.931147: step 60, duration = 9.891
2015-11-26 21:41:12.224409: step 70, duration = 9.843
2015-11-26 21:42:50.520298: step 80, duration = 9.710
2015-11-26 21:44:27.532264: step 90, duration = 9.859
2015-11-26 21:45:56.325980: Forward-backward across 100 steps, 9.721 +/- 0.993 sec / batch
root@fb729273837c:/tensorflow#

The TensorFlow AlexNet forward run is the most efficient from all three TF benchmarks (MNIST, CIFAR10 and AlexNet). Up to 90% CPU utilization as can be seen below. The current benchmark has a very small memory footprint and is synthetic, hence no real external image data is read and processed.

tensorflow-alexnet-forward-cpu-only

The TensorFlow AlexNet forward-backward run is much slower and CPU efficiency drops to 30% which also results in a 30-fold performance drop.

tensorflow-alexnet-forward-backward-cpu-only

For some memory intensive benchmark we can use a larger batch size which surely will occupy more RAM on the CPU and GPU. If swap-memory is used the efficiency will drop to almost zero, so here PCI based SSDs, SSD RAIDS or large TByte RAMDisks can help.

python tensorflow/models/image/alexnet/alexnet_benchmark.py --batch_size 512  --num_batches 3

LINKS

AlexNet paper - Imagenet classification with deep convolutional neural networks
Parallel GPUs - One weird trick for parallelizing convolutional neural networks by Alex Krizhevsky
Convnet bench - Convnet benchmarks by Soumith Chintala
CCV - another convolutional network library
Jetson TK1 - Nvidia Jetson TK1 Reviewed
JTK1 power - power draw of Jetson TK1 boards
Nvidia - ACCELERATED COMPUTING: THE PATH FORWARD - by Jen-Hsun Huang

tensorflow Home
tensorflow Overview
tensorflow Setup
tensorflow MNIST example
tensorflow Cifar10 example
tensorflow AlexNet example
tensorflow Word2vec example
tensorflow General examples
tensorflow Benchmarks
tensorflow TensorBoard
tensorflow Data-scientists
tensorflow Links & Blogs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alexnet example

LINKS

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally