Skip to content

Identify the critical parts of computation time in GPU mode #102

@kloudkl

Description

@kloudkl

There are three motivations to do this.

First, pull #99 referenced the benchmark results of pull #85. As noted in the latter, the experiments conducted in GPU mode was not very accurate because the batch size is set to 1 due to limited memory of the GPU in question. This severely reduced the data throughput and probably distorted the layer wise distribution of computation time. To make more fair comparisons, new benchmark should use devices with bigger memory.

The second objective is to compare and analyze the distributions of computation time of Caffe[1] and DeCAF[2]. During training on the ImageNet dataset, nearly 60% percent of the computation time was spent on the last three fully connected layers in DeCAF which can only run CPU. It is not necessarily the case for Caffe especially in GPU mode.

The third more practical purpose is to help future optimization efforts avoid the root of all evil (#81). This relates to the first motivation and is of the greatest value among the three.

[1] Yangqing Jia. Caffe: An Open Source Convolutional Architecture for Fast Feature Embedding. http://caffe.berkeleyvision.org/. 2013.
[2] Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. arXiv:1310.1531 [cs.CV]. 2013.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions