Sliding Window, Varying input/output size and Dense, multiscale extraction

[1] enables varying input/output size in order to perform multiscale multiview image processing so as to to bolster classification confidence and to perform localisation and object detection. I wonder if and how could it be implemented in Caffe? 

One possibility would be to set blob sizes to their maximum expected values and then account for the actual input size during computation at each layer. I am not familiar enough with Caffe sources to predict the overhead this approach might cause. I imagine it can lead to redundant memory copying and involved index arithmetic in order to access the right data.

What are other possibilities? I would be happy to PR it should we be able to work out a decent solution.

[1] Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. arXiv:1312.6229 [cs.CV].


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sliding Window, Varying input/output size and Dense, multiscale extraction #189

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sliding Window, Varying input/output size and Dense, multiscale extraction #189

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions