DEFEATnet -- A Deep Conventional Image Representation for Image Classification
IEEE Transactions on Circuits and Systems for Video Technology, 2015
ABSTRACT To study underlying possibilities for the successes of conventional image representation... more ABSTRACT To study underlying possibilities for the successes of conventional image representation and deep neural networks in image representation, we propose a DEep FEATure extraction, encoding, and pooling network (DEFEATnet) architecture, which is a marriage between conventional image representation approaches and deep neural networks. Particularly in DEFEATnet, each layer consists of three components: feature extraction, feature encoding, and pooling. The primary advantage of DEFATnet is two-fold: i) It consolidates the prior knowledge (e.g., translation invariance) from extracting, encoding and pooling handcrafted features, as in the conventional feature representation approaches; ii) It represents the object parts at different granularities by gradually increasing the local receptive fields in different layers, as in deep neural networks. Moreover, DEFEATnet is a generalized framework that can readily incorporate all types of local features as well as all kinds of well-designed feature encoding and pooling methods. Since prior knowledge is preserved in DEFEATnet, it is especially useful for image representation on small/medium size datasets where deep neural networks usually fail due to the lack of sufficient training data. Promising experimental results clearly show that DEFEATnets outperform shallow conventional image representation approaches by a large margin when the same type of features, feature encoding and pooling are used. The extensive experiments also demonstrate the effectiveness of the deep architecture of our DEFEATnet in improving the robustness for image presentation.
Uploads
Papers by Ivor Tsang