In general, image restoration involves mapping from low quality images to their high-quality coun... more In general, image restoration involves mapping from low quality images to their high-quality counterparts. Such optimal mapping is usually non-linear and learnable by machine learning. Recently, deep convolutional neural networks have proven promising for such learning processing. It is desirable for an image processing network to support well with three vital tasks, namely, super-resolution, denoising, and deblocking. It is commonly recognized that these tasks have strong correlations. Therefore, it is imperative to harness the inter-task correlations. To this end, we propose the cross-scale residual network to exploit scale-related features and the inter-task correlations among the three tasks. The proposed network can extract multiple spatial scale features and establish multiple temporal feature reusage. Our experiments show that the proposed approach outperforms state-of-the-art methods in both quantitative and qualitative evaluations for multiple image restoration tasks.
The choice of activation function is essential for building state-of-the-art neural networks. At ... more The choice of activation function is essential for building state-of-the-art neural networks. At present, the most widely-used activation function with effectiveness is ReLU. However, ReLU has the problems of non-zero mean, negative missing, and unbounded output, thus it has potential disadvantages in the optimization process. To this end, it is desirable to propose a novel activation function to overcome the above three deficiencies. This paper proposes a new nonlinear activation function, namely "Soft-Root-Sign" (SRS), which is smooth, non-monotonic, and bounded. In contrast to ReLU, SRS adaptively adjusts a pair of independent trainable parameters to provide a zeromean output, resulting in better generalization performance and faster learning speed. It also prevents the distribution of the output from being scattered in the non-negative real number space and corrects it to the positive real number space, making it more compatible with batch normalization (BN) and less sensitive to initialization. In addition, the bounded property of SRS distinguishes itself from most state-of-the-art activation functions. We evaluated SRS on deep networks applied to a variety of tasks, including image classification, machine translation, and generative modeling. Experimental results show that the proposed activation function SRS is superior to ReLU and other state-of-the-art nonlinearities. Ablation study further verifies its compatibility with BN and its adaptability for different initialization.
Temporal action localization in untrimmed videos is an important but difficult task. Difficulties... more Temporal action localization in untrimmed videos is an important but difficult task. Difficulties are encountered in the application of existing methods when modeling temporal structures of videos. In the present study, we developed a novel method, referred to as Gemini Network, for effective modeling of temporal structures and achieving high-performance temporal action localization. The significant improvements afforded by the proposed method are attributable to three major factors. First, the developed network utilizes two subnets for effective modeling of temporal structures. Second, three parallel feature extraction pipelines are used to prevent interference between the extractions of different stage features. Third, the proposed method utilizes auxiliary supervision, with the auxiliary classifier losses affording additional constraints for improving the modeling capability of the network. As a demonstration of its effectiveness, the Gemini Network was used to achieve state-of-the-art temporal action localization performance on two challenging datasets, namely, THUMOS14 and ActivityNet.
Temporal action localization in untrimmed videos is an important but difficult task. Difficulties... more Temporal action localization in untrimmed videos is an important but difficult task. Difficulties are encountered in the application of existing methods when modeling temporal structures of videos. In the present study, we developed a novel method, referred to as Gemini Network, for effective modeling of temporal structures and achieving high-performance temporal action localization. The significant improvements afforded by the proposed method are attributable to three major factors. First, the developed network utilizes two subnets for effective modeling of temporal structures. Second, three parallel feature extraction pipelines are used to prevent interference between the extractions of different stage features. Third, the proposed method utilizes auxiliary supervision, with the auxiliary classifier losses affording additional constraints for improving the modeling capability of the network. As a demonstration of its effectiveness, the Gemini Network was used to achieve state-of-the-art temporal action localization performance on two challenging datasets, namely, THUMOS14 and ActivityNet.
2018 24th International Conference on Pattern Recognition (ICPR), 2018
We investigate generative adversarial networks as an effective solution to the crowd counting pro... more We investigate generative adversarial networks as an effective solution to the crowd counting problem. These networks not only learn the mapping from crowd image to corresponding density map, but also learn a loss function to train this mapping. There are many challenges to the task of crowd counting, such as severe occlusions in extremely dense crowd scenes, perspective distortion, and high visual similarity between pedestrians and background elements. To address these problems, we proposed multi-scale generative adversarial network to generate highquality crowd density maps of arbitrary crowd density scenes. We utilized the adversarial loss from discriminator to improve the quality of the estimated density map, which is critical to accurately predict crowd counts. The proposed multi-scale generator can extract multiple hierarchy features from the crowd image. The results showed that the proposed method provided better performance compared to current state-of-the-art methods .
IEEE International Conference on Acoustics Speech and Signal Processing, 2002
The performance of telephone-based speaker verification systems can be severely degraded by the a... more The performance of telephone-based speaker verification systems can be severely degraded by the acoustic mismatch caused by telephone handsets. This paper proposes to combine a handset selector with stochastic feature transformation to reduce the mismatch. Specifically, a GMM-based handset selector is trained to identify the most likely handset used by the claimants, and then handset-specific stochastic feature transformations are applied to the distorted feature vectors. To overcome the non-linear distortion introduced by telephone handsets, a 2nd-order stochastic feature transformation is proposed. Estimation algorithms based on the stochastic matching technique and the EM algorithm are derived. Experimental results based on 150 speakers of the HTIMIT corpus show that the handset selector is able to identify the handsets accurately (98.3%), and that both linear and non-linear transformation reduce the error rate significantly (from 12.37% to 5.49%).
This study presents a divide-and-conquer (DC) approach based on feature space decomposition for c... more This study presents a divide-and-conquer (DC) approach based on feature space decomposition for classification. When large-scale datasets are present, typical approaches usually employed truncated kernel methods on the feature space or DC approaches on the sample space. However, this did not guarantee separability between classes, owing to overfitting. To overcome such problems, this work proposes a novel DC approach on feature spaces consisting of three steps. Firstly, we divide the feature space into several subspaces using the decomposition method proposed in this paper. Subsequently, these feature subspaces are sent into individual local classifiers for training. Finally, the outcomes of local classifiers are fused together to generate the final classification results. Experiments on large-scale datasets are carried out for performance evaluation. The results show that the error rates of the proposed DC method decreased comparing with the state-of-the-art fast SVM solvers, e.g., reducing error rates by 10.53% and 7.53% on RCV1 and covtype datasets respectively.
The generalization performance of SVM-type classifiers severely suffers from the 'curse of dimens... more The generalization performance of SVM-type classifiers severely suffers from the 'curse of dimensionality'. For some real world applications, the dimensionality of the measurement is sometimes significantly larger compared to the amount of training data samples available. In this paper, a classification scheme is proposed and compared with existing techniques for such scenarios. The proposed scheme includes two parts: (i) feature selection and transformation based on Fisher discriminant criteria and (ii) a hybrid classifier combining Kernel Ridge Regression with Support Vector Machine to predict the label of the data. The first part is named Successively Orthogonal Discriminant Analysis (SODA), which is applied after Fisher score based feature selection as a preliminary processing for dimensionality reduction. At this step, SODA maximizes the ratio of between-class-scatter and within-class-scatter to obtain an orthogonal transformation matrix which maps the features to a new low dimensional feature space where the class separability is maximized. The techniques are tested on high dimensional data from a microwave measurements system and are compared with existing techniques.
Recurrent neural networks have become popular models for system identification and time series pr... more Recurrent neural networks have become popular models for system identification and time series prediction. Nonlinear autoregressive models with exogenous inputs (NARX) neural network models are a popular subclass of recurrent networks and have been used in many applications. Although embedded memory can be found in all recurrent network models, it is particularly prominent in NARX models. We show that using intelligent memory order selection through pruning and good initial heuristics significantly improves the generalization and predictive performance of these nonlinear systems on problems as diverse as grammatical inference and time series prediction.
Convolutional neural networks (CNNs) are inherently suffering from massively redundant computatio... more Convolutional neural networks (CNNs) are inherently suffering from massively redundant computation (FLOPs) due to the dense connection pattern between feature maps and convolution kernels. Recent research has investigated the sparse relationship between channels, however, they ignored the spatial relationship within a channel. In this paper, we present a novel convolutional operator, namely comb convolution, to exploit the intra-channel sparse relationship among neurons. The proposed convolutional operator eliminates nearly 50% of connections by inserting uniform mappings into standard convolutions and removing about half of spatial connections in convolutional layer. Notably, our work is orthogonal and complementary to existing methods that reduce channel-wise redundancy. Thus, it has great potential to further increase efficiency through integrating the comb convolution to existing architectures. Experimental results demonstrate that by simply replacing standard convolutions with comb convolutions on state-of-the-art CNN architectures (e.g., VGGNets, Xception and SE-Net), we can achieve 50% FLOPs reduction while still maintaining the accuracy.
Group convolution works well with many deep convolutional neural networks (CNNs) that can effecti... more Group convolution works well with many deep convolutional neural networks (CNNs) that can effectively compress the model by reducing the number of parameters and computational cost. Using this operation, feature maps of different group cannot communicate, which restricts their representation capability. To address this issue, in this work, we propose a novel operation named Hierarchical Group Convolution (HGC) for creating computationally efficient neural networks. Different from standard group convolution which blocks the inter-group information exchange and induces the severe performance degradation, HGC can hierarchically fuse the feature maps from each group and leverage the inter-group information effectively. Taking advantage of the proposed method, we introduce a family of compact networks called HGCNets. Compared to networks using standard group convolution, HGCNets have a huge improvement in accuracy at the same model size and complexity level. Extensive experimental results on the CIFAR dataset demonstrate that HGCNets obtain significant reduction of parameters and computational cost to achieve comparable performance over the prior CNN architectures designed for mobile devices such as MobileNet and ShuffleNet. Preprint. Under review.
The quest for better data analysis and artificial intelligence has lead to more and more data bei... more The quest for better data analysis and artificial intelligence has lead to more and more data being collected and stored. As a consequence, more data are exposed to malicious entities. This paper examines the problem of privacy in machine learning for classification. We utilize the Ridge Discriminant Component Analysis (RDCA) to desensitize data with respect to a privacy label. Based on five experiments, we show that desensitization by RDCA can effectively protect privacy (i.e. low accuracy on the privacy label) with small loss in utility. On HAR and CMU Faces datasets, the use of desensitized data results in random guess level accuracies for privacy at a cost of 5.14% and 0.04%, on average, drop in the utility accuracies. For Semeion Handwritten Digit dataset, accuracies of the privacy-sensitive digits are almost zero, while the accuracies for the utility-relevant digits drop by 7.53% on average. This presents a promising solution to the problem of privacy in machine learning for classification.
Recently, differentiable neural architecture search methods significantly reduce the search cost ... more Recently, differentiable neural architecture search methods significantly reduce the search cost by constructing a super network and relax the architecture representation by assigning architecture weights to the candidate operations. All the existing methods determine the importance of each operation directly by architecture weights. However, architecture weights cannot accurately reflect the importance of each operation; that is, the operation with the highest weight might not related to the best performance. To alleviate this deficiency, we propose a simple yet effective solution to neural architecture search, termed as exploiting operation importance for effective neural architecture search (EoiNAS), in which a new indicator is proposed to fully exploit the operation importance and guide the model search. Based on this new indicator, we propose a gradual operation pruning strategy to further improve the search efficiency and accuracy. Experimental results have demonstrated the effectiveness of the proposed method. Specifically, we achieve an error rate of 2.50% on CIFAR-10, which significantly outperforms state-of-the-art methods. When transferred to ImageNet, it achieves the top-1 error of 25.6%, comparable to the state-of-the-art performance under the mobile setting.
Object detection methods have been applied in several aerial and traffic surveillance application... more Object detection methods have been applied in several aerial and traffic surveillance applications. However, object detection accuracy decreases in low-resolution (LR) images owing to feature loss. To address this problem, we propose a single network, SRODNet, that incorporates both super-resolution (SR) and object detection (OD). First, a modified residual block (MRB) is proposed in the SR to recover the feature information of LR images, and this network was jointly optimized with YOLOv5 to benefit from hierarchical features for small object detection. Moreover, the proposed model focuses on minimizing the computational cost of network optimization. We evaluated the proposed model using standard datasets such as VEDAI-VISIBLE, VEDAI-IR, DOTA, and Korean highway traffic (KoHT), both quantitatively and qualitatively. The experimental results show that the proposed method improves the accuracy of vehicular detection better than other conventional methods.
Group convolution works well with many deep convolutional neural networks (CNNs) that can effecti... more Group convolution works well with many deep convolutional neural networks (CNNs) that can effectively compress the model by reducing the number of parameters and computational cost. Using this operation, feature maps of different group cannot communicate, which restricts their representation capability. To address this issue, in this work, we propose a novel operation named Hierarchical Group Convolution (HGC) for creating computationally efficient neural networks. Different from standard group convolution which blocks the inter-group information exchange and induces the severe performance degradation, HGC can hierarchically fuse the feature maps from each group and leverage the inter-group information effectively. Taking advantage of the proposed method, we introduce a family of compact networks called HGCNets. Compared to networks using standard group convolution, HGCNets have a huge improvement in accuracy at the same model size and complexity level. Extensive experimental result...
Convolutional neural networks (CNNs) are inherently suffering from massively redundant computatio... more Convolutional neural networks (CNNs) are inherently suffering from massively redundant computation (FLOPs) due to the dense connection pattern between feature maps and convolution kernels. Recent research has investigated the sparse relationship between channels, however, they ignored the spatial relationship within a channel. In this paper, we present a novel convolutional operator, namely comb convolution, to exploit the intra-channel sparse relationship among neurons. The proposed convolutional operator eliminates nearly 50% of connections by inserting uniform mappings into standard convolutions and removing about half of spatial connections in convolutional layer. Notably, our work is orthogonal and complementary to existing methods that reduce channel-wise redundancy. Thus, it has great potential to further increase efficiency through integrating the comb convolution to existing architectures. Experimental results demonstrate that by simply replacing standard convolutions with ...
IEEE Transactions on Neural Networks and Learning Systems, 2021
Recently, differentiable neural architecture search methods significantly reduce the search cost ... more Recently, differentiable neural architecture search methods significantly reduce the search cost by constructing a super network and relax the architecture representation by assigning architecture weights to the candidate operations. All the existing methods determine the importance of each operation directly by architecture weights. However, architecture weights cannot accurately reflect the importance of each operation; that is, the operation with the highest weight might not related to the best performance. To alleviate this deficiency, we propose a simple yet effective solution to neural architecture search, termed as exploiting operation importance for effective neural architecture search (EoiNAS), in which a new indicator is proposed to fully exploit the operation importance and guide the model search. Based on this new indicator, we propose a gradual operation pruning strategy to further improve the search efficiency and accuracy. Experimental results have demonstrated the effectiveness of the proposed method. Specifically, we achieve an error rate of 2.50% on CIFAR-10, which significantly outperforms state-of-the-art methods. When transferred to ImageNet, it achieves the top-1 error of 25.6%, comparable to the state-of-the-art performance under the mobile setting.
Predicting protein subcellular localization is indispensable for inferring protein functions. Rec... more Predicting protein subcellular localization is indispensable for inferring protein functions. Recent studies have been focusing on predicting not only single-location proteins, but also multi-location proteins. Almost all of the high performing predictors proposed recently use gene ontology (GO) terms to construct feature vectors for classification. Despite their high performance, their prediction decisions are difficult to interpret because of the large number of GO terms involved. This paper proposes using sparse regressions to exploit GO information for both predicting and interpreting subcellular localization of single- and multi-location proteins. Specifically, we compared two multi-label sparse regression algorithms, namely multi-label LASSO (mLASSO) and multi-label elastic net (mEN), for large-scale predictions of protein subcellular localization. Both algorithms can yield sparse and interpretable solutions. By using the one-vs-rest strategy, mLASSO and mEN identified 87 and ...
Advances in Multimedia Information Processing — PCM 2002, 2002
This paper investigates kernel-based probabilistic neural networks for speaker verification in cl... more This paper investigates kernel-based probabilistic neural networks for speaker verification in clean and noisy environments. In particular, it compares the performance and characteristics of speaker verification systems that use probabilistic decision-based neural networks (PDBNNs), Gaussian mixture models (GMMs) and elliptical basis function networks (EBFNs) as speaker models. Experimental evaluations based on 138 speakers of the YOHO corpus and its noisy variants were conducted. The original PDBNN training algorithm was also modified to make PDBNNs appropriate for speaker verification. Experimental evaluations, based on 138 speakers and the visualization of decision boundaries, indicate that GMM-and PDBNN-based speaker models are superior to the EBFN ones in terms of performance and generalization capability. This work also finds that PDBNNs and GMMs are more robust than EBFNs in verifying speakers in noise environments.
In general, image restoration involves mapping from low quality images to their high-quality coun... more In general, image restoration involves mapping from low quality images to their high-quality counterparts. Such optimal mapping is usually non-linear and learnable by machine learning. Recently, deep convolutional neural networks have proven promising for such learning processing. It is desirable for an image processing network to support well with three vital tasks, namely, super-resolution, denoising, and deblocking. It is commonly recognized that these tasks have strong correlations. Therefore, it is imperative to harness the inter-task correlations. To this end, we propose the cross-scale residual network to exploit scale-related features and the inter-task correlations among the three tasks. The proposed network can extract multiple spatial scale features and establish multiple temporal feature reusage. Our experiments show that the proposed approach outperforms state-of-the-art methods in both quantitative and qualitative evaluations for multiple image restoration tasks.
The choice of activation function is essential for building state-of-the-art neural networks. At ... more The choice of activation function is essential for building state-of-the-art neural networks. At present, the most widely-used activation function with effectiveness is ReLU. However, ReLU has the problems of non-zero mean, negative missing, and unbounded output, thus it has potential disadvantages in the optimization process. To this end, it is desirable to propose a novel activation function to overcome the above three deficiencies. This paper proposes a new nonlinear activation function, namely "Soft-Root-Sign" (SRS), which is smooth, non-monotonic, and bounded. In contrast to ReLU, SRS adaptively adjusts a pair of independent trainable parameters to provide a zeromean output, resulting in better generalization performance and faster learning speed. It also prevents the distribution of the output from being scattered in the non-negative real number space and corrects it to the positive real number space, making it more compatible with batch normalization (BN) and less sensitive to initialization. In addition, the bounded property of SRS distinguishes itself from most state-of-the-art activation functions. We evaluated SRS on deep networks applied to a variety of tasks, including image classification, machine translation, and generative modeling. Experimental results show that the proposed activation function SRS is superior to ReLU and other state-of-the-art nonlinearities. Ablation study further verifies its compatibility with BN and its adaptability for different initialization.
Temporal action localization in untrimmed videos is an important but difficult task. Difficulties... more Temporal action localization in untrimmed videos is an important but difficult task. Difficulties are encountered in the application of existing methods when modeling temporal structures of videos. In the present study, we developed a novel method, referred to as Gemini Network, for effective modeling of temporal structures and achieving high-performance temporal action localization. The significant improvements afforded by the proposed method are attributable to three major factors. First, the developed network utilizes two subnets for effective modeling of temporal structures. Second, three parallel feature extraction pipelines are used to prevent interference between the extractions of different stage features. Third, the proposed method utilizes auxiliary supervision, with the auxiliary classifier losses affording additional constraints for improving the modeling capability of the network. As a demonstration of its effectiveness, the Gemini Network was used to achieve state-of-the-art temporal action localization performance on two challenging datasets, namely, THUMOS14 and ActivityNet.
Temporal action localization in untrimmed videos is an important but difficult task. Difficulties... more Temporal action localization in untrimmed videos is an important but difficult task. Difficulties are encountered in the application of existing methods when modeling temporal structures of videos. In the present study, we developed a novel method, referred to as Gemini Network, for effective modeling of temporal structures and achieving high-performance temporal action localization. The significant improvements afforded by the proposed method are attributable to three major factors. First, the developed network utilizes two subnets for effective modeling of temporal structures. Second, three parallel feature extraction pipelines are used to prevent interference between the extractions of different stage features. Third, the proposed method utilizes auxiliary supervision, with the auxiliary classifier losses affording additional constraints for improving the modeling capability of the network. As a demonstration of its effectiveness, the Gemini Network was used to achieve state-of-the-art temporal action localization performance on two challenging datasets, namely, THUMOS14 and ActivityNet.
2018 24th International Conference on Pattern Recognition (ICPR), 2018
We investigate generative adversarial networks as an effective solution to the crowd counting pro... more We investigate generative adversarial networks as an effective solution to the crowd counting problem. These networks not only learn the mapping from crowd image to corresponding density map, but also learn a loss function to train this mapping. There are many challenges to the task of crowd counting, such as severe occlusions in extremely dense crowd scenes, perspective distortion, and high visual similarity between pedestrians and background elements. To address these problems, we proposed multi-scale generative adversarial network to generate highquality crowd density maps of arbitrary crowd density scenes. We utilized the adversarial loss from discriminator to improve the quality of the estimated density map, which is critical to accurately predict crowd counts. The proposed multi-scale generator can extract multiple hierarchy features from the crowd image. The results showed that the proposed method provided better performance compared to current state-of-the-art methods .
IEEE International Conference on Acoustics Speech and Signal Processing, 2002
The performance of telephone-based speaker verification systems can be severely degraded by the a... more The performance of telephone-based speaker verification systems can be severely degraded by the acoustic mismatch caused by telephone handsets. This paper proposes to combine a handset selector with stochastic feature transformation to reduce the mismatch. Specifically, a GMM-based handset selector is trained to identify the most likely handset used by the claimants, and then handset-specific stochastic feature transformations are applied to the distorted feature vectors. To overcome the non-linear distortion introduced by telephone handsets, a 2nd-order stochastic feature transformation is proposed. Estimation algorithms based on the stochastic matching technique and the EM algorithm are derived. Experimental results based on 150 speakers of the HTIMIT corpus show that the handset selector is able to identify the handsets accurately (98.3%), and that both linear and non-linear transformation reduce the error rate significantly (from 12.37% to 5.49%).
This study presents a divide-and-conquer (DC) approach based on feature space decomposition for c... more This study presents a divide-and-conquer (DC) approach based on feature space decomposition for classification. When large-scale datasets are present, typical approaches usually employed truncated kernel methods on the feature space or DC approaches on the sample space. However, this did not guarantee separability between classes, owing to overfitting. To overcome such problems, this work proposes a novel DC approach on feature spaces consisting of three steps. Firstly, we divide the feature space into several subspaces using the decomposition method proposed in this paper. Subsequently, these feature subspaces are sent into individual local classifiers for training. Finally, the outcomes of local classifiers are fused together to generate the final classification results. Experiments on large-scale datasets are carried out for performance evaluation. The results show that the error rates of the proposed DC method decreased comparing with the state-of-the-art fast SVM solvers, e.g., reducing error rates by 10.53% and 7.53% on RCV1 and covtype datasets respectively.
The generalization performance of SVM-type classifiers severely suffers from the 'curse of dimens... more The generalization performance of SVM-type classifiers severely suffers from the 'curse of dimensionality'. For some real world applications, the dimensionality of the measurement is sometimes significantly larger compared to the amount of training data samples available. In this paper, a classification scheme is proposed and compared with existing techniques for such scenarios. The proposed scheme includes two parts: (i) feature selection and transformation based on Fisher discriminant criteria and (ii) a hybrid classifier combining Kernel Ridge Regression with Support Vector Machine to predict the label of the data. The first part is named Successively Orthogonal Discriminant Analysis (SODA), which is applied after Fisher score based feature selection as a preliminary processing for dimensionality reduction. At this step, SODA maximizes the ratio of between-class-scatter and within-class-scatter to obtain an orthogonal transformation matrix which maps the features to a new low dimensional feature space where the class separability is maximized. The techniques are tested on high dimensional data from a microwave measurements system and are compared with existing techniques.
Recurrent neural networks have become popular models for system identification and time series pr... more Recurrent neural networks have become popular models for system identification and time series prediction. Nonlinear autoregressive models with exogenous inputs (NARX) neural network models are a popular subclass of recurrent networks and have been used in many applications. Although embedded memory can be found in all recurrent network models, it is particularly prominent in NARX models. We show that using intelligent memory order selection through pruning and good initial heuristics significantly improves the generalization and predictive performance of these nonlinear systems on problems as diverse as grammatical inference and time series prediction.
Convolutional neural networks (CNNs) are inherently suffering from massively redundant computatio... more Convolutional neural networks (CNNs) are inherently suffering from massively redundant computation (FLOPs) due to the dense connection pattern between feature maps and convolution kernels. Recent research has investigated the sparse relationship between channels, however, they ignored the spatial relationship within a channel. In this paper, we present a novel convolutional operator, namely comb convolution, to exploit the intra-channel sparse relationship among neurons. The proposed convolutional operator eliminates nearly 50% of connections by inserting uniform mappings into standard convolutions and removing about half of spatial connections in convolutional layer. Notably, our work is orthogonal and complementary to existing methods that reduce channel-wise redundancy. Thus, it has great potential to further increase efficiency through integrating the comb convolution to existing architectures. Experimental results demonstrate that by simply replacing standard convolutions with comb convolutions on state-of-the-art CNN architectures (e.g., VGGNets, Xception and SE-Net), we can achieve 50% FLOPs reduction while still maintaining the accuracy.
Group convolution works well with many deep convolutional neural networks (CNNs) that can effecti... more Group convolution works well with many deep convolutional neural networks (CNNs) that can effectively compress the model by reducing the number of parameters and computational cost. Using this operation, feature maps of different group cannot communicate, which restricts their representation capability. To address this issue, in this work, we propose a novel operation named Hierarchical Group Convolution (HGC) for creating computationally efficient neural networks. Different from standard group convolution which blocks the inter-group information exchange and induces the severe performance degradation, HGC can hierarchically fuse the feature maps from each group and leverage the inter-group information effectively. Taking advantage of the proposed method, we introduce a family of compact networks called HGCNets. Compared to networks using standard group convolution, HGCNets have a huge improvement in accuracy at the same model size and complexity level. Extensive experimental results on the CIFAR dataset demonstrate that HGCNets obtain significant reduction of parameters and computational cost to achieve comparable performance over the prior CNN architectures designed for mobile devices such as MobileNet and ShuffleNet. Preprint. Under review.
The quest for better data analysis and artificial intelligence has lead to more and more data bei... more The quest for better data analysis and artificial intelligence has lead to more and more data being collected and stored. As a consequence, more data are exposed to malicious entities. This paper examines the problem of privacy in machine learning for classification. We utilize the Ridge Discriminant Component Analysis (RDCA) to desensitize data with respect to a privacy label. Based on five experiments, we show that desensitization by RDCA can effectively protect privacy (i.e. low accuracy on the privacy label) with small loss in utility. On HAR and CMU Faces datasets, the use of desensitized data results in random guess level accuracies for privacy at a cost of 5.14% and 0.04%, on average, drop in the utility accuracies. For Semeion Handwritten Digit dataset, accuracies of the privacy-sensitive digits are almost zero, while the accuracies for the utility-relevant digits drop by 7.53% on average. This presents a promising solution to the problem of privacy in machine learning for classification.
Recently, differentiable neural architecture search methods significantly reduce the search cost ... more Recently, differentiable neural architecture search methods significantly reduce the search cost by constructing a super network and relax the architecture representation by assigning architecture weights to the candidate operations. All the existing methods determine the importance of each operation directly by architecture weights. However, architecture weights cannot accurately reflect the importance of each operation; that is, the operation with the highest weight might not related to the best performance. To alleviate this deficiency, we propose a simple yet effective solution to neural architecture search, termed as exploiting operation importance for effective neural architecture search (EoiNAS), in which a new indicator is proposed to fully exploit the operation importance and guide the model search. Based on this new indicator, we propose a gradual operation pruning strategy to further improve the search efficiency and accuracy. Experimental results have demonstrated the effectiveness of the proposed method. Specifically, we achieve an error rate of 2.50% on CIFAR-10, which significantly outperforms state-of-the-art methods. When transferred to ImageNet, it achieves the top-1 error of 25.6%, comparable to the state-of-the-art performance under the mobile setting.
Object detection methods have been applied in several aerial and traffic surveillance application... more Object detection methods have been applied in several aerial and traffic surveillance applications. However, object detection accuracy decreases in low-resolution (LR) images owing to feature loss. To address this problem, we propose a single network, SRODNet, that incorporates both super-resolution (SR) and object detection (OD). First, a modified residual block (MRB) is proposed in the SR to recover the feature information of LR images, and this network was jointly optimized with YOLOv5 to benefit from hierarchical features for small object detection. Moreover, the proposed model focuses on minimizing the computational cost of network optimization. We evaluated the proposed model using standard datasets such as VEDAI-VISIBLE, VEDAI-IR, DOTA, and Korean highway traffic (KoHT), both quantitatively and qualitatively. The experimental results show that the proposed method improves the accuracy of vehicular detection better than other conventional methods.
Group convolution works well with many deep convolutional neural networks (CNNs) that can effecti... more Group convolution works well with many deep convolutional neural networks (CNNs) that can effectively compress the model by reducing the number of parameters and computational cost. Using this operation, feature maps of different group cannot communicate, which restricts their representation capability. To address this issue, in this work, we propose a novel operation named Hierarchical Group Convolution (HGC) for creating computationally efficient neural networks. Different from standard group convolution which blocks the inter-group information exchange and induces the severe performance degradation, HGC can hierarchically fuse the feature maps from each group and leverage the inter-group information effectively. Taking advantage of the proposed method, we introduce a family of compact networks called HGCNets. Compared to networks using standard group convolution, HGCNets have a huge improvement in accuracy at the same model size and complexity level. Extensive experimental result...
Convolutional neural networks (CNNs) are inherently suffering from massively redundant computatio... more Convolutional neural networks (CNNs) are inherently suffering from massively redundant computation (FLOPs) due to the dense connection pattern between feature maps and convolution kernels. Recent research has investigated the sparse relationship between channels, however, they ignored the spatial relationship within a channel. In this paper, we present a novel convolutional operator, namely comb convolution, to exploit the intra-channel sparse relationship among neurons. The proposed convolutional operator eliminates nearly 50% of connections by inserting uniform mappings into standard convolutions and removing about half of spatial connections in convolutional layer. Notably, our work is orthogonal and complementary to existing methods that reduce channel-wise redundancy. Thus, it has great potential to further increase efficiency through integrating the comb convolution to existing architectures. Experimental results demonstrate that by simply replacing standard convolutions with ...
IEEE Transactions on Neural Networks and Learning Systems, 2021
Recently, differentiable neural architecture search methods significantly reduce the search cost ... more Recently, differentiable neural architecture search methods significantly reduce the search cost by constructing a super network and relax the architecture representation by assigning architecture weights to the candidate operations. All the existing methods determine the importance of each operation directly by architecture weights. However, architecture weights cannot accurately reflect the importance of each operation; that is, the operation with the highest weight might not related to the best performance. To alleviate this deficiency, we propose a simple yet effective solution to neural architecture search, termed as exploiting operation importance for effective neural architecture search (EoiNAS), in which a new indicator is proposed to fully exploit the operation importance and guide the model search. Based on this new indicator, we propose a gradual operation pruning strategy to further improve the search efficiency and accuracy. Experimental results have demonstrated the effectiveness of the proposed method. Specifically, we achieve an error rate of 2.50% on CIFAR-10, which significantly outperforms state-of-the-art methods. When transferred to ImageNet, it achieves the top-1 error of 25.6%, comparable to the state-of-the-art performance under the mobile setting.
Predicting protein subcellular localization is indispensable for inferring protein functions. Rec... more Predicting protein subcellular localization is indispensable for inferring protein functions. Recent studies have been focusing on predicting not only single-location proteins, but also multi-location proteins. Almost all of the high performing predictors proposed recently use gene ontology (GO) terms to construct feature vectors for classification. Despite their high performance, their prediction decisions are difficult to interpret because of the large number of GO terms involved. This paper proposes using sparse regressions to exploit GO information for both predicting and interpreting subcellular localization of single- and multi-location proteins. Specifically, we compared two multi-label sparse regression algorithms, namely multi-label LASSO (mLASSO) and multi-label elastic net (mEN), for large-scale predictions of protein subcellular localization. Both algorithms can yield sparse and interpretable solutions. By using the one-vs-rest strategy, mLASSO and mEN identified 87 and ...
Advances in Multimedia Information Processing — PCM 2002, 2002
This paper investigates kernel-based probabilistic neural networks for speaker verification in cl... more This paper investigates kernel-based probabilistic neural networks for speaker verification in clean and noisy environments. In particular, it compares the performance and characteristics of speaker verification systems that use probabilistic decision-based neural networks (PDBNNs), Gaussian mixture models (GMMs) and elliptical basis function networks (EBFNs) as speaker models. Experimental evaluations based on 138 speakers of the YOHO corpus and its noisy variants were conducted. The original PDBNN training algorithm was also modified to make PDBNNs appropriate for speaker verification. Experimental evaluations, based on 138 speakers and the visualization of decision boundaries, indicate that GMM-and PDBNN-based speaker models are superior to the EBFN ones in terms of performance and generalization capability. This work also finds that PDBNNs and GMMs are more robust than EBFNs in verifying speakers in noise environments.
Uploads
Papers by Sun-yuan Kung