Communications in computer and information science, 2020
Wildfires are one of the disasters that are difficult to detect early and cause significant damag... more Wildfires are one of the disasters that are difficult to detect early and cause significant damage to human life, ecological systems, and infrastructure. There have been several research attempts to detect wildfires based on convolutional neural networks (CNNs) in video surveillance systems. However, most of these methods only focus on flame detection, thus they are still not sufficient to prevent loss of life and reduce economic and material damage. To tackle this issue, we present a deep learningbased method for detecting wildfires at an early stage by identifying flames and smokes at once. To realize the proposed idea, a large dataset for wildfire is acquired from the web. A lightweight yet powerful architecture is adopted to balance efficiency and accuracy. And focal loss is utilized to deal with the imbalance issue between classes. Experimental results demonstrate the effectiveness of the proposed method and validate its suitability for early wildfire detection in a video surveillance system.
Communications in computer and information science, 2020
In the applications of interior and architectural design, there are various tasks range from ensu... more In the applications of interior and architectural design, there are various tasks range from ensuring desired floor colors/textures plans, to deciding furnishing arrangement styles: all depending upon the choices of designers themselves. Thus in this modern era of artificial intelligence, computer vision based applications are very popular. Many research studies have been conducted to address different interior design applications using virtual reality technology. However, VR based applications do not provide a realistic experience to the user for interior design. Therefore in this study, we present an Augmented Reality (AR) based end-to-end systematic approach for interior design initialized by deep matting of an indoor scene. In our proposed application, the user has the authority to choose various colors/textures to change the interior of the region of interest in an indoor environment. Our proposed application has different modules working jointly for efficient interactive interior design. It allows the user to select its region of interest (wall or floor) and then give options to choose a color/texture to map on ROI for interior design experience. The final results of our proposed approach give realistic experience to the users as we estimate the global illumination changes on the ROI in our joint modules. Hence in this way, our presented interactive interior design application is user-friendly and works efficiently with realistic looking outputs.
In recent years visual object tracking has become a very active research area. An increasing numb... more In recent years visual object tracking has become a very active research area. An increasing number of tracking algorithms are being proposed each year. It is because tracking has wide applications in various real world problems such as humancomputer interaction, autonomous vehicles, robotics, surveillance and security just to name a few. In the current study, we review latest trends and advances in the tracking area and evaluate the robustness of di erent trackers based on the feature extraction methods. e rst part of this work comprises a comprehensive survey of the recently proposed trackers. We broadly categorize trackers into Correlation Filter based Trackers (CFTs) and Non-CFTs. Each category is further classi ed into various types based on the architecture and the tracking mechanism. In the second part, we experimentally evaluated 24 recent trackers for robustness, and compared handcra ed and deep feature based trackers. We observe that trackers using deep features performed be er, though in some cases a fusion of both increased performance signi cantly. In order to overcome the drawbacks of the existing benchmarks, a new benchmark Object Tracking and Temple Color (OTTC) has also been proposed and used in the evaluation of di erent algorithms. We analyze the performance of trackers over eleven di erent challenges in OTTC, and three other benchmarks. Our study concludes that Discriminative Correlation Filter (DCF) based trackers perform be er than the others. Our study also reveals that inclusion of di erent types of regularizations over DCF o en results in boosted tracking performance. Finally, we sum up our study by pointing out some insights and indicating future trends in visual object tracking eld.
The spread of satellite image increases various services using it. Especially, 3D visualization s... more The spread of satellite image increases various services using it. Especially, 3D visualization services of the whole earth such as Google Earth ™ and Visual Earth ™ or 3D GIS services for several cities provide realistic geometry information of buildings and terrain of wide areas. These service can be used in the various fields such as urban planning, improvement of roads, entertainment, military simulation and emergency response. The research about extracting the building and terrain information effectively from the high-resolution satellite image is required. In this paper, presents a system for effective extraction of the building model from a single high-resolution satellite image, after examine requirements for building model extraction. The proposed system utilizes geometric features of satellite image and the geometric relationship among the building, the shadow of the building, the positions of the sun and the satellite to minimize user interaction. Finally, after extracting the 3D building, the fact that effective extraction of the model from single high-resolution satellite will be show.
Moving object detection (MOD) is a fundamental step in many high-level vision-based applications,... more Moving object detection (MOD) is a fundamental step in many high-level vision-based applications, such as human activity analysis, visual object tracking, autonomous vehicles, surveillance, and security. Most of the existing MOD algorithms observe performance degradation in the presence of complex scenes containing camouflage objects, shadows, dynamic backgrounds, and varying illumination conditions, and captured by static cameras. To appropriately handle these challenges, we propose a Generative Adversarial Network (GAN) based on a moving object detection algorithm, called MOD_GAN. In the proposed algorithm, scene-specific GANs are trained in an unsupervised MOD setting, thereby enabling the algorithm to learn generating background sequences using input from uniformly distributed random noise samples. In addition to adversarial loss, during training, norm-based loss in the image space and discriminator feature-space is also minimized between the generated images and the training data. The additional losses enable the generator to learn subtle background details, resulting in a more realistic complex scene generation. During testing, a novel back-propagation based algorithm is used to generate images with statistics similar to the test images. More appropriate random noise samples are searched by directly minimizing the loss function between the test and generated images both in the image and discriminator feature-spaces. The network is not updated in this step; only the input noise samples are iteratively modified to minimize the loss function. Moreover, motion information is used to ensure that this loss is only computed on small-motion pixels. A novel dataset containing outdoor time-lapsed images from dawn to dusk with a full illumination variation cycle is also proposed to better compare the MOD algorithms in outdoor scenes. Accordingly, extensive experiments on five benchmark datasets and comparison with 30 existing methods demonstrate the strength of the proposed algorithm.
The foreground segmentation algorithms suffer performance degradation in the presence of various ... more The foreground segmentation algorithms suffer performance degradation in the presence of various challenges such as dynamic backgrounds, and various illumination conditions. To handle these challenges, we present a foreground segmentation method, based on generative adversarial network (GAN). We aim to segment foreground objects in the presence of two aforementioned major challenges in background scenes in real environments. To address this problem, our presented GAN model is trained on background image samples with dynamic changes, after that for testing the GAN model has to generate the same background sample as test sample with similar conditions via back-propagation technique. The generated background sample is then subtracted from the given test sample to segment foreground objects. The comparison of our proposed method with five state-of-the-art methods highlights the strength of our algorithm for foreground segmentation in the presence of challenging dynamic background scenario.
In many high level vision applications such as tracking and surveillance, background estimation i... more In many high level vision applications such as tracking and surveillance, background estimation is a fundamental step. In the past, background estimation was usually based on low level hand-crafted features such as raw color components, gradients, or local binary patterns. These existing algorithms observe performance degradation in the presence of various challenges such as dynamic backgrounds, photo-metric variations, camera jitter, and shadows. To handle these challenges for the purpose of accurate background estimation, we propose a unified method based on Generative Adversarial Network (GAN) and image inpainting. It is an unsupervised visual feature learning hybrid GAN based on context prediction. It is followed by a semantic inpainting network for texture optimization. We also propose a solution of arbitrary region inpainting by using center region inpainting and Poisson blending. The proposed algorithm is compared with the existing algorithms for background estimation on SBM.net dataset and for foreground segmentation on
Person re-identification (re-ID), is the task of associating the relationship among the images of... more Person re-identification (re-ID), is the task of associating the relationship among the images of a person captured from different cameras with non-overlapping field of view. Fundamental and yet an open issue in re-ID is extraction of powerful features in low resolution surveillance videos. In order to solve this, a novel Two Stream Convolutional Recurrent model with Attentive pooling mechanism is presented for person re-ID in videos. Each stream of the model is a Siamese network which is aimed at extracting and matching most differentiated feature maps. Attentive pooling is used to select most informative video frames. The output of two streams is fused to formulate one combined feature map, which helps to deal with major challenges of re-ID e.g. pose and illumination variation, clutter background and occlusion. The proposed technique is evaluated on three challenging datasets: MARS, PRID-2011 and iLIDS-VID. Experimental evaluation shows that the proposed technique performs better than existing state-of-the-art supervised video based person re-ID models. The implementation is available at https://github.com/re-identification/Person_RE-ID.git.
Communications in Computer and Information Science
Moving Object Segmentation (MOS) is an important topic in computer vision. MOS becomes a challeng... more Moving Object Segmentation (MOS) is an important topic in computer vision. MOS becomes a challenging problem in the presence of dynamic background and moving camera videos such as Pan-Tilt-Zoom cameras (PTZ). The MOS problem has been solved using unsupervised and supervised learning strategies. Recently, new ideas to solve MOS using semi-supervised learning have emerged inspired from the theory of Graph Signal Processing (GSP). These new algorithms are usually composed of several steps including: segmentation, background initial-ization, features extraction, graph construction, graph signal sampling, and a semi-supervised learning algorithm inspired from reconstruction of graph signals. In this work, we summarize and explain the theoretical foundations as well as the technical details of MOS using GPS. We also propose two architectures for MOS using semi-supervised learning and a new evaluation procedure for GSP-based MOS algorithms. GSP-based algorithms are evaluated in the Change Detection (CDNet2014) dataset for MOS, outperforming numerous State-Of-The-Art (SOTA) methods in several challenging conditions.
Abstract. Facial expression recognition (FER) is an important task for various computer vision ap... more Abstract. Facial expression recognition (FER) is an important task for various computer vision applications. The task becomes challenging when it requires the detection and encoding of macro- and micropatterns of facial expressions. We present a two-stage texture feature extraction framework based on the local binary pattern (LBP) variants and evaluate its significance in recognizing posed and nonposed facial expressions. We focus on the parametric limitations of the LBP variants and investigate their effects for optimal FER. The size of the local neighborhood is an important parameter of the LBP technique for its extraction in images. To make the LBP adaptive, we exploit the granulometric information of the facial images to find the local neighborhood size for the extraction of center-symmetric LBP (CS-LBP) features. Our two-stage texture representations consist of an LBP variant and the adaptive CS-LBP features. Among the presented two-stage texture feature extractions, the binarized statistical image features and adaptive CS-LBP features were found showing high FER rates. Evaluation of the adaptive texture features shows competitive and higher performance than the nonadaptive features and other state-of-the-art approaches, respectively.
2016 23rd International Conference on Pattern Recognition (ICPR), 2016
Computing a background model from a given sequence of video frames is a prerequisite for many com... more Computing a background model from a given sequence of video frames is a prerequisite for many computer vision applications. Recently, this problem has been posed as learning a low-dimensional subspace from high dimensional data. Many contemporary subspace segmentation methods have been proposed to overcome the limitations of the methods developed for simple background scenes. Unfortunately, because of the absence of motion information and without preserving intrinsic geometric structure of video data, most existing algorithms do not provide promising nature of the low-rank component for complex scenes. Such as largely occluded background by foreground objects, superfluity in video frames in order to cope with intermittent motion of foreground objects, sudden lighting condition variation, and camera jitter sequences. To overcome these difficulties, we propose a motion-aware regularization of graphs on low-rank component for video background modeling. We compute optical flow and use this information to make a motion-aware matrix. In order to learn the locality and similarity information within a video we compute inter-frame and intra-frame graphs which we use to preserve geometric information in the low-rank component. Finally, we use linearized alternating direction method with parallel splitting and adaptive penalty to incorporate the preceding steps to recover the model of the background. Experimental evaluations on challenging sequences demonstrate promising results over state-of-the-art methods.
2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019
Complete moving object detection plays a vital role in many applications of computer vision. For ... more Complete moving object detection plays a vital role in many applications of computer vision. For instance, depth estimation, scene understanding, object interaction, semantic segmentation, accident detection and avoidance in case of moving vehicles on a highway. However, it becomes challenging in the presence of dynamic backgrounds, camouflage, bootstrapping, varying illumination conditions, and noise. Over the past decade, robust subspace learning based methods addressed the moving objects detection problem with excellent performance. However, the moving objects detected by these methods are incomplete, unable to generate the occluded parts. Indeed, complete or occlusionfree moving object detection is still challenging for these methods. In the current work, we address this challenge by proposing a conditional Generative Adversarial Network (cGAN) conditioned on non-occluded moving object pixels during training. It therefore learns the subspace spanned by the moving objects covering all the dynamic variations and semantic information. While testing, our proposed Complete cGAN (CcGAN) is able to generate complete occlusion free moving objects in challenging conditions. The experimental evaluations of our proposed method are performed on SABS benchmark dataset and compared with 14 state-of-the-art methods, including both robust subspace and deep learning based methods. Our experiments demonstrate the superiority of our proposed model over both types of existing methods.
Foreground segmentation is a critical problem in many arti- cial intelligence and computer vision... more Foreground segmentation is a critical problem in many arti- cial intelligence and computer vision based applications. However, ro- bust foreground segmentation with high precision is still a challenging problem in complex scenes. Currently, many of the existing algorithms process the input data in RGB space only, where the foreground seg- mentation performance is most likely degraded by various challenges like shadows, color camou age, illumination changes, out of range camera sensors and bootstrapping. Cameras capturing RGBD data are highly active visual sensors as they provide depth information along with RGB of the given input images. Therefore, to address the challenging problem we proposed a foreground segmentation algorithm based on conditional generative adversarial networks using RGB and depth data. The goal of our proposed model is to perform robust foreground segmentation in the presence of various complex scenes with high accuracy. For this purpose, we trained our GAN based CNN model with RGBD input data con- ditioned on ground-truth information in an adversarial fashion. During training our proposed model aims to learn the foreground segmentation on the basis of cross-entropy loss and euclidean distance loss to iden- tify between real vs fake samples. While during testing the model is given RGBD input to the trained generator network that performs ro- bust foreground segmentation. Our proposed method is evaluated using two RGBD benchmark datasets that are SBM-RGBD and MULTIVI- SION Kinect. Various experimental evaluations and comparative analysis of our proposed model with eleven existing methods con rm its superior performance.
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021
Background-Foreground separation and appearance generation is a fundamental step in many computer... more Background-Foreground separation and appearance generation is a fundamental step in many computer vision applications. Existing methods like Robust Subspace Learning (RSL) suffer performance degradation in the presence of challenges like bad weather, illumination variations, occlusion, dynamic backgrounds and intermittent object motion. In the current work we propose a more accurate deep neural network based model for backgroundforeground separation and complete appearance generation of the foreground objects. Our proposed model, Guided Attention based Adversarial Model (GAAM), can efficiently extract pixel-level boundaries of the foreground objects for improved appearance generation. Unlike RSL methods our model extracts the binary information of foreground objects labeled as attention map which guides our generator network to segment the foreground objects from the complex background information. Wide range of experiments performed on the benchmark CDnet2014 dataset demonstrate the excellent performance of our proposed model.
Communications in Computer and Information Science, 2020
Dynamic Background Modeling (DBM) is a crucial task in many computer vision based applications su... more Dynamic Background Modeling (DBM) is a crucial task in many computer vision based applications such as human activity analysis, traffic monitoring, surveillance, and security. DBM is extremely challenging in scenarios like illumination changes, camouflage, intermittent object motion or shadows. In this study, we proposed an end-to-end framework based on Generative Adversarial Network, which can generate dynamic background information for the task of DBM in an unsupervised manner. Our proposed model can handle the problem of DBM in the presence of the challenges mentioned above by generating data similar to the desired information. The primary aim of our proposed model during training is to learn all the dynamic changes in a scene-specific background information. While, during testing, inverse mapping of data to latent space representation in our model generates dynamic backgrounds similar to test data. The comparative analysis of our proposed model upon experimental evaluations on SBM.net and SBI benchmark datasets has outperformed eight existing methods for DBM in many challenging scenarios.
2020 IEEE International Conference on Image Processing (ICIP), 2020
Dynamic Background Subtraction (BS) is a fundamental problem in many vision-based applications. B... more Dynamic Background Subtraction (BS) is a fundamental problem in many vision-based applications. BS in real complex environments has several challenging conditions like illumination variations, shadows, camera jitters, and bad weather. In this study, we aim to address the challenges of BS in complex scenes by exploiting conditional least squares adversarial networks. During training, a scene-specific conditional least squares adversarial network with two additional regularizations including L1-Loss and Perceptual-Loss is employed to learn the dynamic background variations. The given input to the model is video frames conditioned on corresponding ground truth to learn the dynamic changes in complex scenes. Afterwards, testing is performed on unseen test video frames so that the generator would conduct dynamic background subtraction. The proposed method consisting of three loss-terms including least squares adversarial loss, L1-Loss and Perceptual-Loss is evaluated on two benchmark datasets CDnet2014 and BMC. The results of our proposed method show improved performance on both datasets compared with 10 existing state-of-the-art methods.
Proceedings of the Symposium on Applied Computing, 2017
Background subtraction is a powerful mechanism for moving object detection. In addition to the mo... more Background subtraction is a powerful mechanism for moving object detection. In addition to the most popular dynamic background scenes and abrupt lighting condition limitations for designing robust background subtraction mechanism, jitter-induced motion also poses a great challenge. In this case background subtraction becomes more challenging. Although, robust principal component analysis (RPCA) provides a potential solution for moving object detection but many existing RPCA methods for background subtraction still produce abundant false positives in the presence of these challenges. In this paper, we propose background subtraction algorithm based on continuous learning of low-rank matrix using image pixels represented on a Minimum Spanning Tree (MST). First, efficient MST is constructed to estimate minimax path among the spatial pixels of input image. Then, robust smoothing constraint is employed on these pixels for outlier removal. The low-rank matrix is updated using MST-based observed pixels. Finally, we apply the markov random field (MRF) to label the absolute value of the sparse error. Our experiments show that the proposed algorithm achieves promising results on dynamic background and camera jitter sequences compared to state-of-the-art methods.
Communications in computer and information science, 2020
Wildfires are one of the disasters that are difficult to detect early and cause significant damag... more Wildfires are one of the disasters that are difficult to detect early and cause significant damage to human life, ecological systems, and infrastructure. There have been several research attempts to detect wildfires based on convolutional neural networks (CNNs) in video surveillance systems. However, most of these methods only focus on flame detection, thus they are still not sufficient to prevent loss of life and reduce economic and material damage. To tackle this issue, we present a deep learningbased method for detecting wildfires at an early stage by identifying flames and smokes at once. To realize the proposed idea, a large dataset for wildfire is acquired from the web. A lightweight yet powerful architecture is adopted to balance efficiency and accuracy. And focal loss is utilized to deal with the imbalance issue between classes. Experimental results demonstrate the effectiveness of the proposed method and validate its suitability for early wildfire detection in a video surveillance system.
Communications in computer and information science, 2020
In the applications of interior and architectural design, there are various tasks range from ensu... more In the applications of interior and architectural design, there are various tasks range from ensuring desired floor colors/textures plans, to deciding furnishing arrangement styles: all depending upon the choices of designers themselves. Thus in this modern era of artificial intelligence, computer vision based applications are very popular. Many research studies have been conducted to address different interior design applications using virtual reality technology. However, VR based applications do not provide a realistic experience to the user for interior design. Therefore in this study, we present an Augmented Reality (AR) based end-to-end systematic approach for interior design initialized by deep matting of an indoor scene. In our proposed application, the user has the authority to choose various colors/textures to change the interior of the region of interest in an indoor environment. Our proposed application has different modules working jointly for efficient interactive interior design. It allows the user to select its region of interest (wall or floor) and then give options to choose a color/texture to map on ROI for interior design experience. The final results of our proposed approach give realistic experience to the users as we estimate the global illumination changes on the ROI in our joint modules. Hence in this way, our presented interactive interior design application is user-friendly and works efficiently with realistic looking outputs.
In recent years visual object tracking has become a very active research area. An increasing numb... more In recent years visual object tracking has become a very active research area. An increasing number of tracking algorithms are being proposed each year. It is because tracking has wide applications in various real world problems such as humancomputer interaction, autonomous vehicles, robotics, surveillance and security just to name a few. In the current study, we review latest trends and advances in the tracking area and evaluate the robustness of di erent trackers based on the feature extraction methods. e rst part of this work comprises a comprehensive survey of the recently proposed trackers. We broadly categorize trackers into Correlation Filter based Trackers (CFTs) and Non-CFTs. Each category is further classi ed into various types based on the architecture and the tracking mechanism. In the second part, we experimentally evaluated 24 recent trackers for robustness, and compared handcra ed and deep feature based trackers. We observe that trackers using deep features performed be er, though in some cases a fusion of both increased performance signi cantly. In order to overcome the drawbacks of the existing benchmarks, a new benchmark Object Tracking and Temple Color (OTTC) has also been proposed and used in the evaluation of di erent algorithms. We analyze the performance of trackers over eleven di erent challenges in OTTC, and three other benchmarks. Our study concludes that Discriminative Correlation Filter (DCF) based trackers perform be er than the others. Our study also reveals that inclusion of di erent types of regularizations over DCF o en results in boosted tracking performance. Finally, we sum up our study by pointing out some insights and indicating future trends in visual object tracking eld.
The spread of satellite image increases various services using it. Especially, 3D visualization s... more The spread of satellite image increases various services using it. Especially, 3D visualization services of the whole earth such as Google Earth ™ and Visual Earth ™ or 3D GIS services for several cities provide realistic geometry information of buildings and terrain of wide areas. These service can be used in the various fields such as urban planning, improvement of roads, entertainment, military simulation and emergency response. The research about extracting the building and terrain information effectively from the high-resolution satellite image is required. In this paper, presents a system for effective extraction of the building model from a single high-resolution satellite image, after examine requirements for building model extraction. The proposed system utilizes geometric features of satellite image and the geometric relationship among the building, the shadow of the building, the positions of the sun and the satellite to minimize user interaction. Finally, after extracting the 3D building, the fact that effective extraction of the model from single high-resolution satellite will be show.
Moving object detection (MOD) is a fundamental step in many high-level vision-based applications,... more Moving object detection (MOD) is a fundamental step in many high-level vision-based applications, such as human activity analysis, visual object tracking, autonomous vehicles, surveillance, and security. Most of the existing MOD algorithms observe performance degradation in the presence of complex scenes containing camouflage objects, shadows, dynamic backgrounds, and varying illumination conditions, and captured by static cameras. To appropriately handle these challenges, we propose a Generative Adversarial Network (GAN) based on a moving object detection algorithm, called MOD_GAN. In the proposed algorithm, scene-specific GANs are trained in an unsupervised MOD setting, thereby enabling the algorithm to learn generating background sequences using input from uniformly distributed random noise samples. In addition to adversarial loss, during training, norm-based loss in the image space and discriminator feature-space is also minimized between the generated images and the training data. The additional losses enable the generator to learn subtle background details, resulting in a more realistic complex scene generation. During testing, a novel back-propagation based algorithm is used to generate images with statistics similar to the test images. More appropriate random noise samples are searched by directly minimizing the loss function between the test and generated images both in the image and discriminator feature-spaces. The network is not updated in this step; only the input noise samples are iteratively modified to minimize the loss function. Moreover, motion information is used to ensure that this loss is only computed on small-motion pixels. A novel dataset containing outdoor time-lapsed images from dawn to dusk with a full illumination variation cycle is also proposed to better compare the MOD algorithms in outdoor scenes. Accordingly, extensive experiments on five benchmark datasets and comparison with 30 existing methods demonstrate the strength of the proposed algorithm.
The foreground segmentation algorithms suffer performance degradation in the presence of various ... more The foreground segmentation algorithms suffer performance degradation in the presence of various challenges such as dynamic backgrounds, and various illumination conditions. To handle these challenges, we present a foreground segmentation method, based on generative adversarial network (GAN). We aim to segment foreground objects in the presence of two aforementioned major challenges in background scenes in real environments. To address this problem, our presented GAN model is trained on background image samples with dynamic changes, after that for testing the GAN model has to generate the same background sample as test sample with similar conditions via back-propagation technique. The generated background sample is then subtracted from the given test sample to segment foreground objects. The comparison of our proposed method with five state-of-the-art methods highlights the strength of our algorithm for foreground segmentation in the presence of challenging dynamic background scenario.
In many high level vision applications such as tracking and surveillance, background estimation i... more In many high level vision applications such as tracking and surveillance, background estimation is a fundamental step. In the past, background estimation was usually based on low level hand-crafted features such as raw color components, gradients, or local binary patterns. These existing algorithms observe performance degradation in the presence of various challenges such as dynamic backgrounds, photo-metric variations, camera jitter, and shadows. To handle these challenges for the purpose of accurate background estimation, we propose a unified method based on Generative Adversarial Network (GAN) and image inpainting. It is an unsupervised visual feature learning hybrid GAN based on context prediction. It is followed by a semantic inpainting network for texture optimization. We also propose a solution of arbitrary region inpainting by using center region inpainting and Poisson blending. The proposed algorithm is compared with the existing algorithms for background estimation on SBM.net dataset and for foreground segmentation on
Person re-identification (re-ID), is the task of associating the relationship among the images of... more Person re-identification (re-ID), is the task of associating the relationship among the images of a person captured from different cameras with non-overlapping field of view. Fundamental and yet an open issue in re-ID is extraction of powerful features in low resolution surveillance videos. In order to solve this, a novel Two Stream Convolutional Recurrent model with Attentive pooling mechanism is presented for person re-ID in videos. Each stream of the model is a Siamese network which is aimed at extracting and matching most differentiated feature maps. Attentive pooling is used to select most informative video frames. The output of two streams is fused to formulate one combined feature map, which helps to deal with major challenges of re-ID e.g. pose and illumination variation, clutter background and occlusion. The proposed technique is evaluated on three challenging datasets: MARS, PRID-2011 and iLIDS-VID. Experimental evaluation shows that the proposed technique performs better than existing state-of-the-art supervised video based person re-ID models. The implementation is available at https://github.com/re-identification/Person_RE-ID.git.
Communications in Computer and Information Science
Moving Object Segmentation (MOS) is an important topic in computer vision. MOS becomes a challeng... more Moving Object Segmentation (MOS) is an important topic in computer vision. MOS becomes a challenging problem in the presence of dynamic background and moving camera videos such as Pan-Tilt-Zoom cameras (PTZ). The MOS problem has been solved using unsupervised and supervised learning strategies. Recently, new ideas to solve MOS using semi-supervised learning have emerged inspired from the theory of Graph Signal Processing (GSP). These new algorithms are usually composed of several steps including: segmentation, background initial-ization, features extraction, graph construction, graph signal sampling, and a semi-supervised learning algorithm inspired from reconstruction of graph signals. In this work, we summarize and explain the theoretical foundations as well as the technical details of MOS using GPS. We also propose two architectures for MOS using semi-supervised learning and a new evaluation procedure for GSP-based MOS algorithms. GSP-based algorithms are evaluated in the Change Detection (CDNet2014) dataset for MOS, outperforming numerous State-Of-The-Art (SOTA) methods in several challenging conditions.
Abstract. Facial expression recognition (FER) is an important task for various computer vision ap... more Abstract. Facial expression recognition (FER) is an important task for various computer vision applications. The task becomes challenging when it requires the detection and encoding of macro- and micropatterns of facial expressions. We present a two-stage texture feature extraction framework based on the local binary pattern (LBP) variants and evaluate its significance in recognizing posed and nonposed facial expressions. We focus on the parametric limitations of the LBP variants and investigate their effects for optimal FER. The size of the local neighborhood is an important parameter of the LBP technique for its extraction in images. To make the LBP adaptive, we exploit the granulometric information of the facial images to find the local neighborhood size for the extraction of center-symmetric LBP (CS-LBP) features. Our two-stage texture representations consist of an LBP variant and the adaptive CS-LBP features. Among the presented two-stage texture feature extractions, the binarized statistical image features and adaptive CS-LBP features were found showing high FER rates. Evaluation of the adaptive texture features shows competitive and higher performance than the nonadaptive features and other state-of-the-art approaches, respectively.
2016 23rd International Conference on Pattern Recognition (ICPR), 2016
Computing a background model from a given sequence of video frames is a prerequisite for many com... more Computing a background model from a given sequence of video frames is a prerequisite for many computer vision applications. Recently, this problem has been posed as learning a low-dimensional subspace from high dimensional data. Many contemporary subspace segmentation methods have been proposed to overcome the limitations of the methods developed for simple background scenes. Unfortunately, because of the absence of motion information and without preserving intrinsic geometric structure of video data, most existing algorithms do not provide promising nature of the low-rank component for complex scenes. Such as largely occluded background by foreground objects, superfluity in video frames in order to cope with intermittent motion of foreground objects, sudden lighting condition variation, and camera jitter sequences. To overcome these difficulties, we propose a motion-aware regularization of graphs on low-rank component for video background modeling. We compute optical flow and use this information to make a motion-aware matrix. In order to learn the locality and similarity information within a video we compute inter-frame and intra-frame graphs which we use to preserve geometric information in the low-rank component. Finally, we use linearized alternating direction method with parallel splitting and adaptive penalty to incorporate the preceding steps to recover the model of the background. Experimental evaluations on challenging sequences demonstrate promising results over state-of-the-art methods.
2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019
Complete moving object detection plays a vital role in many applications of computer vision. For ... more Complete moving object detection plays a vital role in many applications of computer vision. For instance, depth estimation, scene understanding, object interaction, semantic segmentation, accident detection and avoidance in case of moving vehicles on a highway. However, it becomes challenging in the presence of dynamic backgrounds, camouflage, bootstrapping, varying illumination conditions, and noise. Over the past decade, robust subspace learning based methods addressed the moving objects detection problem with excellent performance. However, the moving objects detected by these methods are incomplete, unable to generate the occluded parts. Indeed, complete or occlusionfree moving object detection is still challenging for these methods. In the current work, we address this challenge by proposing a conditional Generative Adversarial Network (cGAN) conditioned on non-occluded moving object pixels during training. It therefore learns the subspace spanned by the moving objects covering all the dynamic variations and semantic information. While testing, our proposed Complete cGAN (CcGAN) is able to generate complete occlusion free moving objects in challenging conditions. The experimental evaluations of our proposed method are performed on SABS benchmark dataset and compared with 14 state-of-the-art methods, including both robust subspace and deep learning based methods. Our experiments demonstrate the superiority of our proposed model over both types of existing methods.
Foreground segmentation is a critical problem in many arti- cial intelligence and computer vision... more Foreground segmentation is a critical problem in many arti- cial intelligence and computer vision based applications. However, ro- bust foreground segmentation with high precision is still a challenging problem in complex scenes. Currently, many of the existing algorithms process the input data in RGB space only, where the foreground seg- mentation performance is most likely degraded by various challenges like shadows, color camou age, illumination changes, out of range camera sensors and bootstrapping. Cameras capturing RGBD data are highly active visual sensors as they provide depth information along with RGB of the given input images. Therefore, to address the challenging problem we proposed a foreground segmentation algorithm based on conditional generative adversarial networks using RGB and depth data. The goal of our proposed model is to perform robust foreground segmentation in the presence of various complex scenes with high accuracy. For this purpose, we trained our GAN based CNN model with RGBD input data con- ditioned on ground-truth information in an adversarial fashion. During training our proposed model aims to learn the foreground segmentation on the basis of cross-entropy loss and euclidean distance loss to iden- tify between real vs fake samples. While during testing the model is given RGBD input to the trained generator network that performs ro- bust foreground segmentation. Our proposed method is evaluated using two RGBD benchmark datasets that are SBM-RGBD and MULTIVI- SION Kinect. Various experimental evaluations and comparative analysis of our proposed model with eleven existing methods con rm its superior performance.
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021
Background-Foreground separation and appearance generation is a fundamental step in many computer... more Background-Foreground separation and appearance generation is a fundamental step in many computer vision applications. Existing methods like Robust Subspace Learning (RSL) suffer performance degradation in the presence of challenges like bad weather, illumination variations, occlusion, dynamic backgrounds and intermittent object motion. In the current work we propose a more accurate deep neural network based model for backgroundforeground separation and complete appearance generation of the foreground objects. Our proposed model, Guided Attention based Adversarial Model (GAAM), can efficiently extract pixel-level boundaries of the foreground objects for improved appearance generation. Unlike RSL methods our model extracts the binary information of foreground objects labeled as attention map which guides our generator network to segment the foreground objects from the complex background information. Wide range of experiments performed on the benchmark CDnet2014 dataset demonstrate the excellent performance of our proposed model.
Communications in Computer and Information Science, 2020
Dynamic Background Modeling (DBM) is a crucial task in many computer vision based applications su... more Dynamic Background Modeling (DBM) is a crucial task in many computer vision based applications such as human activity analysis, traffic monitoring, surveillance, and security. DBM is extremely challenging in scenarios like illumination changes, camouflage, intermittent object motion or shadows. In this study, we proposed an end-to-end framework based on Generative Adversarial Network, which can generate dynamic background information for the task of DBM in an unsupervised manner. Our proposed model can handle the problem of DBM in the presence of the challenges mentioned above by generating data similar to the desired information. The primary aim of our proposed model during training is to learn all the dynamic changes in a scene-specific background information. While, during testing, inverse mapping of data to latent space representation in our model generates dynamic backgrounds similar to test data. The comparative analysis of our proposed model upon experimental evaluations on SBM.net and SBI benchmark datasets has outperformed eight existing methods for DBM in many challenging scenarios.
2020 IEEE International Conference on Image Processing (ICIP), 2020
Dynamic Background Subtraction (BS) is a fundamental problem in many vision-based applications. B... more Dynamic Background Subtraction (BS) is a fundamental problem in many vision-based applications. BS in real complex environments has several challenging conditions like illumination variations, shadows, camera jitters, and bad weather. In this study, we aim to address the challenges of BS in complex scenes by exploiting conditional least squares adversarial networks. During training, a scene-specific conditional least squares adversarial network with two additional regularizations including L1-Loss and Perceptual-Loss is employed to learn the dynamic background variations. The given input to the model is video frames conditioned on corresponding ground truth to learn the dynamic changes in complex scenes. Afterwards, testing is performed on unseen test video frames so that the generator would conduct dynamic background subtraction. The proposed method consisting of three loss-terms including least squares adversarial loss, L1-Loss and Perceptual-Loss is evaluated on two benchmark datasets CDnet2014 and BMC. The results of our proposed method show improved performance on both datasets compared with 10 existing state-of-the-art methods.
Proceedings of the Symposium on Applied Computing, 2017
Background subtraction is a powerful mechanism for moving object detection. In addition to the mo... more Background subtraction is a powerful mechanism for moving object detection. In addition to the most popular dynamic background scenes and abrupt lighting condition limitations for designing robust background subtraction mechanism, jitter-induced motion also poses a great challenge. In this case background subtraction becomes more challenging. Although, robust principal component analysis (RPCA) provides a potential solution for moving object detection but many existing RPCA methods for background subtraction still produce abundant false positives in the presence of these challenges. In this paper, we propose background subtraction algorithm based on continuous learning of low-rank matrix using image pixels represented on a Minimum Spanning Tree (MST). First, efficient MST is constructed to estimate minimax path among the spatial pixels of input image. Then, robust smoothing constraint is employed on these pixels for outlier removal. The low-rank matrix is updated using MST-based observed pixels. Finally, we apply the markov random field (MRF) to label the absolute value of the sparse error. Our experiments show that the proposed algorithm achieves promising results on dynamic background and camera jitter sequences compared to state-of-the-art methods.
In this chapter, we present OR-PCA with its application to background/foreground segmentation. Fi... more In this chapter, we present OR-PCA with its application to background/foreground segmentation. First, we give an overview of stochastic RPCA (also known as OR-PCA), then background/foreground segmentation is presented using stochastic RPCA.
Uploads
Papers by Soon Ki Jung