Skip to main content

Miguel Gonzalez-Mendoza

Tecnológico de Monterrey, Computer Sciences, Faculty Member

Followers

34

Following

16

Co-authors

4

Public Views

Miguel González Mendoza holds a PhD degree and a Postdoc in Artificial Intelligence from INSA and LAAS-CNRS Toulouse, France, in 2003 and 2004 respectively. Since 2004 he works as research professor at Tecnologico de Monterrey, Mexico. Miguel González Mendoza’s research activities are focused on machine learning, semantic web and big data applications, areas in which has supervised 9 PhD and 21 MSc. Theses, published more than 100 peer reviewed scientific publications, participated and conducted more than 20 national (CONACYT founded) and international (European founded) research and innovation projects, and chaired 4 international Congresses.President of the Mexican Society for Artificial Intelligence (2017-2018), Member of the Mexican National Research System (SNI) rank II (Jan 2016), member since 2006. Head of the Graduate Programs on Computer Sciences at Tecnologico de Monterrey, Mexico 2005-2016.

less

Universitat Politecnica de Catalunya

Borja Rengel Darnaculleta

Hossein Mashhadimoslem

University of Waterloo, Canada

NUR SHAHIDAH AB AZIZ

UiTM Shah Alam

Moulay Akhloufi

Georgiy Polupan

Volodymyr Shentsov

University of Ulster

Interests

Uploads

Papers by Miguel Gonzalez-Mendoza

Experimental Large-Scale Jet Flames' Geometrical Features Extraction for Risk Management Using Infrared Images and Deep Learning Segmentation Methods

Jet fires are relatively small and have the least severe effects among the diverse fire accidents... more Jet fires are relatively small and have the least severe effects among the diverse fire accidents that can occur in industrial plants; however, they are usually involved in a process known as the domino effect, that leads to more severe events, such as explosions or the initiation of another fire, making the analysis of such fires an important part of risk analysis. This research work explores the application of deep learning models in an alternative approach that uses the semantic segmentation of jet fires flames to extract main geometrical attributes, relevant for fire risk assessments. A comparison is made between traditional image processing methods and some state-of-the-art deep learning models. It is found that the best approach is a deep learning architecture known as UNet, along with its two improvements, Attention UNet and UNet++. The models are then used to segment a group of vertical jet flames of varying pipe outlet diameters to extract their main geometrical characteris...

On the in vivo recognition of kidney stones using machine learning

arXiv (Cornell University), Jan 21, 2022

Determining the kidney stones type allows urologists to prescribe a treatment to avoid recurrence... more Determining the kidney stones type allows urologists to prescribe a treatment to avoid recurrence of renal lithiasis. An automated in-vivo image-based classification method would be an important step towards an immediate identification of the kidney stone type required as a first phase of the diagnosis. In the literature it was shown on ex-vivo data (i.e., in very controlled scene and image acquisition conditions) that an automated kidney stone classification is indeed feasible. This pilot study compares the kidney stone recognition performances of six shallow machine learning methods and three deep-learning architectures which were tested with in-vivo images of the four most frequent urinary calculi types acquired with an endoscope during standard ureteroscopies. This contribution details the database construction and the design of the tested kidney stones classifiers. Even if the best results were obtained by the Inception v3 architecture (weighted precision, recall and F1-score of 0.97, 0.98 and 0.97, respectively), it is also shown that choosing an appropriate colour space and texture features allows a shallow machine learning method to approach closely the performances of the most promising deep-learning methods (the XGBoost classifier led to weighted precision, recall and F1-score values of 0.96).

Lightweight Low-Resolution Face Recognition for Surveillance Applications

2020 25th International Conference on Pattern Recognition (ICPR), 2021

A Bop and Beyond: A Second Order Optimizer for Binarized Neural Networks

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021

The optimization of Binary Neural Networks (BNNs) relies on approximating the real-valued weights... more The optimization of Binary Neural Networks (BNNs) relies on approximating the real-valued weights with their binarized representations. Current techniques for weightupdating use the same approaches as traditional Neural Networks (NNs) with the extra requirement of using an approximation to the derivative of the sign function-as it is the Dirac-Delta function-for back-propagation; thus, efforts are focused adapting full-precision techniques to work on BNNs. In the literature, only one previous effort has tackled the problem of directly training the BNNs with bit-flips by using the first raw moment estimate of the gradients and comparing it against a threshold for deciding when to flip a weight (Bop). In this paper, we take an approach parallel to Adam which also uses the second raw moment estimate to normalize the first raw moment before doing the comparison with the threshold, we call this method Bop2ndOrder. We present two versions of the proposed optimizer: a biased one and a bias-corrected one, each with its own applications. Also, we present a complete ablation study of the hyperparameters space, as well as the effect of using schedulers on each of them. For these studies, we tested the optimizer in CIFAR10 using the BinaryNet architecture. Also, we tested it in ImageNet 2012 with the XnorNet and BiRealNet architectures for accuracy. In both datasets our approach proved to converge faster, was robust to changes of the hyperparameters, and achieved better accuracy values.

Mejoras al algoritmo de trayectorias densas para el reconocimiento de acciones en video

Research in Computing Science, 2018

Resumen. La habilidad para detectar personas y sus acciones de una manera autónoma y eficiente es... more Resumen. La habilidad para detectar personas y sus acciones de una manera autónoma y eficiente es uno de los objetivos principales de los sistemas inteligentes de video protección. El reconocimiento de acciones es parte importante de ello y en este trabajo exploramos diversas alternativas para mejorar el tiempo de ejecución y exactitud en uno de los métodos más usados: las trayectorias densas. Proponemos sustituir el algoritmo de flujo óptico Farneback por DisOF que permite reducir el tiempo de extracción de trayectorias en un 50%. De igual manera, analizamos la reducción del ruido provocado por las trayectorias no asociadas al objeto de interés mediante la estimación de los puntos anatómicos del cuerpo humano, discriminando más de la mitad de las trayectorias sin sacrificar de manera significativa la exactitud de los resultados. Adicional a esto, exploramos la idea de incorporar las relaciones espaciales entre trayectorias a través del uso de la técnica de pirámide espacial, encontrando que es posible mejorar la eficacia en los resultados. Palabras clave: trayectorias densas, reconocimiento de acciones, visión por computadora, estimación de postura, relaciones espaciales. Improvements to the Dense Trajectories Algorithm for Action Recognition Abstract. The ability to detect people and their actions in an autonomous and efficient way is one of the main objectives of intelligent video-protection systems. Action recognition is one of the most important parts of this kind of systems. In this work, we explore diverse alternatives to improve both the accuracy and execution time in one of the most used methods: dense trajectories. We propose to replace the optical flow algorithm from Farneback to DisOF, our results show that the time needed to extract the dense trajectories is reduced by 50%. Also, we analyze how the noisy trajectories can be reduced by estimating the anatomical points of the human body. In this way, more than half of the total trajectories were eliminated without a significant loss of accuracy. In addition to this, we study how spatial relationships through the spatial pyramid technique 257

Drowsy driver detection using wavelets and support vector machines

Proceedings of the First …, 2006

Surveillance temps–réel des systèmes Homme–Machine. Application à l’assistance à la conduite automobile

Avant Propos Les travaux présentés dans cette thèse sont le résultat des recherches réalisées dan... more Avant Propos Les travaux présentés dans cette thèse sont le résultat des recherches réalisées dans le Laboratoire d'Analyse et d'Architecture des Systèmes du Centre National de la Recherche Scientifique, LAAS-CNRS, dans les groupes de recherche Diagnostic, Supervision et Conduite qualitatifs, DISCO, et Microsystèmes et Intégration des Systèmes, MIS. Je remercie à Monsieur Jean-Claude LAPRIE et à Monsieur Malik GHALLAB, ancien et actuel directeur du LAAS, de m'avoir accueilli dans cet établissement. J'exprime toute ma gratitude et reconnaissance à mon directeur de thèse Monsieur André TITLI, Professeur à l'Institut National des Sciences Appliquées de Toulouse, INSA, et cadre Scientifique au LAAS-CNRS, pour sa confiance et son amitié toute au long de ces années de travail. Sa forme très particulière d'écouter, de discuter et de proposer des idées claires et précises, a su diriger sagement les mouvements de ce jeune chercheur, en me donnant, en même temps, un grand niveau d'autonomie et une manière plus élargie d'apprécier le monde. Je lui remercie également de m'avoir maintenu sur la voie de la représentation à partir de la connaissance, notamment sur les systèmes d'inférence flous, pour lesquels je voyais moins d'intérêt que pour la représentation à partir des données. Au LAAS j'exprime ma reconnaissance envers : Monsieur Joseph AGUILAR-MARTIN, directeur du Laboratoire Européen LEA-SICA et ancien directeur du groupe DISCO, de m'avoir toujours aidé et supporté pour réaliser les séjours en Catalogne et participer aux différentes conférences. Son grand enthousiasme, bonne humeur et disponibilité a crée une bonne ambiance entre nos camarades. Madame Anne-Marie GUE, directeur du groupe MIS, pour m'avoir toujours aidé à effectuer les missions au cours de ces années pour les projets. Madame Louise TRAVE-MASSUYES, directeur du groupe DISCO, de m'avoir également soutenue lors des différentes missions et de m'avoir invité à participer au réseau MONET. Je lui remercie également pour sa disponibilité et ces commentaires par rapport au diagnostic. Monsieur Alfredo SANTANA DIAZ, pour nos collaborations sur la détection de la baisse de la vigilance du conducteur et pour m'avoir introduit au monde des ondelettes. Monsieur Sébastien BEYOU, stagiaire de DEA, pour sa contribution à ce travail de thèse. Mes collègues du groupe DISCO et MIS qui ont contribué, avec son bon humeur à rendre plus convivial mon séjour au LAAS :

Boosting Self-supervised Video-based Human Action Recognition Through Knowledge Distillation

LatinX in AI at Neural Information Processing Systems Conference 2022, Nov 28, 2022

Deep learning architectures lead the state-of-the-art in several computer vision, natural languag... more Deep learning architectures lead the state-of-the-art in several computer vision, natural language processing, and reinforcement learning tasks due to their ability to extract multi-level representations without human engineering. The model's performance is affected by the amount of labeled data used in training. Hence, novel approaches like self-supervised learning (SSL) extract the supervisory signal using unlabeled data. Although SSL reduces the dependency on human annotations, there are still two main drawbacks. First, high-computational resources are required to train a large-scale model from scratch. Second, knowledge from an SSL model is commonly finetuning to a target model, which forces them to share the same parameters and architecture and make it task-dependent. This paper explores how SSL benefits from knowledge distillation in constructing an efficient and non-taskdependent training framework. The experimental design compared the training process of an SSL algorithm trained from scratch and boosted by knowledge distillation in a teacher-student paradigm using the video-based human action recognition dataset UCF101. Results show that knowledge distillation accelerates the convergence of a network and removes the reliance on model architectures.

TYolov5: A Temporal Yolov5 Detector Based on Quasi-Recurrent Neural Networks for Real-Time Handgun Detection in Video

arXiv (Cornell University), Nov 16, 2021

Timely handgun detection is a crucial problem to improve public safety; nevertheless, the effecti... more Timely handgun detection is a crucial problem to improve public safety; nevertheless, the effectiveness of many surveillance systems still depends of finite human attention. Much of the previous research on handgun detection is based on static image detectors, leaving aside valuable temporal information that could be used to improve object detection in videos. To improve the performance of surveillance systems, a real-time temporal handgun detection system should be built. Using Temporal Yolov5, an architecture based on Quasi-Recurrent Neural Networks, temporal information is extracted from video to improve the results of handgun detection. Moreover, two publicly available datasets are proposed, labeled with hands, guns, and phones. One containing 2199 static images to train static detectors, and another with 5960 frames of videos to train temporal modules. Additionally, we explore two temporal data augmentation techniques based on Mosaic and Mixup. The resulting systems are three temporal architectures: one focused in reducing inference with a mAP 50:95 of 55.9, another in having a good balance between inference and accuracy with a mAP 50:95 of 59, and a last one specialized in accuracy with a mAP 50:95 of 60.2. Temporal Yolov5 achieves real-time detection in the small and medium architectures. Moreover, it takes advantage of temporal features contained in videos to perform better than Yolov5 in our temporal dataset, making TYolov5 suitable for real-world applications. The source code is publicly available at https://github.com/MarioDuran/TYolov5.

Experimental large-scale jet flames’ geometrical features extraction for risk management using infrared images and deep learning segmentation methods

Journal of Loss Prevention in the Process Industries

Jet fires are relatively small and have the least severe effects among the diverse fire accidents... more Jet fires are relatively small and have the least severe effects among the diverse fire accidents that can occur in industrial plants; however, they are usually involved in a process known as the domino effect, that leads to more severe events, such as explosions or the initiation of another fire, making the analysis of such fires an important part of risk analysis. This research work explores the application of deep learning models in an alternative approach that uses the semantic segmentation of jet fires flames to extract the flame's main geometrical attributes, relevant for fire risk assessments. A comparison is made between traditional image processing methods and some state-of-the-art deep learning models. It is found that the best approach is a deep learning architecture known as UNet, along with its two improvements, Attention UNet and UNet++. The models are then used to segment a group of vertical jet flames of varying pipe outlet diameters to extract their main geometrical characteristics. Attention UNet obtained the best general performance in the approximation of both height and area of the flames, while also showing a statistically significant difference between it and UNet++. UNet obtained the best overall performance for the approximation of the lift-off distances; however, there is not enough data to prove a statistically significant difference between Attention UNet and UNet++. The only instance where UNet++ outperformed the other models, was while obtaining the lift-off distances of the jet flames with 0.01275 m pipe outlet diameter. In general, the explored models show good agreement between the experimental and predicted values for relatively large turbulent propane jet flames, released in sonic and subsonic regimes; thus, making these radiation zones segmentation models, a suitable approach for different jet flame risk management scenarios.

Action recognition by key trajectories

Pattern Analysis and Applications, 2022

Human action recognition is an active field of research that intends to explain what a subject is... more Human action recognition is an active field of research that intends to explain what a subject is doing in an input video. Deep learning architectures serve as the foundation for cutting-edge approaches. Recent research, on the other hand, indicates that hand-crafted characteristics are complementary and, when combined, can enhance classification accuracy. Cutting-edge approaches are based on deep learning architectures. Recent research, however, indicates that hand-crafted features complement each other and can help boost classification accuracy when combined. We introduce the key trajectories approach that is based on the popular, hand-crafted method, improved dense trajectories. Our work explores how pose estimation can be used to find meaningful key points to reduce computational time, undesired noise, and to guarantee a stable frame processing rate. Furthermore, we tested how feature-tracking behaves with dense inverse search and with a frame to frame subject key point estimati...

Comparing Machine Learning Based Segmentation Models on Jet Fire Radiation Zones

Advances in Computational Intelligence, 2021

Risk assessment is relevant in any workplace, however, there is a degree of unpredictability when... more Risk assessment is relevant in any workplace, however, there is a degree of unpredictability when dealing with flammable or hazardous materials so that detection of fire accidents by itself may not be enough. An example of this is the impingement of jet fires, where the heat fluxes of the flame could reach nearby equipment and dramatically increase the probability of a domino effect with catastrophic results. Because of this, the characterization of such fire accidents is important from a risk management point of view. One such characterization would be the segmentation of different radiation zones within the flame, so this paper presents exploratory research regarding several traditional computer vision and Deep Learning segmentation approaches to solving this specific problem. A data set of propane jet fires is used to train and evaluate the different approaches and given the difference in the distribution of the zones and background of the images, different loss functions, that seek to alleviate data imbalance, are also explored. Additionally, different metrics are correlated to a manual ranking performed by experts to make an evaluation that closely resembles the expert's criteria. The Hausdorff Distance and Adjusted Rand Index were the metrics with the highest correlation and the best results were obtained from the UNet architecture with a Weighted Cross-Entropy Loss. These results can be used in future research to extract more geometric information from the segmentation masks or could even be implemented on other types of fire accidents.

A Study on the Performance of Unconstrained Very Low Resolution Face Recognition: Analyzing Current Trends and New Research Directions

IEEE Access, 2021

In the past decade, research in the face recognition area has advanced tremendously, particularly... more In the past decade, research in the face recognition area has advanced tremendously, particularly in uncontrolled scenarios (face recognition in the wild). This advancement has been achieved partly due to the massive popularity and effectiveness of deep convolutional neural networks and the availability of larger unconstrained datasets. However, several face recognition challenges remain in the context of very low resolution homogeneous (same domain) and heterogeneous (different domain) face recognition. In this survey, we study the seminal and novel methods to tackle the very low resolution face recognition problem and provide an in-depth analysis of their design, effectiveness, and efficiency for a real-time surveillance application. Furthermore, we analyze the advantage of employing deep learning convolutional neural networks, while presenting future research directions for effective deep learning network design in this context.

ShuffleFaceNet: A Lightweight Face Architecture for Efficient and Highly-Accurate Face Recognition

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019

The recent success of convolutional neural networks has led to the development of a variety of ne... more The recent success of convolutional neural networks has led to the development of a variety of new effective and efficient architectures. However, few of them have been designed for the specific case of face recognition. Inspired on the state-of-the-art ShuffleNetV2 model, a lightweight face architecture is presented in this paper. The proposal, named ShuffleFaceNet, introduces significant modifications in order to improve face recognition accuracy. First, the Global Average Pooling layer is replaced by a Global Depth-wise Convolution layer, and Parametric Rectified Linear Unit is used as a non-linear activation function. Under the same experimental conditions, ShuffleFaceNet achieves significantly superior accuracy than the original ShuffleNetV2, maintaining the same speed and compact storage. In addition, extensive experiments conducted on three challenging benchmark face datasets, show that our proposal improves not only state-of-the-art lightweight models but also very deep face recognition models.

Recopilación de bases de datos de estacionamientos para aplicaciones en visión computacional

Research in Computing Science, 2018

Resumen. Un estacionamiento es un ambiente muy bien estructurado en donde usualmente los sistemas... more Resumen. Un estacionamiento es un ambiente muy bien estructurado en donde usualmente los sistemas de vigilancia se han enfocado. Sin embargo, el conocimiento previo de la estructura del estacionamiento es muchas veces ignorado por los investigadores que hacen uso de las bases de datos tradicionales para entrenar sus algoritmos. Inclusive que estos algoritmos sean correctos y completos, los modelos con los que han sido entrenados o comparados usando este tipo de datos tienden a quedar muy atrás o presentar una naturaleza engañosa. En este artículo proponemos un enfoque basado en tareas, en el que cuidadosamente desglosamos la compleja tarea de detectar comportamientos en estacionamientos entre partes mucho más tratables. Luego, por cada parte proponemos una serie de bases de datos actualmente disponibles en la literatura que pueden ayudar a dominar el problema, cada una desde una perspectiva diferente. Una de las mayores referencias de este artículo ha sido el trabajo de [5] en el que un enfoque mucho más amplio sobre conducción autómata fue tomado. Palabras clave: visión computacional en estacionamientos, detección de objetos al aire libre, seguimiento de objetos al aire libre, seguimiento de vehículos, bases de datos para estacionamientos, estacionamientos.

Deep Learning System for Vehicular Re-Routing and Congestion Avoidance

Applied Sciences, 2019

Delays in transportation due to congestion generated by public and private transportation are com... more Delays in transportation due to congestion generated by public and private transportation are common in many urban areas of the world. To make transportation systems more efficient, intelligent transportation systems (ITS) are currently being developed. One of the objectives of ITS is to detect congested areas and redirect vehicles away from them. However, most existing approaches only react once the traffic jam has occurred and, therefore, the delay has already spread to more areas of the traffic network. We propose a vehicle redirection system to avoid congestion that uses a model based on deep learning to predict the future state of the traffic network. The model uses the information obtained from the previous step to determine the zones with possible congestion, and redirects the vehicles that are about to cross them. Alternative routes are generated using the entropy-balanced k Shortest Path algorithm (EBkSP). The proposal uses information obtained in real time by a set of prob...

The Traffic Status and Pollutant Status Ontologies for the Smart City Domain

Advances in Computational Intelligence, 2018

Vocabulary must be well defined to promote Syntactical and Semantic interoperability of cloud-bas... more Vocabulary must be well defined to promote Syntactical and Semantic interoperability of cloud-based IoT (Internet of Things) architectures in order to develop applications for Smart City environments. Ontologies are used to represent knowledge within a domain, through them, it is possible to define and classify things, actions, features and relations among other aspects. This work describes the development of two ontologies, these are for traffic status and pollution status.

Robust Parking Block Segmentation from a Surveillance Camera Perspective

Applied Sciences, 2020

Parking block regions host dangerous behaviors that can be detected from a surveillance camera pe... more Parking block regions host dangerous behaviors that can be detected from a surveillance camera perspective. However, these regions are often occluded, subject to ground bumpiness or steep slopes, and thus they are hard to segment. Firstly, the paper proposes a pyramidal solution that takes advantage of satellite views of the same scene, based on a deep Convolutional Neural Network (CNN). Training a CNN from the surveillance camera perspective is rather impossible due to the combinatory explosion generated by multiple point-of-views. However, CNNs showed great promise on previous works over satellite images. Secondly, even though there are many datasets for occupancy detection in parking lots, none of them were designed to tackle the parking block segmentation problem directly. Given the lack of a suitable dataset, we also propose APKLOT, a dataset of roughly 7000 polygons for segmenting parking blocks from the satellite perspective and from the camera perspective. Moreover, our meth...

Predicting Soccer Results Through Sentiment Analysis: A Graph Theory Approach

More than four out of 10 sports fans consider themselves soccer fans, making the game the world's... more More than four out of 10 sports fans consider themselves soccer fans, making the game the world's most popular sport. Sports are season based and constantly changing over time, as well, statistics vary according to the sport and league. Understanding sports communities in Social Networks and identifying fan's expertise is a key indicator for soccer prediction. This research proposes a Machine Learning Model using polarity on a dataset of 3,000 tweets taken during the last game week on English Premier League season 19/20. The end goal is to achieve a flexible mechanism, which automatizes the process of gathering the corpus of tweets before a match, and classifies its sentiment to find the probability of a winning game by evaluating the network centrality. Keywords: Graph theory • Machine learning • Sentiment analysis • Social networks • Sports analytics 1.1 Review on Social Network Analysis: Spread Influence Some research studies, as the one developed by Yan, [16] evaluate the influence of users, represented as nodes, on other entities under the Social Network

Automatic Detection of Social Isolation based on Human behavior Analysis

Computación y Sistemas

Social isolation is a problem that is accentuated in the stage of old age. This condition puts th... more Social isolation is a problem that is accentuated in the stage of old age. This condition puts the physical and mental integrity of older adults at risk. This paper presents a predictive model for the automatic detection of social isolation in older adults. The predictive model was implemented in a mobile application that monitors communication and mobility activities performed by an older adult. The mobile application was also generated for a caregiver who is responsible for receiving notifications about the specific level of social isolation of the older adult. The predictive model was evaluated using an experimental group of older adults.

Experimental Large-Scale Jet Flames' Geometrical Features Extraction for Risk Management Using Infrared Images and Deep Learning Segmentation Methods

Jet fires are relatively small and have the least severe effects among the diverse fire accidents... more Jet fires are relatively small and have the least severe effects among the diverse fire accidents that can occur in industrial plants; however, they are usually involved in a process known as the domino effect, that leads to more severe events, such as explosions or the initiation of another fire, making the analysis of such fires an important part of risk analysis. This research work explores the application of deep learning models in an alternative approach that uses the semantic segmentation of jet fires flames to extract main geometrical attributes, relevant for fire risk assessments. A comparison is made between traditional image processing methods and some state-of-the-art deep learning models. It is found that the best approach is a deep learning architecture known as UNet, along with its two improvements, Attention UNet and UNet++. The models are then used to segment a group of vertical jet flames of varying pipe outlet diameters to extract their main geometrical characteris...

On the in vivo recognition of kidney stones using machine learning

arXiv (Cornell University), Jan 21, 2022

Determining the kidney stones type allows urologists to prescribe a treatment to avoid recurrence... more Determining the kidney stones type allows urologists to prescribe a treatment to avoid recurrence of renal lithiasis. An automated in-vivo image-based classification method would be an important step towards an immediate identification of the kidney stone type required as a first phase of the diagnosis. In the literature it was shown on ex-vivo data (i.e., in very controlled scene and image acquisition conditions) that an automated kidney stone classification is indeed feasible. This pilot study compares the kidney stone recognition performances of six shallow machine learning methods and three deep-learning architectures which were tested with in-vivo images of the four most frequent urinary calculi types acquired with an endoscope during standard ureteroscopies. This contribution details the database construction and the design of the tested kidney stones classifiers. Even if the best results were obtained by the Inception v3 architecture (weighted precision, recall and F1-score of 0.97, 0.98 and 0.97, respectively), it is also shown that choosing an appropriate colour space and texture features allows a shallow machine learning method to approach closely the performances of the most promising deep-learning methods (the XGBoost classifier led to weighted precision, recall and F1-score values of 0.96).

Lightweight Low-Resolution Face Recognition for Surveillance Applications

2020 25th International Conference on Pattern Recognition (ICPR), 2021

A Bop and Beyond: A Second Order Optimizer for Binarized Neural Networks

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021

The optimization of Binary Neural Networks (BNNs) relies on approximating the real-valued weights... more The optimization of Binary Neural Networks (BNNs) relies on approximating the real-valued weights with their binarized representations. Current techniques for weightupdating use the same approaches as traditional Neural Networks (NNs) with the extra requirement of using an approximation to the derivative of the sign function-as it is the Dirac-Delta function-for back-propagation; thus, efforts are focused adapting full-precision techniques to work on BNNs. In the literature, only one previous effort has tackled the problem of directly training the BNNs with bit-flips by using the first raw moment estimate of the gradients and comparing it against a threshold for deciding when to flip a weight (Bop). In this paper, we take an approach parallel to Adam which also uses the second raw moment estimate to normalize the first raw moment before doing the comparison with the threshold, we call this method Bop2ndOrder. We present two versions of the proposed optimizer: a biased one and a bias-corrected one, each with its own applications. Also, we present a complete ablation study of the hyperparameters space, as well as the effect of using schedulers on each of them. For these studies, we tested the optimizer in CIFAR10 using the BinaryNet architecture. Also, we tested it in ImageNet 2012 with the XnorNet and BiRealNet architectures for accuracy. In both datasets our approach proved to converge faster, was robust to changes of the hyperparameters, and achieved better accuracy values.

Mejoras al algoritmo de trayectorias densas para el reconocimiento de acciones en video

Research in Computing Science, 2018

Resumen. La habilidad para detectar personas y sus acciones de una manera autónoma y eficiente es... more Resumen. La habilidad para detectar personas y sus acciones de una manera autónoma y eficiente es uno de los objetivos principales de los sistemas inteligentes de video protección. El reconocimiento de acciones es parte importante de ello y en este trabajo exploramos diversas alternativas para mejorar el tiempo de ejecución y exactitud en uno de los métodos más usados: las trayectorias densas. Proponemos sustituir el algoritmo de flujo óptico Farneback por DisOF que permite reducir el tiempo de extracción de trayectorias en un 50%. De igual manera, analizamos la reducción del ruido provocado por las trayectorias no asociadas al objeto de interés mediante la estimación de los puntos anatómicos del cuerpo humano, discriminando más de la mitad de las trayectorias sin sacrificar de manera significativa la exactitud de los resultados. Adicional a esto, exploramos la idea de incorporar las relaciones espaciales entre trayectorias a través del uso de la técnica de pirámide espacial, encontrando que es posible mejorar la eficacia en los resultados. Palabras clave: trayectorias densas, reconocimiento de acciones, visión por computadora, estimación de postura, relaciones espaciales. Improvements to the Dense Trajectories Algorithm for Action Recognition Abstract. The ability to detect people and their actions in an autonomous and efficient way is one of the main objectives of intelligent video-protection systems. Action recognition is one of the most important parts of this kind of systems. In this work, we explore diverse alternatives to improve both the accuracy and execution time in one of the most used methods: dense trajectories. We propose to replace the optical flow algorithm from Farneback to DisOF, our results show that the time needed to extract the dense trajectories is reduced by 50%. Also, we analyze how the noisy trajectories can be reduced by estimating the anatomical points of the human body. In this way, more than half of the total trajectories were eliminated without a significant loss of accuracy. In addition to this, we study how spatial relationships through the spatial pyramid technique 257

Drowsy driver detection using wavelets and support vector machines

Proceedings of the First …, 2006

Surveillance temps–réel des systèmes Homme–Machine. Application à l’assistance à la conduite automobile

Avant Propos Les travaux présentés dans cette thèse sont le résultat des recherches réalisées dan... more Avant Propos Les travaux présentés dans cette thèse sont le résultat des recherches réalisées dans le Laboratoire d'Analyse et d'Architecture des Systèmes du Centre National de la Recherche Scientifique, LAAS-CNRS, dans les groupes de recherche Diagnostic, Supervision et Conduite qualitatifs, DISCO, et Microsystèmes et Intégration des Systèmes, MIS. Je remercie à Monsieur Jean-Claude LAPRIE et à Monsieur Malik GHALLAB, ancien et actuel directeur du LAAS, de m'avoir accueilli dans cet établissement. J'exprime toute ma gratitude et reconnaissance à mon directeur de thèse Monsieur André TITLI, Professeur à l'Institut National des Sciences Appliquées de Toulouse, INSA, et cadre Scientifique au LAAS-CNRS, pour sa confiance et son amitié toute au long de ces années de travail. Sa forme très particulière d'écouter, de discuter et de proposer des idées claires et précises, a su diriger sagement les mouvements de ce jeune chercheur, en me donnant, en même temps, un grand niveau d'autonomie et une manière plus élargie d'apprécier le monde. Je lui remercie également de m'avoir maintenu sur la voie de la représentation à partir de la connaissance, notamment sur les systèmes d'inférence flous, pour lesquels je voyais moins d'intérêt que pour la représentation à partir des données. Au LAAS j'exprime ma reconnaissance envers : Monsieur Joseph AGUILAR-MARTIN, directeur du Laboratoire Européen LEA-SICA et ancien directeur du groupe DISCO, de m'avoir toujours aidé et supporté pour réaliser les séjours en Catalogne et participer aux différentes conférences. Son grand enthousiasme, bonne humeur et disponibilité a crée une bonne ambiance entre nos camarades. Madame Anne-Marie GUE, directeur du groupe MIS, pour m'avoir toujours aidé à effectuer les missions au cours de ces années pour les projets. Madame Louise TRAVE-MASSUYES, directeur du groupe DISCO, de m'avoir également soutenue lors des différentes missions et de m'avoir invité à participer au réseau MONET. Je lui remercie également pour sa disponibilité et ces commentaires par rapport au diagnostic. Monsieur Alfredo SANTANA DIAZ, pour nos collaborations sur la détection de la baisse de la vigilance du conducteur et pour m'avoir introduit au monde des ondelettes. Monsieur Sébastien BEYOU, stagiaire de DEA, pour sa contribution à ce travail de thèse. Mes collègues du groupe DISCO et MIS qui ont contribué, avec son bon humeur à rendre plus convivial mon séjour au LAAS :

Boosting Self-supervised Video-based Human Action Recognition Through Knowledge Distillation

LatinX in AI at Neural Information Processing Systems Conference 2022, Nov 28, 2022

Deep learning architectures lead the state-of-the-art in several computer vision, natural languag... more Deep learning architectures lead the state-of-the-art in several computer vision, natural language processing, and reinforcement learning tasks due to their ability to extract multi-level representations without human engineering. The model's performance is affected by the amount of labeled data used in training. Hence, novel approaches like self-supervised learning (SSL) extract the supervisory signal using unlabeled data. Although SSL reduces the dependency on human annotations, there are still two main drawbacks. First, high-computational resources are required to train a large-scale model from scratch. Second, knowledge from an SSL model is commonly finetuning to a target model, which forces them to share the same parameters and architecture and make it task-dependent. This paper explores how SSL benefits from knowledge distillation in constructing an efficient and non-taskdependent training framework. The experimental design compared the training process of an SSL algorithm trained from scratch and boosted by knowledge distillation in a teacher-student paradigm using the video-based human action recognition dataset UCF101. Results show that knowledge distillation accelerates the convergence of a network and removes the reliance on model architectures.

TYolov5: A Temporal Yolov5 Detector Based on Quasi-Recurrent Neural Networks for Real-Time Handgun Detection in Video

arXiv (Cornell University), Nov 16, 2021

Timely handgun detection is a crucial problem to improve public safety; nevertheless, the effecti... more Timely handgun detection is a crucial problem to improve public safety; nevertheless, the effectiveness of many surveillance systems still depends of finite human attention. Much of the previous research on handgun detection is based on static image detectors, leaving aside valuable temporal information that could be used to improve object detection in videos. To improve the performance of surveillance systems, a real-time temporal handgun detection system should be built. Using Temporal Yolov5, an architecture based on Quasi-Recurrent Neural Networks, temporal information is extracted from video to improve the results of handgun detection. Moreover, two publicly available datasets are proposed, labeled with hands, guns, and phones. One containing 2199 static images to train static detectors, and another with 5960 frames of videos to train temporal modules. Additionally, we explore two temporal data augmentation techniques based on Mosaic and Mixup. The resulting systems are three temporal architectures: one focused in reducing inference with a mAP 50:95 of 55.9, another in having a good balance between inference and accuracy with a mAP 50:95 of 59, and a last one specialized in accuracy with a mAP 50:95 of 60.2. Temporal Yolov5 achieves real-time detection in the small and medium architectures. Moreover, it takes advantage of temporal features contained in videos to perform better than Yolov5 in our temporal dataset, making TYolov5 suitable for real-world applications. The source code is publicly available at https://github.com/MarioDuran/TYolov5.

Experimental large-scale jet flames’ geometrical features extraction for risk management using infrared images and deep learning segmentation methods

Journal of Loss Prevention in the Process Industries

Jet fires are relatively small and have the least severe effects among the diverse fire accidents... more Jet fires are relatively small and have the least severe effects among the diverse fire accidents that can occur in industrial plants; however, they are usually involved in a process known as the domino effect, that leads to more severe events, such as explosions or the initiation of another fire, making the analysis of such fires an important part of risk analysis. This research work explores the application of deep learning models in an alternative approach that uses the semantic segmentation of jet fires flames to extract the flame's main geometrical attributes, relevant for fire risk assessments. A comparison is made between traditional image processing methods and some state-of-the-art deep learning models. It is found that the best approach is a deep learning architecture known as UNet, along with its two improvements, Attention UNet and UNet++. The models are then used to segment a group of vertical jet flames of varying pipe outlet diameters to extract their main geometrical characteristics. Attention UNet obtained the best general performance in the approximation of both height and area of the flames, while also showing a statistically significant difference between it and UNet++. UNet obtained the best overall performance for the approximation of the lift-off distances; however, there is not enough data to prove a statistically significant difference between Attention UNet and UNet++. The only instance where UNet++ outperformed the other models, was while obtaining the lift-off distances of the jet flames with 0.01275 m pipe outlet diameter. In general, the explored models show good agreement between the experimental and predicted values for relatively large turbulent propane jet flames, released in sonic and subsonic regimes; thus, making these radiation zones segmentation models, a suitable approach for different jet flame risk management scenarios.

Action recognition by key trajectories

Pattern Analysis and Applications, 2022

Human action recognition is an active field of research that intends to explain what a subject is... more Human action recognition is an active field of research that intends to explain what a subject is doing in an input video. Deep learning architectures serve as the foundation for cutting-edge approaches. Recent research, on the other hand, indicates that hand-crafted characteristics are complementary and, when combined, can enhance classification accuracy. Cutting-edge approaches are based on deep learning architectures. Recent research, however, indicates that hand-crafted features complement each other and can help boost classification accuracy when combined. We introduce the key trajectories approach that is based on the popular, hand-crafted method, improved dense trajectories. Our work explores how pose estimation can be used to find meaningful key points to reduce computational time, undesired noise, and to guarantee a stable frame processing rate. Furthermore, we tested how feature-tracking behaves with dense inverse search and with a frame to frame subject key point estimati...

Comparing Machine Learning Based Segmentation Models on Jet Fire Radiation Zones

Advances in Computational Intelligence, 2021

Risk assessment is relevant in any workplace, however, there is a degree of unpredictability when... more Risk assessment is relevant in any workplace, however, there is a degree of unpredictability when dealing with flammable or hazardous materials so that detection of fire accidents by itself may not be enough. An example of this is the impingement of jet fires, where the heat fluxes of the flame could reach nearby equipment and dramatically increase the probability of a domino effect with catastrophic results. Because of this, the characterization of such fire accidents is important from a risk management point of view. One such characterization would be the segmentation of different radiation zones within the flame, so this paper presents exploratory research regarding several traditional computer vision and Deep Learning segmentation approaches to solving this specific problem. A data set of propane jet fires is used to train and evaluate the different approaches and given the difference in the distribution of the zones and background of the images, different loss functions, that seek to alleviate data imbalance, are also explored. Additionally, different metrics are correlated to a manual ranking performed by experts to make an evaluation that closely resembles the expert's criteria. The Hausdorff Distance and Adjusted Rand Index were the metrics with the highest correlation and the best results were obtained from the UNet architecture with a Weighted Cross-Entropy Loss. These results can be used in future research to extract more geometric information from the segmentation masks or could even be implemented on other types of fire accidents.

A Study on the Performance of Unconstrained Very Low Resolution Face Recognition: Analyzing Current Trends and New Research Directions

IEEE Access, 2021

In the past decade, research in the face recognition area has advanced tremendously, particularly... more In the past decade, research in the face recognition area has advanced tremendously, particularly in uncontrolled scenarios (face recognition in the wild). This advancement has been achieved partly due to the massive popularity and effectiveness of deep convolutional neural networks and the availability of larger unconstrained datasets. However, several face recognition challenges remain in the context of very low resolution homogeneous (same domain) and heterogeneous (different domain) face recognition. In this survey, we study the seminal and novel methods to tackle the very low resolution face recognition problem and provide an in-depth analysis of their design, effectiveness, and efficiency for a real-time surveillance application. Furthermore, we analyze the advantage of employing deep learning convolutional neural networks, while presenting future research directions for effective deep learning network design in this context.

ShuffleFaceNet: A Lightweight Face Architecture for Efficient and Highly-Accurate Face Recognition

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019

The recent success of convolutional neural networks has led to the development of a variety of ne... more The recent success of convolutional neural networks has led to the development of a variety of new effective and efficient architectures. However, few of them have been designed for the specific case of face recognition. Inspired on the state-of-the-art ShuffleNetV2 model, a lightweight face architecture is presented in this paper. The proposal, named ShuffleFaceNet, introduces significant modifications in order to improve face recognition accuracy. First, the Global Average Pooling layer is replaced by a Global Depth-wise Convolution layer, and Parametric Rectified Linear Unit is used as a non-linear activation function. Under the same experimental conditions, ShuffleFaceNet achieves significantly superior accuracy than the original ShuffleNetV2, maintaining the same speed and compact storage. In addition, extensive experiments conducted on three challenging benchmark face datasets, show that our proposal improves not only state-of-the-art lightweight models but also very deep face recognition models.

Recopilación de bases de datos de estacionamientos para aplicaciones en visión computacional

Research in Computing Science, 2018

Resumen. Un estacionamiento es un ambiente muy bien estructurado en donde usualmente los sistemas... more Resumen. Un estacionamiento es un ambiente muy bien estructurado en donde usualmente los sistemas de vigilancia se han enfocado. Sin embargo, el conocimiento previo de la estructura del estacionamiento es muchas veces ignorado por los investigadores que hacen uso de las bases de datos tradicionales para entrenar sus algoritmos. Inclusive que estos algoritmos sean correctos y completos, los modelos con los que han sido entrenados o comparados usando este tipo de datos tienden a quedar muy atrás o presentar una naturaleza engañosa. En este artículo proponemos un enfoque basado en tareas, en el que cuidadosamente desglosamos la compleja tarea de detectar comportamientos en estacionamientos entre partes mucho más tratables. Luego, por cada parte proponemos una serie de bases de datos actualmente disponibles en la literatura que pueden ayudar a dominar el problema, cada una desde una perspectiva diferente. Una de las mayores referencias de este artículo ha sido el trabajo de [5] en el que un enfoque mucho más amplio sobre conducción autómata fue tomado. Palabras clave: visión computacional en estacionamientos, detección de objetos al aire libre, seguimiento de objetos al aire libre, seguimiento de vehículos, bases de datos para estacionamientos, estacionamientos.

Deep Learning System for Vehicular Re-Routing and Congestion Avoidance

Applied Sciences, 2019

Delays in transportation due to congestion generated by public and private transportation are com... more Delays in transportation due to congestion generated by public and private transportation are common in many urban areas of the world. To make transportation systems more efficient, intelligent transportation systems (ITS) are currently being developed. One of the objectives of ITS is to detect congested areas and redirect vehicles away from them. However, most existing approaches only react once the traffic jam has occurred and, therefore, the delay has already spread to more areas of the traffic network. We propose a vehicle redirection system to avoid congestion that uses a model based on deep learning to predict the future state of the traffic network. The model uses the information obtained from the previous step to determine the zones with possible congestion, and redirects the vehicles that are about to cross them. Alternative routes are generated using the entropy-balanced k Shortest Path algorithm (EBkSP). The proposal uses information obtained in real time by a set of prob...

The Traffic Status and Pollutant Status Ontologies for the Smart City Domain

Advances in Computational Intelligence, 2018

Vocabulary must be well defined to promote Syntactical and Semantic interoperability of cloud-bas... more Vocabulary must be well defined to promote Syntactical and Semantic interoperability of cloud-based IoT (Internet of Things) architectures in order to develop applications for Smart City environments. Ontologies are used to represent knowledge within a domain, through them, it is possible to define and classify things, actions, features and relations among other aspects. This work describes the development of two ontologies, these are for traffic status and pollution status.

Robust Parking Block Segmentation from a Surveillance Camera Perspective

Applied Sciences, 2020

Parking block regions host dangerous behaviors that can be detected from a surveillance camera pe... more Parking block regions host dangerous behaviors that can be detected from a surveillance camera perspective. However, these regions are often occluded, subject to ground bumpiness or steep slopes, and thus they are hard to segment. Firstly, the paper proposes a pyramidal solution that takes advantage of satellite views of the same scene, based on a deep Convolutional Neural Network (CNN). Training a CNN from the surveillance camera perspective is rather impossible due to the combinatory explosion generated by multiple point-of-views. However, CNNs showed great promise on previous works over satellite images. Secondly, even though there are many datasets for occupancy detection in parking lots, none of them were designed to tackle the parking block segmentation problem directly. Given the lack of a suitable dataset, we also propose APKLOT, a dataset of roughly 7000 polygons for segmenting parking blocks from the satellite perspective and from the camera perspective. Moreover, our meth...

Predicting Soccer Results Through Sentiment Analysis: A Graph Theory Approach

More than four out of 10 sports fans consider themselves soccer fans, making the game the world's... more More than four out of 10 sports fans consider themselves soccer fans, making the game the world's most popular sport. Sports are season based and constantly changing over time, as well, statistics vary according to the sport and league. Understanding sports communities in Social Networks and identifying fan's expertise is a key indicator for soccer prediction. This research proposes a Machine Learning Model using polarity on a dataset of 3,000 tweets taken during the last game week on English Premier League season 19/20. The end goal is to achieve a flexible mechanism, which automatizes the process of gathering the corpus of tweets before a match, and classifies its sentiment to find the probability of a winning game by evaluating the network centrality. Keywords: Graph theory • Machine learning • Sentiment analysis • Social networks • Sports analytics 1.1 Review on Social Network Analysis: Spread Influence Some research studies, as the one developed by Yan, [16] evaluate the influence of users, represented as nodes, on other entities under the Social Network

Automatic Detection of Social Isolation based on Human behavior Analysis

Computación y Sistemas

Social isolation is a problem that is accentuated in the stage of old age. This condition puts th... more Social isolation is a problem that is accentuated in the stage of old age. This condition puts the physical and mental integrity of older adults at risk. This paper presents a predictive model for the automatic detection of social isolation in older adults. The predictive model was implemented in a mobile application that monitors communication and mobility activities performed by an older adult. The mobile application was also generated for a caregiver who is responsible for receiving notifications about the specific level of social isolation of the older adult. The predictive model was evaluated using an experimental group of older adults.