Papers by Javier Gonzalez-Jimenez
Lecture notes in networks and systems, Nov 19, 2022
This work presents GadenTools, a toolkit designed to ease the development and integration of mobi... more This work presents GadenTools, a toolkit designed to ease the development and integration of mobile robotic olfaction applications by enabling a convenient and user-friendly access to Gaden's realistic gas dispersion simulations. It is based on an easy-to-use Python API, and includes an extensive tutorial developed with Jupyter Notebook and Google Colab technologies. A detailed set of examples illustrates aspects ranging from basic access to sensory data or the generation of groundtruth images, to the more advanced implementation of plume tracking algorithms, all in an online web-editor with no installation requirements. All the resources, including the source code, are made available in an online open repository.

Actas de las XXXVII Jornadas de Automática 7, 8 y 9 de septiembre de 2016, Madrid, Feb 25, 2022
En este trabajo se propone un método que combina descriptores de imágenes de intensidad y de prof... more En este trabajo se propone un método que combina descriptores de imágenes de intensidad y de profundidad para detectar de manera robusta el problema de cierre de bucle en SLAM. La robustez del método, proporcionada por el empleo conjunto de información de diversa naturaleza, permite detectar lugares revisitados en situaciones donde métodos basados solo en intensidad o en profundidad presentan dificultades (e.g. condiciones de iluminación deficientes, o falta de geometría). Además, se ha diseñado el método teniendo en cuenta su eficiencia, recurriendo para ello al detector FAST para extraer las características de las observaciones y al descriptor binario BRIEF. La detección de bucle se completa con una Bolsa de Palabras binarias. El rendimiento del método propuesto se ha evaluado en condiciones reales, obteniéndose resultados muy satisfactorios.
h i g h l i g h t s • Anchoring system keeps track of objects for robot planning, learning and ex... more h i g h l i g h t s • Anchoring system keeps track of objects for robot planning, learning and execution. • Builds online graph-based 3D world model of objects in the environment. • Can integrate arbitrary object recognition method. • Exploits context between objects to improve object recognition results.
Frontiers in artificial intelligence and applications, 2018

Springer eBooks, 2016
The suitable operation of mobile robots when providing Ambient Assisted Living (AAL) services cal... more The suitable operation of mobile robots when providing Ambient Assisted Living (AAL) services calls for robust object recognition capabilities. Probabilistic Graphical Models (PGMs) have become the de-facto choice in recognition systems aiming to efficiently exploit contextual relations among objects, also dealing with the uncertainty inherent to the robot workspace. However, these models can perform in an incoherent way when operating in a long-term fashion out of the laboratory, e.g. while recognizing objects in peculiar configurations or belonging to new types. In this work we propose a recognition system that resorts to PGMs and common-sense knowledge, represented in the form of an ontology, to detect those inconsistencies and learn from them. The utilization of the ontology carries additional advantages, e.g. the possibility to verbalize the robot's knowledge. A primary demonstration of the system capabilities has been carried out with very promising results.

This article deals with the problem of room categorization, i.e. the classification of a room as ... more This article deals with the problem of room categorization, i.e. the classification of a room as being a bathroom, kitchen, living-room, bedroom, etc., by an autonomous robot operating in home environments. For that, we propose a room categorization system based on a Bayesian probabilistic framework that combines object detections and its semantics. For detecting objects we resort to a state-of-theart CNN, Mask R-CNN, while the meaning or semantics of those detections is provided by an ontology. Such an ontology encodes the relations between object and room categories, that is, in which room types the different object categories are typically found (toilets in bathrooms, microwaves in kitchens, etc.). The Bayesian framework is in charge of fusing both sources of information and providing a probability distribution over the set of categories the room can belong to. The proposed system has been evaluated in houses from the Robot@Home dataset, validating its effectiveness under real-world conditions. CCS CONCEPTS • Computing methodologies → Cognitive robotics; Probabilistic reasoning; Ontology engineering.

In general, the problems of objects' and rooms' categorizations for robotic applications ... more In general, the problems of objects' and rooms' categorizations for robotic applications have been addressed separately. The current trend is, however, towards a joint modelling of both issues in order to leverage their mutual contextual relations: object → room (e.g. the detection of a microwave indicates that the room is likely to be a kitchen), and room → object (e.g. if the robot is in a bathroom, it is probable to find a toilet). Probabilistic Graphical Models (PGMs) are typically employed to conveniently cope with such relations, relying on inference processes to hypothesize about objects' and rooms' categories. In this work we present a Conditional Random Field (CRF) model, a particular type of PGM, to jointly categorize objects and rooms from RGBD images exploiting object-object and object-room relations. The learning phase of the proposed CRF uses Human Knowledge (HK) to eliminate the necessity of gathering real training data. Concretely, HK is acquired through elicitation and codified into an ontology, which is exploited to effortless generate an arbitrary number of representative synthetic samples for training. The performance of the proposed CRF model has been assessed using the NYU2 dataset, achieving a success of ~ 70% categorizing both, objects and rooms.

Lecture Notes in Computer Science, 2019
In robotics, semantic mapping refers to the construction of a rich representation of the environm... more In robotics, semantic mapping refers to the construction of a rich representation of the environment that includes high level information needed by the robot to accomplish its tasks. Building a semantic map requires algorithms to process sensor data at different levels: geometric, topological and object detections/categories, which must be integrated into an unified model. This paper describes a robotic architecture that successfully builds such semantic maps for indoor environments. For this purpose, within a ROS-based ecosystem, we apply a state-of-the-art Convolutional Neural Network (CNN), concretely YOLOv3, for detecting objects in images. The detection results are placed within a geometric map of the environment making use of a number of modules of the architecture: robot localization, camera extrinsic calibration, data form a depth camera, etc. We demonstrate the suitability of the proposed framework by building semantic maps of several home environments from the Robot@Home dataset, using Unity 3D as a tool to visualize the maps as well as to provide future robotic developments.

IEEE robotics and automation letters, Jul 1, 2019
In order to fuse measurements from multiple sensors mounted on a mobile robot, it is needed to ex... more In order to fuse measurements from multiple sensors mounted on a mobile robot, it is needed to express them in a common reference system through their relative spatial transformations. In this paper, we present a method to estimate the full 6DoF extrinsic calibration parameters of multiple heterogeneous sensors (Lidars, Depth and RGB cameras) suitable for automatic execution on a mobile robot. Our method computes the 2D calibration parameters (x, y, yaw) through a motion-based approach, while for the remaining 3 parameters (z, pitch, roll) it requires the observation of the ground plane for a short period of time. What set this proposal apart from others is that: i) all calibration parameters are initialized in closed form, and ii) the scale ambiguity inherent to motion estimation from a monocular camera is explicitly handled, enabling the combination of these sensors and metric ones (Lidars, stereo rigs, etc.) within the same optimization framework. We provide a formal definition of the problem, as well as of the contributed method, for which a C++ implementation has been made publicly available. The suitability of the method has been assessed in simulation an with real data from indoor and outdoor scenarios. Finally, improvements over state-of-the-art motion-based calibration proposals are shown through experimental evaluation.

arXiv (Cornell University), Apr 18, 2023
Gas source localization (GSL) with an autonomous robot is a problem with many prospective applica... more Gas source localization (GSL) with an autonomous robot is a problem with many prospective applications, from finding pipe leaks to emergency-response scenarios. In this work, we present a new method to perform GSL in realistic indoor environments, featuring obstacles and turbulent flow. Given the highly complex relationship between the source position and the measurements available to the robot (the single-point gas concentration, and the wind vector) we propose an observation model that derives from contrasting the online, real-time simulation of the gas dispersion from any candidate source localization against a gas concentration map built from sensor readings. To account for a convenient and grounded integration of both into a probabilistic estimation framework, we introduce the concept of probabilistic gas-hit maps, which provide a higher level of abstraction to model the time-dependent nature of gas dispersion. Results from both simulated and real experiments show the capabilities of our current proposal to deal with source localization in complex indoor environments.
Computer Vision and Image Understanding, Dec 1, 2022
Lecture Notes in Computer Science, 2021

INTED proceedings, Mar 1, 2019
This paper presents our experiences towards a tutorial on a hot Computer Vision problem, object r... more This paper presents our experiences towards a tutorial on a hot Computer Vision problem, object recognition employing Machine Learning techniques, which we consider that can be a resource didactically interesting for both practitioners' and lecturers' communities. The recognition of objects is an innate ability of human beings. When we look at a picture, we are able to effortlessly detect elements like animals, signals, objects of interest, etc. In the Computer Vision field, this process is carried out by Machine Learning tools, aiming to retrieve information about the content of an image. The possible applications of object recognition are numerous, to name a few: image panoramas, robot localization, face detection/recognition, autonomous driving, or pedestrian detection. Given the relevance of this problem, numerous approaches have been explored to address it. Among the most popular techniques in both, real applications and academia, we can find those recognizing objects by means of hand-crafted features. Typical features are descriptors of their keypoints extracted according to a certain criterion (e.g. Scale-Invariant Feature Transform, SIFT), or those geometrically (size, orientation, shape, etc.) or visually (colour, texture, etc.) describing the regions to be recognized. The number of software tools developed to carry out this task is large, including wellknown software libraries as OpenCV or scikit-learn. However, object recognition is far from being a one-step task, and most of them lack a comprehensive step-by-step description of the pipeline needed for accomplishing it. This pipeline usually includes the management and analysis of the data used to train the recognition model, a pre-processing of such data to convert them into an usable form, the fitting of the model and, finally, its validation. The tutorial presented in this work encompasses those steps, giving detailed information and hints about how to complete each one. It is accompanied by a public implementation of the object recognition pipeline (https://github.com/jotaraul/object_recognition_in_python) using trendy python tools (Pandas, seaborn and scikit-learn), so it can be profitable by Computer Vision practitioners and lecturers aiming to gain insight/illustrate good practices for the design of a successful object recognition system. It is also open to any contribution from those communities. Finally, we also provide some directions and experiences regarding its utilization in academia.
Computer Vision and Pattern Recognition, 2017

Visual or image-based self-localization refers to the recovery of a camera's position and orienta... more Visual or image-based self-localization refers to the recovery of a camera's position and orientation in the world based on the images it records. In this paper, we deal with the problem of self-localization using a sequence of images. This application is of interest in settings where GPS-based systems are unavailable or imprecise, such as indoors or in dense cities. Unlike typical approaches, we do not restrict the problem to that of sequence-to-sequence or sequence-to-graph localization. Instead, the image sequences are localized in an image database consisting on images taken at known locations, but with no explicit ordering. We build upon the Gaussian Process Particle Filter framework, proposing two improvements that enable localization when using databases covering large areas: 1) an approximation to Gaussian Process regression is applied, allowing execution on large databases. 2) we introduce appearance-based particle sampling as a way to combat particle deprivation and bad initialization of the particle filter. Extensive experimental validation is performed using two new datasets which are made available as part of this publication.

Lecture Notes in Computer Science, 2019
Depth cameras, typically in RGB-D configurations, are common devices in mobile robotic platforms ... more Depth cameras, typically in RGB-D configurations, are common devices in mobile robotic platforms given their appealing features: high frequency and resolution, low price and power requirements, among others. These sensors may come with significant, non-linear errors in the depth measurements that jeopardize robot tasks, like free-space detection, environment reconstruction or visual robot-human interaction. This paper presents a method to calibrate such systematic errors with the help of a second, more precise range sensor, in our case a radial laser scanner. In contrast to what it may seem at first, this does not mean a serious limitation in practice since these two sensors are often mounted jointly in many mobile robotic platforms, as they complement well each other. Moreover, the laser scanner can be used just for the calibration process and get rid of it after that. The main contributions of the paper are: i) the calibration is formulated from a probabilistic perspective through a Maximum Likelihood Estimation problem, and ii) the proposed method can be easily executed automatically by mobile robotic platforms. To validate the proposed approach we evaluated for both, local distortion of 3D planar reconstructions and global shifts in the measurements, obtaining considerably more accurate results. A C++ open-source implementation of the presented method has been released for the benefit of the community.

IEEE robotics and automation letters, Oct 1, 2017
This work addresses the fundamental problem of pose graph optimization (PGO), which is pervasive ... more This work addresses the fundamental problem of pose graph optimization (PGO), which is pervasive in the context of SLAM, and widely known as <inline-formula><tex-math notation="LaTeX">$\text{SE}(d)$</tex-math></inline-formula> -synchronization in the mathematical community. Our contribution is twofold. First, we provide a novel, elegant, and compact matrix formulation of the maximum likelihood estimation (MLE) for this problem, drawing interesting connections with the connection Laplacian of a graph object. Second, even though the MLE problem is nonconvex and computationally intractable in general, we exploit recent advances in convex relaxations of PGO and Riemannian techniques for low-rank optimization to yield an <italic>a posteriori certifiably globally optimal</italic> algorithm [A. Bandeira, “A note on probably certifiably correct algorithms,” <italic>Comptes Rendus Mathematique </italic>, vol. 354, pp. 329–333, 2016.] that is also <italic>fast</italic> and <italic>scalable</italic>. This work builds upon a fairly demanding mathematical machinery, but beyond the theoretical basis presented, we demonstrate its performance through extensive experimentation in common large-scale SLAM datasets. The proposed framework, <monospace>Cartan-Sync</monospace>, is up to one order of magnitude faster that the state-of-the-art <monospace>SE-Sync </monospace> [D. M. Rosen <italic>et al.</italic> “A certifiably correct algorithm for synchronization over the special Euclidean group,” in <italic>Proc. Int. Workshop Algorithmic Found. Robot.</italic>, 2016.] in some important scenarios (e.g., the KITTI dataset). We make the code for <monospace>Cartan-Sync</monospace> available at <uri>bitbucket.org/jesusbriales/cartan-sync</uri>, along with some examples and guides for a friendly use by researchers in the field, hoping to promote further adoption and exploitation of these techniques in the robotics community.
Most approaches to visual odometry estimates the camera motion based on point features, consequen... more Most approaches to visual odometry estimates the camera motion based on point features, consequently, their performance deteriorates in lowtextured scenes where it is difficult to find a reliable set of them. This paper extends a popular semi-direct approach to monocular visual odometry known as SVO [1] to work with line segments, hence obtaining a more robust system capable of dealing with both textured and structured environments. The proposed odometry system allows for the fast tracking of line segments since it eliminates the necessity of continuously extracting and matching features between subsequent frames. The method, of course, has a higher computational burden than the original SVO, but it still runs with frequencies of 60Hz on a personal computer while performing robustly in a wider variety of scenarios.
IEEE robotics and automation letters, Apr 1, 2023
Uploads
Papers by Javier Gonzalez-Jimenez