Journal Papers by Caleb Rascon
The Journal of the Acoustical Society of America, 2018
The Acoustic Interactions for Robot Audition corpus is introduced for research on sound source lo... more The Acoustic Interactions for Robot Audition corpus is introduced for research on sound source localization and separation, and for multiuser speech recognition. Its aim is to evaluate and train Robot Audition techniques, as well as Auditory Scene Analysis in general. It was recorded in six real-life environments with different noise presence and reverberation time, using two array configurations: an equilateral triangle, and a three-dimensional 16-microphone array set over a hollow plastic body. It includes clean speech data for static sources and tracking information for mobile sources. It is freely available at https://aira.iimas.unam.mx/.

Journal of Intelligent & Fuzzy Systems , 2018
In this paper a strategy for incorporating a flexible and reliable high-level inference module in... more In this paper a strategy for incorporating a flexible and reliable high-level inference module in service robots is presented. This module is a part of the robot's cognitive architecture which coordinates perception, inference and action within the robot's communication and interaction cycle. The present approach relies on an explicit representation of the structure of the task performed by the robot. There are three kinds of inferences that the robot can use opportunistically along the task: (1) diagnosis, (2) decision making and (3) planning; each kind can be used in specific situations of the task structure or performed in arbitrary situations with recovery purposes when there is an interaction failure. In this latter case the three kinds of inference are performed sequentially in what we call the daily-life inference cycle. The inference cycle allows the incorporation of basic emotions in the robot's behavior. A case study incorporating these functionalities in the robot Golem-III is presented. The paper is concluded with a reflection on the opportunistic use of inference schemes to support flexible and robust behavior, including the expression of emotions, in service robots.
Robotics and Autonomous Systems, 2017
Sound source localization (SSL) in a robotic platform has been essential in the overall scheme of... more Sound source localization (SSL) in a robotic platform has been essential in the overall scheme of robot audition. It allows a robot to locate a sound source by sound alone. It has an important impact on other robot audition modules, such as source separation, and it enriches human–robot interaction by complementing the robot's perceptual capabilities. The main objective of this review is to thoroughly map the current state of the SSL field for the reader and provide a starting point to SSL in robotics. To this effect, we present: the evolution and historical context of SSL in robotics; an extensive review and classification of SSL techniques and popular tracking methodologies; different facets of SSL as well as its state-of-the-art; evaluation methodologies used for SSL; and a set of challenges and research motivations.
EURASIP Journal on Audio, Speech, and Music Processing, 2015
Estimating the directions of arrival (DOAs) of multiple simultaneous mobile sound sources is an i... more Estimating the directions of arrival (DOAs) of multiple simultaneous mobile sound sources is an important step for various audio signal processing applications. In this contribution, we present an approach that improves upon our previous work that is now able to estimate the DOAs of multiple mobile speech sources, while being light in resources, both hardware-wise (only using three microphones) and software-wise. This approach takes advantage of the fact that simultaneous speech sources do not completely overlap each other. To evaluate the performance of this approach, a multi-DOA estimation evaluation system was developed based on a corpus collected from different acoustic scenarios named Acoustic Interactions for Robot Audition (AIRA).
Lecture Notes in Electrical Engineering, 2013
Knowledge of how many users are there in the environment, and where they are located is essential... more Knowledge of how many users are there in the environment, and where they are located is essential for natural and efficient Human-Robot Interaction (HRI). However, carrying out the estimation of multiple Directions-of-Arrival (multi-DOA) on a mobile robotic platform involves a greater challenge as the mobility of the service robot needs to be considered when proposing a solution. This needs to strike a balance with the performance of the DOA estimation, specifically the amount of users the system can detect, which is usually limited by the amount of microphones used. In this contribution, an appropriately carriable small and lightweight hardware system (based on a 3-microphone triangular system) is used, and a fast multi-DOA estimator is proposed that is able to estimate more users than the number of microphones employed.

We present the use of direction of arrival (DOA) of sound sources as an index during the interact... more We present the use of direction of arrival (DOA) of sound sources as an index during the interaction between humans and service robots. These indices follow the notion defined by the theory of interpretation of signs by Peirce. This notion establishes a strong physical relation between signs (DOAs) and objects being signified in specific contexts. With this in mind, we have modeled the call at a distance to a robot as indexical in nature. These indices can be later interpreted as the position of the user and the user herself/himself. The relation between the call and the emitter is formalized in our framework of development of service robots based on the SitLog programming language. In particular, we create a set of behaviours based on direction of arrival information to be used in the programming of tasks for service robots. Based on these behaviours, we have implemented four tasks which heavily rely on them: following a person, taking attendance of a class, playing Marco-Polo, and acting as a waiter in a restaurant.
Applied Spectroscopy, 2009
Frequency displacement, or spectral shift, is commonly observed in industrial spectral measuremen... more Frequency displacement, or spectral shift, is commonly observed in industrial spectral measurements. It can be caused by many factors such as sensor de-calibration or by external influences, which include changes in temperature. The presence of frequency displacement in spectral measurements can cause difficulties when statistical techniques, such as independent component analysis (ICA), are used to analyze it. Using simulated spectral measurements, this paper initially highlights the effect that frequency displacement has on ICA. A post-processing technique, employing particle swarm optimization (PSO), is then proposed that enables ICA to become robust to frequency displacement in spectral measurements. The capabilities of the proposed approach are illustrated using several simulated examples and using tablet data from a pharmaceutical application.

International Journal of Advanced Robotic Systems, 2015
In this paper, we present a concept of service robot and a framework for its functional specifica... more In this paper, we present a concept of service robot and a framework for its functional specification and implementation. The present discussion is grounded in Newell’s system levels hierarchy which suggests organizing robotics research in three different layers, corresponding to Marr’s computational, algorithmic and implementation levels, as follows: (1) the service robot proper, which is the subject of the present paper, (2) perception and action algorithms, and (3) the systems programming level. The concept of a service robot is articulated in practice through the introduction of a conceptual model for particular service robots; this consists of the specification of a set of basic robotic behaviours and a number of mechanisms for assembling such behaviours during the execution of complex tasks. The model involves an explicit representation of the task structure, allowing for deliberative reasoning and task management. The model also permits distinguishing between a robot’s competence and performance, along the lines of Chomsky’s corresponding distinction. We illustrate how this model can be realized in practice with two composition modes that we call static and dynamic; these are illustrated with the Restaurant Test and the General Purpose Service Robot Test of the RoboCup@Home competition, respectively. The present framework and methodology has been implemented in the robot Golem-II+, which is also described. The paper is concluded with an overall reflection upon the present concept of a service robot and its associated functional specifications, and the potential impact of such a conceptual model in the study, development and application of service robots in general.

International Journal of Advanced Robotic Systems, 2013
In this paper we present SitLog: a declarative situation-oriented logical language for programmin... more In this paper we present SitLog: a declarative situation-oriented logical language for programming situated service robot tasks. The formalism is task and domain independent, and can be used in a wide variety of settings. SitLog can also be seen as a behaviour engineering specification and interpretation formalism to support action selection by autonomous agents during the execution of complex tasks. The language combines the recursive transition network formalism, extended with functions to express dynamic and contextualized task structures, with a functional language to express control and content information. The SitLog interpreter is written in Prolog and SitLog’s programs follow closely the Prolog notation, permitting the declarative specification and direct interpretation of complex applications in a modular and compact form.We discuss the structure and representation of service robot tasks in practical settings and how these can be expressed in SitLog. The present framework has been tested in the service robot Golem-II+ using the specification and programming of the typical tasks which require completion in the RoboCup@Home Competition.
Conference Papers by Caleb Rascon

International Conference on Unmanned Aircraft Systems (ICUAS), 2018
This paper presents the first version of the AIRA-UAS corpus. It is a set of recordings produced ... more This paper presents the first version of the AIRA-UAS corpus. It is a set of recordings produced by the ego-noise of an Unmanned Aerial Vehicle (UAV) performing different aerial maneuvers. We also recorded audios produced by other drones flying near the UAV capturing the audio signals on board. The aim of this corpus is to provide an evaluation mechanism for sound source localization and separation algorithms, where the sound data capture process is carried out on board an UAV. We argue that this corpus will be useful for the development of UAV applications focusing on search & rescue operations as well as for detection of unauthorized drone operation. In addition, we also argue that our corpus may prove useful to assess the impact level at which the noise produced by drones affects the welfare of human beings and wildlife.
Knowledge of how many users are there in the environment, and where they are located is essential... more Knowledge of how many users are there in the environment, and where they are located is essential for natural and efficient Human-Robot Interaction (HRI). However, carrying out the estimation of multiple Directions-of-Arrival (multiDOA) on a mobile robotic platform involves a greater challenge as the mobility of the service robot needs to be considered when proposing a solution. This needs to strike a balance with the performance of the DOA estimation, specifically the amount of users the system can detect, which is usually limited by the amount of microphones used. In this paper, a lightweight hardware system (based on a 3-microphone triangular system) is used, and a fast multi-DOA estimator is proposed that is able to estimate more users than the number of microphones employed.
Lecture Notes in Computer Science, 2010
The orientation of conversational robots to face their interlocutors is essential for natural and... more The orientation of conversational robots to face their interlocutors is essential for natural and efficient Human-Robot Interaction (HRI). In this paper, progress towards this objective is presented: a service robot able to detect the direction of a user, and orient itself towards him/her, in a complex auditive environment, using only voice and a 3-
microphone system. This functionality is integrated within Spoken HRI using dialogue models and a cognitive architecture. The paper further discusses applications where robotic orientation benefits HRI, such as a tour-guide robot capable to guide a poster session and a “Marco Polo” game where a robot aims to follow a user purely by sound.
The ability to use spectral data within a control loop is beginning to be considered in many area... more The ability to use spectral data within a control loop is beginning to be considered in many areas, particularly in the Pharmaceutical Industry. However, typical spectral analysis tools, such as Classical Least Squares, are very fragile when handling frequency shifts which may occur in spectral measuring devices as a result of poor calibration or external influences. This paper shows that Particle Swarm Optimisation can be used to offset the effect of shift in measured spectra and improve the performance of any control system which may use this measurement.
A possible solution for the current rate of animal extinction in the world is the use of new tech... more A possible solution for the current rate of animal extinction in the world is the use of new technologies in their monitoring in order to tackle problems in the reduction of their populations in a timely manner. In this work we present a system for the identification of the Turdus migratorius bird species based on their singing. The core of the system is based on turn-level features extracted from the audio signal of the bird songs. These features were adapted from the recognition of human emotion in speech, which are based on Support Vector Machines. The resulting system is a prototype module of acoustic identification of birds which goal is to monitor birds in their environment, and, in the future, estimate their populations.
Lecture Notes in Computer Science, 2013
In this work, we present the speech recognition module of a service robot that performs various t... more In this work, we present the speech recognition module of a service robot that performs various tasks, such as being a host party, receiving multiple commands or giving a tour guide. These tasks take place in diverse acoustic environments, e.g., a home or a supermarket, in which speech is one of the main modalities of interaction. Our approach relies on three strategies: 1) making the recognizer aware of the task context, 2) providing prompting strategies to guide the recognition, and 3) calibrating the audio setting specific to the environment. We provide an evaluation with recordings from real interactions with a service robot in different environments.

Lecture Notes in Computer Science, 2010
In this paper, we present the development of a tour–guide
robot that conducts a poster session t... more In this paper, we present the development of a tour–guide
robot that conducts a poster session through spoken Spanish. The robot is able to navigate around its environment, visually identify informational posters, and explain sections of the posters that users request via pointing gestures. We specify the task by means of dialogue models. A dialogue model defines conversational situations, expectations and robot actions. Dialogue models are integrated into a novel cognitive architecture that allow us to coordinate both human–robot interaction and robot capabilities in a flexible and simple manner. Our robot also incorporates a confidence score on visual outcomes, the history of the conversation and error prevention strategies. Our initial evaluation of the dialogue structure shows the reliability of the overall approach, and the suitability of our dialogue model and architecture to represent complex human–robot interactions, with promising results.
Independent Component Analysis (ICA) is widely used
for Blind Source Separation in generic spect... more Independent Component Analysis (ICA) is widely used
for Blind Source Separation in generic spectra which
are themselves obtained from sensors that can be decalibrated or are too sensitive to ambience changes. This
usually results in frequency displacement or lag that ICA
will face during its source extraction. Experiments were
done that show that ICA is not well-equipped to handle
such displacement, and that it only is able to extract the
same components as before being lagged given only an
insignificant amount of displacement. Other experiments
showed that the amount of lag that ICA can handle varies
depending on the width of the components intended to be
extracted.
Papers by Caleb Rascon

Sensors
Although a significant amount of work has been carried out for visual perception in the context o... more Although a significant amount of work has been carried out for visual perception in the context of unmanned aerial vehicles (UAVs), not so much has been done regarding auditory perception. The latter can complement the observation of the environment that surrounds a UAV by providing additional information that can be used to detect, classify, and localize audio sources of interest. Motivated by the usefulness of auditory perception for UAVs, we present a literature review that discusses the audio techniques and microphone configurations reported in the literature. A categorization of techniques is proposed based on the role a UAV plays in the auditory perception (is it the one being perceived or is it the perceiver?), as well as a set of objectives that are more popularly aimed to be accomplished in the current literature (detection, classification, and localization). This literature review aims to provide a concise landscape of the most relevant works on auditory perception in the ...
Uploads
Journal Papers by Caleb Rascon
Conference Papers by Caleb Rascon
microphone system. This functionality is integrated within Spoken HRI using dialogue models and a cognitive architecture. The paper further discusses applications where robotic orientation benefits HRI, such as a tour-guide robot capable to guide a poster session and a “Marco Polo” game where a robot aims to follow a user purely by sound.
robot that conducts a poster session through spoken Spanish. The robot is able to navigate around its environment, visually identify informational posters, and explain sections of the posters that users request via pointing gestures. We specify the task by means of dialogue models. A dialogue model defines conversational situations, expectations and robot actions. Dialogue models are integrated into a novel cognitive architecture that allow us to coordinate both human–robot interaction and robot capabilities in a flexible and simple manner. Our robot also incorporates a confidence score on visual outcomes, the history of the conversation and error prevention strategies. Our initial evaluation of the dialogue structure shows the reliability of the overall approach, and the suitability of our dialogue model and architecture to represent complex human–robot interactions, with promising results.
for Blind Source Separation in generic spectra which
are themselves obtained from sensors that can be decalibrated or are too sensitive to ambience changes. This
usually results in frequency displacement or lag that ICA
will face during its source extraction. Experiments were
done that show that ICA is not well-equipped to handle
such displacement, and that it only is able to extract the
same components as before being lagged given only an
insignificant amount of displacement. Other experiments
showed that the amount of lag that ICA can handle varies
depending on the width of the components intended to be
extracted.
Papers by Caleb Rascon
microphone system. This functionality is integrated within Spoken HRI using dialogue models and a cognitive architecture. The paper further discusses applications where robotic orientation benefits HRI, such as a tour-guide robot capable to guide a poster session and a “Marco Polo” game where a robot aims to follow a user purely by sound.
robot that conducts a poster session through spoken Spanish. The robot is able to navigate around its environment, visually identify informational posters, and explain sections of the posters that users request via pointing gestures. We specify the task by means of dialogue models. A dialogue model defines conversational situations, expectations and robot actions. Dialogue models are integrated into a novel cognitive architecture that allow us to coordinate both human–robot interaction and robot capabilities in a flexible and simple manner. Our robot also incorporates a confidence score on visual outcomes, the history of the conversation and error prevention strategies. Our initial evaluation of the dialogue structure shows the reliability of the overall approach, and the suitability of our dialogue model and architecture to represent complex human–robot interactions, with promising results.
for Blind Source Separation in generic spectra which
are themselves obtained from sensors that can be decalibrated or are too sensitive to ambience changes. This
usually results in frequency displacement or lag that ICA
will face during its source extraction. Experiments were
done that show that ICA is not well-equipped to handle
such displacement, and that it only is able to extract the
same components as before being lagged given only an
insignificant amount of displacement. Other experiments
showed that the amount of lag that ICA can handle varies
depending on the width of the components intended to be
extracted.
relevance for future personal domestic applications. It is the largest international annual competition
for autonomous service robots and is part of the RoboCup initiative. A set of benchmark
tests is used to evaluate the robots abilities and performance in a realistic non-standardized home
environment setting. Focus lies on the following domains but is not limited to: Human-RobotInteraction
and Cooperation, Navigation and Mapping in dynamic environments, Computer
Vision and Object Recognition under natural light conditions, Object Manipulation, Adaptive
Behaviors, Behavior Integration, Ambient Intelligence, Standardization and System Integration.
It is collocated with the RoboCup symposium.