Papers by Perfecto Herrera
Systems and methods for determining similarity between two or more audio pieces are disclosed. An... more Systems and methods for determining similarity between two or more audio pieces are disclosed. An illustrative method for determining musical similarities includes extracting one or more descriptors from each audio piece, generating a vector for each of the audio pieces, extracting one or more audio features from each of the audio pieces, calculating values for each audio feature, calculating a distance between a vector containing the normalized values and the vectors containing the audio pieces, and outputting a response to a user or ...
Boletin De La Asocacion Espanola De Documentacion Musical, 2011
InformaciĂ³n del artĂculo TecnologĂas para el anĂ¡lisis del contenido musical de archivos sonoros y... more InformaciĂ³n del artĂculo TecnologĂas para el anĂ¡lisis del contenido musical de archivos sonoros y para la generaciĂ³n de nuevos metadatos.
Structuring Music Through Markup Language Designs and Architectures, 2013
Abstract In this chapter, the authors discuss an approach to music representation that supports c... more Abstract In this chapter, the authors discuss an approach to music representation that supports collaborative composition given current practices based on digital audio. A music work is represented as a directed graph that encodes sequences and layers of sound samples. The authors discuss graph grammars as a general framework for this representation. From a grammar perspective, they analyze the use of XML for storing production rules, music structures, and references to audio files. The authors describe an ...

International Computer Music Conference Proceedings, 2007
We present a system to produce expectations based on the observation of a rhythmic music signals ... more We present a system to produce expectations based on the observation of a rhythmic music signals at a constant tempo. The algorithms we use are causal, in order be fit closer to cognitive constraints and allow a future realtime implementation. In a first step, an acoustic front-end based on the aubio library extracts onsets and beats from the incoming signal. The extracted onsets are then encoded in a symbolic way using an unsupervised scheme: each hit is assigned a timbre cluster based on its timbre features, while its inter-onset interval regarding the previous hit is computed as a proportion of the extracted tempo period and assigned an inter-onset interval cluster. In a later step, the representation of each hit is sent to an expectation module, which learns the statistics of the symbolic sequence. Hence, at each musical hit, the system produces both what and when expectations regarding the next musical hit. For evaluating our system, we consider a weighted average F-measure, that takes into account the uncertainty associated with the unsupervised encoding of the musical sequence. We then present a preliminary experiment involving generated musical material and propose a roadmap in the context of this novel application field.
Parlem de la tecnologia musical, tot i semblar un pleonasme ja que la mĂºsica Ă©s tecnĂ© per definic... more Parlem de la tecnologia musical, tot i semblar un pleonasme ja que la mĂºsica Ă©s tecnĂ© per definiciĂ³ , quan ens referim a una branca del coneixement i de l'activitat humana que s'adreça a investigar i desenvolupar eines de suport a la creaciĂ³, la difusiĂ³ i la recepciĂ³ de la mĂºsica. En aquesta ponència presentarem l'estat actual de la recerca i l'educaciĂ³ en tecnologia musical a Catalunya i descobrirem algunes propostes i resultats innovadors i capdavanters que ens porten a pensar que comencem a disposar d'una infraestructura educativa i de recerca apta per a convertir-nos en un punt de referència internacional dins d'aquest Ă mbit.

This paper presents Mood Cloud 2.0, an application that allows to visualize and browse music by m... more This paper presents Mood Cloud 2.0, an application that allows to visualize and browse music by mood. With the first version of Mood Cloud [1], we could visualize in realtime the mood prediction of different Support Vector Machine models (one for each 'basic' mood). This helped to understand how well we can predict the mood evolution in time. Version 2.0 enables a new 2D visualization based social network data (see for more details) and adds retrieval features. In this representation, we can visualize one's collection, observe the mood evolution of a song in time, and draw a path to make a playlist or retrieve a song based its time evolution. This 2D space is flexible, one can choose between different templates. the most innovative one being the representation extracted from social networks called semantic mood space . The 2D semantic mood space was obtained using Self-Organizing Maps on tag data from last.fm. Each song of one's collection is mapped into the semantic mood space using its tags. Other modes and representations are proposed. If the tags are not available, we can use the autotagger function, which automatically adds tag to the piece and so place it in the semantic space. This technique is also used to evaluate the mood evolution of one song dividing it in segments of a few seconds. Additionally, pre-computed audio mood models are available (the updated models from Mood Cloud 1.0), which are state-of-the-art mood classification algorithms. For these models, the 2D representation can be changed using different axis. We allow the user to change the two dimensions, selecting between the existing audio models in Mood Cloud 1.0 (happy, sad, aggressive, relax and party). One can visualize his collection in the agressive/sad or relaxed/happy spaces for instance. With both the autotagger and the mood models, any collection can be mapped and browsed into a 2D space. By analyzing the songs in windows of a few seconds, we can visualize, in the same space, the instantaneous mood and its evolution during the song. Finally drawing a path into that space can be used to make a playlist or to search for a song with this particular mood evolution in time.
This document proposes to use the Friend of a Friend (FOAF) definition to recommend music dependi... more This document proposes to use the Friend of a Friend (FOAF) definition to recommend music depending on user's musical tastes and to filter music-related newsfeeds. One of the goals of the project is to explore music content discovery, based on both user profiling -FOAF descriptions-and content-based descriptions -extracted from the audio itself.
We aim to model infants' perception and representation of temporal information that is present in... more We aim to model infants' perception and representation of temporal information that is present in infant directed speech and singing, using connectionist computational models (neural networks). In our approach, we consider the sound patterning, present in both speech and singing, in terms of timing and accent. The model receives audio on the input. Subsequently, different features are computed, according to different processes operating in parallel. Finally, we compute a representational transition, which learns categorical structured representations in terms of communication purposes from unstructured examples. In addition, we propose experiments to perform with the model. With these experiments we aim to study the development of representations from undifferentiated whole sounds to the relations between attributes that compose those sounds.
The aim of this paper is discussing possible ways of describing some music constructs in a dual c... more The aim of this paper is discussing possible ways of describing some music constructs in a dual context: that of a specific software application (a tool for content-based management and edition of samples and short audio phrases), and that of the current standard for multimedia content description . Different musical layers, melodic, rhythmic and instrumental, are examined in terms of usable descriptors and description schemes. After discussing some MPEG-7 limitations regarding those specific layers (and given the needs of a specific application context), some proposals for overcoming them are presented.
This paper presents an application for performing melodic transformations to monophonic audio phr... more This paper presents an application for performing melodic transformations to monophonic audio phrases. The system first extracts a melodic description from the audio. This description is presented to the user and can be stored and loaded in a MPEG-7 based format. A set of high-level transformations can then be applied to the melodic description. These high-level transformations are mapped into a set of low-level signal transformations and then applied to the audio signal. The algorithms for description extraction and audio transformation are also presented.
The aim of this work is to study how a pitch detection algorithm can help in the task of locating... more The aim of this work is to study how a pitch detection algorithm can help in the task of locating solos in a musical excerpt. Output parameters of the pitch detection algorithm are studied, and enhancements for the task of solo location are proposed. A solo is defined as a section of a piece where an instrument is in foreground compared to the other instrument and to other section of the piece.
With the rapid growth of audio databases, many music retrieval applications have employed metadat... more With the rapid growth of audio databases, many music retrieval applications have employed metadata descriptions to facilitate better handling of huge databases. Music structure creates the uniqueness identity for each music piece. Therefore, structural description is capable of providing a powerful way of interacting with audio content, and serves as a linkage between low-level description and higher-level descriptions of audio
Uploads
Papers by Perfecto Herrera