Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009
…
7 pages
1 file
In this article we describe the approximation we follow to analyze the performance of a singer when singing a reference song. The idea is to rate the performance of a singer in the same way that a music tutor would do it, not only giving a score but also giving feedback about how the user has performed regarding expression, tuning and tempo/timing characteristics. Also a discussion on what visual feedback should be relevant for the user is discussed. Segmentation at an intra-note level is done using an algorithm based on untrained HMMs with probabilistic models built out of a set of heuristic rules that determine regions and their probability of being expressive features. A real-time karaoke-like system is presented where a user can sing and visualize simultaneously feedback and results of the performance. The technology can be applied to a wide set of applications that range from pure entertainment to more serious education oriented.
2010
In this article we describe the approximation we follow to analyze the performance of a singer when singing a reference song. The idea is to rate the performance of a singer in the same way that a music tutor would do it, not only giving a score but also giving feedback about how the user has performed regarding expression, tuning and tempo/timing characteristics. Also a discussion on what visual feedback should be relevant for the user is discussed. Segmentation at an intra-note level is done using an algorithm based on untrained HMMs with probabilistic models built out of a set of heuristic rules that determine regions and their probability of being expressive features. A real-time karaoke-like system is presented where a user can sing and visualize simultaneously feedback and results of the performance. The technology can be applied to a wide set of applications that range from pure entertainment to more serious education oriented.
Computer evaluation of singing interpretation has traditionally been based exclusively on tuning and tempo. This article presents a tool for the automatic evaluation of singing voice performances that regards on tuning and tempo but also on the expression of the voice. For such purpose, the system performs analysis at note and intra-note levels. Note level analysis outputs traditional note pitch, note onset and note duration information while Intra-note level analysis is in charge of the location and the expression categorization of note's attacks, sustains, transitions, releases and vibratos. Segmentation is done using an algorithm based on untrained HMMs with probabilistic models built out of a set of heuristic rules. A graphical tool for the evaluation and fine-tuning of the system will be presented. The interface gives feedback about analysis descriptors and rule probabilities.
2017
The process of speech production changes between speaking and singing due to excitation, vocal tract articulatory positioning, and cognitive motor planning while singing. Singing does not only deviate from typical spoken speech, but it varies across various styles of singing. This is due to alternative genres of music, singing quality of an individual, as well as different languages and cultures. Because of this variation, it is important to establish a baseline system for differentiating between certain aspects of singing. In this study, we establish a classification system that automatically estimates singing quality of candidates from an American TV singing show based on their singing speech acoustics, lip and eye movements. We employ three classifiers that include: Logistic Regression, Naive Bayes and K-nearest neighbor (k-NN) and compare performance of each using unimodal and multimodal features. We also compare performance based on different modalities (speech, lip, eye struct...
Psychology of Music, 2000
In psychological and cross-cultural (e.g. ethnomusicological) research the analysis of song-singing had always been an intricate and serious obstacle. Singing is a transient and mostly unstable patterning of vocal sounds that is organised by applying more or less linguistic and musical rules. Traditionally, a sung performance has been analysed by mere listening and by using the western musical notation for representing its structure. Since this method neglects any in-between categories with respect to pitch and time, it proves to be culturally biased. However, acoustic measures as used in speech analysis have had limited application and were primarily used to quantify isolated parameters of sung performances. For analysing and representing the organisation of pitch in relation to the syllables of the lyrics, and its temporal structure, we devised a computer-aided method in combination with a new symbolic representation. The computer program provides detailed acoustic measures on pit...
2009
Detecting distinct features in modern pop music is an important problem that can have significant applications in areas such as multimedia entertainment. They can be used, for example, to give a visually coherent representation of the sound. We propose to integrate a singing voice detector with a multimedia, multi-touch game where the user has to perform simple tasks at certain key points in the music. While the ultimate goal is to automatically create visual content in response to features extracted from the music, here we give special focus to the detection of voice segments in music songs. The solution presented extracts the Mel-Frequency Cepstral Coefficients of the sound and uses a Hidden Markov Model to infer if the sound has voice. The classification rate obtained is high when compared to other singing voice detectors that use Mel-Frequency Cepstral Coefficients.
CMMR 2017 - 13th International Symposium on Computer Music Multidisciplinary Research - Music Technology with Swing. 25-28 September, 2017
In this paper we present a database of fundamental frequency series for singing performances to facilitate comparative analysis of algorithms developed for singing assessment. A large number of recordings have been collected during conservatory entrance exams which involves candidates’ reproduction of melodies (after listening to the target melody played on the piano) apart from some other rhythm and individual pitch perception related tasks. Leaving out the samples where jury members’ grades did not all agree, we deduced a collection of 1018 singing and 2599 piano performances as instances of 40 distinct melodies. A state of the art fundamental frequency(f0) detection algorithm is used to deduce f0 time-series for each of these recordings to form the dataset. The dataset is shared to support research in singing assessment. Together with the dataset, we provide a flexible singing assessment system that can serve as a baseline for comparison of assessment algorithms.
Applied Sciences
This paper introduces a new method for detecting onsets, offsets, and transitions of the notes in real-time solo singing performances. It identifies the onsets and offsets by finding the transitions from one note to another by considering trajectory changes in the fundamental frequencies. The accuracy of our approach is compared with eight well-known algorithms. It was tested with two datasets that contained 130 files of singing. The total duration of the datasets was more than seven hours and had more than 41,000 onset annotations. The analysis metrics used include the Average, the F-Measure Score, and ANOVA. The proposed algorithm was observed to determine onsets and offsets more accurately than the other algorithms. Additionally, unlike the other algorithms, the proposed algorithm can detect the transitions between notes.
2021
Dynamics are one of the fundamental tools of expressivity in a performance. While the usage of this tool is highly subjective, a systematic methodology to derive loudness markings based on a performance can be highly beneficial. With this goal in mind, this paper is a first step towards developing a methodology to automatically transcribe dynamic markings from vocal rock and pop performances. To this end, we make use of commercial recordings of some popular songs followed by source separation and compare them to the karaoke versions of the same songs. The dynamic variations in the original commercial recordings are found to be structurally very similar to the aligned karaoke/multi-track versions of the same tracks. We compare and show the differences between tracks using statistical analysis, with an eventual goal to use the transcribed markings as guiding tools, to help students adapt with a specific interpretation of a given piece of music. We perform a qualitative analysis of the...
Applied Sciences
There are insufficient datasets of singing files that are adequately annotated. One of the available datasets that includes a variety of vocal techniques (n = 17) and several singers (m = 20) with several WAV files (p = 3560) is the VocalSet dataset. However, although several categories, including techniques, singers, tempo, and loudness, are in the dataset, they are not annotated. Therefore, this study aims to annotate VocalSet to make it a more powerful dataset for researchers. The annotations generated for the VocalSet audio files include fundamental frequency contour, note onset, note offset, the transition between notes, note F0, note duration, Midi pitch, and lyrics. This paper describes the generated dataset and explains our approaches to creating and testing the annotations. Moreover, four different methods to define the onset/offset are compared.
Journal of The Acoustical Society of America, 1999
When a musician gives a recital or concert, the music performed generally includes accompaniment. To render a good performance, the soloist and the accompanist must know the musical score and must follow the other musician's performance. Both performing and rehearsing are limited by constraints on the time and money available for bringing musicians together. Computer systems that automatically provide musical accompaniment offer an inexpensive, readily available alternative. Effective computer accompaniment requires software that can listen to live performers and follow along in a musical score. This work presents an implemented system and method for automatically accompanying a singer given a musical score. Specifically, I offer a method for robust, real-time detection of a singer's score position and tempo. Robust score following requires combining information obtained both from analyzing a complex signal (the singer's performance) and from processing symbolic notation (the score). Unfortunately, the mapping from the available information to score position does not define a function. Consequently, this work investigated a statistical characterization of a singer's score position and a model that combines the available musical information to produce a probabilistic position estimate. By making careful assumptions and estimating statistics from a set of actual vocal performances, a useful approximation of this model can be implemented in software and executed in real time during a musical performance. As part of this project, a metric was defined for evaluating the system's ability to follow a singer. This metric was used to assess the system's ability to track vocal performances. The presented evaluation includes a characterization of how tracking ability can be improved by using several different measurements from the sound signal rather than only one type of measurement. Examined measurements of the sound signal include fundamental pitch, spectral features dependent upon the score's phonetic content, and amplitude changes correlated with the start of a musical note. The evaluation results demonstrate how incorporating multiple measurements of the same signal can improve the accuracy of performance tracking, for individual performances as well as on average. Overall improvement of the performance tracking system through incremental specification, development, and evaluation is facilitated by the formal statistical approach to the problem. First and foremost, I would like to thank my advisor, Roger Dannenberg, for both his assistance and his support, as well as for his great patience throughout the life span of this project. Thanks are also due to the members of my thesis committee-Shuji Hashimoto, Jack Mostow and Tom Mitchell-for their numerous insights and helpful comments as well as their general interest in this work. I am also grateful for the many helpful discussions with other faculty and students in the Department of Computer Science
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Journal of Physics: Conference Series, 2021
2008 IEEE International Conference on Multimedia and Expo, 2008
IEEE Transactions on Multimedia, 2000
Proc. ICMPC2006, 2006
Journal on Multimodal User Interfaces, 2007
2006
Multimedia tools and applications, 2024
Aes 121th Convention, 2006
16th International Conference on Electrical Engineering, Computing Science and Automatic Control, 2019