Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2021
Dynamics play a fundamental role in varying the expressivity of any performance. While the usage of this tool can vary from artist to artist, and also from performance to performance, a systematic methodology to derive dynamics in terms of musically meaningful terms like piano, forte etc can offer valuable feedback in the context of vocal music education. To this end, we make use of commercial recordings of some popular rock and pop songs from the Smule vocal balanced dataset and transcribe it with dynamic markings with the help of a music teacher. Further, we compare the dynamics of the source separated original recordings with the aligned karaoke versions to find the variations in dynamics. We compare and present the differences using statistical analysis, with a goal to provide the dynamic markings as guiding tools for students to learn and adapt with a specific interpretation of a piece of music.
2021
Dynamics are one of the fundamental tools of expressivity in a performance. While the usage of this tool is highly subjective, a systematic methodology to derive loudness markings based on a performance can be highly beneficial. With this goal in mind, this paper is a first step towards developing a methodology to automatically transcribe dynamic markings from vocal rock and pop performances. To this end, we make use of commercial recordings of some popular songs followed by source separation and compare them to the karaoke versions of the same songs. The dynamic variations in the original commercial recordings are found to be structurally very similar to the aligned karaoke/multi-track versions of the same tracks. We compare and show the differences between tracks using statistical analysis, with an eventual goal to use the transcribed markings as guiding tools, to help students adapt with a specific interpretation of a given piece of music. We perform a qualitative analysis of the...
2010
In this article we describe the approximation we follow to analyze the performance of a singer when singing a reference song. The idea is to rate the performance of a singer in the same way that a music tutor would do it, not only giving a score but also giving feedback about how the user has performed regarding expression, tuning and tempo/timing characteristics. Also a discussion on what visual feedback should be relevant for the user is discussed. Segmentation at an intra-note level is done using an algorithm based on untrained HMMs with probabilistic models built out of a set of heuristic rules that determine regions and their probability of being expressive features. A real-time karaoke-like system is presented where a user can sing and visualize simultaneously feedback and results of the performance. The technology can be applied to a wide set of applications that range from pure entertainment to more serious education oriented.
Psychology of Music, 2000
In psychological and cross-cultural (e.g. ethnomusicological) research the analysis of song-singing had always been an intricate and serious obstacle. Singing is a transient and mostly unstable patterning of vocal sounds that is organised by applying more or less linguistic and musical rules. Traditionally, a sung performance has been analysed by mere listening and by using the western musical notation for representing its structure. Since this method neglects any in-between categories with respect to pitch and time, it proves to be culturally biased. However, acoustic measures as used in speech analysis have had limited application and were primarily used to quantify isolated parameters of sung performances. For analysing and representing the organisation of pitch in relation to the syllables of the lyrics, and its temporal structure, we devised a computer-aided method in combination with a new symbolic representation. The computer program provides detailed acoustic measures on pit...
2015
We present a new dataset for singing analysis and modelling, and an exploratory analysis of pitch accuracy and pitch trajectories. Shortened versions of three pieces from The Sound of Music were selected: “Edelweiss”, “Do-ReMi” and “My Favourite Things”. 39 participants sang three repetitions of each excerpt without accompaniment, resulting in a dataset of 21762 notes in 117 recordings. To obtain pitch estimates we used the Tony software’s automatic transcription and manual correction tools. Pitch accuracy was measured in terms of pitch error and interval error. We show that singers’ pitch accuracy correlates significantly with self-reported singing skill and musical training. Larger intervals led to larger errors, and the tritone interval in particular led to average errors of one third of a semitone. Note duration (or inter-onset-interval) had a significant effect on pitch accuracy, with greater accuracy on longer notes. To model drift in the tonal centre over time, we present a s...
This paper describes the challenges that arise when attempting to automatically extract pitchrelated performance data from recordings of the singing voice. The first section of the paper provides an overview of the history of analyzing recorded performances. The second section describes an algorithm for automatically extracting performance data from recordings of the singing voice where a score of the performance is available. The algorithm first identifies note onsets and offsets. Once the onsets and offsets have been determined, intonation, vibrato, and dynamic characteristics can be calculated for each note.
This paper evaluates the utility of the Discrete Cosine Transform (DCT) for characterizing singing voice fundamental frequency (F 0 ) trajectories. Specifically, it focuses on the use of the 1 st and 2 nd DCT coefficients as approximations of slope and curvature. It also considers the impact of vocal vibrato on the DCT calculations, including the influence of segmentation on the consistency of the reported DCT coefficient values. These characterizations are useful for describing similarities in the evolution of the fundamental frequency in different notes. Such descriptors can be applied in the areas of performance analysis and singing synthesis.
2008
Background in music psychology. Physical movement plays an important role in musical perception and production. It is generally agreed among professional singers and vocal teachers, for example, that there are relationships between the kinematics of a singer's body and the quality of their voice. Thus, we might expect to find relationships between quantifiable indicators of a singer's vocal performance and
2009
In this article we describe the approximation we follow to analyze the performance of a singer when singing a reference song. The idea is to rate the performance of a singer in the same way that a music tutor would do it, not only giving a score but also giving feedback about how the user has performed regarding expression, tuning and tempo/timing characteristics. Also a discussion on what visual feedback should be relevant for the user is discussed. Segmentation at an intra-note level is done using an algorithm based on untrained HMMs with probabilistic models built out of a set of heuristic rules that determine regions and their probability of being expressive features. A real-time karaoke-like system is presented where a user can sing and visualize simultaneously feedback and results of the performance. The technology can be applied to a wide set of applications that range from pure entertainment to more serious education oriented.
CMMR 2017 - 13th International Symposium on Computer Music Multidisciplinary Research - Music Technology with Swing. 25-28 September, 2017
In this paper we present a database of fundamental frequency series for singing performances to facilitate comparative analysis of algorithms developed for singing assessment. A large number of recordings have been collected during conservatory entrance exams which involves candidates’ reproduction of melodies (after listening to the target melody played on the piano) apart from some other rhythm and individual pitch perception related tasks. Leaving out the samples where jury members’ grades did not all agree, we deduced a collection of 1018 singing and 2599 piano performances as instances of 40 distinct melodies. A state of the art fundamental frequency(f0) detection algorithm is used to deduce f0 time-series for each of these recordings to form the dataset. The dataset is shared to support research in singing assessment. Together with the dataset, we provide a flexible singing assessment system that can serve as a baseline for comparison of assessment algorithms.
Applied Sciences
There are insufficient datasets of singing files that are adequately annotated. One of the available datasets that includes a variety of vocal techniques (n = 17) and several singers (m = 20) with several WAV files (p = 3560) is the VocalSet dataset. However, although several categories, including techniques, singers, tempo, and loudness, are in the dataset, they are not annotated. Therefore, this study aims to annotate VocalSet to make it a more powerful dataset for researchers. The annotations generated for the VocalSet audio files include fundamental frequency contour, note onset, note offset, the transition between notes, note F0, note duration, Midi pitch, and lyrics. This paper describes the generated dataset and explains our approaches to creating and testing the annotations. Moreover, four different methods to define the onset/offset are compared.
Musicae Scientiae, 2010
Multimedia tools and applications, 2024
This paper describes SingDistVis, an information visualization technique for fundamental frequency (F0) trajectories of large-scale singing data where numerous singers sing the same song. SingDistVis allows to explore F0 trajectories interactively by combining two views: OverallView and DetailedView. OverallView visualizes a distribution of the F0 trajectories of the song in a time-frequency heatmap. When a user specifies an interesting part, Detailed-View zooms in on the specified part and visualizes singing assessment (rating) results. Here, it displays high-rated singings in red and low-rated singings in blue. When the user clicks on a particular singing, the audio source is played and its F0 trajectory through the song is displayed in OverallView. We selected heatmap-based visualization for OverallView to provide an overview of a large-scale F0 dataset, and polyline-based visualization for DetailedView to provide a more precise representation of a small number of particular F0 trajectories. This paper introduces a subjective experiment using 1,000 singing voices to determine suitable visualization parameters. Then, this paper presents user evaluations where we asked participants to compare visualization results of four types of Overview+Detail designs and concluded that the presented design archived better evaluations than other designs in all the seven questions. Finally, this paper describes a user experiment in which eight participants compare SingDistVis with a baseline implementation in exploring interested singing voices and concludes that the proposed SingDistVis archived better evaluations in nine of the questions.
2006
We present a procedure to automatically describe musical articulation gestures used in singing voice performances. We detail a method to characterize temporal evolution of fundamental frequency and energy contours by a set of piece-wise fitting techniques. Based on this, we propose a meaningful parameterization that allows reconstructing contours from a compact set of parameters at different levels. We test the characterization method by applying it to fundamental frequency contours of manually segmented transitions between adjacent notes, and train several classifiers with manually labeled examples. We show the recognition accuracy for different parameterizations and levels of representation.
Applied Sciences
This paper introduces a new method for detecting onsets, offsets, and transitions of the notes in real-time solo singing performances. It identifies the onsets and offsets by finding the transitions from one note to another by considering trajectory changes in the fundamental frequencies. The accuracy of our approach is compared with eight well-known algorithms. It was tested with two datasets that contained 130 files of singing. The total duration of the datasets was more than seven hours and had more than 41,000 onset annotations. The analysis metrics used include the Average, the F-Measure Score, and ANOVA. The proposed algorithm was observed to determine onsets and offsets more accurately than the other algorithms. Additionally, unlike the other algorithms, the proposed algorithm can detect the transitions between notes.
Aes 121th Convention, 2006
We present a procedure to automatically describe musical articulation gestures used in singing voice performances. We detail a method to characterize temporal evolution of fundamental frequency and energy contours by a set of piece-wise fitting techniques. Based on this, we propose a meaningful parameterization that allows reconstructing contours from a compact set of parameters at different levels. We test the characterization method by applying it to fundamental frequency contours of manually segmented transitions between adjacent notes, and train several classifiers with manually labeled examples. We show the recognition accuracy for different parameterizations and levels of representation.
Background in music theory and analysis. Polyphonic vocal intonation practices have been addressed in a number of studies on vocal acoustics. Our research both builds on this work and supplements it with a theoretical paradigm based on work done in the areas of sensory consonance and tonal attraction. Background in computing. Recent work in the field of music information retrieval has discussed the main obstacles related to tracking pitches in a polyphonic signal and has provided some techniques for working around these problems. Our method for analyzing the pitch content of recorded performances draws extensively on this work and on the knowledge made available to us by the musical scores of the pieces being performed. Aims. Our research is focused on the study and modeling of polyphonic vocal intonation practices through the intersection of computational and theoretical approaches. We present a methodology that allows for a detailed model of this aspect of polyphonic vocal performance practice to be built from analyses of numerous recordings of real-world performances, while working within a robust theoretical paradigm. Main contribution. In the computational component of the research a number of a cappella polyphonic vocal recordings are analyzed with signal processing techniques to estimate the perceived fundamental frequencies for the sung notes. These observations can be related to the musical context of the score through machine learning techniques to determine likely intonation tendencies for regularly occurring musical patterns. A major issue in developing a theory of intonation practices is the potential conflict between the vertical and horizontal intonational impetuses. To assess this conflict in greater detail we have constructed a theoretical approach where theories of sensory consonance account for vertical tuning tendencies and theories of tonal attraction account for the horizontal tendencies. Implications. In the field of music cognition, our research relates to work being done in the area of musical expression. If the intonation tendencies inferred from the end results of this research are taken as a norm, then deviations from this norm, when these deviations are musically appropriate, can be viewed as expressive phenomena. Computer software implementing such results will allow composers and musicologists to hear more intonationally accurate digital re-creations and may also function as a training guide for vocalists.
2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763)
One of the most important characteristics of music is the singing voice. Although the identity and characteristics of the singing voice are important cues for recognizing artists, groups and musical genres, these cues have not yet been fully utilized in computer audition algorithms. A first step toward this direction is the identification of segments within a song where there is a singing voice. In this paper, we present some experiments in the automatic extraction of singing voice structure. The main characteristic of the proposed approach is that the segmentation is performed specifically for each individual song using a process we call bootstrapping. In bootstrapping a small random sampling of the song is annotated by the user. This annotation is used to learn the song-specific voice characteristics and the trained classifier is subsequently used to classify and segment the whole song. We present experimental results on a collection of pieces with jazz singers that show the potential of this approach and compare it with the traditional approach of using multiple songs for training. It is our belief that the idea of song-specific bootstrapping is applicable to other types of music and audio computer-supported annotation.
This paper describes one type of analytics model for recorded music performances. It includes the derivation of an " average " performance or Performance Norm (PN), and calculation of distance matrices (DM) for a collection of recordings. It then describes some experiments using both point rates of acceleration, and of change of dynamic level, as key factors in determining expressivity and ultimately performance style. A software program was developed in R to automate calculations. This model has possible " fingerprinting " applications in music fraud detection, music information retrieval (MIR) and, perhaps most importantly, in pedagogical applications for music education.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.