Performance analysis and scoring of the singing voice

2009

visibility

…

description

7 pages

link

1 file

Abstract

In this article we describe the approximation we follow to analyze the performance of a singer when singing a reference song. The idea is to rate the performance of a singer in the same way that a music tutor would do it, not only giving a score but also giving feedback about how the user has performed regarding expression, tuning and tempo/timing characteristics. Also a discussion on what visual feedback should be relevant for the user is discussed. Segmentation at an intra-note level is done using an algorithm based on untrained HMMs with probabilistic models built out of a set of heuristic rules that determine regions and their probability of being expressive features. A real-time karaoke-like system is presented where a user can sing and visualize simultaneously feedback and results of the performance. The technology can be applied to a wide set of applications that range from pure entertainment to more serious education oriented.

Figures (10)

Figure 1: Overview of the performance analysis system expression transcription of the performance, segmenting each note in sub-regions (attack, release, sustain, vibrato or transition) and assigning an expressive label to each region. All these processes are based on a set of descriptors and features extracted from the input audio. In figure 1 we can see an overview of the performance analysis of the singing voice that is being performed by our system. Firstly we decide which features are more relevant in the singing voice and then we try to derive a set of heuristic rules, based on the analysis descriptors (pitch, energy, spectral coefficients, mel cepstrum

supposed to be singing, so we are aligning the midi notes to the notes in the singing performance. As a result of the segmentation we will have the same notes of the midi reference but the onset and duration will be adjusted or aligned to the performance of the user and silences between notes will kept. Thus, if the performance adds or drops notes that are not present in the midi reference, the alignment will try to be as many similar as it can be to the reference but without adding or dropping notes. In figure 3 we can see the results of a note alignment where three notes in the original score (MIDI notes) are aligned to the user pitch (User notes). Figure 3: Note alignment results

Figure 4: MIDI note and note model sequence. Note alignment is performed using segmental HMMs based on hypothetic probabilistic models. A sequence of note and silence states given by a MIDI score represents the melody of the song (see figure 4) and heuristic rules determine the most probable path from all possible paths in the Viterbi matrix. The resulting score is the same as the reference MIDI but with the notes shortened or lengthened and the onsets and pitch shifted to better fulfill the rules applied in the segmentation/alignment algorithm. You can see this in detail in [6].

Figure 5: Expression path with expression labels In order to build the expression model sequence while performing in real-time, first we need to complete the alignment of the last performed note. Thus, in real time context, the performance analysis has a one note latency before it can show any feedback to the user.

Figure 6: Performance rating overview The fundamental performance ratings are calculated comparing the pitch, volume and timing between the user and the reference singer and also between the user and the midi notes. This is done in order to give two separate ratings, one for mimicry and other for comparison with a standard execution.

Figure 7: Offline GUI tool The off-line tool can be used for manual segmentation and expressive label edition of notes and intranote regions. This tool gives feedback by showing the probability of each heuristic rule in the manual segmentation given by the user. With the tool, the user can change the segmentation and see whether the global probability improves or not. This tool is also used to display the descriptors calculated in the analysis process so the user can view the values of these descriptors at any time point and change the heuristic rules to and improve the automatic note segmentation and expression transcription results. In the bottom window in figure 7 we can see the display of values of some analysis descriptors along time, each descriptor with a different color. Above this window we can see the pitch curve of the performance, the results of the note- segmentation and expression transcription as well as the MIDI score.

Figure 8: Lips game

Figure 10: Karaoke Revolution game

In the above part of the screen, the scoring of the song is shown divided in expressive rating, mimicry rating, score rating and total rating, which is an average of the previous.