Papers by Derry Fitzgerald
2011 17th International Conference on Digital Signal Processing (DSP), 2011
We present a system for upmixing mono recordings to stereo through the use of sound source separa... more We present a system for upmixing mono recordings to stereo through the use of sound source separation techniques. The use of sound source separation has the advantage of allowing sources to be placed at distinct points in the stereo field, resulting in more natural sounding upmixes. The system separates an input signal into a number of sources, which can then be imported into a digital audio workstation for upmixing to stereo. Considerations to be taken into account when upmixing are discussed, and a brief overview of the various sound source separation techniques used in the system are given. The effectiveness of the proposed system is then demonstrated on real-world mono recordings.
i Abstract While research has been carried out on automated polyphonic music transcription, to-da... more i Abstract While research has been carried out on automated polyphonic music transcription, to-date the problem of automated polyphonic percussion transcription has not received the same degree of attention. A related problem is that of sound source separation, which attempts to separate a mixture signal into its constituent sources. This thesis focuses on the task of polyphonic percussion transcription and sound source separation of a limited set of drum instruments, namely the drums found in the standard rock/pop drum kit.

Nonnegative matrix factorization (NMF) is an effective and popular low-rank model for nonnegative... more Nonnegative matrix factorization (NMF) is an effective and popular low-rank model for nonnegative data. It enjoys a rich background, both from an optimization and probabilistic signal processing viewpoint. In this study, we propose a new cost-function for NMF fitting, which is introduced as arising naturally when adopting a Cauchy process model for audio waveforms. As we recall, this Cauchy process model is the only probabilistic framework known to date that is compatible with having additive magnitude spectrograms for additive independent audio sources. Similarly to the Gaussian power-spectral density, this Cauchy model features time-frequency nonnegative scale parameters, on which an NMF structure may be imposed. The Cauchy cost function we propose is optimal under that model in a maximum likelihood sense. It thus appears as an interesting newcomer in the inventory of useful cost-functions for NMF in audio. We provide multiplicative updates for Cauchy-NMF and show that they give g...
Many of the Beach Boys records were mono only as this was Brian Wilson's preferred format. Howeve... more Many of the Beach Boys records were mono only as this was Brian Wilson's preferred format. However, starting in the mid 90's, stereo mixes of many of these classics were created by synchronising the tracks from the instrumental multitrack with those of the vocal multitrack. Unfortunately, for a number of tracks, including Good Vibrations, elements of the multitracks were missing, making a true stereo mix impossible. This paper deals with how stereo extraction mixes were created for a number of Beach Boys songs using sound source separation techniques to separate sources from the original mono recordings, which were then panned to create stereo mixes. These mixes were used in reissues of Beach Boys albums in 2012.
An efficient and effective stereo vocal extraction algorithm is presented, which combines two exi... more An efficient and effective stereo vocal extraction algorithm is presented, which combines two existing approaches. A Nearest Neighbours Median Filtering algorithm is used to separate the vocals and the instrumental backing track from the stereo mixture. The separated vocal track is then passed through a mask generated by the Adress algorithm and high-pass filtered to extract the vocals. The separated instrumental backing track is then improved by adding to it the residual backing track energy extracted by Adress. Also investigated is a variant on this algorithm which uses a difference spectrogram to calculate the nearest neighbours. The effectiveness of these algorithms is then demonstrated on a test dataset, and results show that the proposed algorithms give performance comparable to the state of the art, but at a low computational cost.
IET Irish Signals and Systems Conference (ISSC 2012), 2012
Recently, single channel vocal separation algorithms have been proposed which exploit the fact th... more Recently, single channel vocal separation algorithms have been proposed which exploit the fact that most popular music can be regarded as a repeating musical background over which a locally non-repeating vocal signal is superimposed. In this paper we describe a novel vocal separator inspired by these approaches which finds the k nearest neighbours to each frame of a spectrogram of the mixture signal. The median value of these frames is then used as the estimate of the background music at the current frame. This is then used to generate a mask on the original complex-valued spectrogram before inversion to the time domain. The effectiveness of the approach is demonstrated on a number of real-world signals.

Recently, tensor decompositions have found use in sound source separation. In particular, non-neg... more Recently, tensor decompositions have found use in sound source separation. In particular, non-negative tensor decompositions have received a lot of attention due to their ability to decompose audio spectrograms into meaningful "parts" such as individual notes. Extensions to the basic non-negative tensor factorisation framework allow the incorporation of additional constraints, such as shift-invariance in both frequency and time. This enables the factorisations to capture more complex structures than individual notes, such as individual sources playing different pitches and time-evolving instrument timbres. Further music specific constraints such as harmonicity and sourcefilter modeling have been shown to improve separation performance for musical signals. Other recent advances also allow the incorporation of Bayesian priors into these models, thereby further improving the separations obtained.
Recent research has demonstrated that user assisted techniques, where the user provides a "guide"... more Recent research has demonstrated that user assisted techniques, where the user provides a "guide" version of the source to be separated, are capable of giving good sound source separation. Here the user sings or plays along with the target source, and the user input is used to guide the separation towards the source of interest. This is typically done in a factorisation framework, such as non-negative matrix factorisation. Here we extend such approaches to a tensor factorisation framework to deal with multichannel signals. Further, we demonstrate how this framework can be used to improve the output from other user assisted techniques, such as the Adress algorithm, where the user manually selects a region from the stereo space corresponding to a given source.
Much research has been carried out on the use of non-negative matrix factorisation for the purpos... more Much research has been carried out on the use of non-negative matrix factorisation for the purpose of musical sound source separation. However, a notable shortcoming of non-negative matrix factorisation is that the recovered basis functions have to be clustered to sound sources for separation to take place. This has proved to be a difficult problem to solve. As a means of overcoming this problem, we introduce an extension to non-negative matrix factorisation which allows a user to guide the separation by singing, or playing along with, the source they want to separate. This is done through the use of gamma-chain priors. Examples of user assisted separation are also presented.
i Abstract While research has been carried out on automated polyphonic music transcription, to-da... more i Abstract While research has been carried out on automated polyphonic music transcription, to-date the problem of automated polyphonic percussion transcription has not received the same degree of attention. A related problem is that of sound source separation, which attempts to separate a mixture signal into its constituent sources. This thesis focuses on the task of polyphonic percussion transcription and sound source separation of a limited set of drum instruments, namely the drums found in the standard rock/pop drum kit.
This paper demonstrates of Prior Subspace Analysis (PSA) as a method for transcribing drums in th... more This paper demonstrates of Prior Subspace Analysis (PSA) as a method for transcribing drums in the presence of pitched instruments. PSA uses prior subspaces that represent the sources to be transcribed to overcome some of the problems associated with other subspace methods such as Independent Subspace Analysis (ISA) or sub-band ISA. The use of prior knowledge results in improved robustness for transcription purposes and enables the method to work more readily in the presence of pitched instruments than other subspace methods. The effectiveness and robustness of PSA as a tool for drum transcription in the presence of pitched instruments is demonstrated in a simple drum transcription algorithm. __________________________________________________________________________________________
This paper presents a possible approach for developing a violin teaching aid based on violin peda... more This paper presents a possible approach for developing a violin teaching aid based on violin pedagogy, sound analysis and comparison of beginner and good player recordings. This teaching aid is targeted at students who have difficulty listening attentively to the sounds they produce. It aims to draw their attention to the sound of a fault, offer correction and to train the user's ear to actively listen.
Uploads
Papers by Derry Fitzgerald