Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
We propose a supervised system identification method for recovering an acoustic impulse response in a reverberant room. Unlike most existing methods, our algorithm is based on prior information given in the form of a training set of known impulse responses acquired in a controlled environment. By relying on the prior information, we train local Principal Component Analysis (PCA) models of impulse responses corresponding to several different regions in the room. We propose to crudely localize the respective source position, and subsequently, based on the appropriate local model, recover the impulse response. In order to approximate the source location, we introduce a specially-tailored distance measure which is based on an affinity between the trained local models. Experimental results in simulated noisy and reverberant environments demonstrate significant improvements over existing methods.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Large-scale Room Impulse Response (RIR) measurements are required to accurately determine a room's acoustic response to different source-listener configurations. RIR reconstruction methods are often used to reduce these measurement costs. Prior knowledge of room acoustic parameters can ensure reliable and robust RIR reconstruction. This paper proposes a method to reconstruct RIRs based on reflection source locations and time-frequency-directiondependent reflection magnitude response estimated from a single spherical microphone array measurement. These input parameters are learned by applying the eigenbeam spatial correlation method and von Mises-Fisher (vMF)-based directivity modeling. According to the performance evaluation, the composition of the learned features in the RIR reconstruction formulation successfully preserves the objective characteristics of the real room.
Acta Acustica united with Acustica, 2012
The localization of acoustic reflections, i.e., the image-sources, is of interest when analyzing the acoustics of concert halls and auditoriums. The location is needed, for example, in room acoustic studies, auralization, inference of room geometry, or when estimating the acoustic properties of surfaces. This article studies the localization of acoustic reflections from spatial impulse responses. The contribution of this article is threefold. First, the article proposes a new method for localization that takes advantage of the time of arrival (TOA) estimation. Secondly, it is proposed that TOA and time diff erence of arrival (TDOA) information, present in the spatial room impulse responses, are combined in two novel ways. Thirdly, the performance of the proposed localization methods is compared to the existing state-of-the-art localization methods in the acoustic reflection localization task. Theoretical performance is investigated and experiments using real and simulated data are c...
The Journal of the Acoustical Society of America, 2016
This paper investigates the reverberation time estimation methods which employ backward integration of adaptively identified room impulse responses (RIRs). Two kinds of conditions are considered; the first is the "ideal condition" where the anechoic and reverberant signals are both known a priori so that the RIRs can be identified using system identification methods. The second is that only the reverberant speech signal is available, and blind identification of the RIRs via dereverberation is employed for reverberation time estimation. Results show that under the "ideal condition," the average relative errors in 7 octave bands are less than 2% for white noise and 15% for speech, respectively, when both the anechoic and reverberant signals are available. In contrast, under the second condition, the average relative errors of the blindly identified RIR-based reverberation time estimation are around 20%-30% except the 63 Hz octave band. The fluctuation of reverberat...
arXiv (Cornell University), 2023
Accurate estimation of Room Impulse Response (RIR), which captures an environment's acoustic properties, is important for speech processing and AR/VR applications. We propose AV-RIR, a novel multi-modal multi-task learning approach to accurately estimate the RIR from a given reverberant speech signal and the visual cues of its corresponding environment. AV-RIR builds on a novel neural codec-based architecture that effectively captures environment geometry and materials properties and solves speech dereverberation as an auxiliary task by using multi-task learning. We also propose Geo-Mat features that augment material information into visual cues and CRIP that improves late reverberation components in the estimated RIR via image-to-RIR retrieval by 86%. Empirical results show that AV-RIR quantitatively outperforms previous audio-only and visual-only approaches by achieving 36% -63% improvement across various acoustic metrics in RIR estimation. Additionally, it also achieves higher preference scores in human evaluation. As an auxiliary benefit, dereverbed speech from AV-RIR shows competitive performance with the state-of-the-art in various spoken language processing tasks and outperforms reverberation time error score in the real-world AVSpeech dataset. Qualitative examples of both synthesized reverberant speech and enhanced speech are available online 1 .
arXiv (Cornell University), 2022
The speech transmission index (STI) and room acoustic parameters (RAPs), which are derived from a room impulse response (RIR), such as reverberation time and early decay time, are essential to assess speech transmission and to predict the listening difficulty in a sound field. Since it is difficult to measure RIR in daily occupied spaces, simultaneous blind estimation of STI and RAPs must be resolved as it is an imperative and challenging issue. This paper proposes a deterministic method for blindly estimating STI and five RAPs on the basis of an RIR stochastic model that approximates an unknown RIR. The proposed method formulates a temporal power envelope of a reverberant speech signal to obtain the optimal parameters for the RIR model. Simulations were conducted to evaluate STI and RAPs from observed reverberant speech signals. The root-mean-square errors between the estimated and ground-truth results were used to comparatively evaluate the proposed method with the previous method. The results showed that the proposed method can estimate STI and RAPs effectively without any training.
Acoustics Research Letters Online, 2004
EURASIP Journal on Advances in Signal Processing, 2010
Sound source localization is an important feature in robot audition. This work proposes a sound source number and directions estimation method under a multisource reverberant environment. An eigenstructure-based generalized cross-correlation method is proposed to estimate time delay among microphones. A source is considered as a candidate if the corresponding time delay combination among microphones gives reasonable sound speed estimation. Under reverberation, some candidates might be spurious but their direction estimations are not consistent for consecutive data frames. Therefore, an adaptive K-means++ algorithm is proposed to cluster the accumulated results from the sound speed selection mechanism. Experimental results demonstrate the performance of the proposed algorithm in a real room. K k=1
IEEE Access
The speech transmission index (STI) and room acoustic parameters (RAPs) are essential metrics for assessing speech quality and predicting listening difficulty in a sound field. Although STI and important RAPs, such as reverberation time and clarity, can be derived from the room impulse response (RIR), measuring the RIR in regularly occupied spaces is difficult. Hence, simultaneous blind estimation of STI and RAPs is an imperative challenge issue that must be addressed. However, most existing methods provide only a single parameter and require a massive dataset for model training. A deterministic method is presented for blindly estimating STI and five RAPs using a stochastic RIR model that approximates an unknown RIR. An algorithm is formulated that uses the temporal power envelope of a reverberant speech signal to determine the optimal parameters of the RIR model. A mathematical model of reverberation and dereverabation process was proposed is based on the temporal power envelope of the signals. This model maps the parameters of the RIR model to the observed reverberant signal. The estimated RIR can then be synthesized using the optimal parameters to estimate the STI and RAPs. A simulation was conducted to evaluate the simultaneous estimation of STI and five essential RAPs from observed reverberant speech signals, in comparison to the best existing previous work. The root-mean-square error (RMSE) and Pearson correlation coefficient between the estimated and measured values were used as evaluation metrics. In terms of STI, the proposed method achieves the best accuracy with an RMSE of 0.037. With regard to the reverberation time and other RAPs, the accuracy remains consistent with the previous works. The results show that the proposed method can effectively estimate STI and RAPs simultaneously without any training. INDEX TERMS Room impulse response, modulation transfer function, speech transmission index, room acoustic parameters
IEEE Journal of Selected Topics in Signal Processing
Acoustic source localization (ASL) is a fundamental yet still challenging signal processing problem in sound acquisition, speech communication, and human-machine interfaces. Many ASL algorithms have been developed, such as the steered response power (SRP), the SRP-phase transform (SRP-PHAT), the minimum variance distortionless response (MVDR), the multiple signal classification (MUSIC), the Householder transform based methods, to name but a few. Most of those algorithms require hundreds or even thousands of snapshots to produce one reliable estimate, which make them difficult to track moving sources. Moreover, not much efforts have been reported in the literature to show the intrinsic relationships among those methods. This paper deals with the ASL problem with its focal point placed on how to achieve ASL with a short frame of acoustic signal (corresponding to a single snapshot in the frequency domain). It reformulates the ASL problem from the perspective of geometric projection. Four types of power functions are proposed, leading to several different algorithms for ASL. By analyzing those power functions, we show the equivalence between the popularly used conventional algorithms and our methods, which provides some new insights into the conventional algorithms. The relationships among different algorithms are discussed, which make it easy to comprehend the pros and cons of each of those methods. Experiments in real acoustic environments corroborate the theoretical analysis, which in turn justifies the contribution of this paper.
Sensors, 2022
In this paper, we propose a data-driven approach for the reconstruction of unknown room impulse responses (RIRs) based on the deep prior paradigm. We formulate RIR reconstruction as an inverse problem. More specifically, a convolutional neural network (CNN) is employed prior, in order to obtain a regularized solution to the RIR reconstruction problem for uniform linear arrays. This approach allows us to avoid assumptions on sound wave propagation, acoustic environment, or measuring setting made in state-of-the-art RIR reconstruction algorithms. Moreover, differently from classical deep learning solutions in the literature, the deep prior approach employs a per-element training. Therefore, the proposed method does not require training data sets, and it can be applied to RIRs independently from available data or environments. Results on simulated data demonstrate that the proposed technique is able to provide accurate results in a wide range of scenarios, including variable direction of arrival of the source, room T 60 , and SNR at the sensors. The devised technique is also applied to real measurements, resulting in accurate RIR reconstruction and robustness to noise compared to state-of-the-art solutions.
Security Informatics, 2014
Acoustic environment leaves its characteristic signature in the audio recording captured in it. The acoustic environment signature can be modeled using acoustic reverberations and background noise. Acoustic reverberation depends on the geometry and composition of the recording location. The proposed scheme uses similarity in the estimated acoustic signature for acoustic environment identification (AEI). We describe a parametric model to realize acoustic reverberation, and a statistical framework based on maximum likelihood estimation is used to estimate the model parameters. The density-based clustering is used for automatic AEI using estimated acoustic parameters. Performance of the proposed framework is evaluated for two data sets consisting of hand-clapping and speech recordings made in a diverse set of acoustic environments using three microphones. Impact of the microphone type variation, frequency, and clustering accuracy and efficiency on the performance of the proposed method is investigated. Performance of the proposed method is also compared with the existing state-of-the-art (SoA) for AEI.
2005
In several scenarios it is desired to obtain an estimate of only the first part of the room impulse response, e.g. due to computing power restrictions. Room impulse response estimation is also often required in continuous doubletalk situations. In this paper we show that the PEM-AFROW algorithm which has recently been proposed for acoustic feedback cancellation, can be used in these situations to provide a low variance estimate with only a small bias.
In this paper, we propose a distant-talking speaker recognition method using a reverberation model with various artificial room impulse responses. These artificial room impulse responses with different speaker and microphone positions, room sizes, and reflection coefficients of walls and convoluted with clean speech are used to train an artificial reverberation speaker model. This artificial reverberation model is also combined with a reverberation speaker model trained with room impulse responses measured in real environments. Speaker identification performance using a combination of the two reverberation speaker models achieved a relative error reduction rate of 50.0% and 78.4% compared with that using a reverberation model trained with real-world room impulse responses and a clean speech model, respectively.
arXiv (Cornell University), 2021
This paper presents dEchorate: a new database of measured multichannel Room Impulse Responses (RIRs) including annotations of early echo timings and 3D positions of microphones, real sources and image sources under different wall configurations in a cuboid room. These data provide a tool for benchmarking recent methods in echo-aware speech enhancement, room geometry estimation, RIR estimation, acoustic echo retrieval, microphone calibration, echo labeling and reflectors estimation. The database is accompanied with software utilities to easily access, manipulate and visualize the data as well as baseline methods for echo-related tasks.
Speech Communication, 2015
This paper presents a practical technique for Automatic speech recognition (ASR) in multiple reverberant environments based on multi-model selection. Multiple ASR models are trained with artificial synthetic room impulse responses (IRs), i.e. simulated room IRs, with different reverberation time (T Model 60 s) and tested on real room IRs with varying T Room 60
2020 28th European Signal Processing Conference (EUSIPCO), 2021
We introduce a database of multi-channel recordings performed in an acoustic lab with adjustable reverberation time. The recordings provide detailed information about room acoustics for positions of a source within a confined area. In particular, the main positions correspond to 4104 vertices of a cube-shaped dense grid within a 46 × 36 × 32 cm volume. The database can serve for simulations of a real-world situations and as a tool for detailed analyses of beampatterns of spatial processing methods. It could be used also for training and testing of mathematical models of the acoustic field.
IEEE Transactions on Speech and Audio Processing, 2003
Room reverberation is typically the main obstacle for designing robust microphone-based source localization systems. The purpose of the paper is to analyze the achievable performance of acoustical source localization methods when room reverberation is present.
EURASIP Journal on Audio, Speech, and Music Processing, 2021
This paper presents a new dataset of measured multichannel room impulse responses (RIRs) named dEchorate. It includes annotations of early echo timings and 3D positions of microphones, real sources, and image sources under different wall configurations in a cuboid room. These data provide a tool for benchmarking recent methods in echo-aware speech enhancement, room geometry estimation, RIR estimation, acoustic echo retrieval, microphone calibration, echo labeling, and reflector position estimation. The dataset is provided with software utilities to easily access, manipulate, and visualize the data as well as baseline methods for echo-related tasks.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.