Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2014, Security Informatics
…
17 pages
1 file
Acoustic environment leaves its characteristic signature in the audio recording captured in it. The acoustic environment signature can be modeled using acoustic reverberations and background noise. Acoustic reverberation depends on the geometry and composition of the recording location. The proposed scheme uses similarity in the estimated acoustic signature for acoustic environment identification (AEI). We describe a parametric model to realize acoustic reverberation, and a statistical framework based on maximum likelihood estimation is used to estimate the model parameters. The density-based clustering is used for automatic AEI using estimated acoustic parameters. Performance of the proposed framework is evaluated for two data sets consisting of hand-clapping and speech recordings made in a diverse set of acoustic environments using three microphones. Impact of the microphone type variation, frequency, and clustering accuracy and efficiency on the performance of the proposed method is investigated. Performance of the proposed method is also compared with the existing state-of-the-art (SoA) for AEI.
20th ACM international conference on Multimedia - MM '12, 2012
This paper presents a system for identifying the room in an audio or video recording through the analysis of acoustical properties. The room identification system was tested using a corpus of 13440 reverberant audio samples. With no common content between the training and testing data, an accuracy of 61% for musical signals and 85% for speech signals was achieved. This approach could be applied in a variety of scenarios where knowledge about the acoustical environment is desired, such as location estimation, music recommendation, or emergency response systems.
International Journal of Advance Research and Innovative Ideas in Education, 2015
There are many artifact and different distortions present in the recording. The reverberation depends on the volume properties of the room and it causes the calumniation of the recording. The background noise depends on the unnecessary audio source activities present in the evident recording. For audio to be considered as proof in a court, its authenticity must be verified. A blind deconvolution method based on FIR filtering and overlap add method is used to estimate reverberation time. Particle filtering is used to estimate the background noise. Feature extraction is done by using MFCC approach. The 128 Dimensional feature vector is the addition of features from acoustic reverberation and background noise and the higher order statistics. SVM classifier is used for classification of the environments. The performance of the system is checked using audio recordings dataset. The SVM classifier provides best results for the trained dataset and moderate results for untrained dataset.
IEEE Transactions on Information Forensics and Security, 2000
An audio recording is subject to a number of possible distortions and artifacts. Consider, for example, artifacts due to acoustic reverberation and background noise. The acoustic reverberation depends on the shape and the composition of the room, and it causes temporal and spectral smearing of the recorded sound. The background noise, on the other hand, depends on the secondary audio source activities present in the evidentiary recording. Extraction of acoustic cues from an audio recording is an important but challenging task. Temporal changes in the estimated reverberation and background noise can be used for dynamic acoustic environment identification (AEI), audio forensics, and ballistic settings. We describe a statistical technique based on spectral subtraction to estimate the amount of reverberation and nonlinear filtering based on particle filtering to estimate the background noise. The effectiveness of the proposed method is tested using a data set consisting of speech recordings of two human speakers (one male and one female) made in eight acoustic environments using four commercial grade microphones. Performance of the proposed method is evaluated for various experimental settings such as microphone independent, semi-and full-blind AEI, and robustness to MP3 compression. Performance of the proposed framework is also evaluated using Temporal Derivative-based Spectrum and Mel-Cepstrum (TDSM)-based features. Experimental results show that the proposed method improves AEI performance compared with the direct method (i.e., feature vector is extracted from the audio recording directly). In addition, experimental results also show that the proposed scheme is robust to MP3 compression attack.
EURASIP Journal on Advances in Signal Processing, 2010
Sound source localization is an important feature in robot audition. This work proposes a sound source number and directions estimation method under a multisource reverberant environment. An eigenstructure-based generalized cross-correlation method is proposed to estimate time delay among microphones. A source is considered as a candidate if the corresponding time delay combination among microphones gives reasonable sound speed estimation. Under reverberation, some candidates might be spurious but their direction estimations are not consistent for consecutive data frames. Therefore, an adaptive K-means++ algorithm is proposed to cluster the accumulated results from the sound speed selection mechanism. Experimental results demonstrate the performance of the proposed algorithm in a real room. K k=1
ACM Transactions on Speech and Language Processing, 2006
The acoustic environment provides a rich source of information on the types of activity, communication modes, and people involved in many situations. It can be accurately classified using recordings from microphones commonly found in PDAs and other consumer devices. We describe a prototype HMM-based acoustic environment classifier incorporating an adaptive learning mechanism and a hierarchical classification model. Experimental results show that we can accurately classify a wide variety of everyday environments. We also show good results classifying single sounds, although classification accuracy is influenced by the granularity of the classification.
IEEE Transactions on Information Forensics and Security, 2013
An audio recording is subject to a number of possible distortions and artifacts. Consider, for example, artifacts due to acoustic reverberation and background noise. The acoustic reverberation depends on the shape and the composition of a room, and it causes temporal and spectral smearing of the recorded sound. The background noise, on the other hand, depends on the secondary audio source activities present in the evidentiary recording. Extraction of acoustic cues from an audio recording is an important but challenging task. Temporal changes in the estimated reverberation and background noise can be used for dynamic acoustic environment identification (AEI), audio forensics, and ballistic settings. We describe a statistical technique to model and estimate the amount of reverberation and background noise variance in an audio recording. An energy-based voice activity detection method is proposed for automatic decaying-tail-selection from an audio recording. Effectiveness of the proposed method is tested using a data set consisting of speech recordings. The performance of the proposed method is also evaluated for both speaker-dependent and speaker-independent scenarios.
2020
The goal of this project was to explore Computational Auditory Scene Analysis (CASA), specifically, blind source separation in reverberant environments. Additionally, speaker identification, vowel classification and speech generation were also explored.
The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology, 2004
This communication presents a new method for automatic speech recognition in reverberant environments. Our approach consists in the selection of the best acoustic model out of a library of models trained on artificially reverberated speech databases corresponding to various reverberant conditions. Given a speech utterance recorded within a reverberant room, a Maximum Likelihood estimate of the fullband room reverberation time is computed using a statistical model for short-term log-energy sequences of anechoic speech. The estimated reverberation time is then used to select the best acoustic model, i.e., the model trained on a speech database most closely matching the estimated reverberation time, which serves to recognize the reverberated speech utterance. The proposed model selection approach is shown to improve significantly recognition accuracy for a connected digit task in both simulated and real reverberant environments, outperforming standard channel normalization techniques.
Journal of The Acoustical Society of America, 2008
This paper compares two methods for extracting room acoustic parameters from reverberated speech and music. An approach which uses statistical machine learning, previously developed for speech, is extended to work with music. For speech, reverberation time estimations are within a perceptual difference limen of the true value. For music, virtually all early decay time estimations are within a difference limen of the true value. The estimation accuracy is not good enough in other cases due to differences between the simulated data set used to develop the empirical model and real rooms. The second method carries out a maximum likelihood estimation on decay phases at the end of notes or speech utterances. This paper extends the method to estimate parameters relating to the balance of early and late energies in the impulse response. For reverberation time and speech, the method provides estimations which are within the perceptual difference limen of the true value. For other parameters such as clarity, the estimations are not sufficiently accurate due to the natural reverberance of the excitation signals. Speech is a better test signal than music because of the greater periods of silence in the signal, although music is needed for low frequency measurement.
The evaluation of an acoustical situation in the city can be in general be done in two ways. Quantitative assessment, also called "object-related description", uses known methods of statistical noise analysis and is rather straight-forward. Assessing the quality of a soundscapexis more difficult, since it does not always lead to the answer "good" or "bad". It rather deals with question "how different" are different soundscapes in their quality, and how they can be classified into categories. An acoustical situation in a city is determined by many different factors and the perception of the urban soundscape by different people also reflects the difference between their subjective judgment criteria. Some places in the city may sound similar and some very different. Some places might be acoustically judged as pleasant and some as annoying. However, the same objective acoustical situation is sometimes judged differently if it occurs in two different places. This observation makes us ask a question: On what does the human perception and evaluation of sound depend?
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), 2014
Artificial Intelligence and Soft Computing, 2018
6th International Conference on Spoken Language Processing (ICSLP 2000)
arXiv (Cornell University), 2018
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016
EURASIP Journal on Advances in Signal Processing, 2007
EURASIP Journal on Advances in Signal Processing, 2013
IPSN-14 Proceedings of the 13th International Symposium on Information Processing in Sensor Networks, 2014
2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013