Deep learning models for generating audio textures

lonce wyse

Deep learning models for generating audio textures

lonce wyse

2020

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Audio textures are a superset of standard musical instrument timbres that include more complex sounds such as rain, wind, rolling, or scraping. With appropriate modeling strategies, textures can be synthesized under parametric control analogous to the way musical instruments are, and can then become a powerful creativity tools for music making. However, audio textures, with complex structure spanning multiple time scales, are a challenge to model and generate synthetically. They are even challenging to define. Deep learning approaches offer new ways to develop generative audio texture models, and they create different demands on training data than traditional modeling approaches, In this paper we briefly review previous modeling approaches, and attempt to rationalize and converge on a definition of textures using modeling concepts. We introduce a new and growing data set along with a system for managing metadata specifically designed for audio textures. Finally, we report on some re...

sean vasquez

ArXiv, 2019

Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps. While long-range dependencies are difficult to model directly in the time domain, we show that they can be more tractably modelled in two-dimensional time-frequency representations such as spectrograms. By leveraging this representational advantage, in conjunction with a highly expressive probabilistic model and a multiscale generation procedure, we design a model capable of generating high-fidelity audio samples which capture structure at timescales that time-domain models have yet to achieve. We apply our model to a variety of audio generation tasks, including unconditional speech generation, music generation, and text-to-speech synthesis---showing improvements over previous approaches in both density estimates and human judgments.

Log In

Deep learning models for generating audio textures

Sign up for access to the world's latest research

Abstract

Related papers