WASPAA 2025 in Lake Tahoe
ISMIR 2025 in Daejeon, Korea
Darius Defended
Fraunhofer IIS and AudioLabs in Erlangen
University of Hamburg
Ajou Honorary Alumni
Anastasia Kuznetsova Defended
ICASSP 2025 in Hyderabad, India
Jackie Lin at ICASSP 2025 (GenDA 2025 Workshop)
Jaesung Bae at ICASSP 2025 (GenDA 2025 Workshop)
Introduction
My research revolves around making audio and speech AI more practical and useful. I aim to introduce concepts such as efficiency, personalization, scalability, and collaboration into the AI systems I develop. With those goals in mind and by combining signal processing, generative modeling, and machine learning, I develop adaptive systems for learning efficient data representations (e.g., neural audio coding), intelligent signal processing (e.g., speech enhancement and source separation), and generative modeling of audio.
Featured Projects
-
TGIF: A Family-Owned Voice AI
Overview In everyday life, our devices run many speech/audio applications that can benefit from the target speaker…
-
Audio Coding for Machines
Machine-Learned Latent Features Are Codes for That Machine! When we think about compressing sound, we usually…
-
Personalized Neural Speech Codec
Have you ever wondered about a speech codec that’s dedicated to your speech trait? Why? Of…
-
Scalable and Efficient Speech Enhancement Using Modified Cold Diffusion
As we’ve proposed in the BLOOM-Net project, scalability matters. Just to reiterate the argument here once…
-
LaDiffCodec: Generative De-Quantization for Neural Speech Codec via Latent Diffusion
Motivation We bring the cool generative power of a diffusion model to speech coding. We call…
-
Don’t Separate, Learn to Remix: End-to-End Neural Remixing
TLDR: In this project, we developed an end-to-end neural network system that takes a music mixture…
-
SpaIn-Net: Spatially Informed Music Source Separation
The spatial image of a music source is an essential feature in the stereophonic music listening…
-
BLOOM-Net: Scalability Matters
Scalability is a big deal when it comes to video coding. When you watch a movie…
-
Personalized Speech Enhancement
(Download Interspeech 2022 Tutorial Slides) The outstanding development in modern AI has relied greatly on the…
-
Psychoacoustic Loss Functions for Neural Audio Coding
Neural audio coding is an area where we want to compress an audio signal down to…
-
Neural Pitch Correction of Singing Voice
Have you ever wished if you were a good singer? Some people believe that it’s a…
-
Cross-Module Residual Learning for Neural Audio Coding
Speech/audio coding has traditionally involved substantial domain-specific knowledge such as speech generation models. If you haven’t…











