Audio Source Separation using deep learning

Objective: Separate a mixture audio track into individual stems corresponding to drums, bass, vocals, and other instruments. Using a classical technique in non-negative matrix factorization and a state-of-the-art deep learning model architecture in open unmix, the above task is acheived.

All the code is implemented in PyTorch and Python.

Watch video here: https://drive.google.com/file/d/1mQYxrSR_2vEwZfQ6rOPChYq28Bkrh5fx/view

Full Report: Audio Source Separation.pdf

Dataset - MUSDB18

MUSDB18 is the largest freely available dataset for source separation till date. It consists of 150 full-length music tracks (totalling 10 hours) from different genres along with their iso- lated drums, bass, vocals and accompaniment stems [5]. The training set is comprised of 100 tracks, with the remaining 50 tracks constituting the test set. Each instance in the dataset is a 44.1kHz stereo track composed of the full mix plus four stems.

Non deep learning method - Non Negative Matrix Factorization

Non-negative matrix factorization (NMF) is one of the classical methods used to decompose the magnitude of time-frequency distributions in audio processing.

The fundamental process of NMF is factorizing the matrix V containing the audio data spectrogram into two separate matrices referred to as bases (W ) and activations (H ) respectively. All three matrices V, W, H are non-negative.

V_{m x n} = W_{m x r}H_{r x n}

r = min(m, n) is the number of source components

NP-hard problem and no close form solution

See full implementation here

Deep Learning method - Open Unmix

Open-Unmix is a deep neural network reference implementation for music source separation.

Open-Unmix is based on a three-layer bidirectional deep LSTM. The model learns to predict the magnitude spectrogram of a target, like vocals, from the magnitude spectrogram of a mixture input. Internally, the prediction is obtained by applying a mask on the input. The model is optimized in the magnitude domain using mean squared error and the actual separation is done in a post-processing step involving a multichannel wiener filter implemented using norbert. To perform separation into multiple sources, multiple models are trained for each particular target. While this makes the training less comfortable, it allows great flexibility to customize the training data for each target source.

See full implementation here

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Audio Source Separation.pdf		Audio Source Separation.pdf
NMF_implementation.ipynb		NMF_implementation.ipynb
Open_Unmix_Implementation.ipynb		Open_Unmix_Implementation.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio Source Separation using deep learning

Dataset - MUSDB18

Non deep learning method - Non Negative Matrix Factorization

Deep Learning method - Open Unmix

References

About

Uh oh!

Releases

Packages

Languages

LaughBuddha/Audio-Source-Separation

Folders and files

Latest commit

History

Repository files navigation

Audio Source Separation using deep learning

Dataset - MUSDB18

Non deep learning method - Non Negative Matrix Factorization

Deep Learning method - Open Unmix

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages