[Discussion] Using Spleeter pretrained tensorflow models directly

I'm mostly interested in using Spleeter's pretrained models directly within my C++ application, skipping as much as possible the provided python scripts.

My understanding of how to use those pretrained models is as follow (based on the 4stems base_config.json) :

1. convert audio data to 44100Hz/stereo/float32
2. transform it to 2 complex spectrograms (one per channel, fft size 4096 samples, hann windowing, fft step 1024 samples)
3. convert the 2 complex spectrograms to 2 magnitude spectrograms
4. take the first 512 ffts/lowest 1024 bins of those 2 magnitude spectrogram, resulting in a 512x1024x2 float32 data block
5. feed that 512x1024x2 float32 data block to the pretrained model (using tensorflow), and get 4 512x1024x2 float32 prediction data blocks back
6. use the resulting 4 predictions, the original magnitude spectrogram and the separation_exponent  to compute 4 instruments masks: instrument_masks=(predictions^separation_exponent)/(original_magnitude_spectrogram^separation_exponent)
7. apply the instrument mask to the original complex spectrograms and compute the inverse transforms to get 4 audio stems  
8. move by 512 ffts, and do it again from step 4 until the end of the file.

Am I correct or did I misunderstand or missed something ? Is the tensorflow model inputting and outputting float32 numbers ?

thanks !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Using Spleeter pretrained tensorflow models directly #155

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Discussion] Using Spleeter pretrained tensorflow models directly #155

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions