Skip to content

[Discussion] Using Spleeter pretrained tensorflow models directly #155

@divideconcept

Description

@divideconcept

I'm mostly interested in using Spleeter's pretrained models directly within my C++ application, skipping as much as possible the provided python scripts.

My understanding of how to use those pretrained models is as follow (based on the 4stems base_config.json) :

  1. convert audio data to 44100Hz/stereo/float32
  2. transform it to 2 complex spectrograms (one per channel, fft size 4096 samples, hann windowing, fft step 1024 samples)
  3. convert the 2 complex spectrograms to 2 magnitude spectrograms
  4. take the first 512 ffts/lowest 1024 bins of those 2 magnitude spectrogram, resulting in a 512x1024x2 float32 data block
  5. feed that 512x1024x2 float32 data block to the pretrained model (using tensorflow), and get 4 512x1024x2 float32 prediction data blocks back
  6. use the resulting 4 predictions, the original magnitude spectrogram and the separation_exponent to compute 4 instruments masks: instrument_masks=(predictions^separation_exponent)/(original_magnitude_spectrogram^separation_exponent)
  7. apply the instrument mask to the original complex spectrograms and compute the inverse transforms to get 4 audio stems
  8. move by 512 ffts, and do it again from step 4 until the end of the file.

Am I correct or did I misunderstand or missed something ? Is the tensorflow model inputting and outputting float32 numbers ?

thanks !

Metadata

Metadata

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions