-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Model Link: https://drive.google.com/open?id=10ymOlWHutqTtfDYhIbHULn2IKDKP0O9m
Colab example: https://colab.research.google.com/drive/1cpofjnfKSpFhiREgExENIsum4MrqxyPR
This model is trained with Forward Attention enabled until ~400K iters and then finetuned with Batch Norm prenet until the end. It is the best model so far trained.
I observe once again that using BN based prenet improves the spectrogram quality considerablly but if you train it from scratch, model does not learn the attention.
You can also use this TTS model with PWGAN or WaveRNN vocoders. PWGAn provides real-time voice synthesis and WaveRNN is slower but provides better quality.
https://github.com/erogol/ParallelWaveGAN
https://github.com/erogol/WaveRNN
You can see the TB figures below:

