Model Release: Tacotron2 with Forward Attention - LJSpeech

Model Link: https://drive.google.com/open?id=10ymOlWHutqTtfDYhIbHULn2IKDKP0O9m
Colab example: https://colab.research.google.com/drive/1cpofjnfKSpFhiREgExENIsum4MrqxyPR

This model is trained with Forward Attention enabled until ~400K iters and then finetuned with Batch Norm prenet until the end. It is the best model so far trained.

I observe once again that using BN based prenet improves the spectrogram quality considerablly but if you train it from scratch, model does not learn the attention.

You can also use this TTS model with PWGAN or WaveRNN vocoders. PWGAn provides real-time voice synthesis and WaveRNN is slower but provides better quality.

https://github.com/erogol/ParallelWaveGAN
https://github.com/erogol/WaveRNN

You can see the TB figures below:

![image](https://user-images.githubusercontent.com/1402048/73955677-03c28d00-4904-11ea-80d5-e050539e733f.png)

![image](https://user-images.githubusercontent.com/1402048/73955784-31a7d180-4904-11ea-8c14-ba59d7ab95cf.png)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model Release: Tacotron2 with Forward Attention - LJSpeech #345

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model Release: Tacotron2 with Forward Attention - LJSpeech #345

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions