Skip to content

Model Release: Tacotron2 with Forward Attention - LJSpeech #345

@erogol

Description

@erogol

Model Link: https://drive.google.com/open?id=10ymOlWHutqTtfDYhIbHULn2IKDKP0O9m
Colab example: https://colab.research.google.com/drive/1cpofjnfKSpFhiREgExENIsum4MrqxyPR

This model is trained with Forward Attention enabled until ~400K iters and then finetuned with Batch Norm prenet until the end. It is the best model so far trained.

I observe once again that using BN based prenet improves the spectrogram quality considerablly but if you train it from scratch, model does not learn the attention.

You can also use this TTS model with PWGAN or WaveRNN vocoders. PWGAn provides real-time voice synthesis and WaveRNN is slower but provides better quality.

https://github.com/erogol/ParallelWaveGAN
https://github.com/erogol/WaveRNN

You can see the TB figures below:

image

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    model-releaseexplanation for new model releases

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions