Skip to content

Development plan for ESPnet2 singing voice synthesis #4437

@A-Quarter-Mile

Description

@A-Quarter-Mile

We are now migrating Muskit, an open-source music processing toolkit, into ESPnet2.
Muskit focuses on benchmarking the end-to-end singing voice synthesis and expects to extend more tasks in the future. The main structure and base codes are adapted from ESPnet.

We also expect to make some new attempts in combination with the existing tasks (eg. TTS) under ESPnet2. We welcome your suggestions and contributions!

Code Merging

  • Merge modules from Muskit (mainly under the following two folders)
    • tools/
    • muskit/
  • Add authorship notes

Networks

  • RNN-based non-autoregressive model
  • Xiaoice
  • Sequence-to-sequence Transformer (with GLU-based encoder)
  • MLP singer
  • Tacotron-singing
  • DiffSinger
  • VISinger

Recipes

  • CSD
  • Itako
  • Kiritan
  • KiSing
  • Multilingual_four
  • NIT_song070
  • No7singing
  • Ofuton_p_utagoe_db
  • Oniku_kurumi_utagoe_db
  • Opencpop
  • PJS
  • JSUT
  • Ameboshi_ciphyer_utagoe_db

Documentation

  • Installation
  • Running instructions
  • Recipe explanation
  • pretrained_models

New Functions

  • Add musicXML in front-end
  • Add CI test
  • Upload to Huggingface

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions