-
Notifications
You must be signed in to change notification settings - Fork 58
Open
Description
It would be useful to integrate the UDify model directly into AllenNLP as a PR, as the code merely extends the library to handle a few extra features. Since the release of the UDify code, AllenNLP also has added a multilingual UD dataset reader and a multilingual dependency parser with a corresponding model, which should make things easier.
Here is a list of things that need to be done:
- Add scripts to download and concatenate the UD data for training/evaluation. Also, add the CoNLL 2018 evaluation script.
- Create a UDify conllu -> conllu predictor that can handle unseen tokens and multiword ids.
- Add the sqrt learning rate decay LR scheduler.
- Add optional dropout to ScalarMix.
- Modify the multilingual UD dataset reader to handle multiword ids.
- Add lemmatizer edit script code.
- Modify the BERT token embedder to be able to return multiple scalar mixes, one per task (or alternatively all the embeddings). Add optional args for internal BERT dropout.
- Add generic dynamic masking functions.
- Add the custom sequence tagger and biaffine dependency parser that handles a multi-task setup.
- Add the UDify main model, wrapping the BERT, dynamic masking, scalar mix, sequence tagger, and dependency parser code. Provide custom metrics for TensorBoard.
- Add utility code to optionally cache the vocab and grab UD treebank names from files.
- Add helper script to evaluate conllu predictions and output them to json.
- Add tests to verify the new UDify model and modules.
- Add UDify config jsonnet file.
tamuhey and KoichiYasuoka
Metadata
Metadata
Assignees
Labels
No labels