Integrate UDify into AllenNLP

It would be useful to integrate the [UDify model](https://github.com/Hyperparticle/udify) directly into AllenNLP as a PR, as the code merely extends the library to handle a few extra features. Since the release of the UDify code, AllenNLP also has added a [multilingual UD dataset reader](https://github.com/allenai/allennlp/blob/master/allennlp/data/dataset_readers/universal_dependencies_multilang.py) and a [multilingual dependency parser](https://github.com/allenai/allennlp/blob/master/allennlp/models/biaffine_dependency_parser_multilang.py) with a [corresponding model](https://github.com/allenai/allennlp/blob/master/training_config/multilang_dependency_parser.jsonnet), which should make things easier.

Here is a list of things that need to be done:

- [ ] Add scripts to download and concatenate the UD data for training/evaluation. Also, add the CoNLL 2018 evaluation script.
- [ ] Create a UDify conllu -> conllu predictor that can handle unseen tokens and multiword ids.
- [ ] Add the sqrt learning rate decay LR scheduler.
- [ ] Add optional dropout to [ScalarMix](https://github.com/allenai/allennlp/blob/master/allennlp/modules/scalar_mix.py).
- [ ] Modify the multilingual UD dataset reader to handle multiword ids.
- [ ] Add lemmatizer edit script code.
- [ ] Modify the [BERT token embedder](https://github.com/allenai/allennlp/blob/master/allennlp/modules/token_embedders/bert_token_embedder.py) to be able to return multiple scalar mixes, one per task (or alternatively all the embeddings). Add optional args for internal BERT dropout.
- [ ] Add generic dynamic masking functions.
- [ ] Add the custom sequence tagger and biaffine dependency parser that handles a multi-task setup.
- [ ] Add the UDify main model, wrapping the BERT, dynamic masking, scalar mix, sequence tagger, and dependency parser code. Provide custom metrics for TensorBoard.
- [ ] Add utility code to optionally cache the vocab and grab UD treebank names from files.
- [ ] Add helper script to evaluate conllu predictions and output them to json.
- [ ] Add tests to verify the new UDify model and modules.
- [ ] Add UDify config jsonnet file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate UDify into AllenNLP #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Integrate UDify into AllenNLP #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions