WIP Conversion from/to huggingface's downstream models by tholor · Pull Request #196 · deepset-ai/FARM

tholor · 2020-01-16T17:35:07Z

Let's improve the compatibility with huggingface's transformers. So far we could only load the LanguageModel and Tokenizer from their format.

I really like the idea of their new model hub (https://github.com/huggingface/transformers).
Let's add support to use those downstream models in FARM (huggingface -> FARM) and vice-versa upload FARM models there (FARM -> huggingface).

With this, there's no "lock-in" for users and they can choose whatever framework works best for them in their phase of development :)

examples/conversion_huggingface_models.py

tholor · 2020-01-21T18:28:43Z

Ok let's merge this basic version & increase functionality in subsequent PRs (especially for NER etc.)

tnhaider · 2020-10-05T11:17:11Z

Hi everyone,

I tried converting some models. For multi-label text classification head it tells me that only 1 head is allowed. I can work around that. However, for a model with TokenClassificationHead it tells me the following:

  File "conversion_huggingface_models.py", line 43, in convert_to_transformers
    transformer_model = model.convert_to_transformers()
  File "/mnt/beegfs/users/thomas.haider/Documents/workspace/poetry/farm/FARM/farm/modeling/adaptive_model.py", line 543, in convert_to_transformers
    self.language_model.model.config.id2label = {id: label for id, label in enumerate(self.prediction_heads[0].label_list)}
  File "/hpc/users/thomas.haider/Documents/workspace/python37-venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 594, in __getattr__
    type(self).__name__, name))
AttributeError: 'TokenClassificationHead' object has no attribute 'label_list'

Any idea what's going on there?

Thanks.

tnhaider · 2020-10-05T13:21:37Z

With a single prediction head I get the following error. It doesn't matter which script (w/ or w/o classification).

  File "conversion_huggingface_models_classification.py", line 64, in <module>
    convert_to_transformers(sys.argv[1], sys.argv[2])
  File "conversion_huggingface_models_classification.py", line 42, in convert_to_transformers
    model = AdaptiveModel.load(farm_input_dir, device="cpu")
  File "/mnt/beegfs/users/thomas.haider/Documents/workspace/poetry/farm/FARM/farm/modeling/adaptive_model.py", line 339, in load
    head = PredictionHead.load(config_file, strict=strict)
  File "/mnt/beegfs/users/thomas.haider/Documents/workspace/poetry/farm/FARM/farm/modeling/prediction_head.py", line 117, in load
    prediction_head.load_state_dict(torch.load(model_file, map_location=torch.device("cpu")), strict=strict)
  File "/home/thomas.haider/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TextClassificationHead:
	Unexpected key(s) in state_dict: "loss_fct.weight".

Timoeller · 2020-10-05T17:19:29Z

About `Unexpected key(s) in state_dict: "loss_fct.weight"

We currently do not support class weights when converting models. So you would need to train a FARM model without class weights or somehow exclude the weights from conversion.

About `AttributeError: 'TokenClassificationHead' object has no attribute 'label_list'

Could you please create a separate issue with a minimal example script, so we can reproduce your error? Similar to how we managed #553

tholor added 4 commits January 16, 2020 09:32

WIP conversion

6d822c2

update input format for qa inference

9fe67c3

QA working. WIP for other tasks

2cd39f4

qa and pure embeddings working in basic version.

d821cac

tholor changed the title ~~Conversion from/to huggingface's downstream models~~ WIP Conversion from/to huggingface's downstream models Jan 16, 2020

tholor added 14 commits January 17, 2020 09:43

qa and emb working for hf -> farm. other tasks sketched

1f04946

adding farm-> transformers conversion. text classification WIP

2a62c94

Add working conversion for text_classification

b82d2c6

add docstring

62aab7b

merge updates from current master

5709243

quickfixes for pathlib

85aa31c

quickfixes for pathlib

bf3a198

Merge branch 'master' into conversion_huggingface

52c2c27

more pathlib fixes and example for converting doc classif model

5e40f9c

fix dependency url

b52e369

Simplify embedding extraction. Minor fixes.

58a838b

minor changes related to pathlib

9f16223

fix embedding extraction args

095532b

add very basic tests for conversion. make qa test use a single process

d2b8dd2

tanaysoni approved these changes Jan 21, 2020

View reviewed changes

tanaysoni reviewed Jan 21, 2020

View reviewed changes

examples/conversion_huggingface_models.py Show resolved Hide resolved

make example windows compatible

ff7c183

merging latest master

4647bb2

tholor merged commit 2b9873e into master Jan 21, 2020

tholor deleted the conversion_huggingface branch April 28, 2020 07:31

Timoeller mentioned this pull request Oct 6, 2020

conversion to huggingface transformers fails for NER models #570

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Conversion from/to huggingface's downstream models#196

WIP Conversion from/to huggingface's downstream models#196
tholor merged 20 commits intomasterfrom
conversion_huggingface

tholor commented Jan 16, 2020

Uh oh!

Uh oh!

tholor commented Jan 21, 2020

Uh oh!

tnhaider commented Oct 5, 2020

Uh oh!

tnhaider commented Oct 5, 2020

Uh oh!

Timoeller commented Oct 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tholor commented Jan 16, 2020

Uh oh!

Uh oh!

tholor commented Jan 21, 2020

Uh oh!

tnhaider commented Oct 5, 2020

Uh oh!

tnhaider commented Oct 5, 2020

Uh oh!

Timoeller commented Oct 5, 2020

About `Unexpected key(s) in state_dict: "loss_fct.weight"

About `AttributeError: 'TokenClassificationHead' object has no attribute 'label_list'

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants