Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

Error in conversion of BertForMaskedLM with BertLMHead to HuggingFace transformers #533

@himanshurawlani

Description

@himanshurawlani

Describe the bug
I started with the lm_finetuning.py script to pre-train BERT on a domain-specific text corpus. After completing training, the saved artifacts included the following:

  • language_model.bin
  • language_model_config.json (BertForMaskedLM)
  • prediction_head_0.bin
  • prediction_head_0_config.json (BertLMHead)
  • prediction_head_1.bin
  • prediction_head_1_config.json (NextSentenceHead)
  • processor_config.json
  • special_tokens_map.json
  • tokenizer_config.json
  • vocab.txt

I know FARM currently doesn't support converting an AdaptiveModel with two prediction heads to a HuggingFace transformers model. However, after going through the AdaptiveModel.load() classmethod used in the model conversion script, I figured out that removing the prediction_head_1_config.json (NextSentenceHead) and prediction_head_1.bin (PyTorch model) files should result in successful conversion to HuggingFace model as the conversion script would consider it as an AdaptiveModel with single PredictionHead.

Error message
After removing the files and running the conversion script, I get an error saying:

Traceback (most recent call last):
  File "conversion_huggingface_models.py", line 88, in <module>
    convert_to_transformers("./farm_saved_models/bert-english-lm", 
  File "conversion_huggingface_models.py", line 46, in convert_to_transformers
    transformer_model = model.convert_to_transformers()
  File "/home/himanshu/.conda/envs/tf2/lib/python3.8/site-packages/farm/modeling/adaptive_model.py", line 509, in convert_to_transformers
    elif len(self.prediction_heads[0].layer_dims) != 2:
  File "/home/himanshu/.conda/envs/tf2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 771, in __getattr__
    raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'BertLMHead' object has no attribute 'layer_dims'

I made some changes to BertLMHead class by referring to HuggingFace's BertLMPredictionHead and added the following lines to init() method:

self.layer_dims = [hidden_size, vocab_size]
self.decoder.bias = self.bias

I also updated the foward() function by changing the line:

lm_logits = self.decoder(hidden_states) + self.bias

to

lm_logits = self.decoder(hidden_states)

Expected behavior
After making the above changes and re-training the model, the conversion script ran successfully and I was able to import the model into HuggingFace pipelines with the following code:

from transformers import pipeline

fill_mask = pipeline(
    "fill-mask",
    model="./bert-english-lm",
    tokenizer="./bert-english-lm"
)

Correct me if I'm wrong, I expected the conversion script to run successfully after removing the NextSentenceHead files. I'm not sure if this a bug or a feature request as I'm new to FARM. However, I was able to solve the issue with a few changes as described above.

System:

  • OS: Ubuntu 18.04.4 LTS (Bionic Beaver)
  • CPU: Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz
  • FARM version: 0.4.7

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions