-
Notifications
You must be signed in to change notification settings - Fork 32.7k
DeBERTa can't load some parameters #18659
Copy link
Copy link
Closed
Labels
Description
System Info
transformersversion: 4.21.1- Platform: Linux-5.4.0-81-generic-x86_64-with-glibc2.31
- Python version: 3.9.12
- Huggingface_hub version: 0.8.1
- PyTorch version (GPU?): 1.11.0+cu113 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
- Reproduction
from transformers import pipeline
text = "The capital of France is [MASK]"
mlm_pipeline = pipeline('fill-mask', model='microsoft/deberta-base', tokenizer='microsoft/deberta-base')
print(mlm_pipeline(text))- Warning Message
Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing DebertaForMaskedLM: ['lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.dense.weight', 'deberta.embeddings.position_embeddings.weight', 'lm_predictions.lm_head.LayerNorm.weight']
- This IS expected if you are initializing DebertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DebertaForMaskedLM were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
- Output
The capital of France isumption
The capital of France is�
The capital of France iszag
The capital of France isreply
The capital of France isnerg
Expected behavior
When DeBERTa model load using transformers, It seems that doesn't load the weights needed for the MLM head. (+ positional embedding weights).
There are some issues similar to mine.
- DebertaForMaskedLM cannot load the parameters in the MLM head from microsoft/deberta-base #15216
- DebertaForMaskedLM cannot load the parameters in the MLM head #15673
- Pretrained models for masked LM do not work as expected microsoft/DeBERTa#74
But it doesn't seem to be working out yet.
Can you check it?
Reactions are currently unavailable