Skip to content

Outputs are not consistent when using DeBERTa for inference #27586

@polarispw

Description

@polarispw

System Info

  • transformers version: 4.34.1
  • Platform: Windows-10-10.0.22621-SP0
  • Python version: 3.8.18
  • Huggingface_hub version: 0.17.3
  • Safetensors version: 0.4.0
  • Accelerate version: 0.24.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.1.0+cu121 (True)

Who can help?

@ArthurZucker @younesbelkada

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I am running a simple mask-filling task on DeBERTa and find that the output logits of the same input sentence vary every run (reloading the model from the disk).
I have set eval() manual_seed(), but they do not work. And the outputs are very different and do not look like they are caused by random seeds.
Even the official discript shows the same problem. By the way, it works well when holding the model in memory and feed the same input twice to it.

from transformers import AutoTokenizer, DebertaForMaskedLM
import torch

tokenizer = AutoTokenizer.from_pretrained('microsoft/deberta-base', cache_dir='model_cache')
model = DebertaForMaskedLM.from_pretrained('microsoft/deberta-base', cache_dir='model_cache').eval()

inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# retrieve index of [MASK]
mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]

predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
tokenizer.decode(predicted_token_id)

labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
# mask labels of non-[MASK] tokens
labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)

outputs = model(**inputs, labels=labels)
print(round(outputs.loss.item(), 2))
# Run twice without reloading models work fine
# with torch.no_grad():
#     logits = model(**inputs).logits
# 
# # retrieve index of [MASK]
# mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
# 
# predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
# tokenizer.decode(predicted_token_id)
# 
# labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
# # mask labels of non-[MASK] tokens
# labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)
# 
# outputs = model(**inputs, labels=labels)
# print(round(outputs.loss.item(), 2))

Expected behavior

By feeding the same input, I should get the same outputs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions