Outputs are not consistent when using DeBERTa for inference

### System Info

- `transformers` version: 4.34.1
- Platform: Windows-10-10.0.22621-SP0
- Python version: 3.8.18
- Huggingface_hub version: 0.17.3
- Safetensors version: 0.4.0
- Accelerate version: 0.24.0
- Accelerate config:    not found
- PyTorch version (GPU?): 2.1.0+cu121 (True)


### Who can help?

@ArthurZucker @younesbelkada

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

I am running a simple mask-filling task on DeBERTa and find that the output logits of the same input sentence vary every run (reloading the model from the disk).
I have set `eval()` `manual_seed()`, but they do not work. And the outputs are very different and do not look like they are caused by random seeds.
Even the [official discript](https://huggingface.co/docs/transformers/model_doc/deberta#transformers.DebertaForMaskedLM) shows the same problem. By the way, it works well when holding the model in memory and feed the same input twice to it.
```
from transformers import AutoTokenizer, DebertaForMaskedLM
import torch

tokenizer = AutoTokenizer.from_pretrained('microsoft/deberta-base', cache_dir='model_cache')
model = DebertaForMaskedLM.from_pretrained('microsoft/deberta-base', cache_dir='model_cache').eval()

inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# retrieve index of [MASK]
mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]

predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
tokenizer.decode(predicted_token_id)

labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
# mask labels of non-[MASK] tokens
labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)

outputs = model(**inputs, labels=labels)
print(round(outputs.loss.item(), 2))
# Run twice without reloading models work fine
# with torch.no_grad():
#     logits = model(**inputs).logits
# 
# # retrieve index of [MASK]
# mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
# 
# predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
# tokenizer.decode(predicted_token_id)
# 
# labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
# # mask labels of non-[MASK] tokens
# labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)
# 
# outputs = model(**inputs, labels=labels)
# print(round(outputs.loss.item(), 2))

```
### Expected behavior

By feeding the same input, I should get the same outputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outputs are not consistent when using DeBERTa for inference #27586

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Outputs are not consistent when using DeBERTa for inference #27586

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions