Skip to content

ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds Transformers Translation Tutorial Repro #24254

@SoyGema

Description

@SoyGema

System Info

Context

Hello There!
First and foremost, congrats for Transformers Translation tutorial. 👍
It serves as a Spark for building english-to-many translation languages models!
I´m following it along with TF mostly reproducing it in a jupyter Notebook with TF for mac with GPU enabled
Using the following dependency versions.

tensorflow-macos==2.9.0
tensorflow-metal==0.5.0
transformers ==4.29.2

* NOTE : tensorflow-macos dependencies are fixed for ensuring GPU training

Who can help?

@ArthurZucker @younesbelkada
@gante maybe?

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Issue Description

Im finding the following error when fitting the model for finetunning a model coming from TFAutoModelForSeq2SeqLM autoclass

with tf.device('/device:GPU:0'):
    model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=1, callbacks= callbacks ) 

It is returning

ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds
        
        
        Call arguments received by layer "decoder" (type TFT5MainLayer):
          • self=None
          • input_ids=None
          • attention_mask=None
          • encoder_hidden_states=tf.Tensor(shape=(32, 96, 512), dtype=float32)
          • encoder_attention_mask=tf.Tensor(shape=(32, 96), dtype=int32)
          • inputs_embeds=None
          • head_mask=None
          • encoder_head_mask=None
          • past_key_values=None
          • use_cache=True
          • output_attentions=False
          • output_hidden_states=False
          • return_dict=True
          • training=False
    
    
    Call arguments received by layer "tft5_for_conditional_generation" (type TFT5ForConditionalGeneration):
      • self={'input_ids': 'tf.Tensor(shape=(32, 96), dtype=int64)', 'attention_mask': 'tf.Tensor(shape=(32, 96), dtype=int64)'}
      • input_ids=None
      • attention_mask=None
      • decoder_input_ids=None
      • decoder_attention_mask=None
      • head_mask=None
      • decoder_head_mask=None
      • encoder_outputs=None
      • past_key_values=None
      • inputs_embeds=None
      • decoder_inputs_embeds=None
      • labels=None
      • use_cache=None
      • output_attentions=None
      • output_hidden_states=None
      • return_dict=None
      • training=False

Backtrace

Tried:

model = TFAutoModelForSeq2SeqLM.from_pretrained(checkpoint)

Seems to be working correctly. Therefore I assume that the pre-trained model is loaded

Expected behavior

Model trained should be uploaded to the Hub.
The folder appears empty , there is an error

Hypothesis

At this point, what Im guessing is that once I load the model I shall redefine the verbose error trace?
Any help please of how to do this ? :) or how can I fix it ? Do I have to define a specific Trainer ? Any idea of where I can find this in docs?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions