Recommended way of exporting encoder-decoder model to ONNX with `transformers[onnx]`

I am looking for a way to export an encoder-decoder to ONNX to run inference. I followed the guide at [Exporting Transformers Models](https://huggingface.co/docs/transformers/serialization) but that only shows an example of an encoder-only model. Trying to accomplish this for the specific case of the [Helsinki-NLP/Opus-MT model for Spanish to English](https://huggingface.co/Helsinki-NLP/opus-mt-es-en), I did the following:

1. I exported the model with the following command: `python -m transformers.onnx --model=Helsinki-NLP/opus-mt-es-en --feature=seq2seq-lm --atol=2e-05 workspace/onnx/opus-mt-es-en `

The output of the model was successful.

2. Then, as in the docs, I tried running inference on the model with a code similar to the one below.

```
from transformers import AutoTokenizer
from onnxruntime import InferenceSession

tokenizer = ...
session = InferenceSession("onnx/model.onnx")

inputs = tokenizer("Probando el uso de Marian despues de haberlo exportando a ONNX", return_tensors="np", padding=True)
outputs = session.run(output_names=["logits"], input_feed=dict(inputs))
```

This yields the following exception:

`ValueError: Model requires 4 inputs. Input Feed contains 2`. 

------

I tried a similar thing with T5, and the same exception was raised. After some debugging, I realized that any encoder-decoder architecture expects the following 4 arguments: `input_ids`, `attention_mask`, `decoder_input_ids`, `decoder_attention_mask`. 

After thorough reading of the transformer's code, my understanding is that any model in transformers inherits from `PreTrainedModel`, which obviously defines models that are intended for training. This implies that the associated Config requires inputs that are used during training and that explains the need for 4 arguments instead of 2, contrary to the trivial case of encoder-only models that is documented in website.

However, when working with a transformer model (no export to ONNX), one is able to use the `generate()` function that is added on top of seq2seq models by `GenerationMixin`. This function is the helper to perform seq2seq during inference.

------

The question is the following:

Is there a way (or maybe a recommended workaround) to export an encoder-decoder model to ONNX, such that it behaves as the `generate()` function from `GenerationMixin` and not as the `forward()` method in `PreTrainedModel`?

-----

I know that a possible workaround would be to export both the encoder and the decoder separately and programmatically connect the input/outputs of each Individual `InferenceSession`. Other than that, I cannot figure out an obvious solution to this problem using the out-of-the-box methods in `transformers`.

Any help will be highly appreciated :)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommended way of exporting encoder-decoder model to ONNX with `transformers[onnx]` #16006

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Recommended way of exporting encoder-decoder model to ONNX with transformers[onnx] #16006

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Recommended way of exporting encoder-decoder model to ONNX with `transformers[onnx]` #16006