Skip to content

Recommended way of exporting encoder-decoder model to ONNX with transformers[onnx] #16006

@gomerudo

Description

@gomerudo

I am looking for a way to export an encoder-decoder to ONNX to run inference. I followed the guide at Exporting Transformers Models but that only shows an example of an encoder-only model. Trying to accomplish this for the specific case of the Helsinki-NLP/Opus-MT model for Spanish to English, I did the following:

  1. I exported the model with the following command: python -m transformers.onnx --model=Helsinki-NLP/opus-mt-es-en --feature=seq2seq-lm --atol=2e-05 workspace/onnx/opus-mt-es-en

The output of the model was successful.

  1. Then, as in the docs, I tried running inference on the model with a code similar to the one below.
from transformers import AutoTokenizer
from onnxruntime import InferenceSession

tokenizer = ...
session = InferenceSession("onnx/model.onnx")

inputs = tokenizer("Probando el uso de Marian despues de haberlo exportando a ONNX", return_tensors="np", padding=True)
outputs = session.run(output_names=["logits"], input_feed=dict(inputs))

This yields the following exception:

ValueError: Model requires 4 inputs. Input Feed contains 2.


I tried a similar thing with T5, and the same exception was raised. After some debugging, I realized that any encoder-decoder architecture expects the following 4 arguments: input_ids, attention_mask, decoder_input_ids, decoder_attention_mask.

After thorough reading of the transformer's code, my understanding is that any model in transformers inherits from PreTrainedModel, which obviously defines models that are intended for training. This implies that the associated Config requires inputs that are used during training and that explains the need for 4 arguments instead of 2, contrary to the trivial case of encoder-only models that is documented in website.

However, when working with a transformer model (no export to ONNX), one is able to use the generate() function that is added on top of seq2seq models by GenerationMixin. This function is the helper to perform seq2seq during inference.


The question is the following:

Is there a way (or maybe a recommended workaround) to export an encoder-decoder model to ONNX, such that it behaves as the generate() function from GenerationMixin and not as the forward() method in PreTrainedModel?


I know that a possible workaround would be to export both the encoder and the decoder separately and programmatically connect the input/outputs of each Individual InferenceSession. Other than that, I cannot figure out an obvious solution to this problem using the out-of-the-box methods in transformers.

Any help will be highly appreciated :)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions