Implement and benchmark ONNX Runtime for Inference

Went with [onnx-ecosystem](https://github.com/onnx/onnx-docker/tree/master/onnx-ecosystem) which is a recent release (couple of weeks).  Found nvidia-cuda-docker was not initializing, so I ditched Docker for now and ran [this notebook](https://github.com/onnx/tutorials/blob/master/tutorials/Inference-PyTorch-Bert-Model-for-High-Performance-in-ONNX-Runtime.ipynb) from an environment with PyTorch v1.4.0, Transformers v2.5.1, ONNX runtimes v1.2.1 (CPU & GPU).

With the variables (max_seq_length=128, etc.) as originally specified, here is the result on GPU:
```
ONNX Runtime inference time:  0.00811

PyTorch Inference time =  0.02096
***** Verifying correctness *****
PyTorch and ORT matching numbers: True
PyTorch and ORT matching numbers: True
```
With max_seq_length=384, everything else the same, here is the result:
```
ONNX Runtime inference time:  0.0193

PyTorch Inference time =  0.0273
***** Verifying correctness *****
PyTorch and ORT matching numbers: True
PyTorch and ORT matching numbers: True
```
Should have more time tomorrow to examine these preliminary results and to further iterate & characterize the differences, including the notebook's variables `per_gpu_eval_batch_size` and `eval_batch_size`, both originally set to 1.

At this point I am more familiar with `ALBERT_xxlarge` inference performance, so eventually I may try to implement it in ONNX for an inference comparison on a larger model.

Here's another max_seq_length=384 run:  
[Inference-PyTorch-Bert-Model-for-High-Performance-in-ONNX-Runtime_WIP - Jupyter Notebook.pdf](https://github.com/deepset-ai/haystack/files/4311170/Inference-PyTorch-Bert-Model-for-High-Performance-in-ONNX-Runtime_WIP.-.Jupyter.Notebook.pdf)

_Originally posted by @ahotrod in https://github.com/deepset-ai/haystack/issues/23#issuecomment-596837063_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement and benchmark ONNX Runtime for Inference #39

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement and benchmark ONNX Runtime for Inference #39

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions