-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Labels
performanceissues related to performance regressionsissues related to performance regressions
Description
Describe the issue
From commit 2cdc05f, ONNX Runtime (ORT) no longer performs Gelu fusion, resulting in a 4X performance slowdown.
Bisect range: de7a02b .. 2cdc05f.
Optimized model of de7a02b
Optimized model of 2cdc05f
Performance Comparison
| Key | de7a02b | 2cdc05f | Ratio |
|---|---|---|---|
| model_loading_uri | 611 | 603 | 0.9869 |
| session_initialization | 4256 | 4236 | 0.9953 |
| /m4/MatMul_kernel_time | 616211 | 531171 | 0.8623 |
| /m4/Add_kernel_time | 4973509 | ||
| BiasGelu_kernel_time | 513038 | ||
| Gelu_kernel_time | 171279 | ||
| SequentialExecutor::Execute | 1193568 | 5778856 | 4.8418 |
| model_run | 1223691 | 5796766 | 4.7372 |
To reproduce
- Download and unzip "model.zip".
- Run the following script.
import time
import onnxruntime
import numpy as np
# Set the random seed
np.random.seed(0)
onnx_model_path = 'model.onnx'
# Load the ONNX model with the CPUExecutionProvider
ort_session = onnxruntime.InferenceSession(onnx_model_path, providers=['CPUExecutionProvider'])
ort_session.get_modelmeta()
inputs = ort_session.get_inputs()
nth = 100000
# Warm-up inference to cache optimizations
input_data = np.load("input.npy", allow_pickle=True).item()
ort_session.run(None, input_data)
# Measure inference time excluding input creation
total_time_ns = 0
for _ in range(nth):
start_ns = time.perf_counter_ns()
ort_session.run(None, input_data)
end_ns = time.perf_counter_ns()
total_time_ns += end_ns - start_ns
avg_time_ns = total_time_ns / nth
avg_time_ms = avg_time_ns / 1e6
print(f'[{onnxruntime.__version__}] Average inference time: {avg_time_ms:.5f} ms')Urgency
No response
Platform
Linux
OS Version
6.8.0
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.20.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
Is this a quantized model?
No
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
performanceissues related to performance regressionsissues related to performance regressions