[Performance] Slowdown Caused by Gelu Fusion Removal

### Describe the issue

From commit 2cdc05f189bb34259deb3e3daef3289f1565558c, ONNX Runtime (ORT) no longer performs Gelu fusion, resulting in a 4X performance slowdown.

Bisect range: de7a02beefa3ea887753b6e9cad63c7ac84ddb74 .. 2cdc05f189bb34259deb3e3daef3289f1565558c.



### Optimized model of de7a02beefa3ea887753b6e9cad63c7ac84ddb74

<img width="182" alt="Image" src="https://github.com/user-attachments/assets/198504c7-109a-4792-b507-14abb6481bbd" />


### Optimized model of 2cdc05f189bb34259deb3e3daef3289f1565558c

<img width="245" alt="Image" src="https://github.com/user-attachments/assets/a9db8115-3cbc-4c65-b562-9597e88ee42e" />


### Performance Comparison

Key | de7a02beefa3ea887753b6e9cad63c7ac84ddb74 | 2cdc05f189bb34259deb3e3daef3289f1565558c | Ratio
-- | -- | -- | --
model_loading_uri | 611 | 603 | 0.9869
session_initialization | 4256 | 4236 | 0.9953
/m4/MatMul_kernel_time | 616211 | 531171 | 0.8623
/m4/Add_kernel_time |   | 4973509 |  
BiasGelu_kernel_time | 513038 |   |  
Gelu_kernel_time |   | 171279 |  
SequentialExecutor::Execute | 1193568 | 5778856 | 4.8418
model_run | 1223691 | 5796766 | 4.7372




### To reproduce

1. Download and unzip "model.zip".
2. Run the following script.

```python
import time
import onnxruntime
import numpy as np

# Set the random seed
np.random.seed(0)

onnx_model_path = 'model.onnx'

# Load the ONNX model with the CPUExecutionProvider
ort_session = onnxruntime.InferenceSession(onnx_model_path, providers=['CPUExecutionProvider'])
ort_session.get_modelmeta()
inputs = ort_session.get_inputs()

nth = 100000

# Warm-up inference to cache optimizations

input_data = np.load("input.npy", allow_pickle=True).item()
ort_session.run(None, input_data)

# Measure inference time excluding input creation
total_time_ns = 0
for _ in range(nth):

    start_ns = time.perf_counter_ns()
    ort_session.run(None, input_data)
    end_ns = time.perf_counter_ns()

    total_time_ns += end_ns - start_ns

avg_time_ns = total_time_ns / nth
avg_time_ms = avg_time_ns / 1e6

print(f'[{onnxruntime.__version__}] Average inference time: {avg_time_ms:.5f} ms')
```

### Urgency

_No response_

### Platform

Linux

### OS Version

6.8.0

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

1.20.1

### ONNX Runtime API

Python

### Architecture

X64

### Execution Provider

Default CPU

### Execution Provider Library Version

_No response_

### Model File

[model.zip](https://github.com/user-attachments/files/18546584/model.zip)

### Is this a quantized model?

No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Slowdown Caused by Gelu Fusion Removal #23491

Describe the issue

Optimized model of `de7a02b`

Optimized model of `2cdc05f`

Performance Comparison

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Key	`de7a02b`	`2cdc05f`	Ratio
model_loading_uri	611	603	0.9869
session_initialization	4256	4236	0.9953
/m4/MatMul_kernel_time	616211	531171	0.8623
/m4/Add_kernel_time		4973509
BiasGelu_kernel_time	513038
Gelu_kernel_time		171279
SequentialExecutor::Execute	1193568	5778856	4.8418
model_run	1223691	5796766	4.7372

[Performance] Slowdown Caused by Gelu Fusion Removal #23491

Description

Describe the issue

Optimized model of de7a02b

Optimized model of 2cdc05f

Performance Comparison

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Optimized model of `de7a02b`

Optimized model of `2cdc05f`