Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Conversation

@jonahwilliams
Copy link
Contributor

@jonahwilliams jonahwilliams commented Nov 28, 2022

Improve the rrect blur performance by separating out the sigma > 0 and sigma <= 0 cases into distinct shaders. This change accounts for the vast majority of the performance improvement, which can be seen in the significant % of cycle reduction and removal of stack spilling on older GPUs in the malioc diff below.

Details
[Mali-T880]
Main shader
===========

Work registers: 4 (100% used at 100% occupancy)
Uniform registers: 1 (5% used)
- Stack spilling: true
+ Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:   12.00    4.00    0.00        A
+ Total instruction cycles:    8.00    1.00    0.00        A
- Shortest path cycles:        2.64    3.00    0.00       LS
+ Shortest path cycles:        7.59    1.00    0.00        A
Longest path cycles:          N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T860]
Main shader
===========

Work registers: 4 (100% used at 100% occupancy)
Uniform registers: 1 (5% used)
- Stack spilling: true
+ Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:   18.00    4.00    0.00        A
+ Total instruction cycles:   12.00    1.00    0.00        A
- Shortest path cycles:        4.00    3.00    0.00        A
+ Shortest path cycles:       11.50    1.00    0.00        A
Longest path cycles:          N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T830]
Main shader
===========

Work registers: 4 (100% used at 100% occupancy)
Uniform registers: 1 (5% used)
- Stack spilling: true
+ Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:   18.00    4.00    0.00        A
+ Total instruction cycles:   12.00    1.00    0.00        A
- Shortest path cycles:        2.12    3.00    0.00       LS
+ Shortest path cycles:        5.00    1.00    0.00        A
Longest path cycles:          N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T820]
Main shader
===========

Work registers: 4 (100% used at 100% occupancy)
Uniform registers: 1 (5% used)
- Stack spilling: true
+ Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:   36.00    4.00    0.00        A
+ Total instruction cycles:   24.00    1.00    0.00        A
- Shortest path cycles:        4.25    3.00    0.00        A
+ Shortest path cycles:       10.00    1.00    0.00        A
Longest path cycles:          N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T760]
Main shader
===========

Work registers: 4 (100% used at 100% occupancy)
Uniform registers: 1 (5% used)
- Stack spilling: true
+ Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:   18.00    4.00    0.00        A
+ Total instruction cycles:   12.00    1.00    0.00        A
- Shortest path cycles:        4.00    3.00    0.00        A
+ Shortest path cycles:       11.50    1.00    0.00        A
Longest path cycles:          N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T720]
Main shader
===========

Work registers: 4 (100% used at 100% occupancy)
Uniform registers: 1 (5% used)
- Stack spilling: true
+ Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:   36.00    4.00    0.00        A
+ Total instruction cycles:   24.00    1.00    0.00        A
- Shortest path cycles:        4.25    3.00    0.00        A
+ Shortest path cycles:       10.25    1.00    0.00        A
Longest path cycles:          N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, T = Texture


[Mali-G78AE]
Main shader
===========

Work registers: 32 (100% used at 100% occupancy)
- Uniform registers: 26 (40% used)
+ Uniform registers: 24 (37% used)
Stack spilling: false
- 16-bit arithmetic: 26%
+ 16-bit arithmetic: 29%

                                A      LS       V       T    Bound
- Total instruction cycles:    3.24    0.00    0.12    0.00        A
+ Total instruction cycles:    2.47    0.00    0.12    0.00        A
- Shortest path cycles:        0.20    0.00    0.12    0.00        A
+ Shortest path cycles:        2.47    0.00    0.12    0.00        A
- Longest path cycles:         3.11    0.00    0.12    0.00        A
+ Longest path cycles:         2.47    0.00    0.12    0.00        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G78]
Main shader
===========

Work registers: 32 (100% used at 100% occupancy)
- Uniform registers: 26 (40% used)
+ Uniform registers: 24 (37% used)
Stack spilling: false
- 16-bit arithmetic: 26%
+ 16-bit arithmetic: 29%

                                A      LS       V       T    Bound
- Total instruction cycles:    3.24    0.00    0.12    0.00        A
+ Total instruction cycles:    2.47    0.00    0.12    0.00        A
- Shortest path cycles:        0.20    0.00    0.12    0.00        A
+ Shortest path cycles:        2.47    0.00    0.12    0.00        A
- Longest path cycles:         3.11    0.00    0.12    0.00        A
+ Longest path cycles:         2.47    0.00    0.12    0.00        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G77]
Main shader
===========

Work registers: 32 (100% used at 100% occupancy)
- Uniform registers: 26 (40% used)
+ Uniform registers: 24 (37% used)
Stack spilling: false
- 16-bit arithmetic: 26%
+ 16-bit arithmetic: 29%

                                A      LS       V       T    Bound
- Total instruction cycles:    3.24    0.00    0.12    0.00        A
+ Total instruction cycles:    2.47    0.00    0.12    0.00        A
- Shortest path cycles:        0.20    0.00    0.12    0.00        A
+ Shortest path cycles:        2.47    0.00    0.12    0.00        A
- Longest path cycles:         3.11    0.00    0.12    0.00        A
+ Longest path cycles:         2.47    0.00    0.12    0.00        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G76]
Main shader
===========

- Work registers: 25 (78% used at 100% occupancy)
+ Work registers: 22 (68% used at 100% occupancy)
- Uniform registers: 22 (34% used)
+ Uniform registers: 16 (25% used)
Stack spilling: false
- 16-bit arithmetic: 35%
+ 16-bit arithmetic: 42%

                                A      LS       V       T    Bound
- Total instruction cycles:    2.53    0.00    0.12    0.00        A
+ Total instruction cycles:    1.67    0.00    0.12    0.00        A
- Shortest path cycles:        0.71    0.00    0.12    0.00        A
+ Shortest path cycles:        1.67    0.00    0.12    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G72]
Main shader
===========

- Work registers: 26 (81% used at 100% occupancy)
+ Work registers: 23 (71% used at 100% occupancy)
- Uniform registers: 22 (34% used)
+ Uniform registers: 16 (25% used)
Stack spilling: false
- 16-bit arithmetic: 35%
+ 16-bit arithmetic: 42%

                                A      LS       V       T    Bound
- Total instruction cycles:    5.07    0.00    0.25    0.00        A
+ Total instruction cycles:    3.40    0.00    0.25    0.00        A
- Shortest path cycles:        1.42    0.00    0.25    0.00        A
+ Shortest path cycles:        3.40    0.00    0.25    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G715]
Main shader
===========

Work registers: 32 (100% used at 100% occupancy)
- Uniform registers: 26 (40% used)
+ Uniform registers: 24 (37% used)
Stack spilling: false
- 16-bit arithmetic: 26%
+ 16-bit arithmetic: 29%

                                A      LS       V       T    Bound
- Total instruction cycles:    1.10    0.00    0.03    0.00        A
+ Total instruction cycles:    0.85    0.00    0.03    0.00        A
- Shortest path cycles:        0.07    0.00    0.03    0.00        A
+ Shortest path cycles:        0.84    0.00    0.03    0.00        A
- Longest path cycles:         1.06    0.00    0.03    0.00        A
+ Longest path cycles:         0.85    0.00    0.03    0.00        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G710]
Main shader
===========

Work registers: 32 (100% used at 100% occupancy)
- Uniform registers: 26 (40% used)
+ Uniform registers: 24 (37% used)
Stack spilling: false
- 16-bit arithmetic: 26%
+ 16-bit arithmetic: 29%

                                A      LS       V       T    Bound
- Total instruction cycles:    2.20    0.00    0.06    0.00        A
+ Total instruction cycles:    1.70    0.00    0.06    0.00        A
- Shortest path cycles:        0.13    0.00    0.06    0.00        A
+ Shortest path cycles:        1.69    0.00    0.06    0.00        A
- Longest path cycles:         2.12    0.00    0.06    0.00        A
+ Longest path cycles:         1.70    0.00    0.06    0.00        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G71]
Main shader
===========

- Work registers: 29 (90% used at 100% occupancy)
+ Work registers: 24 (75% used at 100% occupancy)
- Uniform registers: 22 (34% used)
+ Uniform registers: 16 (25% used)
Stack spilling: false
- 16-bit arithmetic: 29%
+ 16-bit arithmetic: 37%

                                A      LS       V       T    Bound
- Total instruction cycles:    5.33    0.00    0.25    0.00        A
+ Total instruction cycles:    3.67    0.00    0.25    0.00        A
- Shortest path cycles:        1.25    0.00    0.25    0.00        A
+ Shortest path cycles:        3.67    0.00    0.25    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G68]
Main shader
===========

Work registers: 32 (100% used at 100% occupancy)
- Uniform registers: 26 (40% used)
+ Uniform registers: 24 (37% used)
Stack spilling: false
- 16-bit arithmetic: 26%
+ 16-bit arithmetic: 29%

                                A      LS       V       T    Bound
- Total instruction cycles:    3.24    0.00    0.12    0.00        A
+ Total instruction cycles:    2.47    0.00    0.12    0.00        A
- Shortest path cycles:        0.20    0.00    0.12    0.00        A
+ Shortest path cycles:        2.47    0.00    0.12    0.00        A
- Longest path cycles:         3.11    0.00    0.12    0.00        A
+ Longest path cycles:         2.47    0.00    0.12    0.00        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G615]
Main shader
===========

Work registers: 32 (100% used at 100% occupancy)
- Uniform registers: 26 (40% used)
+ Uniform registers: 24 (37% used)
Stack spilling: false
- 16-bit arithmetic: 26%
+ 16-bit arithmetic: 29%

                                A      LS       V       T    Bound
- Total instruction cycles:    1.10    0.00    0.03    0.00        A
+ Total instruction cycles:    0.85    0.00    0.03    0.00        A
- Shortest path cycles:        0.07    0.00    0.03    0.00        A
+ Shortest path cycles:        0.84    0.00    0.03    0.00        A
- Longest path cycles:         1.06    0.00    0.03    0.00        A
+ Longest path cycles:         0.85    0.00    0.03    0.00        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G610]
Main shader
===========

Work registers: 32 (100% used at 100% occupancy)
- Uniform registers: 26 (40% used)
+ Uniform registers: 24 (37% used)
Stack spilling: false
- 16-bit arithmetic: 26%
+ 16-bit arithmetic: 29%

                                A      LS       V       T    Bound
- Total instruction cycles:    2.20    0.00    0.06    0.00        A
+ Total instruction cycles:    1.70    0.00    0.06    0.00        A
- Shortest path cycles:        0.13    0.00    0.06    0.00        A
+ Shortest path cycles:        1.69    0.00    0.06    0.00        A
- Longest path cycles:         2.12    0.00    0.06    0.00        A
+ Longest path cycles:         1.70    0.00    0.06    0.00        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G57]
Main shader
===========

Work registers: 32 (100% used at 100% occupancy)
- Uniform registers: 26 (40% used)
+ Uniform registers: 24 (37% used)
Stack spilling: false
- 16-bit arithmetic: 26%
+ 16-bit arithmetic: 29%

                                A      LS       V       T    Bound
- Total instruction cycles:    3.24    0.00    0.12    0.00        A
+ Total instruction cycles:    2.47    0.00    0.12    0.00        A
- Shortest path cycles:        0.20    0.00    0.12    0.00        A
+ Shortest path cycles:        2.47    0.00    0.12    0.00        A
- Longest path cycles:         3.11    0.00    0.12    0.00        A
+ Longest path cycles:         2.47    0.00    0.12    0.00        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G52]
Main shader
===========

- Work registers: 25 (78% used at 100% occupancy)
+ Work registers: 22 (68% used at 100% occupancy)
- Uniform registers: 22 (34% used)
+ Uniform registers: 16 (25% used)
Stack spilling: false
- 16-bit arithmetic: 35%
+ 16-bit arithmetic: 42%

                                A      LS       V       T    Bound
- Total instruction cycles:    2.53    0.00    0.12    0.00        A
+ Total instruction cycles:    1.67    0.00    0.12    0.00        A
- Shortest path cycles:        0.71    0.00    0.12    0.00        A
+ Shortest path cycles:        1.67    0.00    0.12    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G510]
Main shader
===========

Work registers: 32 (100% used at 100% occupancy)
- Uniform registers: 26 (40% used)
+ Uniform registers: 24 (37% used)
Stack spilling: false
- 16-bit arithmetic: 26%
+ 16-bit arithmetic: 29%

                                A      LS       V       T    Bound
- Total instruction cycles:    2.94    0.00    0.06    0.00        A
+ Total instruction cycles:    2.27    0.00    0.06    0.00        A
- Shortest path cycles:        0.18    0.00    0.06    0.00        A
+ Shortest path cycles:        2.25    0.00    0.06    0.00        A
- Longest path cycles:         2.83    0.00    0.06    0.00        A
+ Longest path cycles:         2.27    0.00    0.06    0.00        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G51]
Main shader
===========

- Work registers: 26 (81% used at 100% occupancy)
+ Work registers: 23 (71% used at 100% occupancy)
- Uniform registers: 22 (34% used)
+ Uniform registers: 16 (25% used)
Stack spilling: false
- 16-bit arithmetic: 35%
+ 16-bit arithmetic: 42%

                                A      LS       V       T    Bound
- Total instruction cycles:    5.07    0.00    0.12    0.00        A
+ Total instruction cycles:    3.40    0.00    0.12    0.00        A
- Shortest path cycles:        1.42    0.00    0.12    0.00        A
+ Shortest path cycles:        3.40    0.00    0.12    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G310]
Main shader
===========

Work registers: 32 (100% used at 100% occupancy)
- Uniform registers: 26 (40% used)
+ Uniform registers: 24 (37% used)
Stack spilling: false
- 16-bit arithmetic: 26%
+ 16-bit arithmetic: 29%

                                A      LS       V       T    Bound
- Total instruction cycles:    4.41    0.00    0.12    0.00        A
+ Total instruction cycles:    3.41    0.00    0.12    0.00        A
- Shortest path cycles:        0.27    0.00    0.12    0.00        A
+ Shortest path cycles:        3.38    0.00    0.12    0.00        A
- Longest path cycles:         4.24    0.00    0.12    0.00        A
+ Longest path cycles:         3.41    0.00    0.12    0.00        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G31]
Main shader
===========

- Work registers: 26 (81% used at 100% occupancy)
+ Work registers: 23 (71% used at 100% occupancy)
- Uniform registers: 22 (34% used)
+ Uniform registers: 16 (25% used)
Stack spilling: false
- 16-bit arithmetic: 35%
+ 16-bit arithmetic: 42%

                                A      LS       V       T    Bound
- Total instruction cycles:    7.60    0.00    0.12    0.00        A
+ Total instruction cycles:    5.10    0.00    0.12    0.00        A
- Shortest path cycles:        2.12    0.00    0.12    0.00        A
+ Shortest path cycles:        5.10    0.00    0.12    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Immortalis-G715]
Main shader
===========

Work registers: 32 (100% used at 100% occupancy)
- Uniform registers: 26 (40% used)
+ Uniform registers: 24 (37% used)
Stack spilling: false
- 16-bit arithmetic: 26%
+ 16-bit arithmetic: 29%

                                A      LS       V       T    Bound
- Total instruction cycles:    1.10    0.00    0.03    0.00        A
+ Total instruction cycles:    0.85    0.00    0.03    0.00        A
- Shortest path cycles:        0.07    0.00    0.03    0.00        A
+ Shortest path cycles:        0.84    0.00    0.03    0.00        A
- Longest path cycles:         1.06    0.00    0.03    0.00        A
+ Longest path cycles:         0.85    0.00    0.03    0.00        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false

float IPGaussian(float x, float sigma) {
float variance = sigma * sigma;
return exp(-0.5 * x * x / variance) / (kSqrtTwoPi * sigma);
float variance = pow(sigma, 2.0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem to give a minor improvement to register usage but also seeminly reduce the % of 16bit computation since the shaders produced by impellerc seem to always convert mediump to highp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, marginal win - could be bigger if we didn't lose mediump tho

@jonahwilliams
Copy link
Contributor Author

We're going to hold off on this until we have better support for shader variants in the engine code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

No open projects
Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants