Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Conversation

@jonahwilliams
Copy link
Contributor

Improve performance of border mask blur:

  • Move computation of texture flip to vertex shader (Reduces Arithmetic usage, marginally increases varying usage)
  • Move uniforms used in fragment shader to fragment UBO (Reduces varying usage)
  • Vectorize usage of IPVec2GaussianIntegral (Reduces Arithmetic usage slightly on low end devices)
  • Compute kHalfSqrtTwo / sigma in contents (Removes uniform computation)
Detailsmalioc diff
[Mali-T880]
Main shader
===========

- Work registers: 3 (75% used at 100% occupancy)
+ Work registers: 4 (100% used at 100% occupancy)
- Uniform registers: 1 (4% used)
+ Uniform registers: 1 (5% used)
Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:    9.00    3.00    1.00        A
+ Total instruction cycles:    8.00    1.00    1.00        A
- Shortest path cycles:        8.58    3.00    1.00        A
+ Shortest path cycles:        7.59    1.00    1.00        A
- Longest path cycles:         8.58    3.00    1.00        A
+ Longest path cycles:         7.59    1.00    1.00        A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T860]
Main shader
===========

- Work registers: 3 (75% used at 100% occupancy)
+ Work registers: 4 (100% used at 100% occupancy)
- Uniform registers: 1 (4% used)
+ Uniform registers: 1 (5% used)
Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:   13.50    3.00    1.00        A
+ Total instruction cycles:   12.00    1.00    1.00        A
- Shortest path cycles:       13.00    3.00    1.00        A
+ Shortest path cycles:       11.50    1.00    1.00        A
- Longest path cycles:        13.00    3.00    1.00        A
+ Longest path cycles:        11.50    1.00    1.00        A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T830]
Main shader
===========

Work registers: 4 (100% used at 100% occupancy)
Uniform registers: 1 (5% used)
Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:   13.50    3.00    1.00        A
+ Total instruction cycles:   12.00    1.00    1.00        A
- Shortest path cycles:        5.75    3.00    1.00        A
+ Shortest path cycles:        5.88    1.00    1.00        A
- Longest path cycles:         5.75    3.00    1.00        A
+ Longest path cycles:         5.88    1.00    1.00        A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T820]
Main shader
===========

Work registers: 4 (100% used at 100% occupancy)
Uniform registers: 1 (5% used)
Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:   27.00    3.00    1.00        A
+ Total instruction cycles:   24.00    1.00    1.00        A
- Shortest path cycles:       11.50    3.00    1.00        A
+ Shortest path cycles:       11.75    1.00    1.00        A
- Longest path cycles:        11.50    3.00    1.00        A
+ Longest path cycles:        11.75    1.00    1.00        A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T760]
Main shader
===========

- Work registers: 3 (75% used at 100% occupancy)
+ Work registers: 4 (100% used at 100% occupancy)
- Uniform registers: 1 (4% used)
+ Uniform registers: 1 (5% used)
Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:   13.50    3.00    1.00        A
+ Total instruction cycles:   12.00    1.00    1.00        A
- Shortest path cycles:       13.00    3.00    1.00        A
+ Shortest path cycles:       11.50    1.00    1.00        A
- Longest path cycles:        13.00    3.00    1.00        A
+ Longest path cycles:        11.50    1.00    1.00        A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T720]
Main shader
===========

Work registers: 4 (100% used at 100% occupancy)
Uniform registers: 1 (5% used)
Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:   27.00    3.00    1.00        A
+ Total instruction cycles:   24.00    1.00    1.00        A
- Shortest path cycles:       11.75    3.00    1.00        A
+ Shortest path cycles:       11.75    1.00    1.00        A
- Longest path cycles:        11.75    3.00    1.00        A
+ Longest path cycles:        11.75    1.00    1.00        A

A = Arithmetic, LS = Load/Store, T = Texture


[Mali-G78AE]
Main shader
===========

Work registers: 31 (96% used at 100% occupancy)
- Uniform registers: 10 (15% used)
+ Uniform registers: 12 (18% used)
Stack spilling: false
- 16-bit arithmetic: 17%
+ 16-bit arithmetic: 11%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.91    0.00    0.62    0.25        A
+ Total instruction cycles:    0.86    0.00    0.25    0.25        A
- Shortest path cycles:        0.91    0.00    0.62    0.25        A
+ Shortest path cycles:        0.86    0.00    0.25    0.25        A
- Longest path cycles:         0.91    0.00    0.62    0.25        A
+ Longest path cycles:         0.86    0.00    0.25    0.25        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G78]
Main shader
===========

Work registers: 31 (96% used at 100% occupancy)
- Uniform registers: 10 (15% used)
+ Uniform registers: 12 (18% used)
Stack spilling: false
- 16-bit arithmetic: 17%
+ 16-bit arithmetic: 11%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.91    0.00    0.62    0.25        A
+ Total instruction cycles:    0.86    0.00    0.25    0.25        A
- Shortest path cycles:        0.91    0.00    0.62    0.25        A
+ Shortest path cycles:        0.86    0.00    0.25    0.25        A
- Longest path cycles:         0.91    0.00    0.62    0.25        A
+ Longest path cycles:         0.86    0.00    0.25    0.25        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G77]
Main shader
===========

Work registers: 31 (96% used at 100% occupancy)
- Uniform registers: 10 (15% used)
+ Uniform registers: 12 (18% used)
Stack spilling: false
- 16-bit arithmetic: 17%
+ 16-bit arithmetic: 11%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.91    0.00    0.62    0.25        A
+ Total instruction cycles:    0.86    0.00    0.25    0.25        A
- Shortest path cycles:        0.91    0.00    0.62    0.25        A
+ Shortest path cycles:        0.86    0.00    0.25    0.25        A
- Longest path cycles:         0.91    0.00    0.62    0.25        A
+ Longest path cycles:         0.86    0.00    0.25    0.25        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G76]
Main shader
===========

- Work registers: 30 (93% used at 100% occupancy)
+ Work registers: 31 (96% used at 100% occupancy)
- Uniform registers: 2 (3% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 18%
+ 16-bit arithmetic: 16%

                                A      LS       V       T    Bound
- Total instruction cycles:    2.47    0.00    0.75    0.50        A
+ Total instruction cycles:    2.37    0.00    0.25    0.50        A
- Shortest path cycles:        2.47    0.00    0.75    0.50        A
+ Shortest path cycles:        2.37    0.00    0.25    0.50        A
- Longest path cycles:         2.47    0.00    0.75    0.50        A
+ Longest path cycles:         2.37    0.00    0.25    0.50        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G72]
Main shader
===========

- Work registers: 34 (53% used at 50% occupancy)
+ Work registers: 33 (51% used at 50% occupancy)
- Uniform registers: 2 (3% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 18%
+ 16-bit arithmetic: 16%

                                A      LS       V       T    Bound
- Total instruction cycles:    5.17    0.00    1.50    1.00        A
+ Total instruction cycles:    4.73    0.00    0.50    1.00        A
- Shortest path cycles:        5.17    0.00    1.50    1.00        A
+ Shortest path cycles:        4.73    0.00    0.50    1.00        A
- Longest path cycles:         5.17    0.00    1.50    1.00        A
+ Longest path cycles:         4.73    0.00    0.50    1.00        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G715]
Main shader
===========

Work registers: 31 (96% used at 100% occupancy)
- Uniform registers: 10 (15% used)
+ Uniform registers: 12 (18% used)
Stack spilling: false
- 16-bit arithmetic: 16%
+ 16-bit arithmetic: 10%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.34    0.00    0.16    0.12        A
+ Total instruction cycles:    0.30    0.00    0.06    0.12        A
- Shortest path cycles:        0.33    0.00    0.16    0.12        A
+ Shortest path cycles:        0.29    0.00    0.06    0.12        A
- Longest path cycles:         0.34    0.00    0.16    0.12        A
+ Longest path cycles:         0.30    0.00    0.06    0.12        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G710]
Main shader
===========

Work registers: 31 (96% used at 100% occupancy)
- Uniform registers: 10 (15% used)
+ Uniform registers: 12 (18% used)
Stack spilling: false
- 16-bit arithmetic: 17%
+ 16-bit arithmetic: 11%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.66    0.00    0.31    0.12        A
+ Total instruction cycles:    0.59    0.00    0.12    0.12        A
- Shortest path cycles:        0.65    0.00    0.31    0.12        A
+ Shortest path cycles:        0.57    0.00    0.12    0.12        A
- Longest path cycles:         0.66    0.00    0.31    0.12        A
+ Longest path cycles:         0.59    0.00    0.12    0.12        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G71]
Main shader
===========

- Work registers: 41 (64% used at 50% occupancy)
+ Work registers: 36 (56% used at 50% occupancy)
- Uniform registers: 2 (3% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
16-bit arithmetic: 13%

                                A      LS       V       T    Bound
- Total instruction cycles:    6.17    0.00    1.50    1.00        A
+ Total instruction cycles:    5.17    0.00    0.50    1.00        A
- Shortest path cycles:        6.17    0.00    1.50    1.00        A
+ Shortest path cycles:        5.17    0.00    0.50    1.00        A
- Longest path cycles:         6.17    0.00    1.50    1.00        A
+ Longest path cycles:         5.17    0.00    0.50    1.00        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G68]
Main shader
===========

Work registers: 31 (96% used at 100% occupancy)
- Uniform registers: 10 (15% used)
+ Uniform registers: 12 (18% used)
Stack spilling: false
- 16-bit arithmetic: 17%
+ 16-bit arithmetic: 11%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.91    0.00    0.62    0.25        A
+ Total instruction cycles:    0.86    0.00    0.25    0.25        A
- Shortest path cycles:        0.91    0.00    0.62    0.25        A
+ Shortest path cycles:        0.86    0.00    0.25    0.25        A
- Longest path cycles:         0.91    0.00    0.62    0.25        A
+ Longest path cycles:         0.86    0.00    0.25    0.25        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G615]
Main shader
===========

Work registers: 31 (96% used at 100% occupancy)
- Uniform registers: 10 (15% used)
+ Uniform registers: 12 (18% used)
Stack spilling: false
- 16-bit arithmetic: 16%
+ 16-bit arithmetic: 10%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.34    0.00    0.16    0.12        A
+ Total instruction cycles:    0.30    0.00    0.06    0.12        A
- Shortest path cycles:        0.33    0.00    0.16    0.12        A
+ Shortest path cycles:        0.29    0.00    0.06    0.12        A
- Longest path cycles:         0.34    0.00    0.16    0.12        A
+ Longest path cycles:         0.30    0.00    0.06    0.12        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G610]
Main shader
===========

Work registers: 31 (96% used at 100% occupancy)
- Uniform registers: 10 (15% used)
+ Uniform registers: 12 (18% used)
Stack spilling: false
- 16-bit arithmetic: 17%
+ 16-bit arithmetic: 11%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.66    0.00    0.31    0.12        A
+ Total instruction cycles:    0.59    0.00    0.12    0.12        A
- Shortest path cycles:        0.65    0.00    0.31    0.12        A
+ Shortest path cycles:        0.57    0.00    0.12    0.12        A
- Longest path cycles:         0.66    0.00    0.31    0.12        A
+ Longest path cycles:         0.59    0.00    0.12    0.12        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G57]
Main shader
===========

Work registers: 31 (96% used at 100% occupancy)
- Uniform registers: 10 (15% used)
+ Uniform registers: 12 (18% used)
Stack spilling: false
- 16-bit arithmetic: 17%
+ 16-bit arithmetic: 11%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.91    0.00    0.62    0.25        A
+ Total instruction cycles:    0.86    0.00    0.25    0.25        A
- Shortest path cycles:        0.91    0.00    0.62    0.25        A
+ Shortest path cycles:        0.86    0.00    0.25    0.25        A
- Longest path cycles:         0.91    0.00    0.62    0.25        A
+ Longest path cycles:         0.86    0.00    0.25    0.25        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G52]
Main shader
===========

- Work registers: 30 (93% used at 100% occupancy)
+ Work registers: 31 (96% used at 100% occupancy)
- Uniform registers: 2 (3% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 18%
+ 16-bit arithmetic: 16%

                                A      LS       V       T    Bound
- Total instruction cycles:    2.47    0.00    0.75    0.50        A
+ Total instruction cycles:    2.37    0.00    0.25    0.50        A
- Shortest path cycles:        2.47    0.00    0.75    0.50        A
+ Shortest path cycles:        2.37    0.00    0.25    0.50        A
- Longest path cycles:         2.47    0.00    0.75    0.50        A
+ Longest path cycles:         2.37    0.00    0.25    0.50        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G510]
Main shader
===========

Work registers: 31 (96% used at 100% occupancy)
- Uniform registers: 10 (15% used)
+ Uniform registers: 12 (18% used)
Stack spilling: false
- 16-bit arithmetic: 17%
+ 16-bit arithmetic: 11%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.89    0.00    0.31    0.12        A
+ Total instruction cycles:    0.78    0.00    0.12    0.12        A
- Shortest path cycles:        0.86    0.00    0.31    0.12        A
+ Shortest path cycles:        0.76    0.00    0.12    0.12        A
- Longest path cycles:         0.89    0.00    0.31    0.12        A
+ Longest path cycles:         0.78    0.00    0.12    0.12        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G51]
Main shader
===========

- Work registers: 30 (93% used at 100% occupancy)
+ Work registers: 32 (100% used at 100% occupancy)
- Uniform registers: 2 (3% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 18%
+ 16-bit arithmetic: 16%

                                A      LS       V       T    Bound
- Total instruction cycles:    5.17    0.00    0.75    0.50        A
+ Total instruction cycles:    4.73    0.00    0.25    0.50        A
- Shortest path cycles:        5.17    0.00    0.75    0.50        A
+ Shortest path cycles:        4.73    0.00    0.25    0.50        A
- Longest path cycles:         5.17    0.00    0.75    0.50        A
+ Longest path cycles:         4.73    0.00    0.25    0.50        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G310]
Main shader
===========

Work registers: 31 (96% used at 100% occupancy)
- Uniform registers: 10 (15% used)
+ Uniform registers: 12 (18% used)
Stack spilling: false
- 16-bit arithmetic: 17%
+ 16-bit arithmetic: 11%

                                A      LS       V       T    Bound
- Total instruction cycles:    1.33    0.00    0.62    0.25        A
+ Total instruction cycles:    1.17    0.00    0.25    0.25        A
- Shortest path cycles:        1.30    0.00    0.62    0.25        A
+ Shortest path cycles:        1.14    0.00    0.25    0.25        A
- Longest path cycles:         1.33    0.00    0.62    0.25        A
+ Longest path cycles:         1.17    0.00    0.25    0.25        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G31]
Main shader
===========

- Work registers: 30 (93% used at 100% occupancy)
+ Work registers: 32 (100% used at 100% occupancy)
- Uniform registers: 2 (3% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 18%
+ 16-bit arithmetic: 16%

                                A      LS       V       T    Bound
- Total instruction cycles:    7.75    0.00    0.75    0.50        A
+ Total instruction cycles:    7.10    0.00    0.25    0.50        A
- Shortest path cycles:        7.75    0.00    0.75    0.50        A
+ Shortest path cycles:        7.10    0.00    0.25    0.50        A
- Longest path cycles:         7.75    0.00    0.75    0.50        A
+ Longest path cycles:         7.10    0.00    0.25    0.50        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Immortalis-G715]
Main shader
===========

Work registers: 31 (96% used at 100% occupancy)
- Uniform registers: 10 (15% used)
+ Uniform registers: 12 (18% used)
Stack spilling: false
- 16-bit arithmetic: 16%
+ 16-bit arithmetic: 10%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.34    0.00    0.16    0.12        A
+ Total instruction cycles:    0.30    0.00    0.06    0.12        A
- Shortest path cycles:        0.33    0.00    0.16    0.12        A
+ Shortest path cycles:        0.29    0.00    0.06    0.12        A
- Longest path cycles:         0.34    0.00    0.16    0.12        A
+ Longest path cycles:         0.30    0.00    0.06    0.12        A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false

@jonahwilliams jonahwilliams changed the title [Impeller] Improve border_mask_blur performance [Impeller] Improve border_mask_blur performance on Android Nov 28, 2022
@jonahwilliams
Copy link
Contributor Author

We're going to hold off on this until we have better support for shader variants in the engine code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

No open projects
Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants