Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Conversation

@jonahwilliams
Copy link
Contributor

Improvements to morphology (dilate/erode) performance based on Arm mali guidelines. Doesn't noticably improve performance on iOS, but on a Pixel 6 improves full screen filter performance from 40-50 to ~70 FPS.

Summary of Changes

  • Remove tile mode checks and use unconditonal decal logic. It doesn't appear that shaderc is able to eliminate branching for the other tile modes despite being a constant.
  • Flip y coordinate in vertex shader instead of in fragment shader
  • Remove branching on filter type from uniform, instead specialize based on dilate vs erode
  • Compute uv_offset instead of in fragment shader.
malioc results:

**[/Users/jonahwilliams/engine/src/out/android_debug_arm64/gen/flutter/impeller/entity/gles/morphology_filter.frag.gles]**

[Mali-T880]
Main shader
===========

Work registers: 3 (75% used at 100% occupancy)
Uniform registers: 1 (4% used)
Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:    6.33    1.00    1.00        A
+ Total instruction cycles:    3.33    1.00    1.00        A
- Shortest path cycles:        1.65    1.00    0.00        A
+ Shortest path cycles:        1.00    1.00    0.00    A, LS
Longest path cycles:          N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T860]
Main shader
===========

Work registers: 3 (75% used at 100% occupancy)
Uniform registers: 1 (4% used)
Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:    9.50    1.00    1.00        A
+ Total instruction cycles:    5.00    1.00    1.00        A
- Shortest path cycles:        2.50    1.00    0.00        A
+ Shortest path cycles:        1.50    1.00    0.00        A
Longest path cycles:          N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T830]
Main shader
===========

- Work registers: 4 (100% used at 100% occupancy)
+ Work registers: 3 (75% used at 100% occupancy)
- Uniform registers: 1 (5% used)
+ Uniform registers: 1 (4% used)
Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:    9.00    1.00    1.00        A
+ Total instruction cycles:    5.00    1.00    1.00        A
- Shortest path cycles:        1.62    1.00    0.00        A
+ Shortest path cycles:        1.25    1.00    0.00        A
Longest path cycles:          N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T820]
Main shader
===========

- Work registers: 4 (100% used at 100% occupancy)
+ Work registers: 3 (75% used at 100% occupancy)
- Uniform registers: 1 (5% used)
+ Uniform registers: 1 (4% used)
Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:   18.00    1.00    1.00        A
+ Total instruction cycles:   10.00    1.00    1.00        A
- Shortest path cycles:        3.25    1.00    0.00        A
+ Shortest path cycles:        2.50    1.00    0.00        A
Longest path cycles:          N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T760]
Main shader
===========

Work registers: 3 (75% used at 100% occupancy)
Uniform registers: 1 (4% used)
Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:    9.50    1.00    1.00        A
+ Total instruction cycles:    5.00    1.00    1.00        A
- Shortest path cycles:        2.50    1.00    0.00        A
+ Shortest path cycles:        1.50    1.00    0.00        A
Longest path cycles:          N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false



[Mali-T720]
Main shader
===========

- Work registers: 4 (100% used at 100% occupancy)
+ Work registers: 3 (75% used at 100% occupancy)
- Uniform registers: 1 (5% used)
+ Uniform registers: 1 (4% used)
Stack spilling: false

                                A      LS       T    Bound
- Total instruction cycles:   18.00    1.00    1.00        A
+ Total instruction cycles:   10.00    1.00    1.00        A
- Shortest path cycles:        3.25    1.00    0.00        A
+ Shortest path cycles:        2.50    1.00    0.00        A
Longest path cycles:          N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, T = Texture


[Mali-G78AE]
Main shader
===========

- Work registers: 20 (62% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 8 (12% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 80%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.45    0.00    0.12    0.25        A
+ Total instruction cycles:    0.27    0.00    0.25    0.25        A
Shortest path cycles:        0.06    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G78]
Main shader
===========

- Work registers: 20 (62% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 8 (12% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 80%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.45    0.00    0.12    0.25        A
+ Total instruction cycles:    0.27    0.00    0.25    0.25        A
Shortest path cycles:        0.06    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G77]
Main shader
===========

- Work registers: 20 (62% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 8 (12% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 80%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.45    0.00    0.12    0.25        A
+ Total instruction cycles:    0.27    0.00    0.25    0.25        A
Shortest path cycles:        0.06    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G76]
Main shader
===========

- Work registers: 21 (65% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 16 (25% used)
+ Uniform registers: 2 (3% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 81%

                                A      LS       V       T    Bound
- Total instruction cycles:    1.08    0.00    0.12    0.50        A
+ Total instruction cycles:    0.83    0.00    0.25    0.50        A
Shortest path cycles:        0.29    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G72]
Main shader
===========

- Work registers: 21 (65% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 16 (25% used)
+ Uniform registers: 2 (3% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 81%

                                A      LS       V       T    Bound
- Total instruction cycles:    2.17    0.00    0.25    1.00        A
+ Total instruction cycles:    1.67    0.00    0.50    1.00        A
Shortest path cycles:        0.58    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G715]
Main shader
===========

- Work registers: 20 (62% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 8 (12% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 66%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.23    0.00    0.03    0.12        A
+ Total instruction cycles:    0.15    0.00    0.06    0.12        A
Shortest path cycles:        0.03    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G710]
Main shader
===========

- Work registers: 20 (62% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 8 (12% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 80%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.26    0.00    0.06    0.12        A
+ Total instruction cycles:    0.20    0.00    0.12    0.12        A
Shortest path cycles:        0.03    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G71]
Main shader
===========

- Work registers: 19 (59% used at 100% occupancy)
+ Work registers: 20 (62% used at 100% occupancy)
- Uniform registers: 20 (31% used)
+ Uniform registers: 2 (3% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 81%

                                A      LS       V       T    Bound
- Total instruction cycles:    2.00    0.00    0.25    1.00        A
+ Total instruction cycles:    1.58    0.00    0.50    1.00        A
Shortest path cycles:        0.58    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G68]
Main shader
===========

- Work registers: 20 (62% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 8 (12% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 80%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.45    0.00    0.12    0.25        A
+ Total instruction cycles:    0.27    0.00    0.25    0.25        A
Shortest path cycles:        0.06    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G615]
Main shader
===========

- Work registers: 20 (62% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 8 (12% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 66%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.23    0.00    0.03    0.12        A
+ Total instruction cycles:    0.15    0.00    0.06    0.12        A
Shortest path cycles:        0.03    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G610]
Main shader
===========

- Work registers: 20 (62% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 8 (12% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 80%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.26    0.00    0.06    0.12        A
+ Total instruction cycles:    0.20    0.00    0.12    0.12        A
Shortest path cycles:        0.03    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G57]
Main shader
===========

- Work registers: 20 (62% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 8 (12% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 80%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.45    0.00    0.12    0.25        A
+ Total instruction cycles:    0.27    0.00    0.25    0.25        A
Shortest path cycles:        0.06    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G52]
Main shader
===========

- Work registers: 21 (65% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 16 (25% used)
+ Uniform registers: 2 (3% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 81%

                                A      LS       V       T    Bound
- Total instruction cycles:    1.08    0.00    0.12    0.50        A
+ Total instruction cycles:    0.83    0.00    0.25    0.50        A
Shortest path cycles:        0.29    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G510]
Main shader
===========

- Work registers: 20 (62% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 8 (12% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 80%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.34    0.00    0.06    0.12        A
+ Total instruction cycles:    0.26    0.00    0.12    0.12        A
Shortest path cycles:        0.04    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G51]
Main shader
===========

- Work registers: 21 (65% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 16 (25% used)
+ Uniform registers: 2 (3% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 81%

                                A      LS       V       T    Bound
- Total instruction cycles:    2.17    0.00    0.12    0.50        A
+ Total instruction cycles:    1.67    0.00    0.25    0.50        A
Shortest path cycles:        0.58    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G310]
Main shader
===========

- Work registers: 20 (62% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 8 (12% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 80%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.52    0.00    0.12    0.25        A
+ Total instruction cycles:    0.39    0.00    0.25    0.25        A
Shortest path cycles:        0.06    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Mali-G31]
Main shader
===========

- Work registers: 21 (65% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 16 (25% used)
+ Uniform registers: 2 (3% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 81%

                                A      LS       V       T    Bound
- Total instruction cycles:    3.25    0.00    0.12    0.50        A
+ Total instruction cycles:    2.50    0.00    0.25    0.50        A
Shortest path cycles:        0.88    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

- Has uniform computation: true
+ Has uniform computation: false
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false



[Immortalis-G715]
Main shader
===========

- Work registers: 20 (62% used at 100% occupancy)
+ Work registers: 19 (59% used at 100% occupancy)
- Uniform registers: 8 (12% used)
+ Uniform registers: 4 (6% used)
Stack spilling: false
- 16-bit arithmetic: 100%
+ 16-bit arithmetic: 66%

                                A      LS       V       T    Bound
- Total instruction cycles:    0.23    0.00    0.03    0.12        A
+ Total instruction cycles:    0.15    0.00    0.06    0.12        A
Shortest path cycles:        0.03    0.00    0.00    0.00        A
Longest path cycles:          N/A     N/A     N/A     N/A      N/A

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

Shader properties
=================

Has uniform computation: true
Has side-effects: false
Modifies coverage: false
Uses late ZS test: false
Uses late ZS update: false
Reads color buffer: false

}

/// Flip coordinates if If `y_coord_scale` < 0.0.
vec2 IPRemapCoords(vec2 coords, float y_coord_scale) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is actually that useful. Decreases arthimetic unit usage, increases varying or load/store unit usage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ends up with more effect on shaders like gaussian blur that sample N times per fragment

@chinmaygarde chinmaygarde changed the title [impeller] improve morphology performance [Impeller] improve morphology performance Nov 29, 2022
@jonahwilliams
Copy link
Contributor Author

We're going to hold off on this until we have better support for shader variants in the engine code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

No open projects
Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants