Skip to content

[Feature][Sort] Adjust sort resources according to data scale #7056

@featzhang

Description

@featzhang

Description

Currently, the total amount of resources for the Flink Sort Job comes from the configuration file flink-sort-plugin.properties, so all submitted sort jobs will use the same amount of resources. When the data scale is large, the resources will be insufficient. When the data scale is small, the resources are wasted.

# Flink parallelism
flink.parallelism=1

Therefore, dynamically adjusting the number of resources according to the amount of data is one of the urgently needed functions

Resource Adaptive Adjustment

Theoretically, the processing performance of Flink can reach about 1000/second/core, of course, it depends on factors such as state-backed.

Influencing factors:

  • Data scale:
  • Storage IO bottleneck:
    When the performance of a single client connection to external storage becomes a bottleneck, it is a good idea to increase the degree of parallelism or the number of threads
  • Transformation computational complexity:
    In the case of a fixed LoadNode, it is a deterministic factor
  • Advance factors:
    core/task manager, parallelism/core, and so on.

Use case

No response

Are you willing to submit PR?

  • Yes, I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions