Skip to content

Conversation

@tusaryan
Copy link
Contributor

@tusaryan tusaryan commented May 1, 2025

Purpose of the pull request

This Pull Request implements a new configuration option max.concurrent.workflow.instances in the master server, allowing administrators to set a limit on the number of concurrently running workflow instances on a single master. This addresses the issue of potential overload when multiple workflows failover to a single master, as described in #17157.

close #17157

Brief change log

  • Feature: Implemented a new configuration property, master.server-load-protection.max-concurrent-workflow-instances, to limit concurrent workflows. The master is marked as busy if the number of running workflows is greater than or equal to this value.

  • Enhancement: Refactored MasterServerLoadProtection to incorporate system resource thresholds based on community feedback. The following configuration properties are now utilized to provide a more robust server protection mechanism:

    • maxSystemCpuUsagePercentageThresholds
    • maxJvmCpuUsagePercentageThresholds
    • maxSystemMemoryUsagePercentageThresholds
    • maxDiskUsagePercentageThresholds
  • Bug Fix: Resolved a NullPointerException that occurred during MasterServer startup by correctly using @Autowired to inject the MasterServerLoadProtection bean into MasterConfig.

  • Refactor:

    • Created MasterServerLoadProtectionConfig to manage the bean's lifecycle and properties via Spring.
    • Replaced the use of WorkflowCacheRepository with the IWorkflowRepository interface for better abstraction and testability.
  • Documentation: Updated the application.yml file and the configuration.md contributor documentation to include all new load protection options with clear descriptions. Example:

    master:
      server-load-protection:
        # Master max concurrent workflow instances. When the count reaches or exceeds this value, the master is marked busy.
        max-concurrent-workflow-instances: 100  #Set your desired limit 

Replace 100 with any integer value you want as the limit.

Verify this pull request

This change is covered by updated unit tests in MasterServerLoadProtectionTest.java to include comprehensive test cases verifying the new functionality for both the workflow instance limit and the system resource thresholds:

  • Workflow Limit Test: A dedicated test (isOverloadWithMaxConcurrentWorkflowInstances) verifies the new limit:
    • When workflow count (e.g., 5) is less than the limit (e.g., 10), the master is not overloaded.
    • When workflow count (e.g., 5) is equal to or greater than the limit (e.g., 5 or 3), the master is overloaded.
  • System Metrics Test: Tests have been updated to confirm that the master is correctly marked as overloaded when any of the system resource thresholds (CPU, memory, disk) are exceeded.

fix issue: [Improvement][Master] Support set max.concurrent.workflow.instances in master #17157

Pull Request Notice

Pull Request Notice

If your pull request contains incompatible change, you should also add it to docs/docs/en/guide/upgrade/incompatible.md

@tusaryan tusaryan marked this pull request as draft May 1, 2025 14:50
@tusaryan tusaryan marked this pull request as ready for review May 1, 2025 14:50
@SbloodyS SbloodyS changed the title [Fix-17157][Master] Support setting max.concurrent.workflow.instances [Improvement-17157][Master] Support setting max.concurrent.workflow.instances May 5, 2025
@SbloodyS SbloodyS added the improvement make more easy to user or prompt friendly label May 5, 2025
@SbloodyS SbloodyS added this to the 3.3.1 milestone May 5, 2025
@sonarqubecloud
Copy link

sonarqubecloud bot commented May 5, 2025

Copy link
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and it's better to expose the activeWorkflowInstanceCount in MasterHeartBeat.

@tusaryan
Copy link
Contributor Author

tusaryan commented May 7, 2025

Can anyone review the changes again

@tusaryan tusaryan requested a review from ruanwenjun May 7, 2025 21:29
@PostConstruct
public void init() {
MasterServerLoadProtection serverLoadProtection = masterConfig.getServerLoadProtection();
serverLoadProtection.setWorkflowRepository(workflowRepository);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to inject at the constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this resolved? Why mark this as resolved?

@tusaryan tusaryan requested a review from ruanwenjun May 8, 2025 08:53
@nielifeng nielifeng requested a review from Copilot May 9, 2025 09:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new configuration option "max.concurrent.workflow.instances" to limit the number of concurrently running workflow instances on a single master. Key changes include adding a new property in the YAML configuration, updating MasterServerLoadProtection to use IWorkflowRepository for counting workflow instances, and incorporating tests to verify overload behavior.

  • Added test cases in MasterServerLoadProtectionTest.java to validate overload conditions.
  • Updated MasterServerLoadProtection.java to enforce the new concurrent workflow limit.
  • Modified MasterServerLoadProtectionConfig.java to inject the IWorkflowRepository dependency.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
dolphinscheduler-master/src/test/java/org/apache/dolphinscheduler/server/master/config/MasterServerLoadProtectionTest.java Added test cases for the overload behavior with the new configuration option.
dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/config/MasterServerLoadProtectionConfig.java Updated to inject IWorkflowRepository into the load protection component.
dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/config/MasterServerLoadProtection.java Integrated the new "max.concurrent.workflow.instances" property checking and logging for overload conditions.
Comments suppressed due to low confidence (1)

dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/config/MasterServerLoadProtection.java:52

  • [nitpick] The log message contains a minor typo with 'over then' which could be corrected to 'exceeds' for better clarity.
log.info("OverLoad: the workflow instance count: {} is over then the maxConcurrentWorkflowInstances {}", currentWorkflowInstanceCount, maxConcurrentWorkflowInstances);

@ruanwenjun
Copy link
Member

Please solve the code check style by mvn spotless:apply

@tusaryan
Copy link
Contributor Author

I wanted to let you know that I'm currently traveling. Because of this, I might be a bit delayed in addressing the feedback on this pull request. I will definitely get to it as soon as things settle down for me.

@tusaryan
Copy link
Contributor Author

Please solve the code check style by mvn spotless:apply

Code style issues have been fixed by running mvn spotless:apply. Please let me know if there's anything else I can do!

@ruanwenjun ruanwenjun force-pushed the improvement/17157-master-concurrent-limit branch from 0e05e2b to 8b5361f Compare May 16, 2025 10:24
ruanwenjun
ruanwenjun previously approved these changes May 22, 2025
Copy link
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ruanwenjun ruanwenjun force-pushed the improvement/17157-master-concurrent-limit branch from bd4d946 to 4fdac71 Compare May 27, 2025 01:27
@Bean
public MasterServerLoadProtection masterServerLoadProtection(
IWorkflowRepository workflowRepository,
@Value("${master.server-load-protection.max-concurrent-workflow-instances:2147483647}") int maxConcurrentWorkflowInstances) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add this config in doc, and you need to set other fields in MasterServerLoadProtection, e.g. maxSystemCpuUsagePercentageThresholds, maxJvmCpuUsagePercentageThresholds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add this config in doc, and you need to set other fields in MasterServerLoadProtection, e.g. maxSystemCpuUsagePercentageThresholds, maxJvmCpuUsagePercentageThresholds.

Thanks for the feedback! I’ll get to it as soon as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I will update it within 1-2 days.

@ruanwenjun ruanwenjun force-pushed the improvement/17157-master-concurrent-limit branch from 4fdac71 to 3f0ec17 Compare June 9, 2025 01:34
@tusaryan tusaryan requested a review from EricGao888 as a code owner June 17, 2025 15:53
@tusaryan
Copy link
Contributor Author

Please check the failed CI @tusaryan

Okay, I'll check it out.

@tusaryan tusaryan dismissed stale reviews from Gallardot, SbloodyS, and ruanwenjun via 5fa60cd June 18, 2025 15:35
@tusaryan
Copy link
Contributor Author

I've reviewed the logs, and the "Backend-Build" error appears unrelated to my changes. Could someone please help me with this?

tusaryan and others added 15 commits July 2, 2025 09:44
- Refactored MasterServerLoadProtection to require IWorkflowRepository and maxConcurrentWorkflowInstances via constructor injection.
- Updated MasterServerLoadProtectionConfig to provide these dependencies using Spring @bean and @value with a default of 2147483647.
- Removed direct instantiation of MasterServerLoadProtection in MasterConfig to avoid missing constructor arguments.
…onTest

Stub IWorkflowRepository mock in tests to prevent NullPointerException when checking server load.
…terServerLoadProtection

- Updated Spring test configuration in MasterConfigTest to properly autowire MasterServerLoadProtection and its dependencies.
- Added mock bean for IWorkflowRepository in tests.
- Provided test property overrides for load protection thresholds.
@ruanwenjun ruanwenjun force-pushed the improvement/17157-master-concurrent-limit branch from 2ebd911 to 2aedb46 Compare July 2, 2025 01:44
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jul 2, 2025

Please retry analysis of this Pull-Request directly on SonarQube Cloud

Copy link
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@SbloodyS SbloodyS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@SbloodyS SbloodyS merged commit 3b1d631 into apache:dev Jul 2, 2025
69 of 70 checks passed
eco8848 pushed a commit to eco8848/dolphinscheduler that referenced this pull request Aug 8, 2025
davidzollo pushed a commit to davidzollo/dolphinscheduler that referenced this pull request Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend document improvement make more easy to user or prompt friendly test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement][Master] Support set max.concurrent.workflow.instances in master

5 participants