[Fix][Spark] Fix source parallelism not working with Spark engine #9319

joexjx · 2025-05-15T01:20:31Z

This bug was caused by EnvCommonOptions overriding SourceCommonOptions when setting the parallelism in the sparkRuntimeEnvironment.

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
If you are contributing the connector code, please check that the following files are updated:
1. Update plugin-mapping.properties and add new connector information in it
2. Update the pom file of seatunnel-dist
3. Add ci label in label-scope-conf
4. Add e2e testcase in seatunnel-e2e
5. Update connector plugin_config

Hisoka-X · 2025-05-15T11:27:23Z

.../src/main/java/org/apache/seatunnel/core/starter/spark/execution/SourceExecuteProcessor.java

+            envOption.put(EnvCommonOptions.PARALLELISM.key(), String.valueOf(parallelism));
            Dataset<Row> dataset =
                    sparkRuntimeEnvironment
                            .getSparkSession()
                            .read()
                            .format(SeaTunnelSource.class.getSimpleName())
-                            .option(EnvCommonOptions.PARALLELISM.key(), parallelism)
                            .option(


Thanks @joexjx . But in seatunnel, we recommend configuring parallelism in env rather than source configuration. This helps improve configuration consistency across different engines. Maybe you can help update the documentation to make the description clearer.

I understand. However, this code does contain a very simple and obvious error. envOption is a HashMap that already includes EnvCommonOptions.PARALLELISM.key(). The .option(EnvCommonOptions.PARALLELISM.key(), parallelism) call is being overwritten by .options(envOption), which is why the properly configured parallelism (determined earlier based on whether it comes from env or source) isn't taking effect.

Old Code

New Code

I removed .option(EnvCommonOptions.PARALLELISM.key(), ...) and updated envOptions with EnvCommonOptions.PARALLELISM in advance. This ensures that the parallelism value (which was already determined earlier based on whether it should be taken from env or source) takes effect correctly.

Hisoka-X · 2025-05-16T10:41:33Z

Seem like the commit history mess. Please rebase on dev.

Hisoka-X

Please add unit test case.

Hisoka-X · 2025-05-23T06:17:52Z

.../apache/seatunnel/core/starter/spark/singlesplitsource/SingleSplitSourceParallelismTest.java

+@Order(1)
+public class SingleSplitSourceParallelismTest {
+    @Test
+    public void testSourceParallelismIsOneButEnvParallelismIsNotOne()


The current test case is a bit overly complex. We just need to ensure that the DataSet returned by the execute method of SourceExecuteProcessor has the expected parallelism.

SourceExecuteProcessor processor = new SourceExecuteProcessor(); ... init Assertions.equals(10, processor.execute(xxx).get(0).getDataset().rdd().getNumPartitions());

@Hisoka-X Thank you for the guidance, I have added a simpler unit test.

Hisoka-X · 2025-05-26T02:19:25Z

waiting test case passes.

…ng with Spark Engine (apache#9302) This bug was caused by EnvCommonOptions overriding SourceCommonOptions when setting the parallelism in the sparkRuntimeEnvironment.

…sm Overriding Environment Parallelism in Spark Engine.

…lelism Config Works And Overrides Env Config

…ache#9319)

github-actions bot added the core SeaTunnel core module label May 15, 2025

joexjx changed the title ~~[Bugfix] [Connector-V2] Fix SourceCommonOptions Parallelism Not Working with Spark Engine (#9302)~~ [Bugfix] [Connector-V2] Fix SourceCommonOptions Parallelism Not Working with Spark Engine May 15, 2025

Hisoka-X reviewed May 15, 2025

View reviewed changes

github-actions bot added document connectors-v2 e2e labels May 16, 2025

joexjx force-pushed the seatunnel-9302 branch from c43913d to ff77b7b Compare May 19, 2025 02:08

github-actions bot removed document connectors-v2 e2e labels May 19, 2025

Hisoka-X approved these changes May 19, 2025

View reviewed changes

github-actions bot added the approved label May 19, 2025

Hisoka-X requested changes May 19, 2025

View reviewed changes

github-actions bot added reviewed document connectors-v2 e2e api and removed approved reviewed labels May 19, 2025

joexjx force-pushed the seatunnel-9302 branch from 1ced8e3 to c04fc31 Compare May 21, 2025 09:39

github-actions bot removed document connectors-v2 e2e api labels May 21, 2025

joexjx force-pushed the seatunnel-9302 branch 2 times, most recently from 6aaaba8 to a27d20c Compare May 21, 2025 12:28

Hisoka-X reviewed May 23, 2025

View reviewed changes

joexjx force-pushed the seatunnel-9302 branch from a27d20c to bf728e4 Compare May 25, 2025 09:31

joexjx force-pushed the seatunnel-9302 branch from b1d7836 to fc6be8e Compare May 25, 2025 10:49

joexjx force-pushed the seatunnel-9302 branch from fc6be8e to 582958b Compare May 26, 2025 07:56

joexjx added 3 commits May 28, 2025 09:48

[Bugfix] [Connector-V2] Fix SourceCommonOptions Parallelism Not Worki…

1c99ac8

…ng with Spark Engine (apache#9302) This bug was caused by EnvCommonOptions overriding SourceCommonOptions when setting the parallelism in the sparkRuntimeEnvironment.

[Bugfix] [Connector-V2] Add Unit Test for SingleSplitSource Paralleli…

1344c1c

…sm Overriding Environment Parallelism in Spark Engine.

[Bugfix] [Connector-V2] Add A Simpler Unit Test to Prove Source Paral…

c4f76fe

…lelism Config Works And Overrides Env Config

joexjx force-pushed the seatunnel-9302 branch from 582958b to c4f76fe Compare May 28, 2025 01:50

Hisoka-X approved these changes May 28, 2025

View reviewed changes

github-actions bot added approved reviewed labels May 28, 2025

Hisoka-X changed the title ~~[Bugfix] [Connector-V2] Fix SourceCommonOptions Parallelism Not Working with Spark Engine~~ [Fix][Spark] Fix source parallelism not working with Spark engine May 28, 2025

hailin0 approved these changes May 28, 2025

View reviewed changes

hailin0 merged commit bafd8e2 into apache:dev May 28, 2025
5 checks passed

dybyte pushed a commit to dybyte/seatunnel that referenced this pull request Jul 23, 2025

[Fix][Spark] Fix source parallelism not working with Spark engine (ap…

cc17ad9

…ache#9319)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fix][Spark] Fix source parallelism not working with Spark engine #9319

[Fix][Spark] Fix source parallelism not working with Spark engine #9319

Uh oh!

joexjx commented May 15, 2025 •

edited by Hisoka-X

Loading

Uh oh!

Hisoka-X May 15, 2025

Uh oh!

joexjx May 16, 2025 •

edited

Loading

Uh oh!

joexjx May 16, 2025 •

edited

Loading

Uh oh!

Hisoka-X commented May 16, 2025

Uh oh!

Hisoka-X left a comment

Uh oh!

Hisoka-X May 23, 2025

Uh oh!

joexjx May 26, 2025

Uh oh!

Hisoka-X commented May 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Fix][Spark] Fix source parallelism not working with Spark engine #9319

[Fix][Spark] Fix source parallelism not working with Spark engine #9319

Uh oh!

Conversation

joexjx commented May 15, 2025 • edited by Hisoka-X Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Uh oh!

Hisoka-X May 15, 2025

Choose a reason for hiding this comment

Uh oh!

joexjx May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joexjx May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Hisoka-X commented May 16, 2025

Uh oh!

Hisoka-X left a comment

Choose a reason for hiding this comment

Uh oh!

Hisoka-X May 23, 2025

Choose a reason for hiding this comment

Uh oh!

joexjx May 26, 2025

Choose a reason for hiding this comment

Uh oh!

Hisoka-X commented May 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joexjx commented May 15, 2025 •

edited by Hisoka-X

Loading

joexjx May 16, 2025 •

edited

Loading

joexjx May 16, 2025 •

edited

Loading