[SPARK-53469][SQL] Ability to cleanup shuffle in Thrift server #52213

karuppayya · 2025-09-03T05:40:49Z

What changes were proposed in this pull request?

We have the ability top clean up shuffle in spark.sql.classic.shuffleDependency.fileCleanup.enabled.
Honoring this in Thrift server and cleaning up shuffle.
Related PR comment here

Why are the changes needed?

This is to bring the behavior in par with other modes of sql execution(classic, connect)

Does this PR introduce any user-facing change?

No

How was this patch tested?

NA

Was this patch authored or co-authored using generative AI tooling?

No

karuppayya · 2025-09-03T05:41:55Z

sql/core/src/main/scala/org/apache/spark/sql/classic/SparkSession.scala

@@ -496,7 +496,8 @@ class SparkSession private(
          parsedPlan
        }
      }
-      Dataset.ofRows(self, plan, tracker)
+      Dataset.ofRows(self, plan, tracker,


SparkExecuteStatementOperation's code eventually leads here.

karuppayya · 2025-09-03T05:42:32Z

cc: @cloud-fan

karuppayya · 2025-09-09T01:13:09Z

@cloud-fan Can you help review this change

karuppayya · 2025-09-15T16:59:58Z

@cloud-fan Can you please help review this chnage

cloud-fan · 2025-09-16T08:24:19Z

sql/core/src/test/scala/org/apache/spark/sql/InjectRuntimeFilterSuite.scala

@@ -205,6 +206,7 @@ class InjectRuntimeFilterSuite extends QueryTest with SQLTestUtils with SharedSp
    sql("analyze table bf5part compute statistics for columns a5, b5, c5, d5, e5, f5")
    sql("analyze table bf5filtered compute statistics for columns a5, b5, c5, d5, e5, f5")

+    conf.setConf(SQLConf.CLASSIC_SHUFFLE_DEPENDENCY_FILE_CLEANUP_ENABLED, false)


can we add some comments to explain it?

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

...hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala

cloud-fan · 2025-09-22T11:27:18Z

sql/core/src/test/scala/org/apache/spark/sql/InjectRuntimeFilterSuite.scala

@@ -205,6 +206,9 @@ class InjectRuntimeFilterSuite extends QueryTest with SQLTestUtils with SharedSp
    sql("analyze table bf5part compute statistics for columns a5, b5, c5, d5, e5, f5")
    sql("analyze table bf5filtered compute statistics for columns a5, b5, c5, d5, e5, f5")

+    // Tests depend on intermediate results that would otherwise be cleaned up when


This seems a red light to me. Runtime filter is a very powerful optimization and we should make sure the shuffle cleanup won't break it.

Excellent catch! Thanks @cloud-fan

I don't believe the root cause is with shuffle file cleanup itself, but rather with how Adaptive Query Execution handles subquery execution/synchronization.

During codegen phase, FilterExec looks for subquery results(bloom filter) but doesn't find them (at times), so it skips using the Bloom filter optimization.
The lazy val gets populated based on the subquery execution result ie null if had not complete, a bloom filter otherwise. This is then later used in codegen

// The bloom filter created from `bloomFilterExpression`. @transient private lazy val bloomFilter = { val bytes = bloomFilterExpression.eval().asInstanceOf[Array[Byte]] if (bytes == null) null else deserialize(bytes) }

The main query finishes execution while the subquery is still running in the background (separate execution context).

As part of query completion, shuffle cleanup removes all shuffle files, including those needed by the still-running subquery(while subquery results are also no longer needed as main query has completed, this is a bug in that it doesn't use the bloom filters)

Subquery execution (that had started earlier) fails with FetchFailedException trying to access cleaned-up shuffle data.

This suites verifies only the logical plan for the presence of BloomfilterAggregate and does not the verify if the code indeed used Bllom filter based filtering.

This can be easily reproduced by running this suite. (Its not consistent, and fails based on when the subquery completes. But I am sure atleast one test would fail and cause a ripple and fail subsequent tests since sc gets stopped)

I added loggers to prove and verify that its a bug in this commit

Output

karuppayyar: suite run 1 start karuppayyar: subquery started 24 karuppayyar: query ended 24 karuppayyar: removing shuffle 6 karuppayyar: suite run 1 end karuppayyar: suite run 2 start karuppayyar: subquery started 25 karuppayyar: subquery ended 24 karuppayyar: query ended 25 karuppayyar: removing shuffle 8,9 karuppayyar: suite run 2 end karuppayyar: suite run 3 start 17:32:07.521 ERROR org.apache.spark.storage.ShuffleBlockFetcherIterator: Failed to create input stream from local block java.io.IOException: Error in reading FileSegmentManagedBuffer[file=/private/var/folders/tn/62m7jt2j2b7116x0q6wtzg0c0000gn/T/blockmgr-72dd6798-f43d-48a7-8d4c-0a9c44ba09a9/35/shuffle_8_38_0.data,offset=0,length=5195] at org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:110)

Ideally it should looks like this(ie with adpative disabled) ie main query starts-> subqueries execute and completes-> main query starts executyion and completes

karuppayyar: suite run 1 start karuppayyar: subquery started 24 karuppayyar: subquery ended 24 karuppayyar: query ended 24 karuppayyar: removing shuffle 7,8 karuppayyar: suite run 1 end

Every subquery should end before query ends.
You can see that subquery execution doesnot complete before the main query ends and therein not using the subquery result.

The side effect of removing shuffle is that when main query completes, it removes the shuffle of subquery(which has not completed and its result is no longer useful) and subquery execution fails with FetchFailure like above when it tries to run to completion. This helped surfacing the issue.

I am not sure if this is the case with all subqueries(looks like that), this could result in correctness issues cc: @dongjoon-hyun too.

@cloud-fan @dongjoon-hyun Do you thinks its a bug(in which case i can attempt a fix) or am i missing something here?

If the subquery result is no longer needed, we can swallow any error from it?

I think there is an issue with Inject Runtime Filters and Adaptive.
The subquery should populate the bloom filter before the actual query runs.
But when adpative is enabled, the query doesnt wait for the subquery results which is the actual issue.
(This is not related to this PR itself, instead a completely different issue IMO. But this PR cannot be merged before the subquery issue is fixed )

Sorry I didn't completely follow the conclusion. Spark local mode is not a testing mode as users can run Spark locally as a single node engine to do some real work. Can we fix this issue?

Make sense. I'll open a fix later.

Opend the fix #52606. PTAL. @cloud-fan @karuppayya

Thanks @Ngone51. I did a pass of the PR.
I also verified the chnage withe InjectRuntimeFilterSuite and reverted my testdata chnages. (I will retrigger test once the changes are merged)

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

karuppayya · 2025-10-02T14:12:40Z

@cloud-fan can you please take a look.

karuppayya · 2025-10-02T18:28:09Z

cc: @somani (long time!) since it touches the Runtime filter tests.
tl; dr: Fixing the test data to return atlease one row to get all the tests run. Even without it queries run, but subquery runs behind the scenes, even after main query completes.

cloud-fan · 2025-10-09T04:34:55Z

just back from the holiday. @karuppayya can you take a look at #52213 (comment) ?

…d task completion triggered by eager shuffle cleanup ### What changes were proposed in this pull request? This PR proposes to explicitly handle the `SparkException` thrown by the shuffle statues operations on the non-existent shuffle ID to avoid crashing the `SparkContext`. ### Why are the changes needed? When the main query completes, we cleanup its shuffle statuses and the data files. If there is subquery ongoing before it gets completely cancelled, the subquery can throw `SparkException` from `DAGScheduler` due to the operations (e.g., `MapOutputTrackerMaster.registerMapOutput()`) on the non-existent shuffle ID. And this unexpected exception can crash the `SparkContext`. See the detailed discussion at #52213 (comment). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52606 from Ngone51/fix-local-shuffle-cleanup. Lead-authored-by: Yi Wu <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…d task completion triggered by eager shuffle cleanup ### What changes were proposed in this pull request? This PR proposes to explicitly handle the `SparkException` thrown by the shuffle statues operations on the non-existent shuffle ID to avoid crashing the `SparkContext`. ### Why are the changes needed? When the main query completes, we cleanup its shuffle statuses and the data files. If there is subquery ongoing before it gets completely cancelled, the subquery can throw `SparkException` from `DAGScheduler` due to the operations (e.g., `MapOutputTrackerMaster.registerMapOutput()`) on the non-existent shuffle ID. And this unexpected exception can crash the `SparkContext`. See the detailed discussion at #52213 (comment). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52606 from Ngone51/fix-local-shuffle-cleanup. Lead-authored-by: Yi Wu <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 9a37c3d) Signed-off-by: Wenchen Fan <[email protected]>

karuppayya · 2025-11-12T05:38:07Z

@Ngone51 @cloud-fan I rebased this PR with the fix . Please take a look

cloud-fan · 2025-11-14T04:12:11Z

@karuppayya can you fix the merge conflicts?

karuppayya · 2025-11-18T19:46:26Z

@cloud-fan I have rebased my code and fixed the conflicts. When you get a chance

cloud-fan · 2025-11-19T02:54:06Z

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala


  setupTestData()

+  protected override def beforeAll(): Unit = {


nit: we can override sparkConf with super.sparkConf.set(CLASSIC_SHUFFLE_DEPENDENCY_FILE_CLEANUP_ENABLED, false) to have special config for this test suite

BTW, do we have a concrete example about why this test suite can't clean up shuffle files?

The emanates from the the folwoing method org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite#checkNumLocalShuffleReads

private def checkNumLocalShuffleReads( plan: SparkPlan, numShufflesWithoutLocalRead: Int = 0): Unit = { val numShuffles = collect(plan) { case s: ShuffleQueryStageExec => s }.length val numLocalReads = collect(plan) { case read: AQEShuffleReadExec if read.isLocalRead => read } numLocalReads.foreach { r => val rdd = r.execute() val parts = rdd.partitions assert(parts.forall(rdd.preferredLocations(_).nonEmpty)) } assert(numShuffles === (numLocalReads.length + numShufflesWithoutLocalRead)) }

Specifically, rdd.preferredLocations(_).nonEmpty) will be empty after the cleanup (aftercollect() executes).
When shuffle clean up is enabled, this will always be empty.

As for concrete example, almost all test in this suite use this method and fail from that assertion.
( its actually a race between when the shuffle cleanup happens and when this assertion executes)

scala.Predef.refArrayOps[org.apache.spark.Partition](parts).forall(((x$1: org.apache.spark.Partition) => rdd.preferredLocations(x$1).nonEmpty)) was false ScalaTestFailureLocation: org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite at (AdaptiveQueryExecSuite.scala:221) org.scalatest.exceptions.TestFailedException: scala.Predef.refArrayOps[org.apache.spark.Partition](parts).forall(((x$1: org.apache.spark.Partition) => rdd.preferredLocations(x$1).nonEmpty)) was false at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) at org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite.$anonfun$checkNumLocalShuffleReads$1(AdaptiveQueryExecSuite.scala:221) at scala.collection.immutable.List.foreach(List.scala:323) at org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite.checkNumLocalShuffleReads(AdaptiveQueryExecSuite.scala:218)

we can override sparkConf

I tried setting it on SparkConf, but doesn't seem to take effect.

I guess SparkConf is read when creating the SparkSession(and SQLConf), and setting on spark conf later is ineffective for sql execution(since it looks into SQLConf). Let me know if i am missing something

are you following other test suites like https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala#L137-L144 ?

Thanks for the pointer. Fixed
(I was setting it inside beforeAll earlier)

...hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala

cloud-fan · 2025-11-20T03:48:41Z

@karuppayya After a second thought, I think we should add a new config for thriftserver to enable shuffle cleanup. Classic is special that DF reuse may likely happen, but thriftserver is like Spark Connect and can safely enable shuffle cleanup.

karuppayya · 2025-11-20T06:06:26Z

but thriftserver is like Spark Connect and can safely enable shuffle cleanup

I will add the new configuration.
With this, we'll have three configs covering native, Connect, and Thrift modes. (and another which was fallsback to Connect's)
Since both Spark Connect and Thriftserver share the same rationale for shuffle cleanup, and because an application cannot utilize both modes simultaneously, it seems logical to consolidate them into a single configuration?

…d task completion triggered by eager shuffle cleanup ### What changes were proposed in this pull request? This PR proposes to explicitly handle the `SparkException` thrown by the shuffle statues operations on the non-existent shuffle ID to avoid crashing the `SparkContext`. ### Why are the changes needed? When the main query completes, we cleanup its shuffle statuses and the data files. If there is subquery ongoing before it gets completely cancelled, the subquery can throw `SparkException` from `DAGScheduler` due to the operations (e.g., `MapOutputTrackerMaster.registerMapOutput()`) on the non-existent shuffle ID. And this unexpected exception can crash the `SparkContext`. See the detailed discussion at apache#52213 (comment). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52606 from Ngone51/fix-local-shuffle-cleanup. Lead-authored-by: Yi Wu <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

cloud-fan · 2025-11-24T06:28:17Z

Given Spark Connect JDBC is being added, I think thriftserver will be deprecated eventually. I'd prefer to add a new config for thriftsever, to keep the spark connect config name simpler.

…d task completion triggered by eager shuffle cleanup ### What changes were proposed in this pull request? This PR proposes to explicitly handle the `SparkException` thrown by the shuffle statues operations on the non-existent shuffle ID to avoid crashing the `SparkContext`. ### Why are the changes needed? When the main query completes, we cleanup its shuffle statuses and the data files. If there is subquery ongoing before it gets completely cancelled, the subquery can throw `SparkException` from `DAGScheduler` due to the operations (e.g., `MapOutputTrackerMaster.registerMapOutput()`) on the non-existent shuffle ID. And this unexpected exception can crash the `SparkContext`. See the detailed discussion at apache#52213 (comment). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52606 from Ngone51/fix-local-shuffle-cleanup. Lead-authored-by: Yi Wu <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

Remove comment

karuppayya · 2025-11-26T07:17:55Z

@cloud-fan Added a new config for thrift-server. Ready for review.

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

karuppayya · 2025-11-26T21:41:39Z

Thanks @cloud-fan for the review.
Thanks @Ngone51 for the fixing the issues.

cloud-fan · 2025-11-27T03:34:16Z

thanks, merging to master!

github-actions bot added the SQL label Sep 3, 2025

karuppayya commented Sep 3, 2025

View reviewed changes

HyukjinKwon changed the title ~~[SPARK-53469] Ability to cleanup shuffle in Thrift server~~ [SPARK-53469][SQL] Ability to cleanup shuffle in Thrift server Sep 3, 2025

cloud-fan reviewed Sep 16, 2025

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Sep 16, 2025

View reviewed changes

...hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala Outdated Show resolved Hide resolved

karuppayya requested a review from cloud-fan September 21, 2025 04:17

karuppayya force-pushed the SPARK-53469 branch from 2c6849a to 2841ec3 Compare September 22, 2025 06:39

cloud-fan reviewed Sep 22, 2025

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala Outdated Show resolved Hide resolved

karuppayya force-pushed the SPARK-53469 branch 5 times, most recently from 687e70a to 23176ed Compare September 30, 2025 04:19

karuppayya requested a review from cloud-fan September 30, 2025 22:24

Ngone51 mentioned this pull request Oct 14, 2025

[SPARK-53898][CORE] Fix race conditions between query cancellation and task completion triggered by eager shuffle cleanup #52606

Closed

karuppayya force-pushed the SPARK-53469 branch from fee6ad9 to 5db5af6 Compare November 11, 2025 18:37

karuppayya force-pushed the SPARK-53469 branch from 5db5af6 to dbed5ce Compare November 14, 2025 14:56

cloud-fan reviewed Nov 19, 2025

View reviewed changes

...hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala Outdated Show resolved Hide resolved

karuppayya requested a review from cloud-fan November 19, 2025 07:06

karuppayya force-pushed the SPARK-53469 branch from 2c2ce10 to 37ff7a1 Compare November 25, 2025 22:32

karuppayya added 8 commits November 25, 2025 17:40

[SPARK-53469] Abiliy to cleanup shuffle in Thrift server

af0fea9

Fix test

5d5be90

Fix test: IjectRuntimeFilterSuite

d292715

Address review comment: add code comment

038f3cd

Workaround to make test pass, correct fix is to fix adaptive execution

600793d

Revert testdata changes

41ccfde

Address review comment

3733b97

Address review comments: Disable shuffle cleanup in spark conf

0d6d35b

karuppayya force-pushed the SPARK-53469 branch 3 times, most recently from 962eb0f to d92cb57 Compare November 26, 2025 01:42

Address review comments

7e7d2e0

Remove comment

karuppayya force-pushed the SPARK-53469 branch from d92cb57 to 7e7d2e0 Compare November 26, 2025 01:49

cloud-fan reviewed Nov 26, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Outdated Show resolved Hide resolved

cloud-fan approved these changes Nov 26, 2025

View reviewed changes

Address review comment: Use thriftserver instead of thrift

64a3a9e

cloud-fan closed this in 8940dad Nov 27, 2025

[SPARK-53469][SQL] Ability to cleanup shuffle in Thrift server #52213

[SPARK-53469][SQL] Ability to cleanup shuffle in Thrift server #52213

Uh oh!

Conversation

karuppayya commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karuppayya commented Sep 3, 2025

Uh oh!

karuppayya commented Sep 9, 2025

Uh oh!

karuppayya commented Sep 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karuppayya Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karuppayya Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karuppayya Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karuppayya Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

karuppayya commented Oct 2, 2025

Uh oh!

karuppayya commented Oct 2, 2025

Uh oh!

cloud-fan commented Oct 9, 2025

Uh oh!

karuppayya commented Nov 12, 2025

Uh oh!

cloud-fan commented Nov 14, 2025

Uh oh!

karuppayya commented Nov 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

karuppayya commented Sep 3, 2025 •

edited

Loading

karuppayya Sep 23, 2025 •

edited

Loading

karuppayya Sep 24, 2025 •

edited

Loading

karuppayya Sep 24, 2025 •

edited

Loading

karuppayya Oct 14, 2025 •

edited

Loading