Skip to content

Commit ecb48cc

Browse files
committed
[SPARK-35381][R] Fix lambda variable name issues in nested higher order functions at R APIs
### What changes were proposed in this pull request? This PR fixes the same issue as #32424 ```r df <- sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') as letters") collect(select( df, array_transform("numbers", function(number) { array_transform("letters", function(latter) { struct(alias(number, "n"), alias(latter, "l")) }) }) )) ``` **Before:** ``` ... a, a, b, b, c, c, a, a, b, b, c, c, a, a, b, b, c, c ``` **After:** ``` ... 1, a, 1, b, 1, c, 2, a, 2, b, 2, c, 3, a, 3, b, 3, c ``` ### Why are the changes needed? To produce the correct results. ### Does this PR introduce _any_ user-facing change? Yes, it fixes the results to be correct as mentioned above. ### How was this patch tested? Manually tested as above, and unit test was added. Closes #32517 from HyukjinKwon/SPARK-35381. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
1 parent 7e3446a commit ecb48cc

File tree

2 files changed

+20
-1
lines changed

2 files changed

+20
-1
lines changed

R/pkg/R/functions.R

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3670,7 +3670,12 @@ unresolved_named_lambda_var <- function(...) {
36703670
"org.apache.spark.sql.Column",
36713671
newJObject(
36723672
"org.apache.spark.sql.catalyst.expressions.UnresolvedNamedLambdaVariable",
3673-
list(...)
3673+
lapply(list(...), function(x) {
3674+
handledCallJStatic(
3675+
"org.apache.spark.sql.catalyst.expressions.UnresolvedNamedLambdaVariable",
3676+
"freshVarName",
3677+
x)
3678+
})
36743679
)
36753680
)
36763681
column(jc)

R/pkg/tests/fulltests/test_sparkSQL.R

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2161,6 +2161,20 @@ test_that("higher order functions", {
21612161
expect_error(array_transform("xs", function(...) 42))
21622162
})
21632163

2164+
test_that("SPARK-34794: lambda vars must be resolved properly in nested higher order functions", {
2165+
df <- sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') as letters")
2166+
ret <- first(select(
2167+
df,
2168+
array_transform("numbers", function(number) {
2169+
array_transform("letters", function(latter) {
2170+
struct(alias(number, "n"), alias(latter, "l"))
2171+
})
2172+
})
2173+
))
2174+
2175+
expect_equal(1, ret[[1]][[1]][[1]][[1]]$n)
2176+
})
2177+
21642178
test_that("group by, agg functions", {
21652179
df <- read.json(jsonPath)
21662180
df1 <- agg(df, name = "max", age = "sum")

0 commit comments

Comments
 (0)