Skip to content

Commit 385f1e8

Browse files
committed
[SPARK-34768][SQL] Respect the default input buffer size in Univocity
### What changes were proposed in this pull request? This PR proposes to follow Univocity's input buffer. ### Why are the changes needed? - Firstly, it's best to trust their judgement on the default values. Also 128 is too low. - Default values arguably have more test coverage in Univocity. - It will also fix uniVocity/univocity-parsers#449 - ^ is a regression compared to Spark 2.4 ### Does this PR introduce _any_ user-facing change? No. In addition, It fixes a regression. ### How was this patch tested? Manually tested, and added a unit test. Closes apache#31858 from HyukjinKwon/SPARK-34768. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
1 parent 1a4971d commit 385f1e8

File tree

2 files changed

+11
-3
lines changed
  • sql
    • catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv
    • core/src/test/scala/org/apache/spark/sql/execution/datasources/csv

2 files changed

+11
-3
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -166,8 +166,6 @@ class CSVOptions(
166166

167167
val quoteAll = getBool("quoteAll", false)
168168

169-
val inputBufferSize = 128
170-
171169
/**
172170
* The max error content length in CSV parser/writer exception message.
173171
*/
@@ -259,7 +257,6 @@ class CSVOptions(
259257
settings.setIgnoreLeadingWhitespaces(ignoreLeadingWhiteSpaceInRead)
260258
settings.setIgnoreTrailingWhitespaces(ignoreTrailingWhiteSpaceInRead)
261259
settings.setReadInputOnSeparateThread(false)
262-
settings.setInputBufferSize(inputBufferSize)
263260
settings.setMaxColumns(maxColumns)
264261
settings.setNullValue(nullValue)
265262
settings.setEmptyValue(emptyValueInRead)

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2452,6 +2452,17 @@ abstract class CSVSuite
24522452
assert(result.sameElements(exceptResults))
24532453
}
24542454
}
2455+
2456+
test("SPARK-34768: counting a long record with ignoreTrailingWhiteSpace set to true") {
2457+
val bufSize = 128
2458+
val line = "X" * (bufSize - 1) + "| |"
2459+
withTempPath { path =>
2460+
Seq(line).toDF.write.text(path.getAbsolutePath)
2461+
assert(spark.read.format("csv")
2462+
.option("delimiter", "|")
2463+
.option("ignoreTrailingWhiteSpace", "true").load(path.getAbsolutePath).count() == 1)
2464+
}
2465+
}
24552466
}
24562467

24572468
class CSVv1Suite extends CSVSuite {

0 commit comments

Comments
 (0)