Skip to content

chore: Create simple fuzz test as part of test suite#1610

Merged
andygrove merged 17 commits intoapache:mainfrom
andygrove:fuzz-complex-types
Apr 8, 2025
Merged

chore: Create simple fuzz test as part of test suite#1610
andygrove merged 17 commits intoapache:mainfrom
andygrove:fuzz-complex-types

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Apr 4, 2025

Which issue does this PR close?

Part of #1373
Closes #1289
Closes #1613
Closes #1611 (actually, I can no longer reproduce this issue)

Rationale for this change

This is a starting point for fuzz testing as part of our standard test suite.

What changes are included in this PR?

How are these changes tested?

_: BigIntVector | _: Float4Vector | _: Float8Vector | _: VarCharVector |
_: DecimalVector | _: DateDayVector | _: TimeStampMicroTZVector | _: VarBinaryVector |
_: FixedSizeBinaryVector | _: TimeStampMicroVector | _: StructVector) =>
_: FixedSizeBinaryVector | _: TimeStampMicroVector | _: StructVector | _: ListVector) =>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix for #1289

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, need to include MapType as part of #1603

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add MapVector here as well without any ill effects.

@andygrove andygrove marked this pull request as ready for review April 4, 2025 21:54
@codecov-commenter
Copy link

codecov-commenter commented Apr 4, 2025

Codecov Report

❌ Patch coverage is 83.33333% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.97%. Comparing base (f09f8af) to head (bf25dcb).
⚠️ Report is 834 commits behind head on main.

Files with missing lines Patch % Lines
...la/org/apache/comet/testing/ParquetGenerator.scala 85.71% 0 Missing and 2 partials ⚠️
.../scala/org/apache/spark/sql/comet/util/Utils.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1610      +/-   ##
============================================
+ Coverage     56.12%   58.97%   +2.84%     
- Complexity      976     1074      +98     
============================================
  Files           119      125       +6     
  Lines         11743    12522     +779     
  Branches       2251     2339      +88     
============================================
+ Hits           6591     7385     +794     
+ Misses         4012     3971      -41     
- Partials       1140     1166      +26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @andygrove

Make sure `SPARK_HOME` points to the same Spark version as Comet was built for.

```console
```shell
Copy link
Member Author

@andygrove andygrove Apr 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes an issue where previously it was not possible to copy and paste the $ in $SPARK_HOME when copying from the published documentation.

Comment on lines +2462 to +2465
if (groupingExprs.exists(_.isEmpty)) {
withInfo(op, "Not all grouping expressions are supported")
return None
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes #1613

@andygrove
Copy link
Member Author

@parthchandra @mbutrovich could you review?

Range(0, numRows).map(_ => new java.sql.Date(1716645600011L + r.nextInt()))
case DataTypes.TimestampType =>
Range(0, numRows).map(_ => new Timestamp(1716645600011L + r.nextInt()))
case DataTypes.TimestampNTZType =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what this magic number is? If so could we define it as a variable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. It was approximately the current date when I worked on this code. I added a variable with comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you could just use SimpleDateFormat("YYYY-MM-DD hh:mm:ss").parse("1980-12-08 23:15:00") ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

val random = new Random(42)
withSQLConf(
CometConf.COMET_ENABLED.key -> "false",
SQLConf.SESSION_LOCAL_TIMEZONE.key -> "Asia/Kathmandu" /* UTC+5:45 */ ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document that his is a non-arbitrary timezone choice. I know @parthchandra said it's a tricky timezone due to the unique offset from UTC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also makes a great trivia question :)

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Non binding comments.

Range(0, numRows).map(_ => new java.sql.Date(1716645600011L + r.nextInt()))
case DataTypes.TimestampType =>
Range(0, numRows).map(_ => new Timestamp(1716645600011L + r.nextInt()))
case DataTypes.TimestampNTZType =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you could just use SimpleDateFormat("YYYY-MM-DD hh:mm:ss").parse("1980-12-08 23:15:00") ?

_: BigIntVector | _: Float4Vector | _: Float8Vector | _: VarCharVector |
_: DecimalVector | _: DateDayVector | _: TimeStampMicroTZVector | _: VarBinaryVector |
_: FixedSizeBinaryVector | _: TimeStampMicroVector | _: StructVector) =>
_: FixedSizeBinaryVector | _: TimeStampMicroVector | _: StructVector | _: ListVector) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add MapVector here as well without any ill effects.

val random = new Random(42)
withSQLConf(
CometConf.COMET_ENABLED.key -> "false",
SQLConf.SESSION_LOCAL_TIMEZONE.key -> "Asia/Kathmandu" /* UTC+5:45 */ ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also makes a great trivia question :)

@andygrove andygrove merged commit 4b577f8 into apache:main Apr 8, 2025
78 checks passed
@andygrove andygrove deleted the fuzz-complex-types branch April 8, 2025 12:43
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

5 participants