Skip to content

fix: hash join tests with forced collisions#11806

Merged
alamb merged 4 commits intoapache:mainfrom
korowa:hash-collisions-tests
Aug 6, 2024
Merged

fix: hash join tests with forced collisions#11806
alamb merged 4 commits intoapache:mainfrom
korowa:hash-collisions-tests

Conversation

@korowa
Copy link
Contributor

@korowa korowa commented Aug 4, 2024

Which issue does this PR close?

Closes #11658.

Rationale for this change

Some tests in hash_join.rs check the number of output batches, and since output batch for HashJoinExec is created while matching join keys against hash table built on build-side data (which includes only matching by hashes, comparing actual values is a next step in join execution), these tests are failing with forced collisions (due to equal hashes, all records are considered as matching).

This PR recovers hash join tests by adjusting the numbers of expected batches in case test is running with force_hash_collisions feature.

What changes are included in this PR?

Different numbers of expected output batches for force_hash_collisions feature in hash join tests.

Are these changes tested?

Yes

Are there any user-facing changes?

No

// With hash collisions enabled, all records will match each other
// and filtered later.
// Expected number of matches = 9
div_ceil(9, batch_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering will it be 9 for all batch sizes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since create_hashes always sets 0 as hash values, with force-collisions feature enabled, the total number of hash table matches will be equal to left.row_count() * right.row_count() for each batch size. The result of the output batch count calculation depends on the batch size though, and it's based on the number of matches.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @korowa

@alamb alamb merged commit 4e278ca into apache:main Aug 6, 2024
@alamb
Copy link
Contributor

alamb commented Aug 6, 2024

Thanks again @korowa and @comphead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

join_inner_two, join_inner_one_two_parts_left and join_inner_one_two_parts_right fail with force_hash_collisions

3 participants