Skip to content

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented May 26, 2025

This test generates random boolean queries and ensures that setting a minimum number of matching SHOULD clauses returns a subset of the hits with the same scores.

It already tries to work around accuracy loss due to arithmetic operations by allowing a delta of up to one ulp between these two queries. However, sometimes the delta can be higher.

For instance consider the following query that triggered the most recent test failure: (data:5 data:5 data:5 data:6 +data:6 data:Z data:X -data:1)~2. Without a minimum number of matching SHOULD clauses, it gets rewritten to (data:5^3 +data:6^2 data:Z data:X -data:1). So the score contribution of data:5 is computed as (double) score(data:5) + (double) score(data:5) + (double) score(data:5) in one case, and (double) (score(data:5: * 3f) (multiply first, then cast to a double) in the other case. The use of ReqOptSumScorer also contributes accuracy losses as per existing comment, for instance data:6 is part of both the required and the optional clauses in the first case, while it's only a required clauses (with a 2x boost) in the other case. So accuracy loss accrues differently.

I don't think we should try too hard to avoid these accuracy losses, so I'm instead increasing the leniency of the test.

This test generates random boolean queries and ensures that setting a minimum
number of matching SHOULD clauses returns a subset of the hits with the same
scores.

It already tries to work around accuracy loss due to arithmetic operations by
allowing a delta of up to one ulp between these two queries. However, sometimes
the delta can be higher.

For instance consider the following query that triggered the most recent test
failure: `(data:5 data:5 data:5 data:6 +data:6 data:Z data:X -data:1)~2`.
Without a minimum number of matching SHOULD clauses, it gets rewritten to
`(data:5^3 +data:6^2 data:Z data:X -data:1)`. So the score contribution of
`data:5` is computed as `(double) score(data:5) + (double) score(data:5) +
(double) score(data:5)` in one case, and `(double) (score(data:5: * 3f)`
(multiply first, then cast to a double) in the other case. The use of
`ReqOptSumScorer` also contributes accuracy losses as per existing comment, for
instance `data:6` is part of both the required and the optional clauses in the
first case, while it's only a required clauses (with a 2x boost) in the other
case. So accuracy loss accrues differently.

I don't think we should try too hard to avoid these accuracy losses, so I'm
instead increasing the leniency of the test.
@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

@jpountz jpountz merged commit 50b4363 into apache:main May 31, 2025
7 checks passed
@jpountz jpountz deleted the fix_TestBooleanMinShouldMatch_failure branch May 31, 2025 06:35
jpountz added a commit that referenced this pull request May 31, 2025
This test generates random boolean queries and ensures that setting a minimum
number of matching SHOULD clauses returns a subset of the hits with the same
scores.

It already tries to work around accuracy loss due to arithmetic operations by
allowing a delta of up to one ulp between these two queries. However, sometimes
the delta can be higher.

For instance consider the following query that triggered the most recent test
failure: `(data:5 data:5 data:5 data:6 +data:6 data:Z data:X -data:1)~2`.
Without a minimum number of matching SHOULD clauses, it gets rewritten to
`(data:5^3 +data:6^2 data:Z data:X -data:1)`. So the score contribution of
`data:5` is computed as `(double) score(data:5) + (double) score(data:5) +
(double) score(data:5)` in one case, and `(double) (score(data:5: * 3f)`
(multiply first, then cast to a double) in the other case. The use of
`ReqOptSumScorer` also contributes accuracy losses as per existing comment, for
instance `data:6` is part of both the required and the optional clauses in the
first case, while it's only a required clauses (with a 2x boost) in the other
case. So accuracy loss accrues differently.

I don't think we should try too hard to avoid these accuracy losses, so I'm
instead increasing the leniency of the test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants