Bug-fix in Filter and Limit statistics#8094
Bug-fix in Filter and Limit statistics#8094alamb merged 5 commits intoapache:mainfrom synnada-ai:bugfix/statistics
Conversation
alamb
left a comment
There was a problem hiding this comment.
Given the subtlety that may be involved here I think we should have test for these changes. Given that no existing test breaks, that suggests to me that it isn't sufficiently covered 🤔
I have added tests covering the 1st and 3rd cases, but can't find a test suite for the 2nd case that I can demonstrate the need for the fix. Do you have any advice for me to do that easily? |
alamb
left a comment
There was a problem hiding this comment.
Thank you @berkaysynnada -- I think this is better for sure. I hope to keep improving test coverage as part of #8078
| // currently ignores tables that have no statistics regarding the | ||
| // number of rows. | ||
| if num_rows.get_value().unwrap_or(&usize::MIN) <= &limit.unwrap_or(usize::MAX) { | ||
| let conservative_num_rows = match num_rows { |
Which issue does this PR close?
Related to #8078.
Rationale for this change
After
enum Precisionis introduced in DF, some bugs are discovered. This PR resolves 3 bugs:ColumnStatistics,What changes are included in this PR?
1st fix: A column is labeled as singleton only if its min and max values are exact and equal.
2nd fix: To stop processing, only exact count of rows is regarded. Otherwise, we should continue to process until range estimation of precision implemented.
3rd fix: During the analysis in filter statistics, if a column is filtered with a constant value (e.g. c=1), we set its min and max values as exact.
Are these changes tested?
Are there any user-facing changes?