Resolve errors with time to convert bins#5283
Conversation
| step_runs AS ( | ||
| {steps_per_person_query} | ||
| SELECT * FROM ( | ||
| {steps_per_person_query} |
There was a problem hiding this comment.
Q: Why not push the predicates in here?
There was a problem hiding this comment.
We aren't doing this at a lower level (in, say, the original funnel class) because the other things work fine, and the following safeguards in those places (increasing ordering of event times) ensure we don't run into this issue.
Good question! I see good arguments for either, but opted for "where the problem shows up", as it also keeps things in one place. Doing it in steps_per_person_query would also mean doing it in all types of funnel orderings.
| steps_average_conversion_time_expression_sum = " + ".join(steps_average_conversion_time_identifiers) | ||
|
|
||
| steps_average_conditional_for_invalid_values = [ | ||
| f"{identifier} >= 0" for identifier in steps_average_conversion_time_identifiers |
There was a problem hiding this comment.
Are NULLs an issue, should we add a NOT NULL check before >= 0?
There was a problem hiding this comment.
NULL values ought to be removed, and this check implies NULL values won't make it through, either!
This is kinda tested by existing tests: test_auto_bin_count_total - step_1 time is > 0, while step_2 is NULL, and as expected, it's removed from consideration.
| steps_average_conditional_for_invalid_values = [ | ||
| f"{identifier} >= 0" for identifier in steps_average_conversion_time_identifiers | ||
| ] | ||
| # this is protection against the CH bug: https://github.com/ClickHouse/ClickHouse/issues/26580 |
There was a problem hiding this comment.
Nit: Suggestion on comment style:
# :HACK: Protect against CH bug https://github.com/ClickHouse/ClickHouse/issues/26580
# once the issue is resolved, stop skipping the test: test_auto_bin_count_single_step_duplicate_events
# and remove this comment
I use the following meta-comments: :TRICKY: :TODO:. They are easier to grep for and convey reader should look out clearer than free-form text.
Changes
Fixes #5116 (patch) and no. 6: #5249 (comment)
For now, we discard all negative and null values in time to convert analysis.
We aren't doing this at a lower level (in, say, the original funnel class) because the other things work fine, and the original safeguards in those places (increasing ordering of event times) ensure we don't run into this issue.
It's hard to write a deterministic test for this, since the issue is intermittent. And if I un-skip the new test now, it will be terribly flakey. I don't want to delete this test, because coming back to it after ages involves re-loading all the context and trying to write a proper test for it.
Checklist