Enable person breakdowns querying for all ordering funnels#5043
Enable person breakdowns querying for all ordering funnels#5043
Conversation
| event="buy", | ||
| distinct_id=f"person_{num}_{i}", | ||
| properties={"key": "val", "some_breakdown_val": f"{num}"}, | ||
| properties={"key": "val", "some_breakdown_val": num}, |
There was a problem hiding this comment.
This enables testing for both, numeric and string values. Did a quick search on Metabase, and not everyone sends strings always
EDsCODE
left a comment
There was a problem hiding this comment.
Are you expecting the breakdown to contain double quotes now or did you handle it somewhere that I'm missing?
|
|
||
| if with_breakdown: | ||
| serialized_result.update({"breakdown": result[-1][1:-1]}) # strip quotes | ||
| serialized_result.update({"breakdown": result[-1]}) |
There was a problem hiding this comment.
hmm, do you not handle the double quotes now when the breakdown val is a string?
There was a problem hiding this comment.
I don't handle them (so the tests have double quotes now). The reason being: with JSONExtractRaw used everywhere, the keys on which we filter props have double quotes when they're strings, and they are strings when they're input as numbers. I attempted playing with this extraction such that we get no quotes in our keys, but that turned out to be very annoying when you don't know your data type apriori. (I have a test for this now, where the values of a property are both strings and integers)
Us removing double quotes ourselves has implications: the front-end has to guess whether the key had double quotes or not when they try to get the persons for this breakdown, which makes things hard.
Hence, I propose: we don't modify the keys at all. If we want to change the presentation, we let the front-end strip the quotes, if they exist. But, when asking for persons for a breakdown, we expect the front-end to send the specific key which we sent ourselves, sans modification.
I think this approach makes things cleaner.
Thoughts? Also cc: @liyiy @samwinslow (or whoever is going to implement the frontend bit 😅 )
There was a problem hiding this comment.
To give a concrete example of JSONExtractRaw:
SELECT JSONExtractRaw('{"a": "hello", "b": [-100, 200.0, 300], "c": "chrome", "d": 12}', 'c');
returns the string "Chrome"
SELECT JSONExtractRaw('{"a": "hello", "b": [-100, 200.0, 300], "c": "chrome", "d": 12}', 'd');
returns the string 12.
There was a problem hiding this comment.
we expect the front-end to send the specific key which we sent ourselves, sans modification.
Do you mean with or without quotes here then? I think sending with quotes would start getting confusing especially since it's inconsistent with the rest of our app
Otherwise, I would agree as long as it's handled somewhere and stays consistent with how we're displaying the breakdown values elsewhere
There was a problem hiding this comment.
So, I'm not talking about how we display the values at all, just about how we should pass them to the front-end.
To display the problem with stripping quotes more concretely:
SELECT JSONExtractRaw('{"a": "1", "b": 1}', 'a'); -- returns the string "1"
SELECT JSONExtractRaw('{"a": "1", "b": 1}', 'b'); -- returns the string 1
In effect, we can't arbitrarily strip quotes, because it changes the key.
I'm all for the front-end removing the quotes, if they exist, to display breakdown values without quotes, but I don't think we can do this in the backend, since it changes our filter keys for props :$ '"1"' != '1'
There was a problem hiding this comment.
AH! I see, there's the property filtering, which removes this from JSONExtractRaw: https://github.com/PostHog/posthog/blob/master/ee/clickhouse/models/property.py#L89
I guess we just need to follow this everywhere for breakdowns as well, and we should be good to go (we don't yet). I'll fix this like so instead then!
There was a problem hiding this comment.
SELECT trim(BOTH '\"' FROM JSONExtractRaw('{"a": "1", "b": 1}', 'a'));
is same as
SELECT trim(BOTH '\"' FROM JSONExtractRaw('{"a": "1", "b": 1}', 'b'));
So we (will be) good.
There was a problem hiding this comment.
This makes sense to me. I mean, we could actually convert to the appropriate type in the backend really, but in a smarter way that woud result in '"1"' → '1' (string), '1' → 1 (number).
There was a problem hiding this comment.
I actually tried that with JSONExtract('{"a": "1", "b": 1}', 'a', <type>) - but it didn't let me infer the types using type inference as well :$ - so annoying.
I.e. JSONExtract('{"a": "1", "b": 1}', 'a', JSONType('{"a": "1", "b": 1}', 'a')) doesn't work 😢
There was a problem hiding this comment.
Yeah, this'd have to happen in Django at earliest
|
I resolved this for trends + funnels both. Some existing tests were reliant on internal sorting, so I've added an explicit sort on top of this to ensure there's no test failures here. I don't think the order in the list means anything with breakdowns, we don't sort it based on anything, but let me know if the tests were testing for something I'm not aware of. You can see these specific changes here: b74ba00 |
Changes
Resolves #5030
Checklist