Skip to content

Add support for funnel analytics #10866

@dario-liberman

Description

@dario-liberman

A common use-case in real-time interactive analytic applications is funnel analysis.

Example funnel analytics products to name a few:

At the core of these solutions is the ability to correlate user events across steps, such as going thru a checkout funnel.
A typical metric is the count of users reaching each step in a sequential funnel, or the respective conversion rates between steps (simply the ratio of such counts from one step to the next).

The proposal is to build an aggregation function able to correlate events by a given column, such as the user id, the session id, etc.

In order to simplify the initial implementation, the proposal is to start with a causality based funnel rather than requiring a strict sequence in time. Most user funnels in real-life can only be exercised forward, so this can solve for the vast majority of use-cases.
What is meant by causality is that we will count towards step N+1 only if we counted towards step N for the same correlation id, but we would accept for them to happen out of order (in fact, that is often the case in real-life distributed user event logging systems anyways).

In order to simplify the solution further, we can expect that the table is partitioned by the correlation column (eg. user id).

Once these assumptions are accepted, then the solution looks very similar to what is done in SEGMENT_PARTITIONED_DISTINCT_COUNT, which uses roaring bitmaps under the hood. The first step in the funnel would exactly match that, the second step in the funnel is simply the intersection between the first roaring bitmap and the second, and so on.

The following aggregation function API is proposed:

SELECT
   dateTrunc('day', timestamp) AS ts,
   FUNNEL_COUNT(
      STEPS(url = '/addToCart', url = '/checkout', url = '/orderConfirmation'),
      CORRELATED_BY(user)
   ) as step_counts
FROM user_log
GROUP BY 1

This query would return for each day the number of users going thru the funnel defined by the steps above as an array of longs, one count per step. An outer query or the client application (eg. a grafana plugin) can compute conversion rates if necessary.

A user may optimise the query by filtering relevant steps as follows:

SELECT
   dateTrunc('day', timestamp) AS ts,
   FUNNEL_COUNT(
      STEPS(url = '/addToCart', url = '/checkout', url = '/orderConfirmation'),
      CORRELATED_BY(user_id)
   ) as step_counts
FROM user_log
WHERE url in ('/addToCart',  '/checkout', '/orderConfirmation')
GROUP BY 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions