Skip to content

Add AggregateFunction windowFunnel #2352

Merged
alexey-milovidov merged 5 commits intoClickHouse:masterfrom
clickhousecn:feature/funnelWindow
May 13, 2018
Merged

Add AggregateFunction windowFunnel #2352
alexey-milovidov merged 5 commits intoClickHouse:masterfrom
clickhousecn:feature/funnelWindow

Conversation

@sundy-li
Copy link
Copy Markdown
Contributor

@sundy-li
Copy link
Copy Markdown
Contributor Author

sundy-li commented May 13, 2018

In 2017, there was a wonderful share by Mariya Mansurova who taught us how to calculate complex funnel metrics in ClickHouse by sequenceMatch.

But it was less efficient (pattern matching) and it lacks the sliding timestamp window support, which is very important on web analysis, especially on the E-commerce website analysis. So I do think it is very necessary to add this windowFunnel AggregateFunction.

Related issue #2096 #1836 #1241

alexey-milovidov added a commit that referenced this pull request May 13, 2018
alexey-milovidov added a commit that referenced this pull request May 13, 2018
@alexey-milovidov alexey-milovidov merged commit b096f95 into ClickHouse:master May 13, 2018
@alexey-milovidov
Copy link
Copy Markdown
Member

Good feature. Thank you!

@Jingyi244
Copy link
Copy Markdown

cheer 4 this.

@hongyuzhou
Copy link
Copy Markdown

In 2017, there was a wonderful share by Mariya Mansurova who taught us how to calculate complex funnel metrics in ClickHouse by sequenceMatch.

But it was less efficient (pattern matching) and it lacks the sliding timestamp window support, which is very important on web analysis, especially on the E-commerce website analysis. So I do think it is very necessary to add this windowFunnel AggregateFunction.

Related issue #2096 #1836 #1241

HI, Sunday... Here is a question about windowFunnel, the ComparePairFirst only compare event_time, what if two different things happened on the same event_time, and it did happen in the real world, it may cause random sort.

My suggestion is do a little change

struct ComparePairFirst final
{
    template <typename T1, typename T2>
    bool operator()(const std::pair<T1, T2> & lhs, const std::pair<T1, T2> & rhs) const
    {
        if (lhs.first < rhs.first) return true;
        else if (lhs.first = rhs.first) return lhs.second < rhs.second;
        else return false;
    }
};

Hope you can respond to me, thanks a lot.

@sundy-li
Copy link
Copy Markdown
Contributor Author

sundy-li commented May 22, 2020

My suggestion is do a little change

In this case, your advice may make the versatility lost. I don't think we should modify the ComparePairFirst function, yet we can make a new column, such as a1 * c1 + a2 * c2 + a3 * c3.. to be another kind of timestamp in source data.

@hongyuzhou
Copy link
Copy Markdown

column

Thanks for your answer... Make a new timestamp column sound a good idea, but could you please explain a1 * c1 + a2 * c2 + a3 * c3.. specifically more? What do these a1, c1, a2, c2... parameters represent?

For example, here is a demo:

pid event event_time
123 A 10s
123 B 10s

@sundy-li
Copy link
Copy Markdown
Contributor Author

What do these a1, c1, a2, c2... parameters represent

It's just a simple demo equation. In your case, if you want to sort the events by sequential comparison by (event_time, pid), you can create an equation based on your data.

@hongyuzhou
Copy link
Copy Markdown

hongyuzhou commented May 23, 2020

What do these a1, c1, a2, c2... parameters represent

It's just a simple demo equation. In your case, if you want to sort the events by sequential comparison by (event_time, pid), you can create an equation based on your data.

Thanks a lot... I try to make the description of my confusion clearer again. Actually, in my case, if I try to use windowFunnel, it will get two different results usually. For example:

select t.uid,
       windowFunnel(20) (event_time, event='A', event='B') as funnel
from (
select 123 as uid,
	   'A' as event,
	   10 as event_time
union all
select 123 as uid,
	   'B' as event,
	   10 as event_time
) t group by t.uid

clickhouse

How could I fix the problem use by SQL without modify source code? Or does it works if I modify ComparePairFirst function like before?

@ZhangzheBJUT
Copy link
Copy Markdown

ZhangzheBJUT commented Sep 28, 2020

What do these a1, c1, a2, c2... parameters represent

It's just a simple demo equation. In your case, if you want to sort the events by sequential comparison by (event_time, pid), you can create an equation based on your data.

it can not use equation. the equation will make the window size not work!

i think you shoud change the ComparePairFirst function as :

struct ComparePairFirst final
{
    template <typename T1, typename T2>
    bool operator()(const std::pair<T1, T2> & lhs, const std::pair<T1, T2> & rhs) const
    {
        if (lhs.first < rhs.first) return true;
        else if (lhs.first = rhs.first) return lhs.second < rhs.second;
        else return false;
    }
};

Copy link
Copy Markdown
Contributor

@HeenaBansal2009 HeenaBansal2009 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check the max_events limit while deserializing the Buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants