OLAP Functions
Part 2
Patrice Bérubé
Solution Architect
Teradata Canada
OLAP Analytics - Agenda
• History & Recap
• RESET WHEN
• New OLAP
• OLAP in transforms
• Summary
2 pg.
History - V2R3 and V2R6 OLAP
3 pg.
History – V2R12
(V2R6)
(V2R6)
(V2R6) (V2R6)
(V2R6)
(V2R6)
(V2R6)
(V2R6)
4 pg.
History - Clauses
V2R3 V2R6 V2R13
•None •PARTITION BY •PARTITION BY
•ORDER BY •ORDER BY
•ROWS •RESET WHEN
•ROWS
5 pg.
Traditional SQL requests vs
Ordered Analytical Functions
Calculation
Aggregation
Ordered Analytical Functions
6 pg.
Ordered Analytical –
Functions Permutations
Four Categories Aggregates
SUM ( ) OVER
Group Window COUNT ( ) OVER
Cumulative Window AVG ( ) OVER
Moving Window x MIN ( ) OVER
Remaining Window MAX ( ) OVER
Group Window Function
• Use of keywords: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
• Absence of keywords: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
Moving Window Function
• Use of keywords: ROWS BETWEEN # PRECEDING AND # FOLLOWING
• Absence of keywords: UNBOUNDED
Cumulative Window Function
• Use of keywords: ROWS BETWEEN UNBOUNDED PRECEDING
• Absence of keywords: UNBOUNDED FOLLOWING
Remaining Window Function
• Use of keywords: ROWS BETWEEN UNBOUNDED FOLLOWING
• Absence of keywords: UNBOUNDED PRECEDING
7 pg.
OLAP Analytics - Agenda
• History & Recap
• RESET WHEN
• New OLAP
• OLAP in transforms
• Summary
8 pg.
RESET WHEN - Rules
A RESET WHEN condition can contain the following:
• Ordered analytical functions that do not include the RESET WHEN clause
• Scalar subqueries
• Aggregate operators
• DEFAULT functions
A RESET WHEN condition cannot contain the following:
• Ordered analytical functions that include the RESET WHEN clause
• SELECT statement
• LOB columns
• UDT expressions, including UDFs that return a UDT value.
However, a RESET WHEN condition can include an expression that
contains UDTs as long as that expression returns a result that has a
predefined data type.
9 pg.
RESET WHEN (1 of 7)
finds cumulative sales for all periods of increasing sales for each region
SUM(sales)
OVER (PARTITION BY region
ORDER BY day_of_calendar
RESET WHEN sales < SUM(sales)
OVER (PARTITION BY region
ORDER BY day_of_calendar
ROWS BETWEEN 1 PRECEDING
AND 1 PRECEDING)
ROWS UNBOUNDED PRECEDING)
Current row sales
Preceding row sales
10 pg.
RESET WHEN (2 of 7)
finds sequences of increasing balances
Reset whenever the current balance is less than or equal to the preceding balance
11 pg.
RESET WHEN (3 of 7)
finds sequences of increasing balances
reset whenever the current balance is less than or equal to the preceding balance
SELECT
account_key, month, balance,
ROW_NUMBER()
over (PARTITION BY account_key
ORDER BY month
RESET WHEN balance <= SUM(balance)
over (PARTITION BY account_key
ORDER BY month
ROWS BETWEEN 1 PRECEDING
AND 1 PRECEDING)
) - 1 /* to get the count started at 0 */
as balance_increase
FROM accounts;
Current row balance
Preceding row balance
12 pg.
RESET WHEN (4 of 7)
finds sequences of increasing balances by quarter
Must roll up months to quarter first, then verify condition
13 pg.
RESET WHEN (5 of 7)
finds sequences of increasing balances by quarter
reset whenever the current balance is less than or equal to the preceding balance
SELECT
account_key, quarter,
sum(balance),
ROW_NUMBER()
over (PARTITION BY account_key
ORDER BY quarter
RESET WHEN sum(balance) <= SUM(sum(balance))
over (PARTITION BY account_key
ORDER BY quarter
ROWS BETWEEN 1 PRECEDING
AND 1 PRECEDING)
) - 1 /* to get the count started at 0 */
as balance_increase
FROM accounts GROUP BY account_key, quarter;
Current row balance
Preceding row balance
14 pg.
RESET WHEN (6 of 7)
finds sequences of consecutive balances below credit limit
Accounts Data Results Data
Account_Limit Data
15 pg.
RESET WHEN (7 of 7)
finds sequences of consecutive balances below credit limit
select a.account_No,
[Link],
(select Credit_limit from Account_Limit L where l.account_No=a.account_No) Credit_Limit,
ROW_NUMBER() OVER(partition by a.account_No
order by statement_date
RESET WHEN balance > Credit_Limit)
from accounts A;
Scalar select on Credit limit
16 pg.
OLAP Analytics - Agenda
• History & Recap
• RESET WHEN
• New OLAP
• OLAP in transforms
• Summary
17 pg.
List includes New OLAP
18 pg.
Statistical OLAP #1 (1 of 2)
What is the average Plan Price without discount
compare to normal average ?
Sales Data
Discount Data
Plan Data
19 pg.
Statistical OLAP #1
Regressions: REGR_AVGX, REGR_AVGY REGR_COUNT
Select
avg(Discount_Amount) OVER (partition by extract(year from Trans_Date)),
avg(Plan_Price) OVER (partition by extract(year from Trans_Date)),
regr_avgx(Plan_Price,Discount_Amount) OVER (partition by extract(year from Trans_Date)),
regr_avgy(Plan_Price,Discount_Amount) OVER (partition by extract(year from Trans_Date)),
regr_count(Discount_Amount,Plan_Price) OVER (partition by extract(year from Trans_Date))
from Sales_Trans t
inner join Plan_Price p
on t.Plan_No = p.Plan_No
inner join Discount d
on t.Discount_No = d.Discount_No
20 pg.
Statistical OLAP #2 (1 of 2)
• How to identify the impact of a change?
• In other words: Is there a relation between number of sales rep and revenues?
• About Correlation
• Correlation is a measure of association between two variables.
• The value of a correlation coefficient can vary from minus one to plus one
• a negative correlation between two variables
• As the value of one variable increases, the value of the other variable decreases, and vice
versa.
• In other words, for a negative correlation, the variables work opposite each other.
• Zero means there is no relationship between the two variables.
• Positive correlation between two variables.
• As the value of one variable increases, the value of the other variable also increases.
• The variables move together.
21 pg.
Statistical OLAP #2 (1 of 2)
In which market shall new sales rep be added?
Sales Data Correlation
select Region_no,
CORR(Sales_Rep_No,Nb_Sales)
OVER(partition by Region_no)
from Sales_rep
22 pg.
Statistical OLAP #3 (1 of 3)
• How to derive basic predictions based on existing data?
• In other words: Can I leverage past data to predict future results?
• Regression
• Simple regression is used to examine the relationship between one dependent and one
independent variable.
• The regression statistics can be used to predict the dependent variable when the
independent variable is known.
• Regression goes beyond correlation by adding prediction capabilities
• Regression analysis usually required 3 OLAPs
• Y = intercept + (slope * X) Y = REGR_INTERCEPT(Y,X) + (REGR_SLOPE(Y,X))
• REGR_R2 provides the % of values explained by the formula.
Adv_Actual Data Adv_Budget Data
23 pg.
Statistical OLAP #3 (2 of 3)
Can I leverage past data to predict future results?
Regression results
Sales Forecast
Actual Predicted
50000
40000
30000
Results Data 20000
10000
0
Sep- Oct-09 Nov-09 Dec- Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Jul-10 Aug-10
09 09
Adv_Budget Sales
24 pg.
Statistical OLAP #3 (1 of 2)
Can I leverage past data to predict future results?
select Media,Month_Start_Date,Adv_Budget,Sales, 'Real' from Adv_Actual
union all
select [Link], f.Month_Start_Date, f.Adv_Budget,
Intercept + (slope * f.Adv_Budget) predicted, 'Esti'
from Adv_Budget f inner join
(select Media,Month_Start_Date,
Regr_Intercept(sales, Adv_Budget)
OVER(partition by Media order by Month_Start_Date) Intercept,
Regr_Slope(sales, Adv_Budget)
OVER(partition by Media order by Month_Start_Date) Slope
from Adv_Actual)
as h(Media,Month_Start_Date,Intercept,Slope)
on [Link] = [Link] and f.Month_Start_Date = add_months(h.Month_Start_Date,6)
order by 1,2;
25 pg.
Statistical OLAP #4 (1 of 3)
• How to identify a strong change?
• In other words: Who spent with significant variance
over their prior 12mo history?
• Statistical Process Control can identify outliers
• Standard Deviation is a measure of variance
• Std Dev (sigma σ) captures # of observations
• within +/- 1 sigma = 68.2%
• within +/- 2 sigma = 95.45%
• within +/- 3 sigma = 99.7%
• within +/- 4 sigma = 99.99%
• within +/- 5 sigma = 99.9999%
• within +/- 6 sigma = 99.9999998%
• +/- 5 sigma are outliers by definition (very high)
• Upper Control Limit & Lower Control Limit can be
manually-set or use sigma
26 pg.
Statistical OLAP #4 (2 of 3)
• Who spent an alarming high amount that should be proactively investigated?
$140.00
$120.00
$100.00
$80.00
Client A - GREEN $60.00
$40.00
• Very consistent spending,
$20.00
no significant variance $-
observed Apr-08 May-08 Jun-08 Jul-08 Aug-08 Sep-08 Oct-08 Nov-08 Dec-08 Jan-09 Feb-09 Mar-09 Apr-09 May-09
10/ 20/200 12/ 20/200
4/ 20/ 2008 5/ 20/ 2008 6/ 20/ 2008 7/ 20/ 2008 8/ 20/ 2008 9/ 20/ 2008 11/ 20/ 2008 1/ 20/ 2009 2/20/ 2009 3/ 20/ 2009 4/20/ 2009 5/ 20/ 2009
8 8
tot _chrg_amt $71.03 $68.58 $89.12 $65.81 $65.11 $64.75 $62.48 $65.98 $65.89 $64.45 $64.81 $65.58 $64.50 $63.76
UCL $- $71.03 $75.93 $122.04 $119.28 $116.17 $113.28 $111.51 $108.79 $106.51 $104.72 $103.07 $101.54 $100.90
LCL $- $71.03 $63.68 $30.44 $27.99 $27.69 $28.19 $27.60 $29.42 $30.99 $31.92 $32.93 $34.06 $33.61
tot_chrg_amt UCL LCL
Client B - RED $60,000.00
• This client had a history of $40,000.00
$61 monthly spending $20,000.00
• Nov 2008 spent $27,849 , $-
Apr-08 May-08 Jun-08 Jul-08 Aug-08 Sep-08 Oct-08 Nov-08 Dec-08 Jan-09 Feb-09 Mar-09 Apr-09 May-09
clearly exceeding the UCL $(20,000.00)
• SPC identifies the outlier $(40,000.00)
$(60,000.00)
4/ 11/2008 5/11/ 2008 6/ 11/ 2008 7/ 11/ 2008 8/ 11/ 2008 9/ 11/ 2008 10/ 11/ 2008 11/ 11/ 2008 12/ 11/ 2008 1/ 11/ 2009 2/ 11/ 2009 3/ 11/ 2009 4/ 11/ 2009 5/ 11/2009
t ot _chrg_amt $61.15 $61.26 $61.06 $61.06 $61.06 $61.06 $61.06 $27,849.99 $61.06 $61.06 $61.06 $61.06 $61.06 $61.40
UCL $- $61.15 $61.48 $61.57 $61.54 $61.51 $61.49 $61.46 $49,486.27 $46,814.78 $44,523.33 $42,531.09 $40,779.06 $40,779.07
LCL $- $61.15 $60.93 $60.75 $60.72 $60.72 $60.73 $60.74 $(42,416.84 $(40,517.28) $(38,843.3 $(37,356.38 $(36,025.41 $(36,025.43
27 pg.
tot_chrg_amt UCL LCL
Statistical OLAP #4 (3 of 3)
• Who spent an alarming high amount?
select
bss.accs_id
,bss.bl_dt
,bss.tot_chrg_amt
,AVG(tot_chrg_amt) OVER
(PARTITION BY bss.accs_id
order by bss.bl_dt
• Standard Deviation OLAP
ROWS BETWEEN 12 PRECEDING AND 1 PRECEDING
) as AVG_12MO_TOT_CHRG_AMT easily calcs a rolling 12mo
,count(*) OVER
(PARTITION BY bss.accs_id statistic
order by bss.bl_dt
ROWS unbounded PRECEDING • Ignores the first 4
) as BILL_CNT
,STDDEV_POP(tot_chrg_amt) OVER observations to allow system
(PARTITION BY bss.accs_id
order by bss.bl_dt
to “calibrate”
ROWS BETWEEN 12 PRECEDING AND 1 PRECEDING
) as STDDEV_12MO_TOT_CHRG_AMT
,(STDDEV_12MO_TOT_CHRG_AMT * 5) as SIGMA
,(AVG_12MO_TOT_CHRG_AMT + SIGMA) AS UCL
,(AVG_12MO_TOT_CHRG_AMT – SIGMA) AS LCL
,case
when BILL_CNT > 4 and ( bss.tot_chrg_amt > UCL or bss.tot_chrg_amt < LCL )
then 'Y'
else ''
end as INV_PROB_IND
from bl_stmnt_sys bss
where INV_PROB_IND = 'Y' and bss.bl_confirm_ind = 'Y'
order by accs_id, bl_dt
28 pg.
OLAP Analytics - Agenda
• History & Recap
• RESET WHEN
• New OLAP
• OLAP in transforms
• Summary
29 pg.
Transform #1 (1 of 2)
Prorate Weekly units sold over working days.
Allocate uneven quantity on last working day
Weekly Data Prorated Daily Data
30 pg.
Transform #1 (2 of 2)
OLAP function provide the number and last working day
case when bd.calendar_dt = bd.last_day
then o.unit_wk -
(cast((o.unit_wk/nb_days) as integer) Last working day
*(nb_days-1))
else cast((o.unit_wk/nb_days) as integer) Working day
end
as NRC_QTY_,
……………………………………………………………………………………………………
(select TR.SRC_CO_CD, TR.SO_REGION_CD, extract (year from B.calendar_dt)as yr, W.SMB_RPTG_WK,
B.calendar_dt,
cast(COUNT(B.calendar_dt) OVER(partition by TR.SRC_CO_CD,TR.SO_REGION_CD, YR,W.SMB_RPTG_WK
rows between unbounded preceding and unbounded following) AS INTEGER),
MAX(B.calendar_dt) OVER(partition by TR.SRC_CO_CD,TR.SO_REGION_CD, YR,W.SMB_RPTG_WK
rows between unbounded preceding and unbounded following)
from TSO_RGN TR, TSO_RGN_PROV TRP, TBUSDAY B, TSMB_RPTG_WK W
where B.bus_day_flg = 'Y'
AND TR.SRC_CO_CD = TRP.SRC_CO_CD AND TR.SO_REGION_CD = TRP.SO_REGION_CD
AND TRP.PROV_STATE_CD = B.PROV_STATE_CD AND B.CALENDAR_DT = W.CALENDAR_DT
group by 1,2,3,4,5)
as BD(SRC_CO_CD,SO_REGION_CD,THE_YR,SMB_RPTG_WK, calendar_dt,nb_days,last_day)
31 pg.
Transform #2 (1 of 3)
Use OLAP to identify when a client became FIRST TIME PAYING.
Client # 1 subscriptions Client # 2 subscriptions
•Obtain a Guess Pass •Purchase a Paid contract
•Guess Pass expired •Paid contract expired
T ----- client inactive ------ ----- client inactive ------
i •Obtain a Free contract •Obtain a Guess Pass
m •Free contract expired
e ----- client inactive ------
•Obtain a Guess Pass
L •Guess Pass expired
i ----- client inactive ------
n •Purchase a Paid contract
e •Paid contract expired
----- client inactive ------
•Obtain a Guess Pass
32 pg.
Transform #2 (2 of 3)
Use OLAP to derive a Paid Contract & Guess Passes count to date.
FROM (Select Sub_Duration , User_Account_Id, SUBS_STATUS_CD, Revenue_Type_CD
, count (case when Revenue_Type_CD = 'paid'
and SUBS_STATUS_CD='active'
then 1
else null
end) OVER (PARTITION BY User_Account_Id ORDER BY From_Day ROWS UNBOUNDED PRECEDING)
as Nb_Paying_Contract
, count (case when Revenue_Type_CD in ('free','free w/packaged good','free promotion')
and SUB_STATUS_D='active'
then 1
else null
end) OVER (PARTITION BY User_Account_Id ORDER BY From_Day ROWS UNBOUNDED PRECEDING)
as Nb_Guess_Pass
FROM Client_subs_all_JA
WHERE THRU_Day >= FROM_DAY
group by 1,2,3,412
) subs
33 pg.
Transform #2 (3 of 3)
Count subscribers when became FIRST TIME PAYING.
--Monthly First Time Paying With Guess Pass - A subscriber who on this day is starting a paying monthly
--subscription, had never had a paying subscription in the past, but had a guess pass
, count(CASE WHEN subs.SUBS_STATUS_CD = 'active'
and subs.Revenue_Type_CD = 'paid'
and Nb_Paying_Contract = 1
and Nb_Guess_Pass >0
and subs.sub_duration = 'Monthly'
THEN 1
ELSE null
END
) as Monthly_First_Time_Paying_WGP
--Monthly First Time Paying Without Guess Pass - A subscriber who on this day is starting a paying monthly
--subscription, had never had a paying subscription in the past, nor had a guess pass
, count(CASE WHEN subs.SUBS_STATUS_CD = 'active'
and subs.Revenue_Type_CD = 'paid'
and Nb_Paying_Contract =1
and Nb_Guess_Pass =0
and subs.sub_duration = 'Monthly'
THEN 1
ELSE null
END
) as Monthly_First_Time_Paying
34 pg.
OLAP Analytics - Agenda
• History & Recap
• RESET WHEN
• New OLAP
• OLAP in transforms
• Summary
35 pg.
Summary
• New RESET WHEN clause extend OLAP
usability
• Increased compatibility with other DB
• Simplify coding
• New statistical OLAP extend SQL possibilities
• Open new possibilities
• Simplify coding
• Transformation
• Enable single data pass
• Coding closer to transformation rules
36 pg.
Thanks and Questions
• Questions?
• [Link]@[Link]
Thanks all of you !
37 pg.