Avazu
Click‐Through Rate Prediction
Xiaocong Zhou and Peng Yan
March 9, 2015
Problem description
• Predict the click‐through rate of impressions on
mobile device
• Dataset
– Raw features:
• hour,click,C1,banner pos,site id,...,app id,...,device
id,...,C14,...,C21
• Hour is in format "20141021",
• The other known features are hashed as e.g."1fbe01fe".
– Click(label) of day 21 to 30 is given, day 31 to predict.
– #Train: 40M records
– #Test: 4.6M records
Evaluation
2‐class Logarithmic Loss
where N is the number of instances, is the
true label and is the predicted probability.
• Feature Engineering
• Models: FTRL and FFM
• Ensemble
• Calibration
Preliminary analysis
• We make the inference that C14 is ad id, C17
is ad group id and C21 is ad sponsor id by
analyzing the hierarchy of the unknown
features.
• We identify the user with device id if it is not
null and device ip + device model for others.
Feature Engineering
• Rare features: The features which appear less than 10
times are converted to “rare”
• 8 Additional numerical features:
– 4 features: Number of impressions to the user for the ad
id/ad group id in the hour/day
– 1 feature: Number of impressions to the user in the day
– 1 feature: Number of impressions to the user for the
app id/site id in the day
– 1 feature: Time interval from the last visit
– 1 feature: Number of days the user appeared
Most of these features are cut off by 10
Feature Engineering
• LSA feature
– We take the site cate and app cate as words of
each device ip which is not rare and calculate the
tf‐idf vector. Then we perform truncated SVD(LSA)
to reduce the dimensionality to 16.
– The index with the max value is added to features
in FTRL.
Feature Engineering
• GBDT features
– The gradient boosting tree model takes the 9
numerical features(8 additional features and the
number of impressions of the device ip) as input
and the indices of the trees are the output. 19
trees with depth 5 are used and the 19 generated
features are included both in FTRL and FFM.
This approach is proposed by Xinran He et al. at
Facebook[1]
and used by 3 idiots in Criteo's competition[3].
Models‐FTRL
• Logistic regression
λ
| | log 1 exp ∅ , 1
2
∅ ,
• Follow‐the‐Regularized‐Leader
FTRL uses the weight update
1
: λ|| ||
2
where is defined in terms of the learning rate schedule such that : =
This approach is proposed by H. Brendan McMahan et al. at Google[2].
Models‐FTRL
• 21 original features + 8 additional features + 1
LSA feature + 19 gbdt features are included in
FTRL.
• 82 selected interactions are included, mostly
with site id or app id.
• All features are one‐hot encoded to a space of
2^26.
• 3 epochs are used with the learning rate of
0.05.
Models‐FFM
• Field aware Factorization Machine
– The object function is similar to (1) but with
∅ , , , ,
, ∈
where and are the corresponding fields of and
respectively
• Besides feature interactions, first order features are
also used in our FFM model
∅ , ∑ , ∈ , , , +∑ ∈ ∑
Models‐FFM
• 21 original features + 8 additional features +
19 gbdt features are included in FFM.
• The number of latent factors is 8 and 5 epochs
are used.
• This approach was proposed by Michael
Jahrer et al. in KDD Cup 2012 Track 2 and used
by 3 idiots in Criteo's competition.
Ensemble
• Our final model is an ensemble of 4 models.
For FTRL and FFM with gbdt features, we both
train the model on the whole data and the
data separated by sites and apps
• We ensemble by weighted average of the
inverse logit of CTRs and then do the
calibration
Calibration
• The observed average CTR on test set and our predicted
average CTR is slightly different
• We define:
inverse_logit(x) = log
logit(x)=
The calibration is as follows:
intercept = inverse_logit( ) – inverse_logit( )
p = logit(inverse_logit(p)‐intercept)
where p is the predicted CTR for each record.
Result
Reference
• Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu,
Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers,
and Joaquin Quionero Candela. 2014. Practical Lessons
from Predicting Clicks on Ads at Facebook. (ADKDD'14)
• H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young,
Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene
Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin
Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and
Jeremy Kubica. 2013. Ad click prediction: a view from the
trenches. (KDD '13)
• 3 idiots' Approach for Display Advertising Challenge
• Tinrtgu and 3 idiots' code