Avazu CTR Prediction

The document discusses a method for predicting click-through rates (CTR) on mobile devices using a dataset of 40 million training records and 4.6 million test records. It details feature engineering techniques, including rare feature handling, LSA, and GBDT features, along with models like FTRL and FFM, culminating in an ensemble approach for improved prediction accuracy. Calibration techniques are also employed to align predicted CTRs with observed averages in the test set.

Uploaded by

damilazim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views16 pages

Avazu CTR Prediction

Uploaded by

damilazim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Avazu

Click‐Through Rate Prediction

Xiaocong Zhou and Peng Yan
March 9, 2015
Problem description
• Predict the click‐through rate of impressions on
mobile device
• Dataset
– Raw features:
• hour,click,C1,banner pos,site id,...,app id,...,device
id,...,C14,...,C21
• Hour is in format "20141021",
• The other known features are hashed as e.g."1fbe01fe".
– Click(label) of day 21 to 30 is given, day 31 to predict.
– #Train: 40M records
– #Test: 4.6M records
Evaluation
2‐class Logarithmic Loss

where N is the number of instances, is the

true label and is the predicted probability.
• Feature Engineering
• Models: FTRL and FFM
• Ensemble
• Calibration
Preliminary analysis
• We make the inference that C14 is ad id, C17
is ad group id and C21 is ad sponsor id by
analyzing the hierarchy of the unknown
features.
• We identify the user with device id if it is not
null and device ip + device model for others.
Feature Engineering
• Rare features: The features which appear less than 10
times are converted to “rare”
• 8 Additional numerical features:
– 4 features: Number of impressions to the user for the ad
id/ad group id in the hour/day
– 1 feature: Number of impressions to the user in the day
– 1 feature: Number of impressions to the user for the
app id/site id in the day
– 1 feature: Time interval from the last visit
– 1 feature: Number of days the user appeared
Most of these features are cut off by 10
Feature Engineering
• LSA feature
– We take the site cate and app cate as words of
each device ip which is not rare and calculate the
tf‐idf vector. Then we perform truncated SVD(LSA)
to reduce the dimensionality to 16.
– The index with the max value is added to features
in FTRL.
Feature Engineering
• GBDT features
– The gradient boosting tree model takes the 9
numerical features(8 additional features and the
number of impressions of the device ip) as input
and the indices of the trees are the output. 19
trees with depth 5 are used and the 19 generated
features are included both in FTRL and FFM.

This approach is proposed by Xinran He et al. at

Facebook[1]
and used by 3 idiots in Criteo's competition[3].
Models‐FTRL
• Logistic regression
λ
| | log 1 exp ∅ , 1
2
∅ ,
• Follow‐the‐Regularized‐Leader
FTRL uses the weight update
1
: λ|| ||
2
where is defined in terms of the learning rate schedule such that : =
This approach is proposed by H. Brendan McMahan et al. at Google[2].
Models‐FTRL
• 21 original features + 8 additional features + 1
LSA feature + 19 gbdt features are included in
FTRL.
• 82 selected interactions are included, mostly
with site id or app id.
• All features are one‐hot encoded to a space of
2^26.
• 3 epochs are used with the learning rate of
0.05.
Models‐FFM
• Field aware Factorization Machine
– The object function is similar to (1) but with
∅ , , , ,
, ∈
where and are the corresponding fields of and
respectively
• Besides feature interactions, first order features are
also used in our FFM model
∅ , ∑ , ∈ , , , +∑ ∈ ∑
Models‐FFM
• 21 original features + 8 additional features +
19 gbdt features are included in FFM.
• The number of latent factors is 8 and 5 epochs
are used.
• This approach was proposed by Michael
Jahrer et al. in KDD Cup 2012 Track 2 and used
by 3 idiots in Criteo's competition.
Ensemble
• Our final model is an ensemble of 4 models.
For FTRL and FFM with gbdt features, we both
train the model on the whole data and the
data separated by sites and apps
• We ensemble by weighted average of the
inverse logit of CTRs and then do the
calibration
Calibration
• The observed average CTR on test set and our predicted
average CTR is slightly different
• We define:
inverse_logit(x) = log

logit(x)=
The calibration is as follows:
intercept = inverse_logit( ) – inverse_logit( )
p = logit(inverse_logit(p)‐intercept)
where p is the predicted CTR for each record.
Result
Reference
• Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu,
Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers,
and Joaquin Quionero Candela. 2014. Practical Lessons
from Predicting Clicks on Ads at Facebook. (ADKDD'14)
• H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young,
Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene
Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin
Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and
Jeremy Kubica. 2013. Ad click prediction: a view from the
trenches. (KDD '13)
• 3 idiots' Approach for Display Advertising Challenge
• Tinrtgu and 3 idiots' code

CTR Prediction Models and Risks
No ratings yet
CTR Prediction Models and Risks
9 pages
Easter Formitting Eda
No ratings yet
Easter Formitting Eda
7 pages
(21.12) Adversarial Gradient Driven Exploration For Deep Click-Through Rate Prediction
No ratings yet
(21.12) Adversarial Gradient Driven Exploration For Deep Click-Through Rate Prediction
10 pages
Online Shoppers' Purchase Intent Analysis
No ratings yet
Online Shoppers' Purchase Intent Analysis
9 pages
Fuxictr: An Open Benchmark For Click-Through Rate Prediction
No ratings yet
Fuxictr: An Open Benchmark For Click-Through Rate Prediction
9 pages
(FFM) Field-Aware Factorization Machines For CTR Prediction (Criteo 2016) PDF
No ratings yet
(FFM) Field-Aware Factorization Machines For CTR Prediction (Criteo 2016) PDF
8 pages
(2019) (Huawei) (PAL) A Position-Bias Aware Learning Framework For CTR Prediction in Live Recommender Systems
No ratings yet
(2019) (Huawei) (PAL) A Position-Bias Aware Learning Framework For CTR Prediction in Live Recommender Systems
6 pages
Mod ICETEMS 209
No ratings yet
Mod ICETEMS 209
15 pages
ML-1 Project
No ratings yet
ML-1 Project
30 pages
Predicting Search Engine Switching in WSCD 2013 Challenge
No ratings yet
Predicting Search Engine Switching in WSCD 2013 Challenge
8 pages
Electricity Usage Prediction Methods
No ratings yet
Electricity Usage Prediction Methods
2 pages
Mobile App Analytics
100% (1)
Mobile App Analytics
58 pages
Streaming CTR Prediction: Rethinking Recommendation Task For Real-World Streaming Data
No ratings yet
Streaming CTR Prediction: Rethinking Recommendation Task For Real-World Streaming Data
17 pages
Seippel MA Eemcs
No ratings yet
Seippel MA Eemcs
95 pages
Predicting Smartphone App Usage with LSTM
No ratings yet
Predicting Smartphone App Usage with LSTM
16 pages
Evaluating Recommendation Algorithms Effectiveness
No ratings yet
Evaluating Recommendation Algorithms Effectiveness
5 pages
Predicting Mobile Data Usage
No ratings yet
Predicting Mobile Data Usage
15 pages
Blue 3D Elements 5G Technology Presentation
No ratings yet
Blue 3D Elements 5G Technology Presentation
41 pages
Predicting Smartphone Battery Life
No ratings yet
Predicting Smartphone Battery Life
22 pages
Lightweight CTR Prediction Model for IoT
No ratings yet
Lightweight CTR Prediction Model for IoT
64 pages
The Essential Guide To Mobile App Testing
No ratings yet
The Essential Guide To Mobile App Testing
32 pages
Product Management Prep: Metrics & Guesstimates
No ratings yet
Product Management Prep: Metrics & Guesstimates
33 pages
Adkdd 2014 Camera Ready Junfeng
No ratings yet
Adkdd 2014 Camera Ready Junfeng
9 pages
Insurance Claim Prediction Models Analysis
67% (3)
Insurance Claim Prediction Models Analysis
33 pages
Predicting Google Play App Ratings Using ML
No ratings yet
Predicting Google Play App Ratings Using ML
6 pages
Social Media Tourism - Capstone Project
No ratings yet
Social Media Tourism - Capstone Project
13 pages
Hasil Performance Efficiency
No ratings yet
Hasil Performance Efficiency
5 pages
Predicting Mobile App Star Ratings
No ratings yet
Predicting Mobile App Star Ratings
8 pages
Tire Flat Detection Using AI Model
No ratings yet
Tire Flat Detection Using AI Model
4 pages
STAR Method For ML Projects
No ratings yet
STAR Method For ML Projects
10 pages
Bars CTR
No ratings yet
Bars CTR
11 pages
Sigir Ecom 2017 Cs
No ratings yet
Sigir Ecom 2017 Cs
5 pages
YouTubeNetworks 7 8
No ratings yet
YouTubeNetworks 7 8
2 pages
Metrics That Matter
No ratings yet
Metrics That Matter
6 pages
Mobile KPIs User Centered Metric2 WP 2
No ratings yet
Mobile KPIs User Centered Metric2 WP 2
9 pages
Large-Scale Page Speed Study Results
No ratings yet
Large-Scale Page Speed Study Results
9 pages
Online Shoppers Purchase Intention Analysis
No ratings yet
Online Shoppers Purchase Intention Analysis
37 pages
Quick Sheet for User Analysis
No ratings yet
Quick Sheet for User Analysis
6 pages
可学习特征生成分解机
No ratings yet
可学习特征生成分解机
8 pages
Airbnb Experiences Feature Overview
No ratings yet
Airbnb Experiences Feature Overview
19 pages
Documents
No ratings yet
Documents
16 pages
Top 9 Tips To Improve Retail Mobile App Testing
No ratings yet
Top 9 Tips To Improve Retail Mobile App Testing
12 pages
Predictive Model for Retailers
100% (1)
Predictive Model for Retailers
3 pages
Iterative Design for Flyber
No ratings yet
Iterative Design for Flyber
56 pages
Iranian Telecom Churn Prediction Report
No ratings yet
Iranian Telecom Churn Prediction Report
16 pages
Uber 240119080622 21f5d214
No ratings yet
Uber 240119080622 21f5d214
30 pages
Ej-Eng 3102
No ratings yet
Ej-Eng 3102
9 pages
Repeat Buyer Prediction Model Analysis
No ratings yet
Repeat Buyer Prediction Model Analysis
19 pages
App Recommendation System with ML
No ratings yet
App Recommendation System with ML
12 pages
Ux A.1 PDF
No ratings yet
Ux A.1 PDF
5 pages
IDA Paper Gondek Hafner Sampson
No ratings yet
IDA Paper Gondek Hafner Sampson
6 pages
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
No ratings yet
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
28 pages
Smartphone App Usage Analysis Datasets Methods and Applications 2 - 1
No ratings yet
Smartphone App Usage Analysis Datasets Methods and Applications 2 - 1
31 pages
Data Science Product Questions
No ratings yet
Data Science Product Questions
93 pages
Smartphone Data Mining for App Insights
No ratings yet
Smartphone Data Mining for App Insights
61 pages
Ad Prediction Using Click Through Rate and Machine Learning With Reinforcement Learning
No ratings yet
Ad Prediction Using Click Through Rate and Machine Learning With Reinforcement Learning
5 pages
Complete Testing Report
No ratings yet
Complete Testing Report
12 pages
Ahmad2019 Article CustomerChurnPredictionInTelec PDF
No ratings yet
Ahmad2019 Article CustomerChurnPredictionInTelec PDF
24 pages
Intent Edge Pro - Sep 2024
No ratings yet
Intent Edge Pro - Sep 2024
43 pages
CS8492-DBMS Syllabus
No ratings yet
CS8492-DBMS Syllabus
2 pages
Soft Computing: Concepts and Applications
No ratings yet
Soft Computing: Concepts and Applications
9 pages
Rais12 SM CH07
100% (1)
Rais12 SM CH07
58 pages
X20DC4395 Eng
No ratings yet
X20DC4395 Eng
12 pages
The 2008 Revision of IEEE C37.2 Standard For Electrical Power System Device Function Numbers, Acronyms, and Contact Designations
0% (1)
The 2008 Revision of IEEE C37.2 Standard For Electrical Power System Device Function Numbers, Acronyms, and Contact Designations
7 pages
KAK Gabubg - FPIPS Paket 2
No ratings yet
KAK Gabubg - FPIPS Paket 2
8 pages
Number System Conversion
No ratings yet
Number System Conversion
18 pages
数字、高频词、加减法数块涂色专项练习五月【更多幼儿启蒙素材关注公众号：凯旋儿童素材】
No ratings yet
数字、高频词、加减法数块涂色专项练习五月【更多幼儿启蒙素材关注公众号：凯旋儿童素材】
31 pages
21st Century Learning in K to 12 Curriculum
No ratings yet
21st Century Learning in K to 12 Curriculum
39 pages
Models and Best Practices in Teacher Professional Development
No ratings yet
Models and Best Practices in Teacher Professional Development
10 pages
2D Game Development From Zero To Hero A Compendium of The Community Knowledge On Game Design and Development (Python Edition) (Daniele Penazzo)
No ratings yet
2D Game Development From Zero To Hero A Compendium of The Community Knowledge On Game Design and Development (Python Edition) (Daniele Penazzo)
441 pages
Google Sheets Exam Instructions
No ratings yet
Google Sheets Exam Instructions
11 pages
ISPI Admin Handbook
No ratings yet
ISPI Admin Handbook
158 pages
C955 - Course of Study - Applied Probability and Statistics - 3 CUs
50% (2)
C955 - Course of Study - Applied Probability and Statistics - 3 CUs
2 pages
Ravi Teja Data Analyst 2 - 1749121237810 - Raviteja V
No ratings yet
Ravi Teja Data Analyst 2 - 1749121237810 - Raviteja V
4 pages
ESDM Course Offerings Overview
No ratings yet
ESDM Course Offerings Overview
3 pages
One-Day-Workshop On Introduction To Deep Learning Using Tensorflow
No ratings yet
One-Day-Workshop On Introduction To Deep Learning Using Tensorflow
1 page
Questions For Hackerrank
No ratings yet
Questions For Hackerrank
17 pages
What Is Google Docs?
No ratings yet
What Is Google Docs?
26 pages
Understanding Asynchronous Transfer Mode (ATM)
No ratings yet
Understanding Asynchronous Transfer Mode (ATM)
24 pages
Solar Ice Maker Fabrication Guide
No ratings yet
Solar Ice Maker Fabrication Guide
10 pages
Creating A Diagram of The Film Company LAN
No ratings yet
Creating A Diagram of The Film Company LAN
3 pages
2 Overview of Numerical Analysis
No ratings yet
2 Overview of Numerical Analysis
59 pages
Case Study RMWG-03 - Functional Equivalence For Equipment Replacements (Rev 1)
No ratings yet
Case Study RMWG-03 - Functional Equivalence For Equipment Replacements (Rev 1)
6 pages
MiniTab Overview and Exercises
No ratings yet
MiniTab Overview and Exercises
8 pages
Auditing II Exam Paper - University of Zimbabwe
No ratings yet
Auditing II Exam Paper - University of Zimbabwe
7 pages
It106 Prelim Exam
No ratings yet
It106 Prelim Exam
4 pages
Earthwork Volume Calculation Standards
No ratings yet
Earthwork Volume Calculation Standards
16 pages
Discrete Systems
No ratings yet
Discrete Systems
12 pages
Toward A New Retail Banking Architecture Defining The Future Digital Core Platform - Report
No ratings yet
Toward A New Retail Banking Architecture Defining The Future Digital Core Platform - Report
28 pages