Predictive Modeling
Predictive Modeling
FUTURISTIC DATA
DATA PREDICT ANALYTICS
APPLY
Current TECHNIQUES
IDENTIFY TRENDS
And Past
RECOGNIZE
PATTERNS
Predictive Modeling
Independent Dependent
1. Gather data
2. Answer questions
3. Design the structure well
4. Variable Generation
5. Exploratory Data Analysis
6. Variable Transformation
7. Partitioning model set for model build
Algorithms
1. Time Series
2. Regression
3. Association
4. Clustering
5. Decision Trees
6. Outlier Detection
7. Neural Network
8. Ensemble Models
9. Factor Analysis
10. Naive Bayes
11. Support Vector Machines
12. Uplift
13. Survival Analysis
Forecasting Methods
Methods
Qualitative Quantitative
Smoothing
What is Time Series
1. Review historical data over time
TREND CYCLICAL
SEASONAL IRREGULAR
Components of Time Series - Patterns
Smoothing Methods
1. Moving Averages
4. Exponential Smoothing
Smoothing Methods – Moving Averages
Moving Average = ∑ (most recent n data values)
n
Time Response Moving Total (n = 3) Moving Average (n=3)
2011 4 NA NA
2012 6 NA NA
2013 5 NA NA
2014 3 15 5.00
2015 7 14 4.67
2016 NA 15 5.00
Smoothing Methods – Weighted Moving Averages
WMA= ∑ (Weight for period n) (Value in period n)
∑Weights
5 10
6 1310+13+11: 3= 11.33
7 11
Smoothing Methods – Exponential Smoothing
Single Exponential Smoothing
– Similar to single MA
F = αy + (1 – α) F
t+1 t t
F t+1 = αy + (1 – α) F
t t
Suppose α = 0.2
Sales
Qtr Act Forecast from Prior Period Forecast for Next Period
Here Ft= yt since no prior
1 23 NA 23 information exists
2 40 23 (.2)(40)+(.8)(23)=26.4
3 25 26.4 (.2)(25)+(.8)(26.4)=26.296
Regression Algorithms
1. Linear Regression
2. Exponential Regression
3. Geometric Regression
4. Logarithmic Regression
MX MY sX sY r
3 2.06 1.581 1.072 0.627
Regression Algorithms - Exponential
An exponential regression produces an exponential curve that best fits a single set of data points.
Formula :
(smoothing constant) X (previous act demand) + (1- smoothing constant) X (previous forecast)
1. Suppose you have been asked to generate a demand forecast for a product for year 2012 using an
exponential smoothing method. The forecast demand in 2011 was 910. The actual demand in 2011
was 850. Using this data and a smoothing constant of 0.3, which of the following is the demand
forecast for year 2012?
2. Use exponential smoothing to forecast this period’s demand if = 0.2, previous actual
demand was 30, and previous forecast was 35.
where:
a is the first term, and
Example:
Each term (except the first term) is found by multiplying the previous term by 2.
Regression Algorithms - Logarthmic
In statistics, logistic regression, or logit regression, or logit mode is a regression model where
the dependent variable (DV) iscategorical.
Example: Grain
size
(mm) Spiders
0.245 absent
0.247 absent
0.285 present
0.299 present
0.327 present
0.347 present
0.356 absent
0.36 present
0.363 absent
0.364 present
Regression Algorithms – Multiple Linear
A regression with two or more explanatory variables is called a multiple regression
Formula: Y = b 0 + b 1 * 1 + b 2 * 2222222224888
+ .... + b k * X k + e
e is random error
1. Apriori Example
Transactions Items bought
T1 item1, item2, item3
T2 Item1, item2
T3 Item1, item5
T4 Item1, item2, item5
Transaction ID
Association Algorithms - Example
Items Bought
Mango Onion Nintendo Keychains Eggs Yo-Yo Doll Apple Umbrella Corn Ice cream
T1 Yes Yes Yes Yes Yes Yes
T2 Yes Yes Yes Yes Yes Yes
T3 Yes Yes Yes Yes
T4 Yes Yes Yes Yes Yes
T5 Yes Yes Yes Yes Yes
• Overlapping Clustering
• Hierarchical Clustering
• Probabilistic Clustering
Clustering Algorithms – Most Used
• K-means
• Fuzzy C-means
• Hierarchical clustering
• Mixture of Gaussians
Clustering Algorithms – K Means Example
• The distance between two points is defined as
D (P1, P2) = | x1 – x2 | + | y1 – y2|
Table 1 C1 = (2,2) C2 = (1,14) C3= (4,3) Cluster
Points Coordinates D (P,C1) D (P,C2) D (P,C3)
P1 (2,2) 0 13 3 C1
P2 (1,14) 13 0 14 C2
P3 (10,7) 13 16 10 C3
P4 (1,11) 10 3 11 C2
P5 (3,4) 3 12 2 C3
P6 (11,8) 15 16 12 C3
P7 (4,3) 3 14 0 C3
P8 (12,0) 17 16 14 C3
NY 206 0 233 1308 802 2815 2934 2786 1771 MIA 1075 0 1329 3273 3053 2687 2037 CHI
DC 429 233 0 1075 671 2684 2799 2631 1616 CHI 671 1329 0 2013 2142 2054 996 BOS/NY/DC 0 1075 2013 2054 996
/CHI
MIA 1504 1308 1075 0 1329 3273 3053 2687 2037 SEA 2684 3273 2013 0 808 1131 1307 MIA 1075 0 3273 2687 2037
CHI 963 802 671 1329 0 2013 2142 2054 996 SF 2799 3053 2142 808 0 379 1235 SEA 2013 3273 0 808 1307
SEA 2976 2815 2684 3273 2013 0 808 1131 1307 LA 2631 2687 2054 1131 379 0 1059 SF/LA 2054 2687 808 0 1059
SF 3095 2934 2799 3053 2142 808 0 379 1235 DEN 1616 2037 996 1307 1235 1059 0 DEN 996 2037 1307 1059 0
BOS/ MIA CHI SEA SF/L DEN BOS/NY/DC MIA SF/LA/SEA DEN
After merging BOS with NY: (2) A /CHI
BOS/NY DC MIA CHI SEA SF LA DEN BOS/NY/DC 0 1075 2013 996
/CHI
BOS/NY 0 223 1308 802 2815 2934 2786 1771 NY/DC MIA 1075 0 2687 2037
DC 223 0 1075 671 2684 2799 2631 1616 BOS/NY/DC 0 1075 671 2684 2631 1616 SF/LA/SEA 2054 2687 0 1059
MIA 1308 1075 0 1329 3273 3053 2687 2037 MIA 1075 0 1329 3273 2687 2037 DEN 996 2037 1059 0
CHI 802 671 1329 0 2013 2142 2054 996 CHI 671 1329 0 2013 2054 996
After merging SEA with SF/LA: (6)
SEA 2815 2684 3273 2013 0 808 1131 1307 SEA 2684 3273 2013 0 808 1307 BOS/NY/DC MIA SF/LA/SEA BOS/NY/DC MIA
/CHI/DEN /CHI/DEN/S
F/LA/SEA
SF 2934 2799 3053 2142 808 0 379 1235 SF/LA 2631 2687 2054 808 0 1059 BOS/NY/DC 0 1075 1059 BOS/NY/D 0 1075
/CHI/DEN C/CHI/DE
N/SF/LA/S
EA
LA 2786 2631 2687 2054 1131 379 0 1059 DEN 1616 2037 996 1307 1059 0 MIA 1075 0 2687 MIA 1075 0
DEN 1771 1616 2037 996 1307 1235 1059 0 SF/LA/SEA 1059 2687 0
After merging SF with LA: (4)
30, 171, 184, 201, 212, 250, 265, 270, 272, 289, 305, 306, 322, 322, 336, 346, 351, 370, 390, 404, 409, 411, 436, 437,
439, 441, 444, 448, 451, 453, 470, 480, 482, 487, 494, 495, 499, 503, 514, 521, 522, 527, 548, 550, 559, 560, 570, 572,
574, 578, 585, 592, 592, 607, 616, 618, 621, 629, 637, 638, 640, 656, 668, 707, 709, 719, 737, 739, 752, 758, 766, 792,
792, 794, 802, 818, 830, 832, 843, 858, 860, 869, 918, 925, 953, 991, 1000, 1005, 1068, 1441
• Hopfield Nets
• BumpTree
Ensemble Models
• Monte Carlo Analysis
Task Time Estimate Min Most Likely Max
months month month months
1 5 4 5 7
2 4 3 4 6
3 5 4 5 6
14 11 14 19
b. It is later learned that the selected survey subject was smoking a cigar. Also,
9.5% of males smoke cigars, whereas 1.7% of females smoke cigars (based on
data from the Substance Abuse and Mental Health Services Administration).
Use this additional information to find the probability that the selected
subject is a male
Naive Bayes Theorem Solution
M = Male
C = Cigar Smoker
F = Female
N = Non Smoker