Why and how to use random forest
variable importance measures
(and how you shouldnt)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
Carolin Strobl (LMU M
unchen) and Achim Zeileis (WU Wien)
References
[email protected]
useR! 2008, Dortmund
Introduction
Introduction
Construction
R functions
Random forests
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Introduction
Introduction
Construction
R functions
Random forests
Variable
importance
have become increasingly popular in, e.g., genetics and
the neurosciences
Tests for variable
importance
Conditional
importance
Summary
References
Introduction
Introduction
Construction
R functions
Random forests
Variable
importance
have become increasingly popular in, e.g., genetics and
the neurosciences [imagine a long list of references here]
Tests for variable
importance
Conditional
importance
Summary
References
Introduction
Introduction
Construction
R functions
Random forests
Variable
importance
have become increasingly popular in, e.g., genetics and
the neurosciences [imagine a long list of references here]
Tests for variable
importance
Conditional
importance
Summary
can deal with small n large p-problems, high-order
interactions, correlated predictor variables
References
Introduction
Introduction
Construction
R functions
Random forests
Variable
importance
have become increasingly popular in, e.g., genetics and
the neurosciences [imagine a long list of references here]
Tests for variable
importance
Conditional
importance
Summary
can deal with small n large p-problems, high-order
interactions, correlated predictor variables
are used not only for prediction, but also to assess
variable importance
References
(Small) random forest
Introduction
1
Start
p < 0.001
1
Start
p < 0.001
1
Start
p < 0.001
8
8
>8
12
2
n = 13
y = (0.308, 0.692)
>8
2
n = 15
y = (0.4, 0.6)
3
Age
p < 0.001
3
Start
p < 0.001
14
87
6
n = 16
y = (0.75, 0.25)
2
n = 38
y = (0.711, 0.289)
>5
9
n = 11
y = (0.364, 0.636)
2
Age
p < 0.001
> 12
81
3
n = 33
y = (1, 0)
3
Number
p < 0.001
> 81
4
Start
p < 0.001
12
5
n = 13
y = (0.385, 0.615)
>3
4
n = 25
y = (1, 0)
5
n = 18
y = (0.889, 0.111)
4
n = 11
y = (1, 0)
12
6
n = 12
y = (0.25, 0.75)
5
n = 31
y = (1, 0)
1
Start
p < 0.001
> 12
2
Age
p < 0.001
14
7
Number
p < 0.001
> 18
4
Number
p < 0.001
4
>3
8
9
n = 28
n = 21
y = (1, 0) y = (0.952, 0.048)
7
Start
p < 0.001
> 13
8
9
n = 11
n = 37
y = (0.818, 0.182) y = (1, 0)
>4
12
71
> 71
3
n = 15
y = (0.933, 0.067)
4
Start
p < 0.001
12
5
6
n = 12
n = 10
y = (0.417, 0.583)y = (0.2, 0.8)
> 12
81
8
5
Start
p < 0.001
> 81
13
7
n = 34
y = (1, 0)
1
Start
p < 0.001
12
5
Start
p < 0.001
12
3
4
6
n=9
n = 13
n = 12
y = (0.778, 0.222) y = (0.154, 0.846) y = (0.833, 0.167)
2
Age
p < 0.001
>3
136
6
n = 47
y = (1, 0)
> 136
7
n=8
y = (0.75, 0.25)
13
> 12
7
n = 47
y = (1, 0)
71
> 71
12
5
Start
p < 0.001
14
3
4
6
n = 15
n = 17
n = 17
y = (0.667, 0.333) y = (0.235, 0.765) y = (0.882, 0.118)
2
n = 28
y = (0.607, 0.393)
> 14
7
n = 32
y = (1, 0)
7
n = 10
y = (0.5, 0.5)
>3
3
Start
p < 0.001
6
n = 37
y = (0.865, 0.135)
> 13
4
n = 10
y = (0.8, 0.2)
5
n = 24
y = (1, 0)
1
Start
p < 0.001
1
Start
p < 0.001
> 12
Summary
>6
2
Number
p < 0.001
5
Age
p < 0.001
8
>8
Conditional
importance
1
Number
p < 0.001
>8
3
4
n = 12
n = 14
y = (0.667, 0.333) y = (0.143, 0.857)
Tests for variable
importance
> 12
5
6
n = 16
n = 15
y = (0.375, 0.625) y = (0.733, 0.267)
2
Start
p < 0.001
> 13
4
6
n = 16
n = 11
y = (0.188, 0.812) y = (0.818, 0.182)
1
Start
p < 0.001
>1
7
n = 35
y = (1, 0)
1
Start
p < 0.001
2
Age
p < 0.001
3
n = 20
y = (0.85, 0.15)
5
6
n = 14
n=9
y = (0.357, 0.643)
y = (0.111, 0.889)
> 14
2
Age
p < 0.001
>4
Variable
importance
6
n = 11
y = (0.818, 0.182)
13
4
Number
p < 0.001
7
n = 31
y = (0.806, 0.194)
> 125
1
Start
p < 0.001
1
Start
p < 0.001
> 27
>4
4
Age
p < 0.001
125
12
R functions
3
Number
p < 0.001
> 12
2
Age
p < 0.001
2
Start
p < 0.001
>1
5
n=9
y = (0.556, 0.444)
18
6
Start
p < 0.001
1
2
n=8
y = (0.375, 0.625)
>4
3
n = 10
y = (0.9, 0.1)
> 12
15 > 15
7
8
n = 12
n = 12
y = (0.833, 0.167) y = (1, 0)
1
Start
p < 0.001
7
n = 16
y = (1, 0)
7
n = 49
y = (1, 0)
> 68
3
Number
p < 0.001
> 13
1
Number
p < 0.001
12
27
68
5
Start
p < 0.001
13
5
n = 32
y = (1, 0)
1
Start
p < 0.001
3
n = 10
y = (1, 0)
> 87
4
n = 36
y = (1, 0)
> 14
4
n = 34
y = (0.882, 0.118)
> 12
2
Age
p < 0.001
Construction
1
Start
p < 0.001
2
n = 18
y = (0.5, 0.5)
> 12
3
Start
p < 0.001
14
4
n = 21
y = (0.905, 0.095)
>8
3
Start
p < 0.001
12
> 14
5
n = 32
y = (1, 0)
> 12
4
n = 18
y = (0.833, 0.167)
5
Number
p < 0.001
3
6
n = 30
y = (1, 0)
>3
7
n = 15
y = (0.933, 0.067)
References
Construction of a random forest
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Construction of a random forest
Introduction
Construction
R functions
draw ntree bootstrap samples from original sample
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Construction of a random forest
Introduction
Construction
R functions
I
I
draw ntree bootstrap samples from original sample
Variable
importance
fit a classification tree to each bootstrap sample
Tests for variable
importance
ntree trees
Conditional
importance
Summary
References
Construction of a random forest
Introduction
Construction
R functions
I
I
draw ntree bootstrap samples from original sample
Variable
importance
fit a classification tree to each bootstrap sample
Tests for variable
importance
ntree trees
Conditional
importance
creates diverse set of trees because
I
trees are instable w.r.t. changes in learning data
ntree different looking trees (bagging)
randomly preselect mtry splitting variables in each split
ntree more different looking trees (random forest)
Summary
References
Random forests in R
Introduction
randomForest (pkg: randomForest)
Construction
R functions
reference implementation based on CART trees
(Breiman, 2001; Liaw and Wiener, 2008)
for variables of different types: biased in favor of
continuous variables and variables with many categories
(Strobl, Boulesteix, Zeileis, and Hothorn, 2007)
I
cforest (pkg: party)
I
based on unbiased conditional inference trees
(Hothorn, Hornik, and Zeileis, 2006)
+ for variables of different types: unbiased when
subsampling, instead of bootstrap sampling, is used
(Strobl, Boulesteix, Zeileis, and Hothorn, 2007)
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
(Small) random forest
Introduction
1
Start
p < 0.001
1
Start
p < 0.001
1
Start
p < 0.001
8
8
>8
12
2
n = 13
y = (0.308, 0.692)
>8
2
n = 15
y = (0.4, 0.6)
3
Age
p < 0.001
3
Start
p < 0.001
14
87
6
n = 16
y = (0.75, 0.25)
2
n = 38
y = (0.711, 0.289)
>5
9
n = 11
y = (0.364, 0.636)
2
Age
p < 0.001
> 12
81
3
n = 33
y = (1, 0)
3
Number
p < 0.001
> 81
4
Start
p < 0.001
12
5
n = 13
y = (0.385, 0.615)
>3
4
n = 25
y = (1, 0)
5
n = 18
y = (0.889, 0.111)
4
n = 11
y = (1, 0)
12
6
n = 12
y = (0.25, 0.75)
5
n = 31
y = (1, 0)
1
Start
p < 0.001
> 12
2
Age
p < 0.001
14
7
Number
p < 0.001
> 18
4
Number
p < 0.001
4
>3
8
9
n = 28
n = 21
y = (1, 0) y = (0.952, 0.048)
7
Start
p < 0.001
> 13
8
9
n = 11
n = 37
y = (0.818, 0.182) y = (1, 0)
>4
12
71
> 71
3
n = 15
y = (0.933, 0.067)
4
Start
p < 0.001
12
5
6
n = 12
n = 10
y = (0.417, 0.583)y = (0.2, 0.8)
> 12
81
8
5
Start
p < 0.001
> 81
13
7
n = 34
y = (1, 0)
1
Start
p < 0.001
12
5
Start
p < 0.001
12
3
4
6
n=9
n = 13
n = 12
y = (0.778, 0.222) y = (0.154, 0.846) y = (0.833, 0.167)
2
Age
p < 0.001
>3
136
6
n = 47
y = (1, 0)
> 136
7
n=8
y = (0.75, 0.25)
13
> 12
7
n = 47
y = (1, 0)
71
> 71
12
5
Start
p < 0.001
14
3
4
6
n = 15
n = 17
n = 17
y = (0.667, 0.333) y = (0.235, 0.765) y = (0.882, 0.118)
2
n = 28
y = (0.607, 0.393)
> 14
7
n = 32
y = (1, 0)
7
n = 10
y = (0.5, 0.5)
>3
3
Start
p < 0.001
6
n = 37
y = (0.865, 0.135)
> 13
4
n = 10
y = (0.8, 0.2)
5
n = 24
y = (1, 0)
1
Start
p < 0.001
1
Start
p < 0.001
> 12
Summary
>6
2
Number
p < 0.001
5
Age
p < 0.001
8
>8
Conditional
importance
1
Number
p < 0.001
>8
3
4
n = 12
n = 14
y = (0.667, 0.333) y = (0.143, 0.857)
Tests for variable
importance
> 12
5
6
n = 16
n = 15
y = (0.375, 0.625) y = (0.733, 0.267)
2
Start
p < 0.001
> 13
4
6
n = 16
n = 11
y = (0.188, 0.812) y = (0.818, 0.182)
1
Start
p < 0.001
>1
7
n = 35
y = (1, 0)
1
Start
p < 0.001
2
Age
p < 0.001
3
n = 20
y = (0.85, 0.15)
5
6
n = 14
n=9
y = (0.357, 0.643)
y = (0.111, 0.889)
> 14
2
Age
p < 0.001
>4
Variable
importance
6
n = 11
y = (0.818, 0.182)
13
4
Number
p < 0.001
7
n = 31
y = (0.806, 0.194)
> 125
1
Start
p < 0.001
1
Start
p < 0.001
> 27
>4
4
Age
p < 0.001
125
12
R functions
3
Number
p < 0.001
> 12
2
Age
p < 0.001
2
Start
p < 0.001
>1
5
n=9
y = (0.556, 0.444)
18
6
Start
p < 0.001
1
2
n=8
y = (0.375, 0.625)
>4
3
n = 10
y = (0.9, 0.1)
> 12
15 > 15
7
8
n = 12
n = 12
y = (0.833, 0.167) y = (1, 0)
1
Start
p < 0.001
7
n = 16
y = (1, 0)
7
n = 49
y = (1, 0)
> 68
3
Number
p < 0.001
> 13
1
Number
p < 0.001
12
27
68
5
Start
p < 0.001
13
5
n = 32
y = (1, 0)
1
Start
p < 0.001
3
n = 10
y = (1, 0)
> 87
4
n = 36
y = (1, 0)
> 14
4
n = 34
y = (0.882, 0.118)
> 12
2
Age
p < 0.001
Construction
1
Start
p < 0.001
2
n = 18
y = (0.5, 0.5)
> 12
3
Start
p < 0.001
14
4
n = 21
y = (0.905, 0.095)
>8
3
Start
p < 0.001
12
> 14
5
n = 32
y = (1, 0)
> 12
4
n = 18
y = (0.833, 0.167)
5
Number
p < 0.001
3
6
n = 30
y = (1, 0)
>3
7
n = 15
y = (0.933, 0.067)
References
Measuring variable importance
Introduction
Construction
R functions
Variable
Gini importance
importance
mean Gini gain produced by Xj over all trees
I
obj <- randomForest(..., importance=TRUE)
obj$importance
column: MeanDecreaseGini
importance(obj, type=2)
for variables of different types: biased in favor of continuous
variables and variables with many categories
Tests for variable
importance
Conditional
importance
Summary
References
Measuring variable importance
Introduction
permutation importance
Construction
R functions
mean decrease in classification accuracy after
permuting Xj over all trees
I
obj <- randomForest(..., importance=TRUE)
obj$importance
column: MeanDecreaseAccuracy
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
importance(obj, type=1)
I
obj <- cforest(...)
varimp(obj)
for variables of different types: unbiased only when
subsampling is used as in cforest(..., controls =
cforest unbiased())
References
The permutation importance
within each tree t
Introduction
Construction
R functions
Variable
P
VI (t) (xj ) =
(t)
I yi = yi
(t)
B
(t)
iB
(t)
iB
(t)
I yi = yi,j
(t)
B
importance
Tests for variable
importance
Conditional
importance
Summary
(t)
yi
= f (t) (xi ) = predicted class before permuting
(t)
yi,j = f (t) (xi,j ) = predicted class after permuting Xj
xi,j = (xi,1 , . . . , xi,j1 , xj (i),j , xi,j+1 , . . . , xi,p
Note: VI (t) (xj ) = 0 by definition, if Xj is not in tree t
References
The permutation importance
Introduction
Construction
R functions
over all trees:
Variable
importance
1. raw importance
Tests for variable
importance
Conditional
importance
Pntree
VI (xj ) =
VI (t) (xj )
ntree
Summary
t=1
obj <- randomForest(..., importance=TRUE)
importance(obj, type=1, scale=FALSE)
References
The permutation importance
Introduction
Construction
R functions
over all trees:
Variable
importance
2. scaled importance (z-score)
Tests for variable
importance
Conditional
importance
VI (xj )
ntree
Summary
= zj
obj <- randomForest(..., importance=TRUE)
importance(obj, type=1, scale=TRUE) (default)
References
Tests for variable importance
for variable selection purposes
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Tests for variable importance
for variable selection purposes
Introduction
Construction
R functions
Breiman and Cutler (2008): simple significance test
based on normality of z-score
randomForest, scale=TRUE + -quantile of N(0,1)
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Tests for variable importance
for variable selection purposes
Introduction
Construction
R functions
Breiman and Cutler (2008): simple significance test
based on normality of z-score
randomForest, scale=TRUE + -quantile of N(0,1)
Diaz-Uriarte and Alvarez de Andres (2006): backward
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
elimination (throw out least important variables until
out-of-bag prediction accuracy drops)
varSelRF (pkg: varSelRF), dep. on randomForest
References
Tests for variable importance
for variable selection purposes
Introduction
Construction
R functions
Breiman and Cutler (2008): simple significance test
based on normality of z-score
randomForest, scale=TRUE + -quantile of N(0,1)
Diaz-Uriarte and Alvarez de Andres (2006): backward
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
elimination (throw out least important variables until
out-of-bag prediction accuracy drops)
varSelRF (pkg: varSelRF), dep. on randomForest
I
Diaz-Uriarte (2007) and Rodenburg et al. (2008): plots
and significance test (randomly permute response values
to mimic the overall null hypothesis that none of the
predictor variables is relevant = baseline)
References
Tests for variable importance
Introduction
Construction
R functions
Variable
problems of these approaches:
importance
Tests for variable
importance
Conditional
importance
Summary
References
Tests for variable importance
Introduction
Construction
R functions
Variable
problems of these approaches:
importance
Tests for variable
importance
(at least) Breiman and Cutler (2008): strange statistical
properties (Strobl and Zeileis, 2008)
Conditional
importance
Summary
References
Tests for variable importance
Introduction
Construction
R functions
Variable
problems of these approaches:
importance
Tests for variable
importance
(at least) Breiman and Cutler (2008): strange statistical
properties (Strobl and Zeileis, 2008)
Conditional
importance
Summary
References
all: preference of correlated predictor variables (see also
Nicodemus and Shugart, 2007; Archer and Kimes, 2008)
Breiman and Cutlers test
Introduction
Construction
R functions
Variable
under the null hypothesis of zero importance:
importance
Tests for variable
importance
as.
zj N(0, 1)
Conditional
importance
Summary
References
if zj exceeds the -quantile of N(0,1) reject the
null hypothesis of zero importance for variable Xj
Raw importance
Introduction
Construction
R functions
Variable
sample size
importance
100
200
500
mean importance
ntree = 200
mean importance
ntree = 100
Tests for variable
importance
mean importance
ntree = 500
Conditional
importance
Summary
References
0.0
0.1
0.2
0.3
0.4
0.0
0.1
0.2
relevance
0.3
0.4
0.0
0.1
0.2
0.3
0.4
z-score and power
Introduction
sample size
Construction
100
200
500
zscore
ntree = 200
zscore
ntree = 100
R functions
zscore
ntree = 500
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
0.0
0.1
0.2
0.3
0.4
0.0
0.1
power
ntree = 100
0.0
0.1
0.2
0.2
0.3
0.4
0.0
0.1
power
ntree = 200
0.3
0.4
0.0
0.1
0.2
relevance
0.2
0.3
0.4
References
power
ntree = 500
0.3
0.4
0.0
0.1
0.2
0.3
0.4
Findings
Introduction
Construction
z-score and power
R functions
Variable
importance
increase in ntree
decrease in sample size
Tests for variable
importance
Conditional
importance
Summary
rather use raw, unscaled permutation importance!
importance(obj, type=1, scale=FALSE)
varimp(obj)
References
What null hypothesis were we testing
in the first place?
Introduction
Construction
R functions
Variable
obs
Xj
1
..
.
y1
..
.
xj (1),j
..
.
z1
..
.
i
..
.
yi
..
.
xj (i),j
..
.
zi
..
.
yn
xj (n),j
zn
H0 : Xj Y , Z or Xj Y Xj Z
H
P(Y , Xj , Z ) =0 P(Y , Z ) P(Xj )
importance
Tests for variable
importance
Conditional
importance
Summary
References
What null hypothesis were we testing
in the first place?
Introduction
Construction
R functions
Variable
importance
the current null hypothesis reflects independence of Xj from
both Y and the remaining predictor variables Z
Tests for variable
importance
Conditional
importance
Summary
References
What null hypothesis were we testing
in the first place?
Introduction
Construction
R functions
Variable
importance
the current null hypothesis reflects independence of Xj from
both Y and the remaining predictor variables Z
a high variable importance can result from violation of
either one!
Tests for variable
importance
Conditional
importance
Summary
References
Suggestion: Conditional permutation scheme
Introduction
obs
Xj
y1
xj|Z =a (1),j
z1 = a
y3
xj|Z =a (3),j
z3 = a
27
y27
xj|Z =a (27),j
z27 = a
y6
xj|Z =b (6),j
z6 = b
14
y14
xj|Z =b (14),j
z14 = b
33
..
.
y33
..
.
xj|Z =b (33),j
..
.
z33 = b
..
.
H0 : Xj Y |Z
P(Y , Xj |Z )
H0
P(Y |Z ) P(Xj |Z )
or P(Y |Xj , Z )
H0
P(Y |Z )
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Technically
Introduction
Construction
R functions
use any partition of the feature space for conditioning
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Technically
Introduction
Construction
R functions
I
I
use any partition of the feature space for conditioning
Variable
importance
here: use binary partition already learned by tree
Tests for variable
importance
(use cutpoints as bisectors of feature space)
Conditional
importance
Summary
References
Technically
Introduction
Construction
R functions
I
I
use any partition of the feature space for conditioning
Variable
importance
here: use binary partition already learned by tree
Tests for variable
importance
(use cutpoints as bisectors of feature space)
Conditional
importance
condition on correlated variables or select some
Summary
References
Technically
Introduction
Construction
R functions
I
I
use any partition of the feature space for conditioning
Variable
importance
here: use binary partition already learned by tree
Tests for variable
importance
(use cutpoints as bisectors of feature space)
Conditional
importance
condition on correlated variables or select some
Summary
References
Strobl et al. (2008)
available in cforest from version 0.9-994: varimp(obj,
conditional = TRUE)
Simulation study
I
I
Introduction
i.i.d.
dgp: yi = 1 xi,1 + + 12 xi,12 + i , i N(0, 0.5)
X1 , . . . , X12 N(0, )
Construction
R functions
Variable
0.9
0.9
0.9
0 0
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0
..
.
0
..
.
0
..
.
0
..
.
0 0
0 0
0 0
1 0
.. . .
. 0
.
importance
Tests for variable
importance
Conditional
importance
Summary
References
Xj
X1
X2
X3
X4
X5
X6
X7
X8
X12
-5
-5
-2
Results
Construction
R functions
15
mtry = 1
25
Introduction
Variable
0 5
importance
30
Summary
References
10
11
12
mtry = 8
0 10
mtry = 3
50
Conditional
importance
20 40 60 80
Tests for variable
importance
variable
variable
Peptide-binding data
Introduction
Construction
Variable
0.005
importance
Tests for variable
importance
Conditional
importance
Summary
0.005
References
conditional
conditional
unconditional
R functions
h2y8
flex8
*
pol3
Summary
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Summary
if your predictor variables are of different types:
use cforest (pkg: party) with default option controls =
Introduction
Construction
R functions
cforest unbiased()
with permutation importance varimp(obj)
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Summary
if your predictor variables are of different types:
use cforest (pkg: party) with default option controls =
Introduction
Construction
R functions
cforest unbiased()
with permutation importance varimp(obj)
Variable
importance
Tests for variable
importance
otherwise: feel free to use cforest (pkg: party)
Conditional
importance
with permutation importance varimp(obj)
Summary
or randomForest (pkg: randomForest)
References
with permutation importance importance(obj, type=1)
or Gini importance importance(obj, type=2)
but dont fall for the z-score! (i.e. set scale=FALSE)
Summary
if your predictor variables are of different types:
use cforest (pkg: party) with default option controls =
Introduction
Construction
R functions
cforest unbiased()
with permutation importance varimp(obj)
Variable
importance
Tests for variable
importance
otherwise: feel free to use cforest (pkg: party)
Conditional
importance
with permutation importance varimp(obj)
Summary
or randomForest (pkg: randomForest)
References
with permutation importance importance(obj, type=1)
or Gini importance importance(obj, type=2)
but dont fall for the z-score! (i.e. set scale=FALSE)
if your predictor variables are highly correlated: use the
conditional importance in cforest (pkg: party)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Archer, K. J. and R. V. Kimes (2008). Empirical characterization
of random forest variable importance measures. Computational
Introduction
Construction
Statistics & Data Analysis 52 (4), 22492260.
Breiman, L. (2001). Random forests. Machine Learning 45 (1),
532.
Breiman, L. and A. Cutler (2008). Random forests Classification
manual. Website accessed in 1/2008;
http://www.math.usu.edu/adele/forests.
Breiman, L., A. Cutler, A. Liaw, and M. Wiener (2006). Breiman
and Cutlers Random Forests for Classification and Regression.
R package version 4.5-16.
Diaz-Uriarte, R. (2007). GeneSrF and varselrf: A web-based
tool and R package for gene selection and classification using
random forest. BMC Bioinformatics 8:328.
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Hothorn, T., K. Hornik, and A. Zeileis (2006). Unbiased recursive
partitioning: A conditional inference framework. Journal of
Computational and Graphical Statistics 15 (3), 651674.
Introduction
Construction
R functions
Variable
Strobl, C., A.-L. Boulesteix, A. Zeileis, and T. Hothorn (2007).
importance
Bias in random forest variable importance measures:
Tests for variable
importance
Illustrations, sources and a solution. BMC Bioinformatics 8:25.
Conditional
importance
Strobl, C. and A. Zeileis (2008). Danger: High power! exploring
the statistical properties of a test for random forest variable
importance. In Proceedings of the 18th International
Conference on Computational Statistics, Porto, Portugal.
Strobl, C., A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis
(2008). Conditional variable importance for random forests.
BMC Bioinformatics 9:307.
Summary
References