When BIC is Best
The data set
is data for a linear model with one response variable y
and
25 predictor variables x1
through x25
. In real
life, we don't know which of the predictor variables have anything to do
with the response. If we think of all possible regression models in which
the means are linear functions of some subset of the predictors, there
are 225 = 33,554,432 different subsets so about 33 million
possible models.
The following R statements fit the largest model, which includes all the predictors.
This is simulated data and the simulation truth
regression coefficients
are equal to 1 for x1
through x5
and equal to zero
for x6
through x25
.
The following R statements fit the simulation truth model.
We want a procedure that does not use the true (pretend unknown) parameter values and still gives reasonable results. We demonstrate model selection with BIC and AIC
First BIC
The R function regsubsets
in the leaps
package
(on-line
help)
rapidly finds the best fitting model with p parameters for each
p in the range specified (here 2 to 16). The best subset
according to
BIC
has p = 7.
Second AIC
The best subset according to AIC has p = 9.
When AIC is Best
The data set
is data for a linear model with one response variable y
and
25 predictor variables x1
thought x25
. In real
life, we don't know which of the predictor variables have anything to do
with the response. If we think of all possible regression models in which
the means are linear functions of some subset of the predictors, there
are 225 = 33,554,432 different subsets so about 33 million
possible models.
The following R statements fit the largest model, which includes all the predictors.
This is simulated data and the simulation truth
regression coefficient
for xi is 1 ⁄ (1 + i).
All simulation truth regression coefficients are nonzero.
Hence the fit above is the simulation truth model.
We want a procedure that does not use the true (pretend unknown) parameter values and still gives reasonable results. We demonstrate model selection with BIC and AIC
First BIC
The best subset according to BIC has p = 6.
Second AIC
The best subset according to AIC has p = 10.
Summary
When the actual true model is one of the models under consideration and has a small number of nonzero parameters, then BIC is best. It provides consistent model selection as the sample size goes to infinity and AIC does not.
When the actual true model is not one of the models under consideration or has a large number of nonzero parameters, then AIC is best. The consistent estimation property of BIC is meaningless in this context. Moreover BIC tends to pick models that are too small.