Chapter 17 Example 3.6: Predictive Model Selection
One thing you might want to do with a model is predict future values. Information criteria such as AIC (or AICc, which includes a small sample size correction) and BIC attempt to estimate a form of the error we might experience when predicting future data. AIC and BIC (and other information criteria) generally take the form of a measure of how far the data are from the fitted model plus a penalty for using additional parameters in the model. You compute an information criterion for each model and select the model with the lowest value.
BIC is generally preferred in the situation where it is believed that the true model is somewhere in the set of models being compared. AIC is generally preferred when all of the models are considered to be approximate.
For the resin lifetime data, we will look at these models: single mean, separate means, and polynomials of order 0 through 4. Note: the single mean model and polynomial of order 0 model both just fit a constant value; they are the same model described differently and will have the same AIC and BIC. Note further: because there are only five different temperatures, the separate means model and the order 4 polynomial have the same model fits to the five temperatures, but describe the fit differently; they will have the same AIC and BIC.
Because the separate means model is as complete a model as we can fit, that is, the five treatment means can be fit exactly, we can confidently use BIC in this situation. However, if we had more levels of temperature but only considered polynomial models up to order 4, then AIC would be appropriate but BIC would not.
> # get data
> data(ResinLifetimes)
> # fit models
> separate.means <- lm(logTime ~ temp,data=ResinLifetimes)
> single.mean <- lm(logTime~1,data=ResinLifetimes)
> # add powers of temperature to our data frame
> myRL <- within(ResinLifetimes,
+ {temp.z2 <- temp.z^2;temp.z3<-temp.z^3;temp.z4<-temp.z^4})
> # fit polynomial models
> p4 <- lm(logTime~temp.z+temp.z2+temp.z3+temp.z4,data=myRL)
> p3 <- lm(logTime~temp.z+temp.z2+temp.z3,data=myRL)
> p2 <- lm(logTime~temp.z+temp.z2,data=myRL)
> p1 <- lm(logTime~temp.z,data=myRL)
> p0 <- lm(logTime~1,data=myRL) # same as single.mean
> # collect AIC values in a vector with labels
> RL.AIC <- c(single=AIC(single.mean),p0=AIC(p0),
+ p1=AIC(p1),p2=AIC(p2),p3=AIC(p3),
+ p4=AIC(p4),separate=AIC(separate.means))
> RL.BIC <- c(single=BIC(single.mean),p0=BIC(p0),
+ p1=BIC(p1),p2=BIC(p2),p3=BIC(p3),
+ p4=BIC(p4),separate=BIC(separate.means))
> cbind(RL.AIC,RL.BIC)
RL.AIC RL.BIC
single 25.09628 28.31811
p0 25.09628 28.31811
p1 -59.18418 -54.35142
p2 -65.93237 -59.48870
p3 -63.93471 -55.88012
p4 -61.93575 -52.27025
separate -61.93575 -52.27025
> # or you can make the same information look fancier
> knitr::kable(cbind(AIC=RL.AIC,BIC=RL.BIC))
AIC | BIC | |
---|---|---|
single | 25.09628 | 28.31811 |
p0 | 25.09628 | 28.31811 |
p1 | -59.18418 | -54.35142 |
p2 | -65.93237 | -59.48870 |
p3 | -63.93471 | -55.88012 |
p4 | -61.93575 | -52.27025 |
separate | -61.93575 | -52.27025 |
Quadratic has the lowest AIC and BIC and is thus the selected model. However, cubic and quartic are also better than linear or single mean (just not as good as quadratic).