Chapter 33 Factorials
33.1 Barley data
Back to the barley sprouting data.
> library(cfcdae)
> data(SproutingBarley)
> head(SproutingBarley)
weeks water sprouting weeks.z water.z
1 1 4 11 1 4
2 1 8 8 1 8
3 3 4 7 3 4
4 3 8 1 3 8
5 6 4 9 6 4
6 6 8 5 6 8
This is a two factor model. We want to fit both main effects
and the two factor interaction. In R, interactions are indicated
by using a colon, as in factor1:factor2
.
> barleyfit1 <- lm(sprouting~water+weeks+water:weeks,data=SproutingBarley)
> anova(barleyfit1)
Analysis of Variance Table
Response: sprouting
Df Sum Sq Mean Sq F value Pr(>F)
water 1 1178.13 1178.13 19.7232 0.000251 ***
weeks 4 1321.13 330.28 5.5293 0.003645 **
water:weeks 4 208.87 52.22 0.8742 0.496726
Residuals 20 1194.67 59.73
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sometimes you want to have a single factor that encompasses all
of the combined leves. There is a join()
function in conf.design
that
you can use to do that.
Note that in the anova the SS and df for ww
are the sum
of those for water
, weeks
, and water:weeks
in
the preceding anova.
> ww <- with(SproutingBarley,conf.design::join(water,weeks))
> ww
[1] 4:1 8:1 4:3 8:3 4:6 8:6 4:9 8:9 4:12 8:12 4:1 8:1 4:3 8:3 4:6 8:6 4:9 8:9 4:12
[20] 8:12 4:1 8:1 4:3 8:3 4:6 8:6 4:9 8:9 4:12 8:12
Levels: 4:1 4:12 4:3 4:6 4:9 8:1 8:12 8:3 8:6 8:9
> barleyfit2 <- lm(sprouting~ww,data=SproutingBarley)
> anova(barleyfit2)
Analysis of Variance Table
Response: sprouting
Df Sum Sq Mean Sq F value Pr(>F)
ww 9 2708.1 300.904 5.0375 0.001269 **
Residuals 20 1194.7 59.733
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There are multiple short cuts for factorial models that can save you a lot of typing.
A*B*C
expands out into all main effects and interactions:A + B + C + A:B + A:C + B:C + A:B:C
.(A+B+C)^k
expands out into all of the main effects and interactions up to order k. Thuse(A+B+C)^2
is a short cut forA + B + C + A:B + A:C + B:C
.- You can subtract out terms. Thus
A*B*C - A:B:C - A:C
is the same asA + B + C + A:B + B:C
.
Note: I have put in spaces in these models for clarity; the spaces are not actually needed.
Before we do any inference, we need to check our assumptions. We have non-constant variance.
> plot(barleyfit1,which=1)

Figure 33.1: Residual plot for barley data
Box-Cox suggests a square root, which should surprise no one as these are binomial data with smallish success probability.
> car::boxCox(barleyfit1)

Figure 33.2: Box-Cox plot for barley data
After transforming, we see that the main effects are both pretty significant, but there is no evidence for interaction.
> barleyfit3 <- lm(sqrt(sprouting)~water*weeks,data=SproutingBarley)
> anova(barleyfit3)
Analysis of Variance Table
Response: sqrt(sprouting)
Df Sum Sq Mean Sq F value Pr(>F)
water 1 21.8930 21.8930 23.7606 9.177e-05 ***
weeks 4 21.8949 5.4737 5.9406 0.002555 **
water:weeks 4 2.2485 0.5621 0.6101 0.660139
Residuals 20 18.4280 0.9214
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A side by side plot emphasizes that the main effects are large relative to residuals, but not so much the interaction.
> sidebyside(barleyfit3)

Figure 33.3: Side by side plot for transformed barley data
We can see the fitted effects using summary()
. Note that there
are many other effects that can be determined by the
zero sum constraints we use.
> summary(barleyfit3)
Call:
lm(formula = sqrt(sprouting) ~ water * weeks, data = SproutingBarley)
Residuals:
Min 1Q Median 3Q Max
-1.4250 -0.5001 0.1663 0.5928 1.4911
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.47148 0.17525 19.808 1.30e-14 ***
water1 0.85426 0.17525 4.874 9.18e-05 ***
weeks1 -0.96171 0.35050 -2.744 0.0125 *
weeks2 -0.78037 0.35050 -2.226 0.0376 *
weeks3 0.11369 0.35050 0.324 0.7490
weeks4 0.19109 0.35050 0.545 0.5917
water1:weeks1 -0.44200 0.35050 -1.261 0.2218
water1:weeks2 0.04425 0.35050 0.126 0.9008
water1:weeks3 -0.01444 0.35050 -0.041 0.9675
water1:weeks4 0.42088 0.35050 1.201 0.2439
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9599 on 20 degrees of freedom
Multiple R-squared: 0.7141, Adjusted R-squared: 0.5855
F-statistic: 5.551 on 9 and 20 DF, p-value: 0.0006986
If you want to see the full set of coefficients, it’s easier
to use model.effects()
.
You need to enter the
term exactly as it appears in the model, ie, you have to have the variables
in the correct order, because model.effects()
is too
dumb to find the term otherwise.
> model.effects(barleyfit3,"weeks")
1 3 6 9 12
-0.9617085 -0.7803725 0.1136921 0.1910862 1.4373027
> model.effects(barleyfit3,"water")
4 8
0.8542634 -0.8542634
> model.effects(barleyfit3,"water:weeks")
1 3 6 9 12
4 -0.4419991 0.04424575 -0.01444493 0.4208793 -0.008681009
8 0.4419991 -0.04424575 0.01444493 -0.4208793 0.008681009
> model.effects(barleyfit3,"weeks:water")
Error in model.effects.default(barleyfit3, "weeks:water"): weeks:water is not an explicit term in the model
We can do contrasts, pairwise comparisons and so on. Because we have more than one factor in the model, the second argument identifying the factor to use makes more sense than it did when there was only one term in the model.
> linear.contrast(barleyfit3,weeks,c(-2,-1,0,1,2))
estimates se t-value p-value lower-ci upper-ci
1 5.769481 1.23922 4.655735 0.0001522892 3.184513 8.354449
attr(,"class")
[1] "linear.contrast"
> pairwise(barleyfit3,weeks,type="hsd")
Pairwise comparisons ( hsd ) of weeks
estimate signif diff lower upper
1 - 3 -0.18133596 1.658362 -1.839698 1.4770265
1 - 6 -1.07540057 1.658362 -2.733763 0.5829618
1 - 9 -1.15279468 1.658362 -2.811157 0.5055677
* 1 - 12 -2.39901121 1.658362 -4.057374 -0.7406488
3 - 6 -0.89406461 1.658362 -2.552427 0.7642978
3 - 9 -0.97145871 1.658362 -2.629821 0.6869037
* 3 - 12 -2.21767525 1.658362 -3.876038 -0.5593128
6 - 9 -0.07739411 1.658362 -1.735757 1.5809683
6 - 12 -1.32361064 1.658362 -2.981973 0.3347518
9 - 12 -1.24621654 1.658362 -2.904579 0.4121459
You can also get the same contrast in one variable
computed separately for every level of a second variable
by using the byvar
argument. This is another
way to look at interaction beyond interaction plots.
> with(SproutingBarley,linear.contrast(barleyfit3,weeks,c(-2,-1,0,1,2),
+ byvar=water))
estimates se t-value p-value lower-ci upper-ci
1 7.012751 1.752522 4.001519 0.0007010286 3.3570539 10.668448
2 4.526211 1.752522 2.582684 0.0177798490 0.8705145 8.181908
attr(,"class")
[1] "linear.contrast"
We can use emmeans()
in several useful ways.
- We can get the fitted values.
- We can get marginal fitted values. Note that these can be misleading if you look at marginal means in the presence of an interaction.
- We can get a table of means even if the model includes just main effects.
Note that the first and third (full and additive) results are different.
> library(emmeans)
> emmeans(barleyfit3,~weeks:water)
weeks water emmean SE df lower.CL upper.CL
1 4 2.92 0.554 20 1.766 4.08
3 4 3.59 0.554 20 2.434 4.75
6 4 4.42 0.554 20 3.269 5.58
9 4 4.94 0.554 20 3.782 6.09
12 4 5.75 0.554 20 4.598 6.91
1 8 2.10 0.554 20 0.941 3.25
3 8 1.79 0.554 20 0.637 2.95
6 8 2.75 0.554 20 1.589 3.90
9 8 2.39 0.554 20 1.231 3.54
12 8 4.06 0.554 20 2.907 5.22
Results are given on the sqrt (not the response) scale.
Confidence level used: 0.95
> emmeans(barleyfit3,~weeks)
NOTE: Results may be misleading due to involvement in interactions
weeks emmean SE df lower.CL upper.CL
1 2.51 0.392 20 1.69 3.33
3 2.69 0.392 20 1.87 3.51
6 3.59 0.392 20 2.77 4.40
9 3.66 0.392 20 2.85 4.48
12 4.91 0.392 20 4.09 5.73
Results are averaged over the levels of: water
Results are given on the sqrt (not the response) scale.
Confidence level used: 0.95
> additive.fit <- lm(sqrt(sprouting)~weeks+water,data=SproutingBarley)
> emmeans(additive.fit,~weeks:water)
weeks water emmean SE df lower.CL upper.CL
1 4 3.36 0.415 24 2.507 4.22
3 4 3.55 0.415 24 2.689 4.40
6 4 4.44 0.415 24 3.583 5.30
9 4 4.52 0.415 24 3.660 5.37
12 4 5.76 0.415 24 4.906 6.62
1 8 1.66 0.415 24 0.799 2.51
3 8 1.84 0.415 24 0.980 2.69
6 8 2.73 0.415 24 1.874 3.59
9 8 2.81 0.415 24 1.952 3.67
12 8 4.05 0.415 24 3.198 4.91
Results are given on the sqrt (not the response) scale.
Confidence level used: 0.95
The effects
package also has some utility. allEffects
returns the table of means for any term in the model
that is not contained in another term. In barleyfit3
, that
is the full interaction water:weeks
. In additive.fit
, that
is the weeks
margin and the water
margin.
Plotting effects contained in another term can be misleading.
> library(effects)
Loading required package: carData
lattice theme set by effectsTheme()
See ?effectsTheme for details.
> allEffects(barleyfit3)
model: sqrt(sprouting) ~ water * weeks
water*weeks effect
weeks
water 1 3 6 9 12
4 2.922038 3.589619 4.424993 4.937711 5.754367
8 2.097510 1.792601 2.745356 2.387426 4.063203
> allEffects(additive.fit)
model: sqrt(sprouting) ~ weeks + water
weeks effect
weeks
1 3 6 9 12
2.509774 2.691110 3.585174 3.662569 4.908785
water effect
water
4 8
4.325746 2.617219
You can plot the results of predictorEffects()
. In the
interactive model, we get two sets of plots: how water
affects the response separately for every level of weeks,
and how weeks affects the response separately for every
level of water.
> plot(predictorEffects(barleyfit3,~water:weeks))

Figure 33.4: Plot of predictor effects for barley data with interaction
If you do the same thing with the additive model, it notices there is no interaction to handle and just gives you plots of each margin.
> plot(predictorEffects(additive.fit,~water:weeks))

Figure 33.5: Plot of predictor effects for barley data with additive fit
33.2 Carbonwire data
These data are not in the book, so I read them in from
a file using read.table()
and then make the
predictors into factors. The header=TRUE
says
that the column names are on the first row of the file.
> wire <- read.table("carbonwire.txt",header=TRUE)
> names(wire)
[1] "resist" "degas" "temp" "diff"
> wire <- within(wire, {degas <- factor(degas); temp <- factor(temp);
+ diff <- factor(diff)})
Let’s begin by fitting the full model. We’ll
need to check residuals, but it looks like anything
involving degas
is not significant, and the other
three terms look highly significant.
> wire.fit <- lm(resist~degas*temp*diff,data=wire)
> anova(wire.fit)
Analysis of Variance Table
Response: resist
Df Sum Sq Mean Sq F value Pr(>F)
degas 1 0.47 0.467 1.5918 0.2125
temp 2 410.69 205.346 699.6024 < 2.2e-16 ***
diff 2 80.54 40.270 137.1989 < 2.2e-16 ***
degas:temp 2 0.31 0.157 0.5342 0.5892
degas:diff 2 0.70 0.351 1.1957 0.3104
temp:diff 4 14.81 3.704 12.6177 2.566e-07 ***
degas:temp:diff 4 0.87 0.219 0.7450 0.5656
Residuals 54 15.85 0.294
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
But we need to check those residuals. Constant variance looks fine.
> plot(wire.fit,which=1)

Figure 33.6: Residual plot of carbon wire data
And normality is great. So no transformations!
> plot(wire.fit,which=2)

Figure 33.7: NPP of carbon wire data
Side by side reinforces what we see in the anova.
> sidebyside(wire.fit)

Figure 33.8: Side by side plot of carbon wire data
Let’s look at that interaction. This plot makes temp-3/diff-2 look a little high.
> interactplot(temp,diff,resist,data=wire)

Figure 33.9: Carbonwire interaction plot; temp by diff
But with this view it looks more like temp-3/diff-1 is too low. It’s likely some mix of the two.
> interactplot(diff,temp,resist,data=wire)

Figure 33.10: Carbonwire interaction plot; diff by temp
33.3 Order of terms
Note that R will, by default, reorder your terms so that main effects come first, then 2fi, then 3fi, etc.
> anova(lm(resist~temp+diff+temp:diff+degas,data=wire))
Analysis of Variance Table
Response: resist
Df Sum Sq Mean Sq F value Pr(>F)
temp 2 410.69 205.346 717.6589 < 2.2e-16 ***
diff 2 80.54 40.270 140.7400 < 2.2e-16 ***
degas 1 0.47 0.467 1.6329 0.2061
temp:diff 4 14.81 3.704 12.9434 1.014e-07 ***
Residuals 62 17.74 0.286
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
You can force R to accept your order of terms, but it’s
a little tedious. You use terms(,keep.order=TRUE)
to
set your order, then pass that to lm()
instead of
the model.
> anova(lm(terms(resist~temp+diff+temp:diff+degas,keep.order=TRUE),data=wire))
Analysis of Variance Table
Response: resist
Df Sum Sq Mean Sq F value Pr(>F)
temp 2 410.69 205.346 717.6589 < 2.2e-16 ***
diff 2 80.54 40.270 140.7400 < 2.2e-16 ***
temp:diff 4 14.81 3.704 12.9434 1.014e-07 ***
degas 1 0.47 0.467 1.6329 0.2061
Residuals 62 17.74 0.286
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note: you can’t just choose any order you like. If you put an interaction in before terms that it includes, then the interaction will such up the SS and df of the lower order terms.
> anova(lm(terms(resist~temp+temp:diff+diff+degas,keep.order=TRUE),data=wire))
Analysis of Variance Table
Response: resist
Df Sum Sq Mean Sq F value Pr(>F)
temp 2 410.69 205.346 717.6589 <2e-16 ***
temp:diff 6 95.35 15.892 55.5423 <2e-16 ***
degas 1 0.47 0.467 1.6329 0.2061
Residuals 62 17.74 0.286
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1