Chapter 40 Randomized Complete Blocks
40.1 Immer data
One early and famous example of a Randomized Complete Block analyzed by Fisher involves five varieties of barley grown at six locations (which included Crookston, Waseca, …, data from Minnesota!). The original experiment had two years of data; we will only look at the second year. This data set is included in the MASS package.
> library(cfcdae)
Registered S3 method overwritten by 'DoE.base':
method from
factorize.factor conf.design
> library(MASS)
> library(lme4)
Loading required package: Matrix
> library(car)
Loading required package: carData
> immer
Loc Var Y1 Y2
1 UF M 81.0 80.7
2 UF S 105.4 82.3
3 UF V 119.7 80.4
4 UF T 109.7 87.2
5 UF P 98.3 84.2
6 W M 146.6 100.4
7 W S 142.0 115.5
8 W V 150.7 112.2
9 W T 191.5 147.7
10 W P 145.7 108.1
11 M M 82.3 103.1
12 M S 77.3 105.1
13 M V 78.4 116.5
14 M T 131.3 139.9
15 M P 89.6 129.6
16 C M 119.8 98.9
17 C S 121.4 61.9
18 C V 124.0 96.2
19 C T 140.8 125.5
20 C P 124.8 75.7
21 GR M 98.9 66.4
22 GR S 89.0 49.9
23 GR V 69.1 96.7
24 GR T 89.3 61.9
25 GR P 104.1 80.3
26 D M 86.9 67.7
27 D S 77.1 66.7
28 D V 78.9 67.4
29 D T 101.8 91.8
30 D P 96.0 94.1
Here are the results separately by block. There are large block to block differences (and all of the treatment to treatment differences are within each boxplot), so blocking should be a real help.
> boxplot(Y2~Loc,data=immer,main="Barley by location")

Figure 40.1: Boxplot of barley by location
Here are the results separately by treatment. We can maybe see some treatment differences, but we also see a lot of variation within each treatment. Much of this is block to block variability that would show up in our error if we hadn’t blocked.
> boxplot(Y2~Var,data=immer,main="Barley by variety")

Figure 40.2: Boxplot of barley by variety
40.2 As Fixed Effects
Fit as fixed blocks first. Get into the habit of treatments after blocks (which is the order you’ll need if you have missing data).
> barley.lm <- lm(Y2~Loc+Var,data=immer)
Residuals aren’t too bad, although there is just a hint of a dip in the middle.
> plot(barley.lm,which=1)

Figure 40.3: Residual plot of barley data
Normality looks good.
> plot(barley.lm,which=2)

Figure 40.4: NPP of barley data residuals
Here is the ANOVA, we just enter treatments after blocks. Variety effects are modestly significant. We do not test the block effects (even though R prints an F, it doesn’t know about blocking factors).
> anova(barley.lm)
Analysis of Variance Table
Response: Y2
Df Sum Sq Mean Sq F value Pr(>F)
Loc 5 10285.0 2056.99 10.3901 5.049e-05 ***
Var 4 2845.2 711.29 3.5928 0.02306 *
Residuals 20 3959.5 197.98
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
You can do all the usual things: summaries, pairwise comparisons, marginal means, etc.
Varieties M and S have low yields, variety T has a high yield, and the others are in the middle. Again, ignore ``testing” related to block effects.
> summary(barley.lm)
Call:
lm(formula = Y2 ~ Loc + Var, data = immer)
Residuals:
Min 1Q Median 3Q Max
-25.007 -8.665 -0.900 8.185 23.893
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 93.133 2.569 36.254 < 2e-16 ***
Loc1 -1.493 5.744 -0.260 0.797543
Loc2 -15.593 5.744 -2.715 0.013344 *
Loc3 -22.093 5.744 -3.846 0.001008 **
Loc4 25.707 5.744 4.475 0.000232 ***
Loc5 -10.173 5.744 -1.771 0.091790 .
Var1 -6.933 5.138 -1.349 0.192261
Var2 2.200 5.138 0.428 0.673081
Var3 -12.900 5.138 -2.511 0.020748 *
Var4 15.867 5.138 3.088 0.005797 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 14.07 on 20 degrees of freedom
Multiple R-squared: 0.7683, Adjusted R-squared: 0.664
F-statistic: 7.369 on 9 and 20 DF, p-value: 0.0001069
> model.effects(barley.lm,"Var")
M P S T V
-6.933333 2.200000 -12.900000 15.866667 1.766667
Using HSD, only S and T seem to differ, the others cannot be distinguished.
> pairwise(barley.lm,Var)
Pairwise comparisons ( hsd ) of Var
estimate signif diff lower upper
M - P -9.1333333 24.30866 -33.441989 15.175322
M - S 5.9666667 24.30866 -18.341989 30.275322
M - T -22.8000000 24.30866 -47.108656 1.508656
M - V -8.7000000 24.30866 -33.008656 15.608656
P - S 15.1000000 24.30866 -9.208656 39.408656
P - T -13.6666667 24.30866 -37.975322 10.641989
P - V 0.4333333 24.30866 -23.875322 24.741989
* S - T -28.7666667 24.30866 -53.075322 -4.458011
S - V -14.6666667 24.30866 -38.975322 9.641989
T - V 14.1000000 24.30866 -10.208656 38.408656
“Old school” estimate of
block variation based on barley.lm
(will
agree with REML).
> (2056.99-197.98)/5
[1] 371.802
40.3 As Random Effects
Maybe you think the blocks are supposed to be representative of all of Minnesota; then we would treat them as random.
> barley.lmer <- lmer(Y2~(1|Loc)+Var,data=immer)
For this simple model and balanced data, we get all the same results:
> summary(barley.lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: Y2 ~ (1 | Loc) + Var
Data: immer
REML criterion at convergence: 227
Scaled residuals:
Min 1Q Median 3Q Max
-1.92838 -0.49474 -0.05208 0.72439 1.54701
Random effects:
Groups Name Variance Std.Dev.
Loc (Intercept) 371.8 19.28
Residual 198.0 14.07
Number of obs: 30, groups: Loc, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 93.133 8.280 11.247
Var1 -6.933 5.138 -1.349
Var2 2.200 5.138 0.428
Var3 -12.900 5.138 -2.511
Var4 15.867 5.138 3.088
Correlation of Fixed Effects:
(Intr) Var1 Var2 Var3
Var1 0.000
Var2 0.000 -0.250
Var3 0.000 -0.250 -0.250
Var4 0.000 -0.250 -0.250 -0.250
> Anova(barley.lmer,test="F")
Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df)
Response: Y2
F Df Df.res Pr(>F)
Var 3.5928 4 20 0.02306 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> pairwise(barley.lmer,Var)
Pairwise comparisons ( hsd ) of Var
estimate signif diff lower upper
M - P -9.1333333 24.30866 -33.441989 15.175322
M - S 5.9666667 24.30866 -18.341989 30.275322
M - T -22.8000000 24.30866 -47.108656 1.508656
M - V -8.7000000 24.30866 -33.008656 15.608656
P - S 15.1000000 24.30866 -9.208656 39.408656
P - T -13.6666667 24.30866 -37.975322 10.641989
P - V 0.4333333 24.30866 -23.875322 24.741989
* S - T -28.7666667 24.30866 -53.075322 -4.458011
S - V -14.6666667 24.30866 -38.975322 9.641989
T - V 14.1000000 24.30866 -10.208656 38.408656
40.4 Relative Efficiency
Recall the fixed effects ANOVA:
> anova(barley.lm)
Analysis of Variance Table
Response: Y2
Df Sum Sq Mean Sq F value Pr(>F)
Loc 5 10285.0 2056.99 10.3901 5.049e-05 ***
Var 4 2845.2 711.29 3.5928 0.02306 *
Residuals 20 3959.5 197.98
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This is our estimate of what the error variance would be in a CRD from the same kind of data we used in our RCB.
Noting that 518 is a lot more than 198, our blocking has been successful.
> (5*2056.99+24*197.98)/29
[1] 518.499
Here is the estimate of relative efficiency. Mostly it is a ratio of the MSE in the CRD (estimated) to the MSE in the RCB. In addition, there is a degrees of freedom correction term that adjusts for the loss of degrees of freedom in the RCB. This term is usually pretty close to 1. Here it is .98.
> (21*28/23/26) * 518.499/197.98
[1] 2.575151
The RE says that we would have needed a sample size 2.6 times as large to have the same power in a CRD as we achieved with blocking. That is very worthwhile.