Chapter 40 Randomized Complete Blocks

40.1 Immer data

One early and famous example of a Randomized Complete Block analyzed by Fisher involves five varieties of barley grown at six locations (which included Crookston, Waseca, …, data from Minnesota!). The original experiment had two years of data; we will only look at the second year. This data set is included in the MASS package.

> library(cfcdae)
Registered S3 method overwritten by 'DoE.base':
  method           from       
  factorize.factor conf.design
> library(MASS)
> library(lme4)
Loading required package: Matrix
> library(car)
Loading required package: carData
> immer
   Loc Var    Y1    Y2
1   UF   M  81.0  80.7
2   UF   S 105.4  82.3
3   UF   V 119.7  80.4
4   UF   T 109.7  87.2
5   UF   P  98.3  84.2
6    W   M 146.6 100.4
7    W   S 142.0 115.5
8    W   V 150.7 112.2
9    W   T 191.5 147.7
10   W   P 145.7 108.1
11   M   M  82.3 103.1
12   M   S  77.3 105.1
13   M   V  78.4 116.5
14   M   T 131.3 139.9
15   M   P  89.6 129.6
16   C   M 119.8  98.9
17   C   S 121.4  61.9
18   C   V 124.0  96.2
19   C   T 140.8 125.5
20   C   P 124.8  75.7
21  GR   M  98.9  66.4
22  GR   S  89.0  49.9
23  GR   V  69.1  96.7
24  GR   T  89.3  61.9
25  GR   P 104.1  80.3
26   D   M  86.9  67.7
27   D   S  77.1  66.7
28   D   V  78.9  67.4
29   D   T 101.8  91.8
30   D   P  96.0  94.1

Here are the results separately by block. There are large block to block differences (and all of the treatment to treatment differences are within each boxplot), so blocking should be a real help.

> boxplot(Y2~Loc,data=immer,main="Barley by location")
Boxplot of barley by location

Figure 40.1: Boxplot of barley by location

Here are the results separately by treatment. We can maybe see some treatment differences, but we also see a lot of variation within each treatment. Much of this is block to block variability that would show up in our error if we hadn’t blocked.

> boxplot(Y2~Var,data=immer,main="Barley by variety")
Boxplot of barley by variety

Figure 40.2: Boxplot of barley by variety

40.2 As Fixed Effects

Fit as fixed blocks first. Get into the habit of treatments after blocks (which is the order you’ll need if you have missing data).

> barley.lm <- lm(Y2~Loc+Var,data=immer)

Residuals aren’t too bad, although there is just a hint of a dip in the middle.

> plot(barley.lm,which=1)
Residual plot of barley data

Figure 40.3: Residual plot of barley data

Normality looks good.

> plot(barley.lm,which=2)
NPP of barley data residuals

Figure 40.4: NPP of barley data residuals

Here is the ANOVA, we just enter treatments after blocks. Variety effects are modestly significant. We do not test the block effects (even though R prints an F, it doesn’t know about blocking factors).

> anova(barley.lm)
Analysis of Variance Table

Response: Y2
          Df  Sum Sq Mean Sq F value    Pr(>F)    
Loc        5 10285.0 2056.99 10.3901 5.049e-05 ***
Var        4  2845.2  711.29  3.5928   0.02306 *  
Residuals 20  3959.5  197.98                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You can do all the usual things: summaries, pairwise comparisons, marginal means, etc.

Varieties M and S have low yields, variety T has a high yield, and the others are in the middle. Again, ignore ``testing” related to block effects.

> summary(barley.lm)

Call:
lm(formula = Y2 ~ Loc + Var, data = immer)

Residuals:
    Min      1Q  Median      3Q     Max 
-25.007  -8.665  -0.900   8.185  23.893 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   93.133      2.569  36.254  < 2e-16 ***
Loc1          -1.493      5.744  -0.260 0.797543    
Loc2         -15.593      5.744  -2.715 0.013344 *  
Loc3         -22.093      5.744  -3.846 0.001008 ** 
Loc4          25.707      5.744   4.475 0.000232 ***
Loc5         -10.173      5.744  -1.771 0.091790 .  
Var1          -6.933      5.138  -1.349 0.192261    
Var2           2.200      5.138   0.428 0.673081    
Var3         -12.900      5.138  -2.511 0.020748 *  
Var4          15.867      5.138   3.088 0.005797 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 14.07 on 20 degrees of freedom
Multiple R-squared:  0.7683,    Adjusted R-squared:  0.664 
F-statistic: 7.369 on 9 and 20 DF,  p-value: 0.0001069
> model.effects(barley.lm,"Var")
         M          P          S          T          V 
 -6.933333   2.200000 -12.900000  15.866667   1.766667 

Using HSD, only S and T seem to differ, the others cannot be distinguished.

> pairwise(barley.lm,Var)
                                    
Pairwise comparisons ( hsd ) of Var 
           estimate signif diff      lower     upper
  M - P  -9.1333333    24.30866 -33.441989 15.175322
  M - S   5.9666667    24.30866 -18.341989 30.275322
  M - T -22.8000000    24.30866 -47.108656  1.508656
  M - V  -8.7000000    24.30866 -33.008656 15.608656
  P - S  15.1000000    24.30866  -9.208656 39.408656
  P - T -13.6666667    24.30866 -37.975322 10.641989
  P - V   0.4333333    24.30866 -23.875322 24.741989
* S - T -28.7666667    24.30866 -53.075322 -4.458011
  S - V -14.6666667    24.30866 -38.975322  9.641989
  T - V  14.1000000    24.30866 -10.208656 38.408656

“Old school” estimate of block variation based on barley.lm (will agree with REML).

> (2056.99-197.98)/5
[1] 371.802

40.3 As Random Effects

Maybe you think the blocks are supposed to be representative of all of Minnesota; then we would treat them as random.

> barley.lmer <- lmer(Y2~(1|Loc)+Var,data=immer)

For this simple model and balanced data, we get all the same results:

> summary(barley.lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: Y2 ~ (1 | Loc) + Var
   Data: immer

REML criterion at convergence: 227

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.92838 -0.49474 -0.05208  0.72439  1.54701 

Random effects:
 Groups   Name        Variance Std.Dev.
 Loc      (Intercept) 371.8    19.28   
 Residual             198.0    14.07   
Number of obs: 30, groups:  Loc, 6

Fixed effects:
            Estimate Std. Error t value
(Intercept)   93.133      8.280  11.247
Var1          -6.933      5.138  -1.349
Var2           2.200      5.138   0.428
Var3         -12.900      5.138  -2.511
Var4          15.867      5.138   3.088

Correlation of Fixed Effects:
     (Intr) Var1   Var2   Var3  
Var1  0.000                     
Var2  0.000 -0.250              
Var3  0.000 -0.250 -0.250       
Var4  0.000 -0.250 -0.250 -0.250
> Anova(barley.lmer,test="F")
Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df)

Response: Y2
         F Df Df.res  Pr(>F)  
Var 3.5928  4     20 0.02306 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> pairwise(barley.lmer,Var)
                                    
Pairwise comparisons ( hsd ) of Var 
           estimate signif diff      lower     upper
  M - P  -9.1333333    24.30866 -33.441989 15.175322
  M - S   5.9666667    24.30866 -18.341989 30.275322
  M - T -22.8000000    24.30866 -47.108656  1.508656
  M - V  -8.7000000    24.30866 -33.008656 15.608656
  P - S  15.1000000    24.30866  -9.208656 39.408656
  P - T -13.6666667    24.30866 -37.975322 10.641989
  P - V   0.4333333    24.30866 -23.875322 24.741989
* S - T -28.7666667    24.30866 -53.075322 -4.458011
  S - V -14.6666667    24.30866 -38.975322  9.641989
  T - V  14.1000000    24.30866 -10.208656 38.408656

40.4 Relative Efficiency

Recall the fixed effects ANOVA:

> anova(barley.lm)
Analysis of Variance Table

Response: Y2
          Df  Sum Sq Mean Sq F value    Pr(>F)    
Loc        5 10285.0 2056.99 10.3901 5.049e-05 ***
Var        4  2845.2  711.29  3.5928   0.02306 *  
Residuals 20  3959.5  197.98                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This is our estimate of what the error variance would be in a CRD from the same kind of data we used in our RCB.

Noting that 518 is a lot more than 198, our blocking has been successful.

> (5*2056.99+24*197.98)/29
[1] 518.499

Here is the estimate of relative efficiency. Mostly it is a ratio of the MSE in the CRD (estimated) to the MSE in the RCB. In addition, there is a degrees of freedom correction term that adjusts for the loss of degrees of freedom in the RCB. This term is usually pretty close to 1. Here it is .98.

> (21*28/23/26) * 518.499/197.98
[1] 2.575151

The RE says that we would have needed a sample size 2.6 times as large to have the same power in a CRD as we achieved with blocking. That is very worthwhile.