Chapter 22 Standard Analysis of Linear Contrasts

Linear contrasts are extremely important in the analysis of experimental data, sufficiently important that they have been implemented in several packages. Here we will investigate:

  • linear.contrast() from the cfcdae package
  • contrast from the emmeans package
  • glht() from the multcomp package

Note that several other R packages implement contrasts in various ways.

22.1 Preliminaries

All of the functions to do contrasts work on a linear model with one or more factor variables as predictors. So let’s set up a couple of model fits using the rat liver weight data and the resin lifetime data.
> library(cfcdae)
> data(RatLiverWeight)
> fit.RLW <- lm(weight~diet,data=RatLiverWeight)
> coef(fit.RLW)
(Intercept)       diet1       diet2       diet3 
 3.71163690  0.03407738 -0.13163690 -0.11330357 
> data(ResinLifetimes)
> fit.RL <- lm(logTime ~ temp, data=ResinLifetimes)
> coef(fit.RL)
(Intercept)       temp1       temp2       temp3       temp4 
 1.43794048  0.49455952  0.19080952 -0.06044048 -0.24365476 

We will investigate the following contrasts for the rat liver weights:

  • (1/3,1/3,1/3,-1) This compares the average response of the first three treatments (manufacturer 1) to the average response of the fourth treatment (manufacturer 2).
  • (1,-.5,-.5,0) Within manufacturer 1, this compares the average response of the standard rations to the response of the premium ration.
  • (1,0,0,-1), (0,1,0,-1), (0,0,1,-1) These individually compare each ration from manufacturer 1 to the ration from manufacturer 2.

We will investigate the following contrasts for the resin lifetime data:

  • (-1,1,0,0,0) Compare the lifetimes for the first two levels of temperature. This is an example of a pairwise comparison, which simply compares two treatments.
  • (2,-1,-2,-1,2) If there is a quadratic effect of temperature, this contrast should be non-zero.
  • (1,-4,6,-4,1) If there is a quartic effect of temperature, this contrast should be non-zero.

Note that the coefficients for the quadratic and quartic contrasts would yield sums of squares equal to the corresponding orthogonal polynomials if the treatments were equally spaced and all the \(n_i\) were equal. Those conditions are not met, so these contrasts are just approximations of the true quadratic and quartic contrasts.

22.2 Using linear.contrast()

linear.contrast() (from package cfcdae) computes linear contrasts of treatment effects. Its first argument is the linear model object, the second is the variable (factor) for which you want to make a contrast (you must specify it even if there is only one factor in the model), and the third is the contrast coefficients. There must be one coefficient for every level of the factor and the coefficients must add to 0.

The value of linear.contrast() is an estimate of the contrast, its standard error, t-value, p-value for testing the null hypothesis that the contrast is zero with two-sided alternative, and a confidence interval.

When we compare manufacturers, we see an average difference of .281 percent between manufacturer 1 and 2; this is reasonably significant with a p-value under .003 (two-sided):
> linear.contrast(fit.RLW,diet,c(1/3,1/3,1/3,-1))
   estimates         se   t-value    p-value   lower-ci   upper-ci
1 -0.2811508 0.08467399 -3.320392 0.00276231 -0.4555401 -0.1067615
attr(,"class")
[1] "linear.contrast"
On the other hand, when we compare premium to standard rations, there is no evidence of a difference (p-value .11 and confidence interval spanning 0):
> linear.contrast(fit.RLW,diet,c(1,-.5,-.5,0))
  estimates         se  t-value   p-value    lower-ci  upper-ci
1 0.1565476 0.09448756 1.656807 0.1100575 -0.03805315 0.3511484
attr(,"class")
[1] "linear.contrast"
When we compare each manufacturer 1 ration (premium) to the manufacturer 2 ration we see no evidence that ration 1 is different from manufacturer 2 (p-value .11), but there is reasonable evidence that the standard rations (2 and 3) are about .32–.34 percent different from manufacturer 2.
> linear.contrast(fit.RLW,diet,c(1,0,0,-1))
   estimates        se   t-value   p-value   lower-ci   upper-ci
1 -0.1767857 0.1052754 -1.679269 0.1055566 -0.3936044 0.04003301
attr(,"class")
[1] "linear.contrast"
> linear.contrast(fit.RLW,diet,c(0,1,0,-1))
  estimates        se  t-value     p-value   lower-ci   upper-ci
1   -0.3425 0.1017057 -3.36756 0.002457339 -0.5519668 -0.1330332
attr(,"class")
[1] "linear.contrast"
> linear.contrast(fit.RLW,diet,c(0,0,1,-1))
   estimates        se   t-value    p-value   lower-ci    upper-ci
1 -0.3241667 0.1098547 -2.950867 0.00679258 -0.5504167 -0.09791667
attr(,"class")
[1] "linear.contrast"
You can combine multiple contrasts into one call to linear.contrast() if you put the coefficents in a matrix, with each column of the matrix corresponding to a contrast. Here cbind() joins vectors (as columns) into a matrix.
> mat.cfs <- cbind(c(1,0,0,-1), c(0,1,0,-1), c(0,0,1,-1))
> mat.cfs
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1
[4,]   -1   -1   -1
> linear.contrast(fit.RLW,diet,mat.cfs)
   estimates        se   t-value     p-value   lower-ci    upper-ci
1 -0.1767857 0.1052754 -1.679269 0.105556638 -0.3936044  0.04003301
2 -0.3425000 0.1017057 -3.367560 0.002457339 -0.5519668 -0.13303321
3 -0.3241667 0.1098547 -2.950867 0.006792580 -0.5504167 -0.09791667
attr(,"class")
[1] "linear.contrast"
In fact, you can ask linear.contrast() to test the joint null hypothesis that all of the contrasts are simultaneously zero against the alternative that one or more of them is non-zero by using the optional joint= argument. For the set of contrasts in mat.cfs, all of the contrasts being zero implies all the means are the same (single mean model); allowing them to be non-zero implies that the means can all be different (separate means model). Thus testing these three contrasts jointly is equivalent to comparing the single mean and separate means models.
> linear.contrast(fit.RLW,diet,mat.cfs,joint=TRUE)
$estimates
   estimates        se   t-value     p-value   lower-ci    upper-ci
1 -0.1767857 0.1052754 -1.679269 0.105556638 -0.3936044  0.04003301
2 -0.3425000 0.1017057 -3.367560 0.002457339 -0.5519668 -0.13303321
3 -0.3241667 0.1098547 -2.950867 0.006792580 -0.5504167 -0.09791667

$Ftest
        F df1 df2    p-value
 4.658146   3  25 0.01015794

attr(,"class")
[1] "linear.contrast"
> anova(fit.RLW)
Analysis of Variance Table

Response: weight
          Df  Sum Sq  Mean Sq F value  Pr(>F)  
diet       3 0.57821 0.192736  4.6581 0.01016 *
Residuals 25 1.03440 0.041376                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
More generally, the numerator degrees of freedom for the F-test will be at least one and no more than the number of contrasts; it could be in between these values if some of the contrasts are linearly dependent. Thus, if you put in redundant contrasts, linear.contrast() will figure that out for any joint test. Here the last contrast is the average of the first three.
> mat2.cfs <- cbind(mat.cfs,c(1/3,1/3,1/3,-1))
> mat2.cfs
     [,1] [,2] [,3]       [,4]
[1,]    1    0    0  0.3333333
[2,]    0    1    0  0.3333333
[3,]    0    0    1  0.3333333
[4,]   -1   -1   -1 -1.0000000
> linear.contrast(fit.RLW,diet,mat2.cfs,joint=TRUE)$Ftest
        F df1 df2    p-value
 4.658146   3  25 0.01015794
For the resin lifetime data, we have three contrasts of interest. Let’s put them together into a matrix and check them out. We see that there is very strong evidence that the first two treatments do not have the same mean, and pretty strong evidence that the quadratic shape is present (p-value .003). However, there is no evidence for the quartic shape.
> rl.cfs <- cbind(c(-1,1,0,0,0), c(2,-1,-2,-1,2), c(1,-4,6,-4,1))
> colnames(rl.cfs) <- c("1 vs 2", "quad", "quartic")
> rl.cfs
     1 vs 2 quad quartic
[1,]     -1    2       1
[2,]      1   -1      -4
[3,]      0   -2       6
[4,]      0   -1      -4
[5,]      0    2       1
> linear.contrast(fit.RL,temp,rl.cfs)
    estimates         se    t-value      p-value   lower-ci   upper-ci
1 -0.30375000 0.04790063 -6.3412521 4.056402e-07 -0.4013204 -0.2061796
2  0.40029762 0.13324726  3.0041714 5.139664e-03  0.1288818  0.6717134
3 -0.03797619 0.28863670 -0.1315709 8.961475e-01 -0.6259099  0.5499575
attr(,"class")
[1] "linear.contrast"

For comparison, recall that R can use the parameterization where the first treatment effect is zero. That means that the second treatment effect would be the level 2 mean minus the level 1 mean. That should be the same as the contrast we just did. Here we force the “first effect 0” via the contrast= argument, and then see the level 2 treatment effect in the summary matching what we saw in our contrast above.

> fit.RL.trt1 <- lm(logTime~temp,data=ResinLifetimes,
+            contrasts=list(temp=contr.treatment))
> summary(fit.RL.trt1)

Call:
lm(formula = logTime ~ temp, data = ResinLifetimes, contrasts = list(temp = contr.treatment))

Residuals:
     Min       1Q   Median       3Q      Max 
-0.22667 -0.03667  0.00250  0.03125  0.20333 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.93250    0.03387  57.055  < 2e-16 ***
temp2       -0.30375    0.04790  -6.341 4.06e-07 ***
temp3       -0.55500    0.04790 -11.586 5.49e-13 ***
temp4       -0.73821    0.04958 -14.889 6.13e-16 ***
temp5       -0.87583    0.05174 -16.928  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0958 on 32 degrees of freedom
Multiple R-squared:  0.9233,    Adjusted R-squared:  0.9138 
F-statistic: 96.36 on 4 and 32 DF,  p-value: < 2.2e-16

linear.contrast() has a number of other arguments that you might find useful. For example, if you give the optional argument allpairs=TRUE, then you don’t need to give your own coefficients, and the function will take differences of all pairs. If you want to compare all treatments to a particular treatment, say treatment k, then you can use the optional argument controlpairs=k, and again you don’t need to specify your own contrasts. You can also change the confidence level on the intervals with the confidence=x optional argument.

> linear.contrast(fit.RLW,diet,allpairs=TRUE,confidence=.99)
        estimates        se    t-value     p-value   lower-ci    upper-ci
1 - 2  0.16571429 0.1052754  1.5741028 0.128035240 -0.1277341  0.45916268
1 - 3  0.14738095 0.1131676  1.3023241 0.204676468 -0.1680666  0.46282850
1 - 4 -0.17678571 0.1052754 -1.6792691 0.105556638 -0.4702341  0.11666268
2 - 3 -0.01833333 0.1098547 -0.1668871 0.868801395 -0.3245463  0.28787960
2 - 4 -0.34250000 0.1017057 -3.3675598 0.002457339 -0.6259981 -0.05900191
3 - 4 -0.32416667 0.1098547 -2.9508675 0.006792580 -0.6303796 -0.01795374
attr(,"class")
[1] "linear.contrast"
> linear.contrast(fit.RLW,diet,controlpairs=4,confidence=.99)
       estimates        se   t-value     p-value   lower-ci    upper-ci
1 - 4 -0.1767857 0.1052754 -1.679269 0.105556638 -0.4702341  0.11666268
2 - 4 -0.3425000 0.1017057 -3.367560 0.002457339 -0.6259981 -0.05900191
3 - 4 -0.3241667 0.1098547 -2.950867 0.006792580 -0.6303796 -0.01795374
attr(,"class")
[1] "linear.contrast"

22.3 Using emmeans::contrast()

emmeans is short for estimated marginal means, sometimes called least squares means. You first fit the linear model, then you fit the estimated marginal means (emm) object, then you do the contrast(s). Needing an extra step is a bit of a bother, but the compensation is that the emm object is useful in its own right, and contrast() applied to the emm object is more capable than linear.contrast().

You use emmeans() to create the emm object. The first argument is the fitted linear model, and the second argument is the right hand side of a model giving the factor (margin) over which to compute means. This will make much more sense when we have more than one factor. Simply printing the emm object gives you the estimated treatment means along with the SE and a confidence interval.

> library(emmeans) # you may also need to install emmeans
> RLW.emm <- emmeans(fit.RLW,~diet)
> RLW.emm
 diet emmean     SE df lower.CL upper.CL
 1      3.75 0.0769 25     3.59     3.90
 2      3.58 0.0719 25     3.43     3.73
 3      3.60 0.0830 25     3.43     3.77
 4      3.92 0.0719 25     3.77     4.07

Confidence level used: 0.95 
> RL.emm <- emmeans(fit.RL,~temp)
> RL.emm
 temp emmean     SE df lower.CL upper.CL
 175    1.93 0.0339 32    1.864     2.00
 194    1.63 0.0339 32    1.560     1.70
 213    1.38 0.0339 32    1.309     1.45
 231    1.19 0.0362 32    1.121     1.27
 250    1.06 0.0391 32    0.977     1.14

Confidence level used: 0.95 
Contrast coefficients should be in a named list. You can have a single contrast, or you can put several contrasts in a matrix (with each column being a separate contrast). Note that contrast() will include column labels when labeling the contrasts.
> contrast(RLW.emm,list(diet=mat.cfs))
 contrast estimate    SE df t.ratio p.value
 diet.1     -0.177 0.105 25  -1.679  0.1056
 diet.2     -0.343 0.102 25  -3.368  0.0025
 diet.3     -0.324 0.110 25  -2.951  0.0068
> contrast(RL.emm,list(temp=rl.cfs))
 contrast     estimate     SE df t.ratio p.value
 temp.1 vs 2    -0.304 0.0479 32  -6.341  <.0001
 temp.quad       0.400 0.1332 32   3.004  0.0051
 temp.quartic   -0.038 0.2886 32  -0.132  0.8961
You can also select pairwise or compare to control.
> contrast(RLW.emm,method="pairwise",adjust="none")
 contrast      estimate    SE df t.ratio p.value
 diet1 - diet2   0.1657 0.105 25   1.574  0.1280
 diet1 - diet3   0.1474 0.113 25   1.302  0.2047
 diet1 - diet4  -0.1768 0.105 25  -1.679  0.1056
 diet2 - diet3  -0.0183 0.110 25  -0.167  0.8688
 diet2 - diet4  -0.3425 0.102 25  -3.368  0.0025
 diet3 - diet4  -0.3242 0.110 25  -2.951  0.0068
> contrast(RLW.emm,method="trt.vs.ctrl1",ref=4,adjust="none")
 contrast      estimate    SE df t.ratio p.value
 diet1 - diet4   -0.177 0.105 25  -1.679  0.1056
 diet2 - diet4   -0.343 0.102 25  -3.368  0.0025
 diet3 - diet4   -0.324 0.110 25  -2.951  0.0068

The trt.vs.ctrl1 with ref=4 says to compare each treatment to the fourth treatment. adjust="none" says to make no adjustment to the p-values for multiple comparisons.

22.4 Using multcomp::glht()

glht() stands for general linear hypothesis test. Basically it does inference for linear combinations of parameters. Contrasts are one form of linear combination, so we can use glht().

To use glht() you must specify a linear combination of parameters, a null value, and an alternative. The defaults for the null and alternative are 0 and two-sided, and we’ll just use those.

The linear combinations to use with glht() depend on the parameterization. In our rat liver weight example, the coefficients are the overall mean and the effects for treatments 1, 2, and 3. We are using the “sum to 0” parameterization, so the effect for treatment 4 is minus the sum of the effects for treatments 1, 2, and 3. As a linear combination of coefficients, \(\alpha_1\) uses (0,1,0,0), and \(\alpha_4\) uses (0,-1,-1,-1). To get the difference of treatment 1 and treatment 4 (1 minus 4) we would need to use (0,1,0,0)-(0,-1,-1,-1)=(0,2,1,1). Similarly, 2 versus 4 and 3 versus 4 are (0,1,2,1) and (0,1,1,2). If you use a different parameterization in your model, you need to use different linear combinations.

In glht() the linear coefficients must be the rows of a matrix. This output ought to look pretty familiar by now.
> library(multcomp)
Loading required package: mvtnorm
Loading required package: survival
Loading required package: TH.data
Loading required package: MASS

Attaching package: 'TH.data'
The following object is masked from 'package:MASS':

    geyser
> glht.out1 <- glht(fit.RLW,t(c(0,2,1,1)))
> summary(glht.out1)

     Simultaneous Tests for General Linear Hypotheses

Fit: lm(formula = weight ~ diet, data = RatLiverWeight)

Linear Hypotheses:
       Estimate Std. Error t value Pr(>|t|)
1 == 0  -0.1768     0.1053  -1.679    0.106
(Adjusted p values reported -- single-step method)
> glht.out2 <- glht(fit.RLW,t(c(0,1,2,1)))
> summary(glht.out2)

     Simultaneous Tests for General Linear Hypotheses

Fit: lm(formula = weight ~ diet, data = RatLiverWeight)

Linear Hypotheses:
       Estimate Std. Error t value Pr(>|t|)   
1 == 0  -0.3425     0.1017  -3.368  0.00246 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

What linear.contrast() does is essentially figure out the modified coefficients for you, and then do something very much like what glht() does.

You can use a matrix with more than one row, but glht() adjusts p-values in a way that we have not studied. You can also have glht() do all pairwise comparisons, but it also wants to make p-value adjustments we have not yet discussed.