Chapter 26 Pairwise Comparisons

Perhaps the most commonly seen use of multiple comparisons is to control the error rate when doing all pairwise comparisons in an experiment. You could use the Bonferroni-type adjustments discussed earlier, but several other techniques have been developed specifically for the pairwise situation, and they will generally work a little better when they can be used.

The main requirement is a function that facilitates doing all the pairwise comparison along with options that allow you to control different error rate. This chapter discusses three approaches.

We will illustrate with the cheese amino acid data set, which has four treatments (bacteria strain A added to the cheese, strain B added, both strains added, neither strain added).
> library(cfcdae)
> library(emmeans)
> data("CheeseAminoAcid")
> CheeseAminoAcid
  strain freeAminoAcid
1   none         4.195
2   none         4.175
3      A         4.125
4      A         4.735
5      B         4.865
6      B         5.745
7     AB         6.155
8     AB         6.488
> cheese.separate <- lm(freeAminoAcid~strain,data=CheeseAminoAcid)
> anova(cheese.separate)
Analysis of Variance Table

Response: freeAminoAcid
          Df Sum Sq Mean Sq F value Pr(>F)  
strain     3 5.6279 1.87595  11.932 0.0183 *
Residuals  4 0.6289 0.15722                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> model.effects(cheese.separate,"strain")
        A        AB         B      none 
-0.630375  1.261125  0.244625 -0.875375 

26.1 pairwise() in cfcdae

There is a pairwise() function in cfcdae that does all pairwise comparisons. You can use several different methods that control a variety of error rates. There are also a couple of options for visualizing the results. The basic setup is

pairwise(linear.model.fit,factor.name,type=control.method)

The linear.model.fit is the output of lm(); the factor.name is the factor across the levels of which we wish to do pairwise comparisons; the control.method is a character string selecting the type of adjustments to make. The choices are

  • “hsd” (the default) Use the Tukey Honest Significant Difference. This provides simultaneous confidence intervals and controls the strong familywise error rate.
  • “bsd” Use the Bonferroni adjustment. This provides simultaneous confidence intervals and controls the strong familywise error rate, but it is usually not as powerful as “hsd”.
  • “regwr” Use the Ryan-Einot-Garbriel-Welsch adjustment. This controls the strong familywise error rate.
  • “snk” Use Student Neuman Keuls. This is thought to control the false discovery rate.
  • “lsd” Use Least Significant Difference. This does not adjust the nominal p-values.
A typical use might be:
> pairwise.tukey <- pairwise(cheese.separate,strain)
> # or pairwise(cheese.separate,strain,type="hsd",confidence=.95)

This produces simultaneous confidence intervals for all pairwise differences.

You may view the results by simply printing them output. For each pair, you get the estimated difference, the “significant difference” (that is, the amount by which the pairs would need to differ to reject the null hypothesis that they are the same), and a confidence interval. The default coverage is 95%, but that can be changed via the optional confidence argument (e.g., confidence=.99).

As you move to controlling weaker error rates, the number of significant differences increases and the confidence intervals shorten.
> pairwise.regwr <- pairwise(cheese.separate,strain,type="regwr")
> pairwise.snk <- pairwise(cheese.separate,strain,type="snk")
> pairwise.lsd <- pairwise(cheese.separate,strain,type="lsd")
> pairwise.tukey
                                       
Pairwise comparisons ( hsd ) of strain 
            estimate signif diff      lower      upper
* A - AB     -1.8915    1.614152 -3.5056524 -0.2773476
  A - B      -0.8750    1.614152 -2.4891524  0.7391524
  A - none    0.2450    1.614152 -1.3691524  1.8591524
  AB - B      1.0165    1.614152 -0.5976524  2.6306524
* AB - none   2.1365    1.614152  0.5223476  3.7506524
  B - none    1.1200    1.614152 -0.4941524  2.7341524
> pairwise.regwr
                                         
Pairwise comparisons ( regwr ) of strain 
            estimate signif diff      lower      upper
* A - AB     -1.8915    1.413173 -3.3046727 -0.4783273
  A - B      -0.8750    1.413173 -2.2881727  0.5381727
  A - none    0.2450    1.385992 -1.1409922  1.6309922
  AB - B      1.0165    1.385992 -0.3694922  2.4024922
* AB - none   2.1365    1.614152  0.5223476  3.7506524
  B - none    1.1200    1.413173 -0.2931727  2.5331727
> pairwise.snk
                                       
Pairwise comparisons ( snk ) of strain 
            estimate signif diff       lower      upper
* A - AB     -1.8915    1.413173 -3.30467267 -0.4783273
  A - B      -0.8750    1.413173 -2.28817267  0.5381727
  A - none    0.2450    1.100905 -0.85590508  1.3459051
  AB - B      1.0165    1.100905 -0.08440508  2.1174051
* AB - none   2.1365    1.614152  0.52234758  3.7506524
  B - none    1.1200    1.413173 -0.29317267  2.5331727
> pairwise.lsd
                                       
Pairwise comparisons ( lsd ) of strain 
            estimate signif diff       lower      upper
* A - AB     -1.8915      1.1009 -2.99240031 -0.7905997
  A - B      -0.8750      1.1009 -1.97590031  0.2259003
  A - none    0.2450      1.1009 -0.85590031  1.3459003
  AB - B      1.0165      1.1009 -0.08440031  2.1174003
* AB - none   2.1365      1.1009  1.03559969  3.2374003
* B - none    1.1200      1.1009  0.01909969  2.2209003

Notice that the signficant differences and lengths of confidence intervals are the same for “hse” and “lsd” but differ for “regwr” and “snk”. (If we had different numbers of data in each group, these would also differ for “hsd” and “lsd”.)

In addition to printing all the details, the results can be summarized down to simply significant/non-significant and presented graphically using vertical lines (rather than the more common horizontal underlines). This summarization can be particularly helpful when there are many comparisons.
> sidelines(pairwise.tukey)
               
none -0.875 |  
A    -0.630 |  
B     0.245 | |
AB    1.261   |
> sidelines(pairwise.lsd)
                 
none -0.875 |    
A    -0.630 | |  
B     0.245   | |
AB    1.261     |

Two treatments that do not share a vertical bar are significantly different.

You can also view the graphically:
> plot(pairwise.regwr)

Each diagonal line represents a comparison. Non-significant comparisons are printed in black and boxed by a gray square showing how far apart the pair would need to be to be significant. Significant comparisons are printed in red, with little gray circles added to show the “significant difference” for that comparison.

26.2 pairs() in package emmeans

The pairs() function operates on an emm object.
> cheese.emm <- emmeans(cheese.separate, ~strain)
> pairs.tukey <- pairs(cheese.emm,adjust="tukey")
> # or contrast(cheese.emm,method="pairwise",adjust="tukey")
> pairs.tukey
 contrast  estimate    SE df t.ratio p.value
 A - AB      -1.891 0.397  4  -4.770  0.0296
 A - B       -0.875 0.397  4  -2.207  0.2637
 A - none     0.245 0.397  4   0.618  0.9212
 AB - B       1.016 0.397  4   2.564  0.1871
 AB - none    2.136 0.397  4   5.388  0.0194
 B - none     1.120 0.397  4   2.825  0.1463

P value adjustment: tukey method for comparing a family of 4 estimates 
Instead of a t-statistic and a p-value, you can get a confidence interval (change the coverage with the level= argument).
> confint(pairs.tukey) # default uses level=.95
 contrast  estimate    SE df lower.CL upper.CL
 A - AB      -1.891 0.397  4   -3.506   -0.277
 A - B       -0.875 0.397  4   -2.489    0.739
 A - none     0.245 0.397  4   -1.369    1.859
 AB - B       1.016 0.397  4   -0.598    2.631
 AB - none    2.136 0.397  4    0.522    3.751
 B - none     1.120 0.397  4   -0.494    2.734

Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 4 estimates 

The options for p-value adjustment include: “tukey”, “holm”, “hochberg”, “BH”, “BY”, and “none”. The middle four are the same p-value adjustments we considered when we talked about Bonferroni-style adjustments. For example,

> pairs.BH <- pairs(cheese.emm,adjust="BH")
> pairs.BH
 contrast  estimate    SE df t.ratio p.value
 A - AB      -1.891 0.397  4  -4.770  0.0265
 A - B       -0.875 0.397  4  -2.207  0.1104
 A - none     0.245 0.397  4   0.618  0.5701
 AB - B       1.016 0.397  4   2.564  0.0936
 AB - none    2.136 0.397  4   5.388  0.0265
 B - none     1.120 0.397  4   2.825  0.0936

P value adjustment: BH method for 6 tests 
> pairs.holm <- pairs(cheese.emm,adjust="holm")
> pairs.holm
 contrast  estimate    SE df t.ratio p.value
 A - AB      -1.891 0.397  4  -4.770  0.0442
 A - B       -0.875 0.397  4  -2.207  0.1904
 A - none     0.245 0.397  4   0.618  0.5701
 AB - B       1.016 0.397  4   2.564  0.1904
 AB - none    2.136 0.397  4   5.388  0.0344
 B - none     1.120 0.397  4   2.825  0.1904

P value adjustment: holm method for 6 tests 
The plot for the output of pairs() shows each comparison as a horizontal confidence interval.
> plot(pairs.tukey)

26.3 R Builtins

There are also two functions builtin to R that are useful. The pairwise.t.test() function takes three arguments: a data vector, a grouping factor, and a p-value adjustment method (one of “holm”, “bdr”, or “none”, among others). The output is a matrix of (adjusted) p-values.
> with(CheeseAminoAcid, pairwise.t.test(freeAminoAcid,strain,p.adjust="fdr"))

    Pairwise comparisons using t tests with pooled SD 

data:  freeAminoAcid and strain 

     A     AB    B    
AB   0.027 -     -    
B    0.110 0.094 -    
none 0.570 0.027 0.094

P value adjustment method: fdr 
You may have noted the curious lack of a Tukey HSD method for pairwise.t.test(). There is a separate TukeyHSD() function, but it works on the output of aov(), not lm().
> aov.out <- aov(freeAminoAcid~strain,data=CheeseAminoAcid)
> TukeyHSD(aov.out)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = freeAminoAcid ~ strain, data = CheeseAminoAcid)

$strain
           diff        lwr        upr     p adj
AB-A     1.8915  0.2773476  3.5056524 0.0296048
B-A      0.8750 -0.7391524  2.4891524 0.2636576
none-A  -0.2450 -1.8591524  1.3691524 0.9212486
B-AB    -1.0165 -2.6306524  0.5976524 0.1870664
none-AB -2.1365 -3.7506524 -0.5223476 0.0194380
none-B  -1.1200 -2.7341524  0.4941524 0.1462787