Chapter 26 Pairwise Comparisons
Perhaps the most commonly seen use of multiple comparisons is to control the error rate when doing all pairwise comparisons in an experiment. You could use the Bonferroni-type adjustments discussed earlier, but several other techniques have been developed specifically for the pairwise situation, and they will generally work a little better when they can be used.
The main requirement is a function that facilitates doing all the pairwise comparison along with options that allow you to control different error rate. This chapter discusses three approaches.
We will illustrate with the cheese amino acid data set, which has four treatments (bacteria strain A added to the cheese, strain B added, both strains added, neither strain added).> library(cfcdae)
> library(emmeans)
> data("CheeseAminoAcid")
> CheeseAminoAcid
strain freeAminoAcid
1 none 4.195
2 none 4.175
3 A 4.125
4 A 4.735
5 B 4.865
6 B 5.745
7 AB 6.155
8 AB 6.488
> cheese.separate <- lm(freeAminoAcid~strain,data=CheeseAminoAcid)
> anova(cheese.separate)
Analysis of Variance Table
Response: freeAminoAcid
Df Sum Sq Mean Sq F value Pr(>F)
strain 3 5.6279 1.87595 11.932 0.0183 *
Residuals 4 0.6289 0.15722
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> model.effects(cheese.separate,"strain")
A AB B none
-0.630375 1.261125 0.244625 -0.875375
26.1 pairwise()
in cfcdae
There is a pairwise()
function in cfcdae
that does
all pairwise comparisons. You can use several different
methods that control a variety of error rates. There are
also a couple of options for visualizing the results.
The basic setup is
pairwise(linear.model.fit,factor.name,type=control.method)
The linear.model.fit
is the output of lm()
; the factor.name
is the factor across the levels of
which we wish to do pairwise comparisons;
the control.method
is a character string selecting
the type of adjustments to make. The choices are
- “hsd” (the default) Use the Tukey Honest Significant Difference. This provides simultaneous confidence intervals and controls the strong familywise error rate.
- “bsd” Use the Bonferroni adjustment. This provides simultaneous confidence intervals and controls the strong familywise error rate, but it is usually not as powerful as “hsd”.
- “regwr” Use the Ryan-Einot-Garbriel-Welsch adjustment. This controls the strong familywise error rate.
- “snk” Use Student Neuman Keuls. This is thought to control the false discovery rate.
- “lsd” Use Least Significant Difference. This does not adjust the nominal p-values.
> pairwise.tukey <- pairwise(cheese.separate,strain)
> # or pairwise(cheese.separate,strain,type="hsd",confidence=.95)
This produces simultaneous confidence intervals for all pairwise differences.
You may view the results by simply printing them
output. For each pair, you get the estimated difference,
the “significant difference” (that is, the amount by
which the pairs would need to differ to reject the
null hypothesis that they are the same), and a
confidence interval. The default coverage is 95%,
but that can be changed via the optional
confidence
argument (e.g., confidence=.99
).
> pairwise.regwr <- pairwise(cheese.separate,strain,type="regwr")
> pairwise.snk <- pairwise(cheese.separate,strain,type="snk")
> pairwise.lsd <- pairwise(cheese.separate,strain,type="lsd")
> pairwise.tukey
Pairwise comparisons ( hsd ) of strain
estimate signif diff lower upper
* A - AB -1.8915 1.614152 -3.5056524 -0.2773476
A - B -0.8750 1.614152 -2.4891524 0.7391524
A - none 0.2450 1.614152 -1.3691524 1.8591524
AB - B 1.0165 1.614152 -0.5976524 2.6306524
* AB - none 2.1365 1.614152 0.5223476 3.7506524
B - none 1.1200 1.614152 -0.4941524 2.7341524
> pairwise.regwr
Pairwise comparisons ( regwr ) of strain
estimate signif diff lower upper
* A - AB -1.8915 1.413173 -3.3046727 -0.4783273
A - B -0.8750 1.413173 -2.2881727 0.5381727
A - none 0.2450 1.385992 -1.1409922 1.6309922
AB - B 1.0165 1.385992 -0.3694922 2.4024922
* AB - none 2.1365 1.614152 0.5223476 3.7506524
B - none 1.1200 1.413173 -0.2931727 2.5331727
> pairwise.snk
Pairwise comparisons ( snk ) of strain
estimate signif diff lower upper
* A - AB -1.8915 1.413173 -3.30467267 -0.4783273
A - B -0.8750 1.413173 -2.28817267 0.5381727
A - none 0.2450 1.100905 -0.85590508 1.3459051
AB - B 1.0165 1.100905 -0.08440508 2.1174051
* AB - none 2.1365 1.614152 0.52234758 3.7506524
B - none 1.1200 1.413173 -0.29317267 2.5331727
> pairwise.lsd
Pairwise comparisons ( lsd ) of strain
estimate signif diff lower upper
* A - AB -1.8915 1.1009 -2.99240031 -0.7905997
A - B -0.8750 1.1009 -1.97590031 0.2259003
A - none 0.2450 1.1009 -0.85590031 1.3459003
AB - B 1.0165 1.1009 -0.08440031 2.1174003
* AB - none 2.1365 1.1009 1.03559969 3.2374003
* B - none 1.1200 1.1009 0.01909969 2.2209003
Notice that the signficant differences and lengths of confidence intervals are the same for “hse” and “lsd” but differ for “regwr” and “snk”. (If we had different numbers of data in each group, these would also differ for “hsd” and “lsd”.)
In addition to printing all the details, the results can be summarized down to simply significant/non-significant and presented graphically using vertical lines (rather than the more common horizontal underlines). This summarization can be particularly helpful when there are many comparisons.> sidelines(pairwise.tukey)
none -0.875 |
A -0.630 |
B 0.245 | |
AB 1.261 |
> sidelines(pairwise.lsd)
none -0.875 |
A -0.630 | |
B 0.245 | |
AB 1.261 |
Two treatments that do not share a vertical bar are significantly different.
You can also view the graphically:> plot(pairwise.regwr)

Each diagonal line represents a comparison. Non-significant comparisons are printed in black and boxed by a gray square showing how far apart the pair would need to be to be significant. Significant comparisons are printed in red, with little gray circles added to show the “significant difference” for that comparison.
26.2 pairs()
in package emmeans
The pairs()
function operates on an emm
object.
> cheese.emm <- emmeans(cheese.separate, ~strain)
> pairs.tukey <- pairs(cheese.emm,adjust="tukey")
> # or contrast(cheese.emm,method="pairwise",adjust="tukey")
> pairs.tukey
contrast estimate SE df t.ratio p.value
A - AB -1.891 0.397 4 -4.770 0.0296
A - B -0.875 0.397 4 -2.207 0.2637
A - none 0.245 0.397 4 0.618 0.9212
AB - B 1.016 0.397 4 2.564 0.1871
AB - none 2.136 0.397 4 5.388 0.0194
B - none 1.120 0.397 4 2.825 0.1463
P value adjustment: tukey method for comparing a family of 4 estimates
level=
argument).
> confint(pairs.tukey) # default uses level=.95
contrast estimate SE df lower.CL upper.CL
A - AB -1.891 0.397 4 -3.506 -0.277
A - B -0.875 0.397 4 -2.489 0.739
A - none 0.245 0.397 4 -1.369 1.859
AB - B 1.016 0.397 4 -0.598 2.631
AB - none 2.136 0.397 4 0.522 3.751
B - none 1.120 0.397 4 -0.494 2.734
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 4 estimates
The options for p-value adjustment include: “tukey”, “holm”, “hochberg”, “BH”, “BY”, and “none”. The middle four are the same p-value adjustments we considered when we talked about Bonferroni-style adjustments. For example,
> pairs.BH <- pairs(cheese.emm,adjust="BH")
> pairs.BH
contrast estimate SE df t.ratio p.value
A - AB -1.891 0.397 4 -4.770 0.0265
A - B -0.875 0.397 4 -2.207 0.1104
A - none 0.245 0.397 4 0.618 0.5701
AB - B 1.016 0.397 4 2.564 0.0936
AB - none 2.136 0.397 4 5.388 0.0265
B - none 1.120 0.397 4 2.825 0.0936
P value adjustment: BH method for 6 tests
> pairs.holm <- pairs(cheese.emm,adjust="holm")
> pairs.holm
contrast estimate SE df t.ratio p.value
A - AB -1.891 0.397 4 -4.770 0.0442
A - B -0.875 0.397 4 -2.207 0.1904
A - none 0.245 0.397 4 0.618 0.5701
AB - B 1.016 0.397 4 2.564 0.1904
AB - none 2.136 0.397 4 5.388 0.0344
B - none 1.120 0.397 4 2.825 0.1904
P value adjustment: holm method for 6 tests
pairs()
shows each comparison as a horizontal confidence interval.
> plot(pairs.tukey)

26.3 R Builtins
There are also two functions builtin to R that are useful. Thepairwise.t.test()
function takes three arguments: a data vector,
a grouping factor, and a p-value adjustment method (one of “holm”, “bdr”, or “none”,
among others). The output is a matrix of (adjusted) p-values.
> with(CheeseAminoAcid, pairwise.t.test(freeAminoAcid,strain,p.adjust="fdr"))
Pairwise comparisons using t tests with pooled SD
data: freeAminoAcid and strain
A AB B
AB 0.027 - -
B 0.110 0.094 -
none 0.570 0.027 0.094
P value adjustment method: fdr
pairwise.t.test()
.
There is a separate TukeyHSD()
function, but it works on the output of aov()
, not
lm()
.
> aov.out <- aov(freeAminoAcid~strain,data=CheeseAminoAcid)
> TukeyHSD(aov.out)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = freeAminoAcid ~ strain, data = CheeseAminoAcid)
$strain
diff lwr upr p adj
AB-A 1.8915 0.2773476 3.5056524 0.0296048
B-A 0.8750 -0.7391524 2.4891524 0.2636576
none-A -0.2450 -1.8591524 1.3691524 0.9212486
B-AB -1.0165 -2.6306524 0.5976524 0.1870664
none-AB -2.1365 -3.7506524 -0.5223476 0.0194380
none-B -1.1200 -2.7341524 0.4941524 0.1462787