Chapter 24 Bonferroni-Style Methods
We want methods that adjust the p-values for a set of K tests so that we control various error rates at some rate \(\alpha\). The methods discussed here work directly on the p-values; usually, you don’t need to know anything beyond K and the p-values. Some methods, however, require that the tests that generated the p-values be statistically independent; when the tests are independent, you can usually get a little more power.
The text presents Bonferroni-style methods as comparing p-values to the nominal rate E divided by an adjustment factor. In R, p.adjust()
instead multiplies each p-value by the adjustment factor, and you then compare that to the nominal error rate.
The progenitor of all these methods is Bonferroni itself. Multiply each p-value by K. If the adjusted p-value is less than \(\alpha\), reject that null hypothesis. Bonferroni is widely applicable and dead easy. It controls the strong familywise error rate, or you can use it to get simultaneous confidence intervals. But unless you need simultaneous confidence intervals, the Holm method is always more powerful (more likely to find differences) while still controlling the strong familywise error rate.
Let \(p_{(1)}\) denote the smallest p-value, \(p_{(2)}\) the second smallest p-value and so on. All of the methods will compare ordered p-values to rescaled critical values. Alternatively, you can adjust (rescale) the ordered p-values and check to see if they are less than the original rate \(\alpha\). The Bonferroni, Hochberg, Benjamini-Hochberg, and Benjamini-Yekutieli methods say to reject hypothesis (j) if the comparison inequality holds for any (i) with \(i \geq j\). The Holm method says to reject hypothesis (j) if the comparison inequality holds for all \(i \leq j\).
Put the adjusted p-values in order so that the original p-values are in increasing order. For Holm, you can only reject (j) if the jth adjusted p-value and all those that come before it are less than \(\alpha\). So start from the left and keep rejecting as long as the adjusted p-values are less than \(\alpha\); any adjusted p-value bigger than \(\alpha\) prevents rejection from there on into the list. For the other methods, you can reject (j) if it or any adjusted p-value that comes after it is less than \(\alpha\). So start from the right. Once you find an adjusted p-value less than \(\alpha\), reject that hypothesis and all those that come before it in the list.
In this example, we have p-values for testing the equality of ratings of different sensory aspects of three cottage cheeses (Example 5.3). It’s probably reasonable to assume that these tests are all independent of each other, but you might be able to come up with a scenario where using the same raters for all attributes induces some kind of correlation between the results.
Enter the p-values and sort them into increasing order:> Texture <- c(BD.rate=.001,Firm=.0001,Sticky=.41,Slippery=.07,Heavy=.15,
+ Part.size=.42,Runny=.002,Rubbery=.006)
> Texture <- sort(Texture) # smallest to largest
> Texture
Firm BD.rate Runny Rubbery Slippery Heavy Sticky Part.size
0.0001 0.0010 0.0020 0.0060 0.0700 0.1500 0.4100 0.4200
Note: if you need to sort the rows of a matrix so that the first
column is in increasing order, or do a parallel rearrangement of the
elements of several vectors so that one of them is in increasing
order, then you should be using the order()
function.
order()
gives us the indices so that when
you subscript with those indices the result is in order.
The following two do the same thing:
sort(foo)
foo[order(foo)]
Consider the Texture tests using the different approaches.
Assuming that the tests are independent, we can use B-H to
control the false discovery rate and Hochberg to
control the strong familywise error rate.
> Texture # unadjusted
Firm BD.rate Runny Rubbery Slippery Heavy Sticky Part.size
0.0001 0.0010 0.0020 0.0060 0.0700 0.1500 0.4100 0.4200
> p.adjust(Texture,method="BH") # controls FDR if independent
Firm BD.rate Runny Rubbery Slippery Heavy Sticky Part.size
0.000800000 0.004000000 0.005333333 0.012000000 0.112000000 0.200000000 0.420000000 0.420000000
> p.adjust(Texture,method="hochberg") # controls SFER if independent
Firm BD.rate Runny Rubbery Slippery Heavy Sticky Part.size
0.0008 0.0070 0.0120 0.0300 0.2800 0.4200 0.4200 0.4200
Testing at the .01 level, Firm, BD.rate, Runny, and Rubbery were all signficant when we make no adjustments. Firm, BD.rate, and Runny are significant when we control the false discovery rate, and only Firm and BD.rate are significant when we control the strong familywise error rate.
Notice that the three largest Hochberg p-values are all .42. The adjusted p-values begin by multiplying the smallest by K (8 in this case), then by K-1, and so on. If we do that by hand, we get> Texture*(8:1)
Firm BD.rate Runny Rubbery Slippery Heavy Sticky Part.size
0.0008 0.0070 0.0120 0.0300 0.2800 0.4500 0.8200 0.4200
The last three products are .45, .82, and .42 (not .42 three times). However,
recall that Hochberg is a “right to left”; the sixth and seventh values
will reject for any \(\alpha\) at or above .42, because the eighth value is .42
and it carries left. p.adjust()
changes the .45 and .82 to .42, which is
the value that is carried left.
> p.adjust(Texture,method="BY")
Firm BD.rate Runny Rubbery Slippery Heavy Sticky Part.size
0.002174286 0.010871429 0.014495238 0.032614286 0.304400000 0.543571429 1.000000000 1.000000000
> p.adjust(Texture,method="holm")
Firm BD.rate Runny Rubbery Slippery Heavy Sticky Part.size
0.0008 0.0070 0.0120 0.0300 0.2800 0.4500 0.8200 0.8200
Holm is a
“left to right” approach. It begins with the same basic products
as Hochberg, but it modifies the products to ensure that they do not
decrease. Because the seventh product is .82, the eighth
value will not reject for anything below .82. Thus p.adjust()
reports the
eighth value as .82.