Chapter 3 Two-Sample Procedures

Two-sample procedures refer to making inference about two populations using samples from the two populations. Typically, we are making inference about the group means: Is there evidence that are not equal? Is there evidence one is greater? What is a range of values for the difference of means that is consistent with the data?

Consider the data on breaking strength for notched and unnotched boards data set NotchedBoards. We would like to investigate the null hypothesis that unnotched boards of thickness .625 inch have the same strength as notched boards of thickness .75 inch with a 1 inch wide notch cut in the center to thickness .625 inch.

First get the NotchedBoards data from cfcdae into your current work space.

data(NotchedBoards) # creates variable NotchedBoards

Now you have a choice. You can create two vectors for the two different groups, or you can work with the data frame directly.

unnotched <- NotchedBoards$strength[NotchedBoards$shape=="uniform"]
notched <- NotchedBoards$strength[NotchedBoards$shape=="notched"]
unnotched
 [1] 243 229 305 395 210 311 289 269 282 399 222 331 369
notched
 [1] 215 202 273 292 253 247 350 246 352 398 267 331 342

Before we do any inference, let’s just look at the data. Here we do boxplots of the two different groups. They are nearly the same with lots of overlap. This suggests that we will find no significant differences.

# boxplot(notched,unnotched) # most obvious version
# boxplot(list(Notched=notched,Unnotched=unnotched)) # provides better labels
boxplot(strength ~ shape,data=NotchedBoards) # formula version

You can just give several data vectors as arguments to boxplot, you can give them better labels (capitals show the labels), or you can use a formula of the form response ~ groupings. The latter is easiest if you have many groups or your data comes in a data frame.

3.1 Standard t-test

The two-sample t-test is the typical method used to do tests regarding the means of two groups. In R, this is the t.test(x,y) function. This command does a two-sample t-test between the sets of data in x and y. The confidence interval it generates is for the mean of x minus the mean of y.

There is also a “formula” version of t.test(). The formula takes the form of response ~ predictor, where in our case the predictor is a factor (grouping variable) with two levels. You get the same results, and it’s a little less fuss if your data come from a data frame.

t.test(unnotched,notched)

    Welch Two Sample t-test

data:  unnotched and notched
t = 0.27353, df = 23.911, p-value = 0.7868
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -43.31049  56.54126
sample estimates:
mean of x mean of y 
 296.4615  289.8462 
t.test(strength ~ shape,data=NotchedBoards)

    Welch Two Sample t-test

data:  strength by shape
t = -0.27353, df = 23.911, p-value = 0.7868
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -56.54126  43.31049
sample estimates:
mean in group notched mean in group uniform 
             289.8462              296.4615 

Note that by default R uses an unpooled estimate of variance (the Welch version with fractional degrees of freedom), a two-sided alternative, and produces a confidence interval with 95% coverage. You can also get a pooled estimate of variance and/or upper or lower alternatives (i.e., x has greater mean or lesser mean) and/or change the confidence level by using the appropriate optional arguments.

The unpooled (Welch) version is generally the better option in typical two-sample use, because it is almost as good as the pooled version when the population variances are equal and is much better when the population variances differ. However, Analysis of Variance, which generalizes the t test to multiple groups and to more complicated settings, is a generalization of the unpooled version.

Here we do the test with the option that forces the group variances to be equal and a 99.5% coverage level. Then we jump ahead to an Analysis of Variance approach just to show that for two groups its p-value agrees with the equal variances t-test (the F value is the square of the t). The p-values in both cases are large providing no evidence against the null of equal means.

t.test(unnotched,notched,var.equal=TRUE,conf.level=.995) # nearly identical for these data

    Two Sample t-test

data:  unnotched and notched
t = 0.27353, df = 24, p-value = 0.7868
alternative hypothesis: true difference in means is not equal to 0
99.5 percent confidence interval:
 -68.12963  81.36040
sample estimates:
mean of x mean of y 
 296.4615  289.8462 
anova(lm(strength~shape,data=NotchedBoards)) # preview
Analysis of Variance Table

Response: strength
          Df Sum Sq Mean Sq F value Pr(>F)
shape      1    284   284.5  0.0748 0.7868
Residuals 24  91249  3802.0               

One reasonable belief might be that the notched boards would be stronger than the unnotched boards, because while they have the same minimum thickness as the unnotched boards, their average thickness is greater. We can examine this using a one-sided test with the alternative that the unnotched mean is greater than the notched mean. The p-value is smaller than for the two-sided test, but it is still quite large.

t.test(unnotched,notched,alternative="greater")

    Welch Two Sample t-test

data:  unnotched and notched
t = 0.27353, df = 23.911, p-value = 0.3934
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 -34.76902       Inf
sample estimates:
mean of x mean of y 
 296.4615  289.8462 

3.2 Digresson on computing percent points and quantiles

pt() gives you the cumulative probability (area to the left) for Student’s t distribution; the lower.tail=FALSE option gives you the upper tail probability. The first argument is the t value, the second is the degrees of freedom. The lower tail and upper tail values are below, and of course they add to 1. The second line with twice the smaller tail gives the two-sided p-value.

pt(-.27353,23.911);pt(-.27353,23.911,lower.tail=FALSE)
[1] 0.3933973
[1] 0.6066027
2*pt(-abs(.27353),23.911)
[1] 0.7867947

In general in R, pFOO(q,params) gives you the cumulative probability up to q for distribution FOO, qFOO(p,params) gives you the quantile that gives you cumulative probability p, and rFOO(n,parms) gives you a random sample of size n from distribution FOO. Thus, we have pt, pnorm, pf, pchisq, pbinom, and many others as well as their q and r forms.

3.3 Randomization (permutation) two-sample test

The randomization version of the two-sample t-test can be done with the function permTS(); this function comes from the perm package. To use it, you need to install the perm package onto your computer once (although you may need to redo this every time you update R), and then load it into every R session you want to use it in. You can use the functions as shown here (the first command does the install, and the second one is the one you need to do every time you want to use it), but it is usually easier to use the package menu commands in RStudio to do the install. Feel free to use a different CRAN repository.

install.packages("perm",repos="https://cloud.r-project.org")
Installing package into '/Users/gary/Library/R/4.0/library'
(as 'lib' is unspecified)

The downloaded binary packages are in
    /var/folders/_6/3018nw2s6x1_vm4fszmrz7t80000gp/T//RtmpxikzNa/downloaded_packages
library(perm)

We are going to do randomization tests, which rely on randomization. The “random” numbers in R are produced by an algorithm that starts with a “seed” value. If you want to be able to reproduce exact values, you need to seed (start) the random number generator in the same place. I do that here so that you can reproduce the results I get in the demo. In general, R will seed its own random numbers so that they’re different every time.

set.seed(654321)

The permTS() function does the two-sample randomization (permutation) t-test. By default it does a two-sided alternative. The main advantage of this procedure over the t test is that the permutation test does not assume or depend on normality.

We see that the x mean is less than the y mean, and that the probability that a randomization leads to a difference of means as large or larger than 6.62 in absolute value is 78%. Note that this is very close to the t-test p-value.

permTS(unnotched,notched)

    Permutation Test using Asymptotic Approximation

data:  unnotched and notched
Z = 0.27874, p-value = 0.7804
alternative hypothesis: true mean unnotched - mean notched is not equal to 0
sample estimates:
mean unnotched - mean notched 
                     6.615385 
set.seed(654321) # try again
permTS(strength~shape,data=NotchedBoards) # same results with formula

    Permutation Test using Asymptotic Approximation

data:  strength by shape
Z = -0.27874, p-value = 0.7804
alternative hypothesis: true mean shape=notched - mean shape=uniform is not equal to 0
sample estimates:
mean shape=notched - mean shape=uniform 
                              -6.615385 

We may also specify different alternatives, for example,

permTS(unnotched,notched,alternative="greater")

    Permutation Test using Asymptotic Approximation

data:  unnotched and notched
Z = 0.27874, p-value = 0.3902
alternative hypothesis: true mean unnotched - mean notched is greater than 0
sample estimates:
mean unnotched - mean notched 
                     6.615385