1 Introduction

1.1 Motivation and Goals

Nonparametric randomization and permutation tests offer robust alternatives to classic (parametric) hypothesis tests. Unlike classic hypothesis tests, which depend on parametric assumptions and/or large sample approximations for valid inference, nonparametric tests use computationally intensive methods to provide valid inferencial results under a wide collection of data generating conditions. As will be elaborated in upcoming sections, nonparametric tests are a form of conditional inference, where inference is conducted after conditioning on properties of the data.

The stats package in R implements some basic nonparametric tests, but lacks general functionality for robust nonparametric inference. As a result, this document focuses on the nptest R package, which implements robust nonparametric tests for location, correlation, and regression problems. As I demonstrate, the nptest package provides a user-friendly unification of many parametric and nonparametric tests, as well as novel implementations of several robust nonparametric tests.

1.2 Permutations in R

The word permutation refers to the arrangement of a set of objects into some specified order.

Given a vector \(\mathbf{x} = (x_1, \ldots, x_n)^\top\) of length \(n\), there are \[ n! = n \times (n - 1) \times \cdots \times 2 \] possible permutations of the \(n\) values.

To generate all possible permutations of the integers \(1,\ldots,n\), use the permn function in the nptest R package:

# load 'nptest' package
library(nptest)

## Loading required package: parallel

# all permutations of c(1, 2, 3)
permn(3)

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    2    2    1    3    3
## [2,]    2    1    3    3    1    2
## [3,]    3    3    1    2    2    1

Warning: For large n this function will consume a lot of memory and may even crash R.

To obtain a random permutation of the integes \(1,\ldots,n\), use the sample.int function in the base R package:

# random permutation of 1:3
sample.int(3)

## [1] 2 1 3

# random permutation of 1:4
sample.int(4)

## [1] 4 2 3 1

Note that the above ideas can be used to permute the elements of any vector, given that the integers \(1,\ldots,n\) can be used to index the vector’s elements. Or, for a random permutation of a generic vector, the sample function in R’s base package can be used:

x <- round(rnorm(7), 2)
x

## [1] 0.88 1.61 0.05 1.13 1.80 0.21 0.85

sample(x)

## [1] 0.21 0.88 0.85 1.80 1.13 1.61 0.05

1.3 Hypothesis Testing Basics

Null hypothesis significance testing involves the following steps:

Form a null hypothesis \(H_0\) and an alternative hypothesis \(H_1\) about some parameter \(\theta = t(F)\) of a distribution \(F\).
Calculate some test statistic \(T = s(\mathbf{x})\) from the observed data \(\mathbf{x} = (x_1, \ldots, x_n)^\top\) where \(x_i \stackrel{\mathrm{iid}}{\sim}F\) for \(i = 1, \ldots, n\).
Calculate the p-value, which is the probability of observing a test statistic as or more extreme than \(T\) under the assumption \(H_0\) is true.
Reject \(H_0\) if the p-value is below some user-determined threshold, e.g., \(\alpha = 0.05\) or some other small value.

The notation \(\theta = t(F)\) denotes that the parameter is a function of the distribution. Similarly, the notation \(T = s(\mathbf{x})\) denotes that the statistic is a function of the data.

The sampling distribution of the test statistic \(T\) refers to the distribution of \(T\) that would be obtained by applying the test statistic function \(s(\cdot)\) to many different random samples of size \(n\) drawn from the population \(F\). Note that the sampling distribution of \(T\) (under \(H_0\)) is needed for the p-value calculation in step 3.

In parametric applications of hypothesis tests, the sampling distribution of \(T\) (under \(H_0\)) is derived from either (i) parametric assumptions about the data generating process, e.g., \(F\) is a normal distribution, or (ii) large sample assumptions about the test statistic, e.g., \(T\) is asymptotically normal.

1.4 Exact Nonparametric Tests

Permutation Distribution

Nonparametric tests derive the sampling distribution of \(T\) (under \(H_0\)) by (i) enumerating all data arrangements (or permutations) that are equally likely under \(H_0\), and then (ii) calculating the test statistic \(T\) for each possible data permutation.

The exact null distribution of the test statistic \(T\) refers to the (discrete) distribution formed by calculating \(T\) for all possible data permutations that are equally likely under \(H_0\). This distribution will also be referred to as the exact permutation distribution of the test statistic.

Let \(M\) denote the number of data permutations that are equally likely under \(H_0\):

\(M = 2^n\) for one-sample problems
\(M = {m + n \choose m}\) for two-sample problems
\(M = n!\) for correlation and regression problems

The exact null distribution will be denoted by \(\mathcal{T} = \{T_j\}_{j = 1}^M\) where \(T_j = s(\mathbf{x}_j)\) with \(\mathbf{x}_j\) denoting the \(j\)-th permutation of the data (for \(j = 1,\ldots, M\)).

Permutation Inference

In nonparametric tests, exact p-values are defined as \[ \mbox{p-value} = \frac{1}{M} \sum_{j = 1}^M I(|T_j| \geq |T|) \] where \(I(\cdot)\) is an indicator function and \(T\) is the observed test statistic. Note that the p-value can range from \(1/M\) to 1 given that \(T\) is one of the \(M\) values in the exact null distribution.

The above p-value calculation assumes a two-sided alternative hypothesis. For directional alternatives, (i) the absolute value is removed from the test statistics in the p-value calculation, and (ii) the \(\geq\) sign is changed to \(\leq\) for “less than” alternatives.

1.5 Approximate (Monte Carlo) Tests

Approximate Permutation Distributions

The total number of permutations \(M\) is often too large to feasibly compute all elements of the exact null distribution.

In such cases, we can conduct an approximate (or Monte Carlo) nonparametric test:

Randomly select \(R < M\) elements from \(\mathcal{T}\)
Define the approximate null distribution \(\mathcal{T}_{R+1} = \{T_j\}_{j = 1}^{R+1}\)
Compute p-values using \(\mathcal{T}_{R+1}\) in place of \(\mathcal{T}\)

Note that \(\mathcal{T}_{R+1}\) has \(R + 1\) elements (\(R\) permutations plus 1 observed), so the minimum possible p-value for an approximate nonparametric test is \(1/(R+1)\). Thus, the accuracy of the Monte Carlo approximation is affected by the number of random permutations \(R\) that are sampled from \(M\).

Monte Carlo Standard Errors

Consider a nonparametric test where the total number of possible outcomes \(M\) is very large (e.g., \(M \approx \infty\)), and define the following notation

\(F(t)\) is the distribution function for the exact null distribution \(\mathcal{T}\)
\(G(t)\) is the distribution function for the approximate null distribution \(\mathcal{T}_{R+1}\)

Reminder: the distribution function is a probability calculation \(F(t) = P(T < t)\).

The Monte Carlo standard error (MCSE) of a nonparametric test is defined as \[ \sigma(t) = \sqrt{ \frac{F(t) [1 - F(t)]}{R + 1} } \] which is the classic formula for the standard error of a sample proportion. Note that the MCSE is the standard deviation of \(G(t)\), which is our (sample) estimate of the (population) probability \(F(t)\).

Given the MCSE, a symmetric \(100(1 - \tilde{\alpha})\%\) confidence interval for \(F(t)\) can be approximated as \[ G(t) \pm Z_{1 - \tilde{\alpha}/2} \sigma(t) \] where \(Z_{1 - \tilde{\alpha}/2}\) is the quantile of a standard normal distribution that cuts-off \(\tilde{\alpha} / 2\) in the upper tail (e.g., \(Z_{0.975} \approx 1.96\) for a 95% confidence interval). Note that \(\alpha\) and \(\tilde{\alpha}\) can be different: the value of \(\alpha\) relates to the significance level of the nonparametric test, whereas the value of \(\tilde{\alpha}\) relates to the confidence of conclusions drawn from the approximate test.

Accuracy of Approximate Tests

Let \(\alpha\) denote the significance level for a nonparametric test with a one-sided alternative hypothesis (\(\alpha\) is one-half the significance level if \(H_1\) is two-sided). Furthermore, let \(t_\alpha\) denote the value of the test statistic such that \(F(t_\alpha) = \alpha\). To define the accuracy of the approximate test, consider the probability statement \[ P(| G(t_\alpha) - \alpha | < \alpha \delta) \approx 1 - \tilde{\alpha} \] where \(0 < \delta < 1\) quantifies the accuracy of the approximation (e.g., \(\delta = 0.1\) corresponds to 10% accuracy), and \(\tilde{\alpha}\) controls the confidence of the approximation (e.g., \(\tilde{\alpha} = 0.05\) corresponds to 95% confidence). Note that \(G(t_\alpha) \sim N(\alpha, \sigma^2(t_\alpha))\), so the probability statement will be true when \[ \sigma(t_\alpha) = \sqrt{ \frac{\alpha (1 - \alpha)}{(R + 1)} } = \frac{\alpha \delta}{ Z_{1 - \tilde{\alpha}/2} } \] which can be guaranteed by adjusting the number of permutations \(R\) and/or the accuracy parameter \(\delta\).

Fixing \(R\) and solving for \(\delta\) gives the accuracy of an approximate test \[ \delta = Z_{1 - \tilde{\alpha}/2} \sqrt{ \frac{(1 - \alpha)}{\alpha (R + 1)} } \] which is a function of the number of permutations \(R\), the significance level \(\alpha\), and confidence level \(1-\tilde{\alpha}\). Fixing \(\delta\) and solving for \(R\) gives the number of permutations needed to achieve a test with a given accuracy \[ R + 1 = \left\lceil \frac{Z_{1 - \tilde{\alpha}/2}^2 (1 - \alpha)}{\alpha \delta^2} \right\rceil \] which is a function of the accuracy parameter \(\delta\), the significance level \(\alpha\), and confidence level \(1-\tilde{\alpha}\). Note that the notation \(\lceil \cdot \rceil\) denotes the ceiling function.

Examples in R

The mcse function (in the nptest package) can be used to find (A) the accuracy of a given test, or (B) the number of permutations needed for a given accuracy.

Find \(\delta\) for a given \(R = 9999\):

mcse(R = 10000)

## 
## Monte Carlo Standard Errors for Nonparametric Tests
## alternative hypothesis: two.sided 
## Monte Carlo Std Err: 0.0016 
## number of resamples: 10000 
## accuracy of approx: 0.1224 
##   confidence level: 0.95 
## significance level: 0.05

Interpretation: Using an \(\alpha = 0.05\) significance level and a two-sided alternative hypothesis, the approximate nonparametric test with \(R = 9999\) permutations has a MCSE of \(\sigma(t_{0.025}) = 0.0016\) and an accuracy of \(\delta = 0.1224\) (assuming 95% confidence).

Find \(R+1\) for a given \(\delta = 0.1\) (two-sided):

mcse(delta = 0.1)

## 
## Monte Carlo Standard Errors for Nonparametric Tests
## alternative hypothesis: two.sided 
## Monte Carlo Std Err: 0.0013 
## number of resamples: 14982 
## accuracy of approx: 0.1 
##   confidence level: 0.95 
## significance level: 0.05

Interpretation: Using an \(\alpha = 0.05\) significance level and a two-sided alternative hypothesis, the approximate nonparametric test with accuracy \(\delta = 0.1\) has a MCSE of \(\sigma(t_{0.025}) = 0.0013\) and a minimum number of resamples \(R + 1 = 14982\) (assuming 95% confidence).

Find \(R+1\) for a given \(\delta = 0.1\) (one-sided):

mcse(delta = 0.1, alternative = "one.sided")

## 
## Monte Carlo Standard Errors for Nonparametric Tests
## alternative hypothesis: one.sided 
## Monte Carlo Std Err: 0.0026 
## number of resamples: 7299 
## accuracy of approx: 0.1 
##   confidence level: 0.95 
## significance level: 0.05

Interpretation: Using an \(\alpha = 0.05\) significance level and a one-sided alternative hypothesis, the approximate nonparametric test with accuracy \(\delta = 0.1\) has a MCSE of \(\sigma(t_{0.05}) = 0.0026\) and a minimum number of resamples \(R + 1 = 7299\) (assuming 95% confidence).

2 One-Sample Location Tests

2.1 Overview of Problem

For the one-sample location problem, we have \(n\) observations

\(Z_1,\ldots,Z_n \stackrel{\mathrm{iid}}{\sim}F\) if one-sample situation
\(Z_1,\ldots,Z_n \stackrel{\mathrm{iid}}{\sim}F\) with \(Z_i = X_i - Y_i\) if paired sample situation

The observations are assumed to be sampled from a continuous distribution \(F\) that depends on the location parameter \(\mu\).

Null hypothesis is \(H_{0}: \mu = \mu_0\) where \(\mu_0\) is known
Three possible alternatives: \(H_{1}: \mu < \mu_0\), \(H_{1}: \mu > \mu_0\), \(H_{1}: \mu \neq \mu_0\)

Throughout our discussion, \(\mu\) will denote either the mean or median of \(F\).

The np.loc.test function (in the nptest package) can be used to implement one-sample and paired sample location tests.

Usage:

np.loc.test(x, y = NULL,
                       alternative = c("two.sided", "less", "greater"),
                       mu = 0, paired = FALSE, var.equal = FALSE,
                       median.test = FALSE, symmetric = TRUE,
                       R = 9999, parallel = FALSE, cl = NULL,
                       perm.dist = TRUE)

2.2 Choice of Test Statistic

Specifying the Four Options

Four different test statistics are available:

	median.test = FALSE	median.test = TRUE
symmetric = FALSE	Johnson skew-adjusted \(t\) test	Fisher sign test
symmetric = TRUE	Student \(t\) test	Wilcoxon signed rank test

Note that the chosen test statistic relates to the type of test being conducted (e.g., mean or median test), as well as the assumptions made about the data (e.g., symmetric or skewed).

Student’s t test statistic (1908)

Student’s \(t\) test statistic has the form \[ T = \frac{\bar{Z} - \mu_0}{S / \sqrt{n}} \] where \(\bar{Z} = \frac{1}{n} \sum_{i = 1}^n Z_i\) is the sample mean, and \(S^2 = \frac{1}{n-1}\sum_{i = 1}^n (Z_i - \bar{Z})^2\) is the sample variance.

Johnson’s (skew-adjusted) t test statistic (1978)

Johnson proposed a correction to Student’s \(t\) test statistic for skewed data: \[ T = \frac{(\bar{Z} - \mu_0) + \frac{\hat{\mu}_3}{6 S^2 n} + \frac{\hat{\mu}_3}{3 S^4} (\bar{Z} - \mu_0)^2 }{S / \sqrt{n}} \] where \(\hat{\mu}_3 = \frac{1}{n-1}\sum_{i=1}^n (Z_i - \bar{Z})^3\) is the (estimated) third central moment of \(F\).

Wilcoxon’s signed rank test statistic (1945)

Wilcoxon’s signed rank test statistic has the form \[ T = \frac{V - E(V)}{\sqrt{\mathrm{var}(V)}} \] where \(V = \sum_{i = 1}^n R_i \psi_i\), \(R_i = \mathrm{rank}(|Z_i - \mu_0|)\), and \(\psi_i = I(Z_i > \mu_0)\). The expectataion and variance (under \(H_0\)) have the form \(E(V) = n(n+1)/4\) and \(\mathrm{var}(V) = n(n+1)(2n+1)/24\).

Fisher’s sign test statistic (1925)

Fisher’s sign test statistic has the form \[ T = \frac{S - E(S)}{\sqrt{\mathrm{var}(S)}} \] where \(S = \sum_{i = 1}^n \psi_i\) is the number of \(Z_i\) that are greater than \(\mu_0\). The expectataion and variance (under \(H_0\)) have the form \(E(S) = n/2\) and \(\mathrm{var}(S) = n/4\).

2.3 Randomization Inference

Number of Possible Outcomes

For the one-sample (or paired sample) problem, there are \(M = 2^n\) possible outcomes:

Two possibilities for each \(Z_i\): (1) \(Z_i > \mu_0\) or (2) \(Z_i < \mu_0\)
The \(Z_i\) are iid so each combination of \(\pm Z_i\) is possible

Define \(\tilde{Z}_i = Z_i − \mu_0\) to be the centered data for the \(i\)-th observation, which has mean (or median) zero under \(H_0\). Note that we can write \[ \tilde{Z}_i = s_i | \tilde{Z}_i | \] where \(s_i = 2\psi_i - 1\) and \(\psi_i = I(Z_i > \mu_0)\).

The \(M = 2^n\) possible outcomes correspond to all vectors of the form \[ \mathbf{s} = (s_1, s_2, \ldots, s_n)^\top \] where \(s_i \in \{-1, 1\}\) for \(i = 1,\ldots, n\). Such vectors are referred to as “sign-flipping” vectors, given that \(s_i\) controls the sign of the centered score \(\tilde{Z}_i\). Note that this is a form of conditional inference where we condition on the magnitudes of the centered scores, i.e., \(| \tilde{Z}_i | \ \forall i\).

The flipn function (in the nptest package) can be used to obtain all \(M = 2^n\) possible sign-flipping vectors:

# all sign-flipping vectors (n = 3)
flipn(3)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]   -1   -1   -1   -1    1    1    1    1
## [2,]   -1   -1    1    1   -1   -1    1    1
## [3,]   -1    1   -1    1   -1    1   -1    1

Warning: For large n this function will consume a lot of memory and may even crash R.

Exact and Approximate Null Distributions

Define \(\{\mathbf{s}_j\}_{j = 1}^M\) to be the collection of \(M = 2^n\) sign-flipping vectors for a given sample size \(n\), where \(\mathbf{s}_j = (s_{1j}, \ldots, s_{nj})^\top\). For example, if \(n = 3\), then the \(j\)-th sign-flipping vector is the \(j\)-th column of the matrix returned by the flipn function (see above).

Let \(\tilde{Z}_{ij} = s_{ij} | \tilde{Z}_i |\) denote the \(j\)-th “sign-flipped” version of the \(i\)-th observation’s data, where \(s_{ij}\) is the \(i\)-th obervation’s sign for the \(j\)-th sign-flipping vector.
Let \(\tilde{\mathbf{Z}}_j = (\tilde{Z}_{1j}, \ldots, \tilde{Z}_{nj})^\top\) denote the sample of the \(n\) observations’ centered and sign-flipped data for the \(j\)-th sign-flipping vector.
Let \(\mathbf{Z}_j = (Z_{1j}, \ldots, Z_{nj})^\top\) where \(Z_{ij} = \mu_0 + \tilde{Z}_{ij}\) denotes the “uncentered” and sign-flipped data for the \(j\)-th sign-flipping vector.

The exact null distribution is given by \(\mathcal{T} = \{T_j\}_{j = 1}^M\) where \(T_j = s(\mathbf{Z}_j)\) denotes the test statistic corresponding to the \(j\)-th sign-flipping vector. Note that \(T_j\) is obtained by applying the given test statistic function \(s(\cdot)\) to the \(j\)-th sign-flipped version of the data \(\mathbf{Z}_j\).

An approximate null distribution is formed by calculating the test statistic for \(R\) randomly sampled sign-flipping vectors (plus the observed data vector), i.e., \(\mathcal{T}_{R+1} = \{T_j\}_{j = 1}^{R+1}\) where \(T_j = s(\mathbf{Z}_j)\) denotes the test statistic corresponding to the \(j\)-th sign-flipping vector.

Necessary Assumptions

If \(F\) is symmetric around the median \(\mu\), the test will be exact regardless of the chosen test statistic.

If \(F\) is a skewed distribution (so mean \(\neq\) median), the test will be…

inexact but asymptotically valid when \(\mu\) is the mean (i.e., median.test = FALSE) for Johnson’s and Student’s test statistics
inexact and asymptotically invalid when \(\mu\) is the median (i.e., median.test = TRUE) and Wilcoxon’s test statistic is used (i.e., symmetric = TRUE)
exact when \(\mu\) is the median (i.e., median.test = TRUE) and Fisher’s test statistic is used (i.e., symmetric = FALSE)

2.4 Simulation Study (Helwig, 2019a)

Helwig, N. E. (2019). Statistical nonparametric mapping: Multivariate permutation tests for location, correlation, and regression problems in neuroimaging. WIREs Computational Statistics, 11(2), e1457. https://doi.org/10.1002/wics.1457

Design

The simulation study manipulated two data generating factors:

the distribution for \(Z_i\) (see figure below)
the sample size \(n \in \{10, 25, 50, 100, 200\}\)

Figure 1: Six univariate distributions with mean \(\mu = 0\) and variance \(\sigma^2 = 1\).

Analyses

10,000 independent datasets (replications) were generated for each of the 30 (6 distribution × 5 n) cells of the simulation design. For each replication, the null hypothesis \(H_0: \mu = 0\) was tested using the alternative hypothesis \(H_1: \mu > 0\) and an \(\alpha = 0.05\) significance level.

The significance testing results were compared using two methods:

Student’s t test using the \(t_{n−1}\) distribution for inference
Permutation test using Student’s t test statistic

For the permutation tests, the number of resamples was set at \(R = \min(2^n, 9999)\).

Results

The type I error rate for each combination of test statistic and data generating condition is given in the figure below.

Figure 2: Type I error rate calculated across 10,000 replications for the \(t\) test (red squares) and the permutation test (black circles). The vertical bars denote approximate 99% confidence intervals, and the dotted horizontal lines denote the nominal rate of \(\alpha = 0.05\).

The simulation results reveal that…

the \(t\) test and the permutation test perform similarly
for symmetric distributions, the results are exact for all \(n\)
for skewed distributions, the results are asymptotically valid

2.5 Example 3.1: Depression Treatment

Data and Hypotheses

Consider the dataset in Example 3.1 from Nonparametric Statistical Methods, 3rd Ed. (Hollander et al., 2014).

Nine psychiatric patients were treated with a tranquilizer drug. Before and after the treatment (i.e., traquilizer drug), the patients’ suicidal tendencies were measured using the Hamilton Depression Scale (Factor IV).

Let \(X\) denote the pre-treatment score
Let \(Y\) denote the post-treatment score

Higher scores on the Hamilton Depression Scale (Factor IV) correspond to more suicidal tendencies. We want to test if the tranquilizer significantly reduced suicidal tendencies:

\(H_0: \mu = 0\) versus \(H_1: \mu > 0\)
\(\mu\) is the mean (or median) of \(Z = X - Y\)

Enter the data into R:

x <- c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30)
y <- c(0.878, 0.647, 0.598, 2.050, 1.060, 1.290, 1.060, 3.140, 1.290)

Results with Student’s Test Statistic

Student’s \(t\) test:

## student t-test
t.test(x, y, alternative = "greater", paired = TRUE)

## 
##  Paired t-test
## 
## data:  x and y
## t = 3.0354, df = 8, p-value = 0.008088
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.1673028       Inf
## sample estimates:
## mean of the differences 
##               0.4318889

Permutation test using Student’s \(t\) statistic:

## permutation test (T = student)
nlt <- np.loc.test(x, y, alternative = "greater", paired = TRUE)
nlt

## 
## Nonparametric Location Test (Paired t-test)
## alternative hypothesis: true difference of means is greater than 0 
## t = 3.0354,  p-value = 0.0137
## sample estimate:
## mean of the differences 
##               0.4318889

Note that both approaches use the same observed test statistic is \(T = 3.0354\), but the two approaches produce slightly different p-values:

\(t\) test: \(p = 0.0081\) is obtained by comparing \(T = 3.0354\) to a \(t_8\) distribution (note that \(t_\nu\) is Student’s \(t\) distribution with \(\nu\) d.f.)
NP test: \(p = 0.0137\) is obtained by enumerating the exact null distribution with \(2^n = 512\) possible outcomes

Plot the exact null distribution used for the nonparametric test:

## plot perm.dist (T = student)
hist(nlt$perm.dist, breaks = "FD", xlab = "T", 
     main = "Permutation Distribution (T = Student)")
box()
abline(v = nlt$statistic, lty = 2, col = "red")
legend("topleft", "Observed T", lty = 2, col = "red", bty = "n")

Results with Wilcoxon’s Test Statistic

Wilcoxon’s signed rank test:

## wilcoxon signed rank test
wilcox.test(x, y, alternative = "greater", paired = TRUE)

## 
##  Wilcoxon signed rank exact test
## 
## data:  x and y
## V = 40, p-value = 0.01953
## alternative hypothesis: true location shift is greater than 0

Permutation test using Wilcoxon’s signed rank statistic:

## permutation test (T = wilcoxon)
nlt <- np.loc.test(x, y, alternative = "greater", paired = TRUE,
                   median.test = TRUE)
nlt

## 
## Nonparametric Location Test (Paired Wilcoxon Signed Rank Test)
## alternative hypothesis: true median of differences is greater than 0 
## t = 2.0732,  p-value = 0.0195
## sample estimate:
## (pseudo)median of the differences 
##                              0.46

Note that both approaches use the same observed test statistic, but return the result on a different scale: \(T = (V - E(V)) / \sqrt{\mathrm{var}(V)}\). In this case, the number of elements of the exact null distribution is manageable (\(M = 512\)), so both functions are implementing an exact test. Consequently, we obtain the same (exact) p-value from both functions.

Plot the exact null distribution used for the nonparametric test:

## plot perm.dist (T = wilcoxon)
hist(nlt$perm.dist, breaks = "FD", xlab = "T", 
     main = "Permutation Distribution (T = Wilcoxon)")
box()
abline(v = nlt$statistic, lty = 2, col = "red")
legend("topleft", "Observed T", lty = 2, col = "red", bty = "n")

Results with Fisher’s Test Statistic

Fishers’s sign test:

## fisher sign test
BSDA::SIGN.test(x, y, alternative = "greater")

## 
##  Dependent-samples Sign-Test
## 
## data:  x and y
## S = 7, p-value = 0.08984
## alternative hypothesis: true median difference is greater than 0
## 95 percent confidence interval:
##  -0.041    Inf
## sample estimates:
## median of x-y 
##          0.49 
## 
## Achieved and Interpolated Confidence Intervals: 
## 
##                   Conf.Level L.E.pt U.E.pt
## Lower Achieved CI     0.9102  0.010    Inf
## Interpolated CI       0.9500 -0.041    Inf
## Upper Achieved CI     0.9805 -0.080    Inf

Permutation test using Fisher’s sign statistic:

## permutation test (T = fisher)
nlt <- np.loc.test(x, y, alternative = "greater", paired = TRUE,
                   median.test = TRUE, symmetric = FALSE)
nlt

## 
## Nonparametric Location Test (Paired Fisher Sign Test)
## alternative hypothesis: true median of differences is greater than 0 
## t = 1.6667,  p-value = 0.0898
## sample estimate:
## median of the differences 
##                      0.49

Note that both approaches use the same observed test statistic, but return the result on a different scale: \(T = (S - E(S)) / \sqrt{\mathrm{var}(S)}\). In this case, the number of elements of the exact null distribution is manageable (\(M = 512\)), so both functions are implementing an exact test. Consequently, we obtain the same (exact) p-value from both functions.

Plot the exact null distribution used for the nonparametric test:

## plot perm.dist (T = fisher)
hist(nlt$perm.dist, breaks = 22, xlab = "T", 
     main = "Permutation Distribution (T = Fisher)")
box()
abline(v = nlt$statistic, lty = 2, col = "red")
legend("topleft", "Observed T", lty = 2, col = "red", bty = "n")

3 Two-Sample Location Tests

3.1 Overview of Problem

For the two-sample location problem, we have \(N = m + n\) observations

\(X_1, \ldots , X_m\) are an iid random sample from population \(F_X\)
\(Y_1, \ldots , Y_n\) are an iid random sample from population \(F_Y\)
\(Z_1, \ldots , Z_N = (X_1, \ldots , X_m, Y_1, \ldots , Y_n)\) denotes the combined sample
\(\psi_1, \ldots, \psi_N = (1, \ldots, 1, 0, \ldots, 0)\) denotes the group label vector (\(\psi_k = 1\) if \(Z_k\) is an \(X\))

The \(X_i\) observations are assumed to be independent samples from a continuous distribution \(F_X\) that depends on the location parameter \(\mu_X\). Similarly, the \(Y_j\) observations are assumed to be independent samples from a continuous distribution \(F_Y\) that depends on the location parameter \(\mu_Y\). The \(X_i\) and \(Y_j\) observations are assumed to be independent of one another for all combiations of \(i,j\).

Null hypothesis is same location \(\Leftrightarrow H_0 : \mu_X = \mu_Y\)
Three possible alternatives: \(H_1 : \mu_X < \mu_Y\), \(H_1 : \mu_X > \mu_Y\), \(H_1 : \mu_X \neq \mu_Y\)

Note that \(\mu_X\) and \(\mu_Y\) will denote either the means or medians of \(F_X\) and \(F_Y\), respectively.

The np.loc.test function (in the nptest package) can be used to implement two-sample location tests.

Usage:

3.2 Choice of Test Statistic

Specifying the Four Options

Four different test statistics are available:

	median.test = FALSE	median.test = TRUE
var.equal = FALSE	Welch \(t\) test	Studentized Wilcoxon test
var.equal = TRUE	Student \(t\) test	Wilcoxon rank sum test

Note that the chosen test statistic relates to the type of test being conducted (e.g., mean or median test), as well as the assumptions made about the data (e.g., equal variance or not).

Student’s t test statistic (1908)

Student’s two-sample \(t\) test statistic has the form \[ T = \frac{\bar{X} - \bar{Y}}{S_p \sqrt{\frac{1}{m} + \frac{1}{n}}} \] where \(\bar{X} = \frac{1}{m} \sum_{i = 1}^n X_i\) and \(\bar{Y} = \frac{1}{n} \sum_{j = 1}^n Y_j\) are the sample means, and \[ S_p^2 = \frac{1}{N - 2} \left(\sum_{i=1}^m (X_i - \bar{X})^2 + \sum_{j=1}^n (Y_j - \bar{Y})^2 \right) \] is the pooled estimate of the (common) variance for the two populations.

Welch’s t test statistic (1938)

Welch proposed a modification of Student’s two-sample \(t\) test statistic for unequal variances: \[ T = \frac{\bar{X} - \bar{Y}}{\sqrt{\frac{S_X^2}{m} + \frac{S_Y^2}{n}}} \] where \(S_X^2 = \frac{1}{m - 1}\sum_{i=1}^m (X_i - \bar{X})^2\) and \(S_Y^2 = \frac{1}{n - 1}\sum_{j=1}^n (Y_j - \bar{Y})^2\) are the estimates of the variances for \(F_X\) and \(F_Y\), respectively.

Wilcoxon Rank Sum test statistic (1945)

Wilcoxon’s rank sum test statistic has the form \[ T = \frac{W - E(W)}{\sqrt{\mathrm{var}(W)}} \] where \(W = \sum_{k = 1}^N R_k \psi_k\), \(R_k = \mathrm{rank}(Z_k)\), and \(\psi_k = I(Z_k \sim F_X)\). The expectation and varaince of \(W\) (under \(H_0\)) have the form \(E(W) = m(N+1)/2\) and \(\mathrm{var}(W) = mn (N + 1) / 12\).

Studentized Wilcoxon test statistic (2016)

Chung and Romano (2016) proposed a modification of Wilcoxon’s rank sum test statistic for unequal variances: \[ T = \frac{W - E(W)}{\sqrt{\widetilde{\mathrm{var}}(W)}} \] where \(W\) and \(E(W)\) are the same as before, and the modified variance term is \[ \widetilde{\mathrm{var}}(W) = (mn)^2 \left(\frac{\xi_x^2}{m} + \frac{\xi_y^2}{n} \right) \] with the two components of the variance defined as \[ \begin{split} \xi_x^2 &= \frac{1}{m-1} \sum_{i=1}^m \left(\bar{U}_i - \frac{1}{m} \sum_{i=1}^m \bar{U}_i \right)^2 \\ \xi_y^2 &= \frac{1}{n-1} \sum_{j=1}^n \left(\bar{V}_j - \frac{1}{n} \sum_{j=1}^n \bar{V}_j \right)^2 \\ \end{split} \] where \(\bar{U}_i = \frac{1}{n}\sum_{j=1}^n I(Y_j \leq X_i)\) and \(\bar{V}_j = \frac{1}{m}\sum_{i=1}^m I(X_i < Y_j)\).

3.3 Randomization Inference

Number of Possible Outcomes

For the two-sample problem, there are \(M = {m+n \choose m}\) possible outcomes given a sample of \(N = m + n\) observations.

Test statistic depends on which \(Z_k\) are labeled \(X\) versus \(Y\)
Need to choose \(m\) of the \(N\) values to receive \(X\) labels

Let \(Z_k\) denote the \(k\)-th observation from the combined sample, which has common mean (or median) \(\mu\) under \(H_0\).

\((Z_k, \psi_k)\) is observed where \(\psi_k = 1\) if \(Z_k\) is an \(X\) and \(\psi_k = 0\) if \(Z_k\) is a \(Y\)
The \(\psi_k\) labels are arbitrary if the two populations are equivalent, i.e., if \(F_X = F_Y\)

The \(M\) possible outcomes correspond to the \({m + n \choose m}\) vectors of the form \[ \boldsymbol\psi = (\psi_1, \psi_2, \ldots, \psi_N) \] with \(\psi_k \in \{0,1\}\) for all \(k = 1,\ldots,N\) and \(\sum_{k = 1}^N \psi_k = m\). Such vectors are referred to as “combination” vectors given that \(\psi_k\) describes the how the observations combine to form the two groups. Note that this is a form of conditional inference where we condition on the \(Z_k \ \forall k\).

The combn function (in the utils package) can be used to obtain all \(M = {m + n \choose m}\) possible combination vectors:

# all combinations (m = 3, n = 2)
combn(5, 3)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    1    1    1    1    1    2    2    2     3
## [2,]    2    2    2    3    3    4    3    3    4     4
## [3,]    3    4    5    4    5    5    4    5    5     5

Note: each column of the matrix returned by combn denotes which \(\psi_k\) are equal to 1 for the combination vector.

Exact and Approximate Null Distributions

Define \(\{ \boldsymbol\psi_j \}_{j = 1}^M\) to be the collection of \(M = {m + n \choose m}\) combination vectors for the given sample sizes \((m, n)\), where \(\boldsymbol\psi_j = (\psi_{1j}, \ldots, \psi_{Nj})\). For example, if \(m = 3\) and \(n = 2\), then the \(j\)-th combination vector contains 1’s in the locations denoted by the \(j\)-th column of the matrix returned by the combn function (see above) and zeros elsewhere.

Let \(\mathbf{Z} = (Z_1, \ldots, Z_N)^\top\) denote the combined vector of data
Let \(\boldsymbol\psi = (\psi_1, \ldots, \psi_N)^\top\) denote the observed combination vector
The observed test statistic can be written as a function \(T = s(\mathbf{Z}, \boldsymbol\psi)\)

The exact null distribution is given by \(\mathcal{T} = \{T_j\}_{j = 1}^M\) where \(T_j = s(\mathbf{Z}, \boldsymbol\psi_j)\) denotes the test statistic corresponding to the \(j\)-th combination vector. Note that \(T_j\) is obtained by applying the given test statistic function \(s(\cdot, \cdot)\) to the data using the \(j\)-th combination vector.

An approximate null distribution is formed by calculating the test statistic for \(R\) random combination vectors (plus the observed combination vector), i.e., \(\mathcal{T}_{R+1} = \{T_j\}_{j = 1}^{R+1}\) where \(T_j = s(\mathbf{Z}, \boldsymbol\psi_j)\) denotes the test statistic corresponding to the \(j\)-th combination vector.

Necessary Assumptions

If the location-shift model is true, i.e., if \(F_X(z - \delta) = F_Y(z)\) where \(\delta = \mu_1 - \mu_2\) is the location shift, then the permutation test is exact. This is because, assuming the location-shift model is true, the group labels are arbitrary under \(H_0\): \(\delta = 0 \ \leftrightarrow \ F_X(z) = F_Y(z) \ \forall z\)

If the location-scale model is true, i.e., if \(F_X(z) = G([z - \mu_X] / \sigma_X)\) and \(F_Y(z) = G([z - \mu_Y] / \sigma_Y)\), where \((\mu_X, \mu_Y)\) are the population-specific location parameters and \((\sigma_X^2, \sigma_Y^2)\) are the population-specific variance parameters, then the test will be

inexact but asymptotically valid when using the Welch or the studentized Wilcoxon test statistic (i.e., var.equal = FALSE)
inexact and asymptotically invalid when using the Student or Wilcoxon test statistic (i.e., var.equal = TRUE) … unless you get lucky

If the two groups differ more generally, the test will be

inexact but asymptotically valid using the Welch test statistic (for testing \(H_0: \mu_1 = \mu_2\))
inexact but asymptotically valid using the Studentized Wilcoxon test statistic (for testing \(H_0: P(X < Y) = 1/2\))
inexact and asymptotically invalid using the Student and Wilcoxon test statistics (i.e., var.equal = TRUE) … unless you get lucky

Note: in the more general scenario, the Wilcoxon test statistics (median.test = TRUE) are not testing \(H_0: \mu_1 = \mu_2\), which is a common misconception. Instead, they are testing the null hypothesis \(H_0: P(X < Y) = 1/2\), i.e., that the median of \(F_{X-Y}\) is equal to zero, where \(F_{X-Y}\) is the distribution of the difference score \(X - Y\).

3.4 Simulation Study (Helwig, 2019a)

Design

The simulation study manipulated three data generating factors:

the distribution for data (see top row of Figure 1)
the standard deviation of the second group: \(\sigma_2 \in \{0.5, 1, 2\}\)
the sample size of the first group: \(n_1 \in \{10, 25, 50, 100, 200\}\)

Throughout the simulation study…

the standard deviation of the first group was fixed at \(\sigma_1 = 1\)
the sample size of the second group was defined as \(n_2 = 2 n_1\)

Analyses

10,000 independent datasets (replications) were generated for each of the 45 (3 distribution \(\times\) 3 \(\sigma_2\) \(\times\) 5 \(n_1\)) cells of the simulation design. For each replication, the null hypothesis \(H_{0}: \mu_1 = \mu_2\) was tested using the alternative hypothesis \(H_{1}: \mu_1 > \mu_2\) and an \(\alpha = 0.05\) significance level.

The significance testing results were compared using four methods:

Student’s \(t\) test which assumes normality and equal variances
Welch’s \(t\) test which assumes normality and unequal variances
Permutation test using Student’s (pooled variance) \(T\) test statistic
Permutation test using Welch’s (unequal variance) \(T^*\) test statistic

For the permutation tests, the number of resamples was set at \(R = 9999\).

Results

The type I error rate for each combination of test statistic and data generating condition is given in the figure below.

Figure 3: Type I error rate calculated across 10,000 replications for the \(t\) test (red squares) and the permutation test (black circles). The unfilled points denote results using the (Student) \(T\) test statistic, whereas the filled points denote results using the (Welch) \(T^*\) test statistic. The vertical bars denote approximate 99% confidence intervals, and the dotted horizontal lines denote the nominal rate of \(\alpha = 0.05\).

The simulation results reveal that…

When \(\sigma_1 = \sigma_2\) (equal variance)

Permutation test produces valid results for all \(n\) for both Student and Welch test statistic
\(t\) test produces asymptotically valid results for skewed data (and valid results for normal data)

When \(\sigma_1 \neq \sigma_2\) (unequal variance)

For \(t\) and permutation tests, using Student’s test statistic produces invalid results
For \(t\) and permutation tests, using Welch’s test statistic produces asymptotically valid results

3.5 Example 4.2: Alcohol Treatment

Data and Hypotheses

Consider the dataset in Example 4.2 from Nonparametric Statistical Methods, 3rd Ed. (Hollander et al., 2014).

\(N = 23\) patients are in an alcohol treatment program

\(m = 12\) assigned to Control condition (\(X\))
\(n = 11\) assigned to Social Skills Training (\(Y\))

The resposne variable is the post-treatment alcohol intake for 1 year following the program. The resposne is measured in centiliters of pure alcohol consumed throughout the duration of the year.

The goal is to test if the SST program reduced alcohol intake:

\(H_0: \mu_X = \mu_Y\) versus \(H_1: \mu_X > \mu_Y\) (median.test = FALSE)
\(H_0: P(X < Y) = 1/2\) versus \(H_1: P(X < Y) < 1/2 \Longleftrightarrow H_1: \mbox{median}(X-Y) > 0\) (median.test = TRUE)

R code to input the data:

x <- c(1042, 1617, 1180, 973, 1552, 1251, 1151, 1511, 728, 1079, 951, 1319)
y <- c(874, 389, 612, 798, 1152, 893, 541, 741, 1064, 862, 213)

Results with Student’s Test Statistic

Student’s \(t\) test:

## student t-test
t.test(x, y, alternative = "greater", var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  x and y
## t = 3.9835, df = 21, p-value = 0.0003379
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  259.1703      Inf
## sample estimates:
## mean of x mean of y 
## 1196.1667  739.9091

Permutation test using Student’s \(t\) statistic:

## permutation test (T = student)
set.seed(1)
nlt <- np.loc.test(x, y, alternative = "greater", var.equal = TRUE)
nlt

## 
## Nonparametric Location Test (Two Sample t-test)
## alternative hypothesis: true difference of means is greater than 0 
## t = 3.9835,  p-value = 6e-04
## sample estimate:
## difference of the means 
##                456.2576

Note that both approaches use the same observed test statistic is \(T = 3.9835\), but the two approaches produce slightly different p-values:

\(t\) test: \(p = 0.0003\) is obtained by comparing \(T = 3.9835\) to a \(t_{21}\) distribution (note that \(t_\nu\) is Student’s \(t\) distribution with \(\nu\) d.f.)
NP test: \(p = 0.0006\) is obtained by comparing \(T = 3.9835\) to the approximate null distribution with 10,000 elements

Plot the approximate null distribution used for the nonparametric test:

## plot perm.dist (T = student)
hist(nlt$perm.dist, breaks = "FD", xlab = "T", 
     main = "Permutation Distribution (T = Student)")
box()
abline(v = nlt$statistic, lty = 2, col = "red")
legend("topleft", "Observed T", lty = 2, col = "red", bty = "n")

Results with Welch’s Test Statistic

Welch’s \(t\) test:

## welch t-test
t.test(x, y, alternative = "greater")

## 
##  Welch Two Sample t-test
## 
## data:  x and y
## t = 3.9747, df = 20.599, p-value = 0.0003559
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  258.5566      Inf
## sample estimates:
## mean of x mean of y 
## 1196.1667  739.9091

Permutation test using Welch’s \(t\) statistic:

## permutation test (T = welch)
set.seed(1)
nlt <- np.loc.test(x, y, alternative = "greater")
nlt

## 
## Nonparametric Location Test (Welch Two Sample t-test)
## alternative hypothesis: true difference of means is greater than 0 
## t = 3.9747,  p-value = 6e-04
## sample estimate:
## difference of the means 
##                456.2576

Note that both approaches use the same observed test statistic is \(T = 3.9747\), but the two approaches produce slightly different p-values:

\(t\) test: \(p = 0.0004\) is obtained by comparing \(T = 3.9747\) to a \(t_{20.599}\) distribution (note that \(t_\nu\) is Student’s \(t\) distribution with \(\nu\) d.f.)
NP test: \(p = 0.0006\) is obtained by comparing \(T = 3.9747\) to the approximate null distribution with 10,000 elements

Plot the approximate null distribution used for the nonparametric test:

## plot perm.dist (T = welch)
hist(nlt$perm.dist, breaks = "FD", xlab = "T", 
     main = "Permutation Distribution (T = Welch)")
box()
abline(v = nlt$statistic, lty = 2, col = "red")
legend("topleft", "Observed T", lty = 2, col = "red", bty = "n")

Results with Wilcoxon’s Test Statistic

Wilcoxon’s rank sum test:

## wilcoxon rank sum t-test
wilcox.test(x, y, alternative = "greater")

## 
##  Wilcoxon rank sum exact test
## 
## data:  x and y
## W = 117, p-value = 0.0004904
## alternative hypothesis: true location shift is greater than 0

Permutation test using Wilcoxon’s rank sum statistic:

## permutation test (T = wilcoxon)
set.seed(1)
nlt <- np.loc.test(x, y, alternative = "greater",
                   median.test = TRUE, var.equal = TRUE)
nlt

## 
## Nonparametric Location Test (Wilcoxon Rank Sum Test)
## alternative hypothesis: true median of differences is greater than 0 
## t = 3.1388,  p-value = 9e-04
## sample estimate:
## median of the differences 
##                     435.5

Note that both approaches use the same observed test statistic, but return the result on a different scale: \(T = (W - E(W)) / \sqrt{\mathrm{var}(W)}\). In this case, the number of elements of the exact null distribution is large (\(M\) = 1,352,078). The wilcox.test function is implementing an exact test (b/c there are less than \(N = 50\) observations and no ties). The np.loc.test is implementing an approximate test using \(R = 9999\) random samples of the \(M\) possible combination vectors. The p-values for the two approaches are quite similar:

\(t\) test: \(p = 0.0005\) is obtained by comparing \(W = 117\) to the exact null distribution
NP test: \(p = 0.0009\) is obtained by comparing \(T = 3.1388\) to the approximate null distribution with 10,000 elements

Plot the approximate null distribution used for the nonparametric test:

## plot perm.dist (T = wilcoxon)
hist(nlt$perm.dist, breaks = "FD", xlab = "T", 
     main = "Permutation Distribution (T = Wilcoxon)")
box()
abline(v = nlt$statistic, lty = 2, col = "red")
legend("topleft", "Observed T", lty = 2, col = "red", bty = "n")

Results with Studentized Wilcoxon Test Statistic

Permutation test using studentized Wilcoxon’s rank sum statistic:

## permutation test (T = studentized wilcoxon)
set.seed(1)
nlt <- np.loc.test(x, y, alternative = "greater",
                   median.test = TRUE)
nlt

## 
## Nonparametric Location Test (Studentized Wilcoxon Test)
## alternative hypothesis: true median of differences is greater than 0 
## t = 5.4221,  p-value = 6e-04
## sample estimate:
## median of the differences 
##                     435.5

Plot the approximate null distribution used for the nonparametric test:

## plot perm.dist (T = studentized wilcoxon)
hist(nlt$perm.dist, breaks = "FD", xlab = "T", 
     main = "Permutation Distribution (T = Studentized Wilcoxon)")
box()
abline(v = nlt$statistic, lty = 2, col = "red")
legend("topleft", "Observed T", lty = 2, col = "red", bty = "n")

4 Association / Correlation Tests

4.1 Overview of Problem

Suppose we have paired data \((X_i,Y_i) \stackrel{\mathrm{iid}}{\sim}F_{XY}\) for \(i = 1, \ldots, n\), where \(F_{XY}\) is some bivariate distribution.

The goal is to test whether or not \(X\) and \(Y\) correlated (or associated) with one another.

\(X\) and \(Y\) are independent if and only if \(F_{XY}(x,y) = F_X(x)F_Y(y)\)
If \(X\) and \(Y\) are correlated/associated, they are dependent
Null hypothesis is \(H_0: \rho = 0\) where \(\rho = \mbox{cor}(X,Y)\)
Different definitions of \(\rho\) measure different types of association

We will focus on testing the significance of the Pearson product moment correlation coefficient between \(X\) and \(Y\), which has the form \[ \rho = \frac{\sigma_{XY}}{\sigma_X \sigma_Y} \] where \(\sigma_{XY} = E[(X - \mu_X) (Y - \mu_Y)]\) is the covariance between \(X\) and \(Y\), \(\sigma_X^2 = E[(X - \mu_X)^2]\) is the variance of \(X\), and \(\sigma_Y^2 = E[(Y - \mu_Y)^2]\) is the variance of \(Y\). Note that the Pearson correlation coefficient measures the degree of linear association between \(X\) and \(Y\).

The np.cor.test function (in the nptest package) can be used to implement Pearson correlation tests.

Usage:

np.cor.test(x, y, z = NULL,
                       alternative = c("two.sided", "less", "greater"),
                       rho = 0, independent = FALSE, partial = TRUE,
                       R = 9999, parallel = FALSE, cl = NULL,
                       perm.dist = TRUE)

Note: the function can also be used to test partial and semi-partial (part) correlations between \(X\) and \(Y\) controlling for \(Z = (Z_1,\ldots,Z_q)\).

4.2 Choice of Test Statistic

Specifying the Two Options

Two different test statistics are available:

independent = FALSE	independent = TRUE
Studentized \(t\) test	Classic \(t\) test

Note that the chosen test statistic relates to hypothesis being tested:

\(H_0: \rho = 0\) for independent = FALSE
\(H_0: F_{XY}(x,y) = F_X(x)F_Y(y)\) for independent = TRUE

Classic \(t\) test statistic (1908)

Student’s \(t\) test statistic for testing \(H_0: \rho = 0\) has the form \[ T = r \sqrt{\frac{n - 2}{1 - r^2}} \] where \[ r = \frac{S_{XY}}{S_X S_Y} = \frac{\sum_{i=1}^n (X_i - \bar{X}) (Y_i - \bar{Y})}{\sqrt{ \sum_{i=1}^n (X_i - \bar{X})^2 } \sqrt{ \sum_{i=1}^n (Y_i - \bar{Y})^2 }} \] is the sample correlation coefficient, \(S_{XY} = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X}) (Y_i - \bar{Y})\) is the sample covariance, \(S_X^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2\) is the sample variance of \(X\), and \(S_Y^2 = \frac{1}{n-1} \sum_{i=1}^n (Y_i - \bar{Y})^2\) is the sample variance of \(Y\).

Pitman correlation statistic (1937)

Pitman (1937) proposed using the sample correlation coefficient \(r\) as the test statistic, which is equivalent to using \[ T = r \sqrt{n} \] as the test statistic. If \(H_0\) is true, this test statistic is asymptotically equivalent to Student’s \(t\) test statistic, given that \(r \rightarrow 0\) as \(n \rightarrow \infty\) when \(H_0\) is true.

Studentized \(t\) test statistic (2017)

DiCiccio and Romano (2017) proposed a modification of the test statistic that is appropriate for testing \(H_0: \rho = 0\) when the bivariate distribution \(F_{X,Y}\) is any continuous distrbution. The modified (or “studentized”) test statistic has the form \[ T = \frac{r \sqrt{n}}{\hat{\tau}} \] where \[ \hat{\tau}^2 = \frac{\frac{1}{n} \sum_{i=1}^n (X_i - \bar{X})^2 (Y_i - \bar{Y})^2 }{ \left[ \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X})^2 \right] \left[ \frac{1}{n} \sum_{i=1}^n(Y_i - \bar{Y})^2 \right] } \] is an estimate of the asymptotic variance of \(r \sqrt{n}\) under the assumption \(H_0: \rho = 0\) is true.

Under \(H_0\), Pitman’s test statistic \(r \sqrt{n}\) asymptotically follows a normal distribution with mean \(0\) and variance \[ \tau^2 = \frac{E[(X - \mu_X)^2 (Y - \mu_Y)^2]}{\sigma_X^2 \sigma_Y^2} \] where \(\sigma_X^2 = E[(X - \mu_X)^2]\) is the variance of \(X\) and \(\sigma_Y^2 = E[(Y - \mu_Y)^2]\) is the variance of \(Y\). Note that \(\tau^2 = 1\) if \(X\) and \(Y\) are independent, but \(\tau^2\) can take other values if \(X\) and \(Y\) are dependent and uncorrelated (which can occur for non-Gaussian data).

4.3 Randomization Inference

Number of Possible Outcomes

For the association / correlation problem, there are \(M = n!\) possible outcomes given a sample \(n\) observations.

Test statistic depends on which \(X_i\) are paired with which \(Y_i\)
There are \(n!\) possible ways to permute the \(Y_i\) observations

There should be no linear association between \(X\) and \(Y\) if \(H_0: \rho = 0\) is true.

\((X_i, Y_i)\) are the observed data, which is one possible pairing of \(X_i\)’s with \(Y_i\)’s
All possible pairings are equally likely if \(X\) and \(Y\) are independent of one another

The \(M\) possible outcomes correspond to the \(n!\) vectors of the form \[ \boldsymbol\pi = (\pi_1, \ldots, \pi_n) \] where the \(\pi_i\) are some permutation of the integers \(1,\ldots,n\). Such vectors are referred to as “permutation vectors” given that the \(\pi_i\) control the ordering of the \(Y\) observations, i.e., \(Y_{\pi_i}\) is used to permute the \(Y\) values. Note that this is a form of conditional inference where we condition on the \(X_i\) and \(Y_i\) values.

Reminder: the permn function (in the nptest package) can be used to obtain all \(n!\) possible permutation vectors:

# all permutations of c(1, 2, 3)
permn(3)

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    2    2    1    3    3
## [2,]    2    1    3    3    1    2
## [3,]    3    3    1    2    2    1

Exact and Approximate Null Distributions

Define \(\{\boldsymbol\pi_j \}_{j = 1}^M\) to be the collection of \(M = n!\) permutation vectors for the given sample size \(n\), where \(\boldsymbol\pi_j = (\pi_{1j}, \ldots, \pi_{nj})\). For example, if \(n = 3\), then the \(j\)-th permutation vector is the \(j\)-th column of the matrix output by the permn function (see above).

\(\mathbf{X} = (X_1, \ldots, X_n)\) and \(\mathbf{Y} = (Y_1, \ldots, Y_n)\) are the observed data vectors
\(\boldsymbol\pi = (\pi_1, \ldots, \pi_n)\) is the observed permutation vector (with \(\pi_i = i \ \forall i\))
The observed test statistic can be written as a function \(T = s(\mathbf{X}, \mathbf{Y})\)

The exact null distribution is given by \(\mathcal{T} = \{T_j\}_{j = 1}^M\) where \(T_j = s(\mathbf{X}, \mathbf{Y}_j)\) denotes the test statistic corresponding to the j-th permutation of the \(\mathbf{Y}\) vector. Note that \(T_j\) is obtained by applying the given test statistic function \(s(\cdot, \cdot)\) to the data using the \(j\)-th permutation \(\mathbf{Y}_j = (Y_{\pi_{1j}}, \ldots, Y_{\pi_{nj}})\).

An approximate null distribution is formed by calculating the test statistic for \(R\) random permutation vectors (plus the observed permutation vector), i.e., \(\mathcal{T}_{R+1} = \{T_j\}_{j = 1}^{R+1}\) where \(T_j = s(\mathbf{X}, \mathbf{Y}_j)\) denotes the test statistic corresponding to the \(j\)-th permutation vector.

Necessary Assumptions

Using the studentized \(t\) test statistic (independent = FALSE), the test of \(H_0: \rho = 0\) will be (i) exact if \(X\) and \(Y\) are independent, and (ii) asymptotically valid otherwise.

Point (i) is because each possible pairing of \(X_i\) with \(Y_i\) is equally likely when \(X\) and \(Y\) are independent
Point (ii) is because \(r \sqrt{n}\) is asymptotically normal with mean 0 and variance \(\tau^2\) under \(H_0: \rho = 0\)

Using the classic \(t\) test statistic (independent = TRUE), the test of \(H_0: F_{XY}(x,y) = F_X(x) F_Y(y)\) will be (i) exact if \(X\) and \(Y\) are independent, and (ii) invalid if \(X\) and \(Y\) are dependent but uncorrelated.

Point (i) is the same reason as before (i.e., each pairing is equally likely)
Point (ii) is because the test uses a measure of linear association to quantify dependence

Note: rejecting \(H_0: F_{XY}(x,y) = F_X(x) F_Y(y)\) does not allow you to draw conclusions about the direction of the correlation \(\rho\), even though the sample correlation is used for the test statistic.

These results require some basic regularity conditions for the distribution \(F_{XY}\), i.e.,

\(E([X - \mu_X] [Y - \mu_Y]) < \infty\), \(E([X - \mu_X]^2) < \infty\), and \(E([Y - \mu_Y]^2) < \infty\)
\(E([X - \mu_X]^2 [Y - \mu_Y]^2) < \infty\), \(E([X - \mu_X]^4) < \infty\), and \(E([Y - \mu_Y]^4) < \infty\)

4.4 Simulation Study (Helwig, 2019a)

Design

The simulation study manipulated two data generating factors:

the bivariate distribution \(F_{XY}\) (see figure below)
the sample size \(n \in \{10, 25, 50, 100, 200\}\)

Figure 4: Three bivariate distributions with \(\mu_X = \mu_Y = 0\), \(\sigma_X^2 = \sigma_Y^2 = 1\), and \(\rho_{XY} = 0\).

Analyses

10,000 independent datasets (replications) were generated for each of the 15 (3 distribution \(\times\) 5 \(n\)) cells of the simulation design. For each replication, the null hypothesis \(H_0: \rho(X, Y) = 0\) was tested using the alternative hypothesis \(H_0: \rho(X, Y) > 0\) and an \(\alpha = 0.05\) significance level.

The significance testing results were compared using four methods:

classic \(t\) test assuming bivariate normality
normal approximation using studentized \(t\) test statistic
permutation test using classic \(t\) test statistic
permutation test using studentized \(t\) test statistic

For the permutation tests, the number of resamples was set at \(R = 9999\).

Results

The type I error rate for each combination of test statistic and data generating condition is given in the figure below.

Figure 5: Type I error rate calculated across 10,000 replications for the parametric tests (red squares) and the permutation tests (black circles). The unfilled points denote the results using the (classic) \(T\) test statistic, whereas the filled points denote the results using the (studentized) \(T^*\) test statistic. The vertical bars denote approximate 99% confidence intervals, and the dotted horizontal lines denote the nominal rate of \(\alpha = 0.05\).

The simulation results reveal that…

Circular Uniform (\(\tau < 1\)): using classic \(t\) test or Pitman’s permutation test produces invalid results (error rates are too small), whereas using the studentized \(t\) test statistic produces asymptotically valid results.
MVN (\(\tau = 1\)): all methods perform equally well (b/c \(X\) and \(Y\) are independent).
MVT (\(\tau > 1\)): using classic \(t\) test or Pitman’s permutation test produces invalid results (error rates are too large), whereas using the studentized \(t\) test statistic produces asymptotically valid results.

4.5 Example 8.5: Twins’ Test Scores

Data and Hypotheses

Consider the dataset in Table 8.5 from Nonparametric Statistical Methods, 3rd Ed. (Hollander et al., 2014).

The dataset contains psychological test scores from \(n = 13\) pairs of twins.

\(X_i\) is first twin from \(i\)-th pair, and \(Y_i\) is second twin from \(i\)-th pair

Goal: test if the test scores are positively associated with one another:

\(H_0: \rho(X,Y) = 0\) versus \(H_1: \rho(X,Y) > 0\) (independent = FALSE)
\(H_0: F_{XY}(x,y) = F_X(x) F_Y(y)\) versus \(H_1: \rho(X,Y) > 0\) (independent = TRUE)

R code to input the data:

x <- c(277, 169, 157, 139, 108, 213, 232, 229, 114, 232, 161, 149, 128)
y <- c(256, 118, 137, 144, 146, 221, 184, 188, 97, 231, 114, 187, 230)

Results using Classic Test Statistic

Classic \(t\) test statistic:

## pearson correlation test
cor.test(x, y, alternative = "greater")

## 
##  Pearson's product-moment correlation
## 
## data:  x and y
## t = 2.8284, df = 11, p-value = 0.008209
## alternative hypothesis: true correlation is greater than 0
## 95 percent confidence interval:
##  0.2479471 1.0000000
## sample estimates:
##       cor 
## 0.6488863

Permutation test using classic \(t\) statistic:

## permutation test (T = classic)
set.seed(1)
nct <- np.cor.test(x, y, alternative = "greater", independent = TRUE)
nct

## 
## Nonparametric Independence Test
## alternative hypothesis: true correlation is greater than 0 
## t = 2.8284,  p-value = 0.0091
## sample estimate:
##       cor 
## 0.6488863

Note that both approaches use the same observed test statistic is \(T = 2.8284\), but the two approaches produce slightly different p-values:

\(t\) test: \(p = 0.0082\) is obtained by comparing \(T = 2.8284\) to a \(t_{11}\) distribution (note that \(t_\nu\) is Student’s \(t\) distribution with \(\nu\) d.f.)
NP test: \(p = 0.0091\) is obtained by comparing \(T = 2.8284\) to the approximate null distribution with 10,000 elements

Plot the approximate null distribution used for the nonparametric test:

## plot perm.dist (T = classic)
hist(nct$perm.dist, breaks = "FD", xlab = "T", 
     main = "Permutation Distribution (T = Classic)")
box()
abline(v = nct$statistic, lty = 2, col = "red")
legend("topleft", "Observed T", lty = 2, col = "red", bty = "n")

Results using Studentized Test Statistic

Permutation test using studentized \(t\) statistic:

## permutation test (T = studentized)
set.seed(1)
nct <- np.cor.test(x, y, alternative = "greater")
nct

## 
## Nonparametric Correlation Test
## alternative hypothesis: true correlation is greater than 0 
## t = 1.9463,  p-value = 0.025
## sample estimate:
##       cor 
## 0.6488863

Plot the approximate null distribution used for the nonparametric test:

## plot perm.dist (T = studentized)
hist(nct$perm.dist, breaks = "FD", xlab = "T", 
     main = "Permutation Distribution (T = Studentized)")
box()
abline(v = nct$statistic, lty = 2, col = "red")
legend("topleft", "Observed T", lty = 2, col = "red", bty = "n")

5 Regression Coefficient Tests

5.1 Overview of Problem

Consider a linear model of the form \[ Y_i = \alpha + \sum_{j = 1}^p \beta_j X_{ij} + \epsilon_i \quad \leftrightarrow \quad Y = \alpha + X \beta + \epsilon \] or a linear model of the form \[ Y_i = \alpha + \sum_{j = 1}^p \beta_j X_{ij} + \sum_{k = 1}^q \gamma_k Z_{ik} + \epsilon_i \quad \leftrightarrow \quad Y = \alpha + X \beta + Z \gamma + \epsilon \] where the goal is to test the null hypothesis \(H_0: \beta = \beta_0\) versus the alternative hypothesis \(H_1: \beta \neq \beta_0\). Note that the covariates in \(Z\) are being “controlled for” (or “conditioned on”) while testing the effects of the variables in \(X\).

The np.reg.test function (in the nptest package) can be used to implement nonparametric tests of regression coefficients.

Usage:

np.reg.test(x, y, z = NULL, method = NULL,
                       beta = NULL, homosced = FALSE,
                       R = 9999, parallel = FALSE, cl = NULL,
                       perm.dist = TRUE)

5.2 Choice of Test Statistic

Specifying the Two Options

homosced = FALSE	homosced = TRUE
\(W\) statistic	\(F\) statistic

Note that the chosen test statistic relates to assumptions about the variance of the error terms:

\(F\) statistic assumes \(\mathrm{var}(\epsilon_i) = \sigma^2\) (homoscedastic)
\(W\) statistic assumes \(\mathrm{var}(\epsilon_i) = \sigma_i^2\) (heteroscedastic)

Least Squares Estimates

The linear model can be rewritten as \[ Y = \alpha + M \psi + \epsilon \] where \(M = (X, Z)\) denotes the combined design matrix, and \(\psi = \left[ \begin{smallmatrix} \beta \\ \gamma \end{smallmatrix} \right]\) denotes the combined coefficient vector. Note that we can write \[ \beta = S^\top \psi \] where the selection matrix \(S = \left( \begin{smallmatrix} I \\ 0 \end{smallmatrix} \right)\) returns the \(\beta\) portion of \(\psi\).

The least squares estimate of the coefficient vector has the form \[ \hat{\psi} = \left[ \begin{smallmatrix} \hat{\beta} \\ \hat{\gamma} \end{smallmatrix} \right] = \left( M_{\mathrm{c}}^{^\top} M_{\mathrm{c}} \right)^{-1} M_{\mathrm{c}}^\top Y \] where \(M_{\mathrm{c}} = C M\) is the columnwise mean centered version of \(M\), and the matrix \(C = I - n^{-1} 1 1^\top\) is a centering matrix.

Note: the least squares estimate of the intercept has the form \[ \hat{\alpha} = \bar{Y} - \bar{M}^\top \hat{\psi} \] where \(\bar{Y}\) is the mean of the response, and \(\bar{M}\) is the mean predictor vector.

Classic \(F\) test statistic

The classic \(F\) test statistic has the form \[ F = \frac{1}{p} \left( \hat{\beta} - \beta_0 \right)^\top \hat{\Sigma}_{\hat{\beta}}^{-1} \left( \hat{\beta} - \beta_0 \right) \] where

\(\hat{\Sigma}_{\hat{\beta}} = S^\top \hat{\Sigma}_{\hat{\psi}} S\) is the (estimated) covariance matrix of \(\hat{\beta}\) assuming \(\mathrm{var}(\epsilon_i) = \sigma^2\)
\(\hat{\Sigma}_{\hat{\psi}} = \hat{\sigma}^2 \left( M_{\mathrm{c}}^{^\top} M_{\mathrm{c}} \right)^{-1}\) is the (estimated) covariance matrix of \(\hat{\psi}\) assuming \(\mathrm{var}(\epsilon_i) = \sigma^2\)
\(\hat{\sigma}^2 = \frac{1}{n-r-1} \| \hat{\epsilon} \|^2\) is the error variance estimate assuming \(\mathrm{var}(\epsilon_i) = \sigma^2\)
\(\hat{\epsilon} = Y - \hat{Y}\) is the residual vector and \(\hat{Y} = \hat{\alpha} + M \hat{\psi}\) is the fitted value vector.

If the errors are iid \(N(0, \sigma^2)\), then the \(F\) statistic follows an \(F_{p, n - r - 1}\) distribution under \(H_0\).

Robust \(W\) test statistic

The robust Wald test statisic has the form \[ W = \left( \hat{\beta} - \beta_0 \right)^\top \hat{\Omega}_{\hat{\beta}}^{-1} \left( \hat{\beta} - \beta_0 \right) \] where

\(\hat{\Omega}_{\hat{\beta}} = S^\top \hat{\Omega}_{\hat{\psi}} S\) is the (estimated) asymptotic covariance matrix of \(\hat{\beta}\) assuming \(\mathrm{var}(\epsilon_i) = \sigma_i^2\)
\(\hat{\Omega}_{\hat{\psi}} = \left( M_{\mathrm{c}}^{^\top} M_{\mathrm{c}} \right)^{-1} M_{\mathrm{c}}^{^\top} D_{\hat{\epsilon}} M_{\mathrm{c}} \left( M_{\mathrm{c}}^{^\top} M_{\mathrm{c}} \right)^{-1}\) is the (estimated) asymptotic covariance matrix of \(\hat{\psi}\) assuming \(\mathrm{var}(\epsilon_i) = \sigma_i^2\)
\(D_{\hat{\epsilon}} = \mathrm{diag}(\hat{\epsilon}_1^2, \ldots, \hat{\epsilon}_n^2 )\) is a diagonal matrix containing the squared residuals
\(D_{\hat{\epsilon}} = \mathrm{diag}((Y_1 - \bar{Y})^2, \ldots, (Y_n - \bar{Y})^2 )\) if there are no covariates in the model

Under a variety of conditions, the \(W\) statistic asymptotically follows a \(\chi_p^2\) distribution under \(H_0\).

5.3 Randomization Inference

Number of Possible Outcomes

For the regression problem, there are \(M = n!\) possible outcomes given a sample \(n\) observations.

Test statistic depends on which \(X_i = (X_{i1}, \ldots, X_{ip})^\top\) are paired with which \(Y_i\)
There are \(n!\) possible ways to permute the rows of \(X\).
If errors are symmetric, can permute and resign (so \(M = n! 2^n\)).

There should be no linear association between \(X\) and \(Y\) if \(H_0: \beta = 0\) is true.

\((X_i, Y_i)\) are the observed data, which is one possible pairing of \(X_i\)’s with \(Y_i\)’s
All possible pairings are equally likely if \(X\) and \(Y\) are independent of one another

Reminder: the permn function (in the nptest package) can be used to obtain all \(n!\) possible permutation vectors:

# all permutations of c(1, 2, 3)
permn(3)

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    2    2    1    3    3
## [2,]    2    1    3    3    1    2
## [3,]    3    3    1    2    2    1

Exact and Approximate Null Distributions

With no covariates (\(Y = \alpha + X \beta + \epsilon\)), the exact null distribution is formed by calculating the test statistic for all possible permutations of \(Y\).

The observed test statistic can be written as a function \(T = s(X, Y)\)
The exact null distribution is given by \(\mathcal{T} = \{T_j\}_{j = 1}^M\) where \(T_j = s(X, Y_j)\) denotes the test statistic corresponding to the \(j\)-th permutation of the \(Y\) vector.

With covariates (\(Y = \alpha + X \beta + Z \gamma + \epsilon\)), there are many possible permutation strategies (see Table 1).

Reproduction of Table 1 from Helwig (2019b).

Necessary Assumptions

With no covariates (\(Y = \alpha + X \beta + \epsilon\)), we need the following assumptions:

\(\{ (X_i, Y_i) \}_{i = 1}^n\) are an iid sample from some distribution \(F_{XY}\) that satisfies the assumed linear model with \(E(\epsilon_i X_i) = 0\).
\(E(\epsilon_i) = 0\) but \(E(\epsilon_i | X_i)\) may be non-zero because \(\epsilon_i\) may depend on \(X_i\)

With covariates (\(Y = \alpha + X \beta + Z \gamma + \epsilon\)), we need the following assumptions:

\(\{ (X_i, Z_i, Y_i) \}_{i = 1}^n\) are an iid sample from some distribution \(F_{XZY}\) that satisfies the assumed linear model.
\(E(\epsilon_i) = 0\) and \(E(\epsilon_i | X_i, Z_i) = 0\) (i.e., errors have expectation zero)

In both cases, we make some basic regularity assumptions about the data:

The matrices \(M_{\mathrm{c}}^{^\top} M_{\mathrm{c}}\) and \(M_{\mathrm{c}}^{^\top} D_{\hat{\epsilon}} M_{\mathrm{c}}\) are almost surely invertible, and their expectations are nonsingular
The response and predictor variables have finite fourth moments

5.4 Simulation Study (Helwig, 2019b)

Helwig, N. E. (2019). Robust nonparametric tests of general linear model coefficients: A comparison of permutation methods and test statistics. NeuroImage, 201, 116030. https://doi.org/10.1016/j.neuroimage.2019.116030

Design

The simulation study manipulated four data generating conditions:

the distribution for data (MVN and MVT with \(\nu = 5\))
the correlation between \(X\) and \(Z\) (3 levels: \(\rho \in \{0, 1/3, 2/3\}\))
the number of observations: \(n \in \{10, 25, 50, 100, 200 \}\)
the true \(\beta\) coefficient (2 levels: \(\beta \in \{0, 1/4\}\))

The multivariate normal (MVN) condition is used to explore the methods when the error terms are homoscedastic, whereas the MVT (with \(\nu = 5\)) is included to compare the methods with heteroscedastic error terms (see Appendix A of Helwig, 2019b).

The \(\beta = 0\) condition is used to explore each method’s type I error rate, whereas the \(\beta = 1/4\) condition is used to explore each method’s power.

Analyses

10,000 independent datasets (replications) were generated for each of the 60 (2 distribution \(\times\) 3 \(\rho\) \(\times\) 5 \(n\) \(\times\) 2 \(\beta\)) cells of the simulation design. For each replication, the null hypothesis \(H_0: \beta = 0\) was tested using the alternative hypothesis \(H_0: \beta \neq 0\) and an \(\alpha = 0.05\) significance level.

The significance testing results were compared using 18 different approaches:

all combinations of 2 test statistics with 8 permutation methods (16 possibilities)
two corresponding parameteric tests (2 possibilities)

For the permutation tests, the number of resamples was set at \(R = 9999\).

Results

The type I error rate (\(\beta = 0\)) for each combination of test statistic and data generating condition is given in the figure below.

Figure 5: Type I error rates for various simulation conditions. PA = parametric.

The plot of the type I error rates reveals that…

The \(F\) statistic is exact for MVN data (\(\epsilon_i\) homoscedastic), but invalid for MVT data (\(\epsilon_i\) heteroscedastic)
The \(W\) statistic is exact for MVN data (\(\epsilon_i\) homoscedastic), and is asymptotically valid for MVT data (\(\epsilon_i\) heteroscedastic)
The Still-White (SW) permutation method is invalid when \(X\) and \(Z\) are correlated
Otherwise the different permutation methods perform similarly to one another

The power (\(\beta = 1/4\)) for each combination of test statistic and data generating condition is given in the figure below.

Figure 6: Statistical power for various simulation conditions. PA = parametric.

The plot of the power for each approach reveals that…

Power increased as \(n\) increased for both \(F\) and \(W\) statistics
Correlation between \(X\) and \(Z\) reduced the power for both \(F\) and \(W\) statistics
For MVN data, using \(W\) statistic led to slightly reduced power (compared to \(F\) statistic), but the difference disappeared as \(n\) increased
For MVT data, the power of the \(W\) statistic was slightly reduced compared to the MVN data

5.5 Example: SAT and College GPA

Data and Hypotheses

We will use the SAT and College GPA example from http://onlinestatbook.com/2/case_studies/sat.html

The dataset contains information from \(n = 105\) students that graduated from a state university with a B.S. degree in computer science.

high_GPA = High school grade point average
math_SAT = Math SAT score
verb_SAT = Verbal SAT score
comp_GPA = Computer science grade point average
univ_GPA = Overall university grade point average

R code to read-in and look at the data

# read-in data
sat <- read.table("http://onlinestatbook.com/2/case_studies/data/sat.txt", 
                  header = TRUE)
head(sat)

##   high_GPA math_SAT verb_SAT comp_GPA univ_GPA
## 1     3.45      643      589     3.76     3.52
## 2     2.78      558      512     2.87     2.91
## 3     2.52      583      503     2.54     2.40
## 4     3.67      685      602     3.83     3.47
## 5     3.24      592      538     3.29     3.47
## 6     2.10      562      486     2.64     2.37

# plot data
plot(sat$high_GPA, sat$univ_GPA, 
     xlab = "High school GPA", ylab = "University GPA")
abline(lm(univ_GPA ~ high_GPA, data = sat))

Is University GPA Linearly Related to High School GPA?

Consider the simple linear regression model \[ Y = \alpha + X \beta + \epsilon \] where \(Y\) is the University GPA and \(X\) is the high school GPA.

Testing \(H_0: \beta = 0\) versus \(H_1: \beta \neq 0\)

Classic \(F\) test statistic:

# classic F test (assuming iid normal errors)
anova(lm(univ_GPA ~ high_GPA, data = sat))

## Analysis of Variance Table
## 
## Response: univ_GPA
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## high_GPA    1 12.6394 12.6394  159.57 < 2.2e-16 ***
## Residuals 103  8.1587  0.0792                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Permutation test with \(F\) test statistic:

# permutation test (assuming homoscedastic errors)
set.seed(1)
np.reg.test(x = sat$high_GPA, y = sat$univ_GPA, homosced = TRUE)

## 
## Nonparametric Regression Test (Homoscedastic)
## randomization method: Permute 
## F = 159.5667,  p-value = 1e-04
## sample estimates:
## (Intercept)          x1 
##      1.0968      0.6748

Permutation test with \(W\) test statistic:

# permutation test (assuming heteroscedastic errors)
set.seed(1)
np.reg.test(sat$high_GPA, sat$univ_GPA)

## 
## Nonparametric Regression Test (Heteroscedastic)
## randomization method: Permute 
## W = 40.6939,  p-value = 1e-04
## sample estimates:
## (Intercept)          x1 
##      1.0968      0.6748

Note that the classic approach uses the same \(F\) test statistic as the permutation test that assumes homoscedasticity (\(F = 159.5667\)). In this case, all three tests produce similar results: we reject the null hypothesis of no linear relationship.

Do SAT Scores Help Predict University GPA?

Consider the multiple linear regression model \[ Y = \alpha + X_1 \beta_1 + X_2 \beta_2 + Z \gamma + \epsilon \] where \(Y\) is the University GPA, \(X_1\) is the Math SAT score, \(X_2\) is the Verbal SAT score, and \(Z\) is the high school GPA.

Testing \(H_0: \beta_1 = \beta_2 = 0\) versus \(H_1: (\beta_1 \neq 0) \mbox{ and/or } (\beta_2 \neq 0)\)

Classic \(F\) test statistic:

# classic F test (assuming iid normal errors)
mod0 <- lm(univ_GPA ~ high_GPA, data = sat)
mod1 <- lm(univ_GPA ~ high_GPA + math_SAT + verb_SAT, data = sat)
anova(mod0, mod1)

## Analysis of Variance Table
## 
## Model 1: univ_GPA ~ high_GPA
## Model 2: univ_GPA ~ high_GPA + math_SAT + verb_SAT
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    103 8.1587                           
## 2    101 7.8288  2   0.32988 2.1279 0.1244

Permutation test with \(F\) test statistic:

# permutation test (assuming homoscedastic errors)
set.seed(1)
np.reg.test(x = cbind(sat$math_SAT, sat$verb_SAT), 
            y = sat$univ_GPA, z = sat$high_GPA, homosced = TRUE)

## 
## Nonparametric Regression Test (Homoscedastic)
## randomization method: Huh-Jhun 
## F = 2.1071,  p-value = 0.1239
## sample estimates:
## (Intercept)          x1          x2          z1 
##      0.5793      0.0005      0.0010      0.5454

Permutation test with \(W\) test statistic:

# permutation test (assuming heteroscedastic errors)
set.seed(1)
np.reg.test(x = cbind(sat$math_SAT, sat$verb_SAT), 
            y = sat$univ_GPA, z = sat$high_GPA)

## 
## Nonparametric Regression Test (Heteroscedastic)
## randomization method: Huh-Jhun 
## W = 6.8236,  p-value = 0.0283
## sample estimates:
## (Intercept)          x1          x2          z1 
##      0.5793      0.0005      0.0010      0.5454

Interestingly, the methods using the \(F\) test statistic produce a p-value that would fail to reject the null using a standard significance level. In contrast, the permutation test using the robust \(W\) test statistic produces a p-value of \(p = 0.0283\), which would reject the null using a standard \(\alpha = 0.05\) significance level. The difference between the results seems to be because the error variances are heteroscedastic.

Visualize the heteroscedasticity in the data

# visualize heteroscedasticity
plot(fitted.values(mod1), residuals(mod1))

Nonparametric Hypothesis Tests in R

Nathaniel E. Helwig
Department of Psychology & School of Statistics
University of Minnesota

January 04, 2021

1 Introduction

1.1 Motivation and Goals

1.2 Permutations in R

1.3 Hypothesis Testing Basics

1.4 Exact Nonparametric Tests

1.5 Approximate (Monte Carlo) Tests

2 One-Sample Location Tests

2.1 Overview of Problem

2.2 Choice of Test Statistic

2.3 Randomization Inference

2.4 Simulation Study (Helwig, 2019a)

2.5 Example 3.1: Depression Treatment

3 Two-Sample Location Tests

3.1 Overview of Problem

3.2 Choice of Test Statistic

3.3 Randomization Inference

3.4 Simulation Study (Helwig, 2019a)

3.5 Example 4.2: Alcohol Treatment

4 Association / Correlation Tests

4.1 Overview of Problem

4.2 Choice of Test Statistic

4.3 Randomization Inference

4.4 Simulation Study (Helwig, 2019a)

4.5 Example 8.5: Twins’ Test Scores

5 Regression Coefficient Tests

5.1 Overview of Problem

5.2 Choice of Test Statistic

5.3 Randomization Inference

5.4 Simulation Study (Helwig, 2019b)

5.5 Example: SAT and College GPA

Nonparametric Hypothesis Tests in R

Nathaniel E. Helwig Department of Psychology & School of Statistics University of Minnesota

January 04, 2021

1 Introduction

1.1 Motivation and Goals

1.2 Permutations in R

1.3 Hypothesis Testing Basics

1.4 Exact Nonparametric Tests

1.5 Approximate (Monte Carlo) Tests

2 One-Sample Location Tests

2.1 Overview of Problem

2.2 Choice of Test Statistic

2.3 Randomization Inference

2.4 Simulation Study (Helwig, 2019a)

2.5 Example 3.1: Depression Treatment

3 Two-Sample Location Tests

3.1 Overview of Problem

3.2 Choice of Test Statistic

3.3 Randomization Inference

3.4 Simulation Study (Helwig, 2019a)

3.5 Example 4.2: Alcohol Treatment

4 Association / Correlation Tests

4.1 Overview of Problem

4.2 Choice of Test Statistic

4.3 Randomization Inference

4.4 Simulation Study (Helwig, 2019a)

4.5 Example 8.5: Twins’ Test Scores

5 Regression Coefficient Tests

5.1 Overview of Problem

5.2 Choice of Test Statistic

5.3 Randomization Inference

5.4 Simulation Study (Helwig, 2019b)

5.5 Example: SAT and College GPA

Nathaniel E. Helwig
Department of Psychology & School of Statistics
University of Minnesota