Statistics 5102 (Geyer, Spring 2010) Examples: Bootstrap

Nonparametric Bootstrap

The data set

http://www.stat.umn.edu/geyer/5102/data/ex8-1.txt

is data of (X_i, Y_i) pairs, from which we wish to estimate the correlation coefficient and get some idea of its sampling distribution.

The following R statements do a nonparametric bootstrap estimate of the sampling distribution of the correlation coefficient.

The R functions sample and sample.int (on-line help) sample with or without replacement from a finite population. Here we use sample.int which samples from the integers from one to n (the data sample size).

Applying the result to the original data, we get bootstrap data x.star and y.star which we use to calculate one random realization the estimator theta.star[i] each time through the loop.

The histogram shows the sampling distribution of theta.star which is assumed to be close to the sampling distribution of the actual estimator. More precisely the distribution of theta.hat − theta is assumed to be close to the distribution of theta.star − theta.hat.

`Bootstrap Percentile Intervals`





The simplest method of making confidence intervals for the unknown parameter
is to take α ⁄ 2
and 1 − α ⁄ 2 quantiles of the bootstrap distribution
of theta.star.





R statements
conf.level <- 0.95
nboot <- 1000
n <- length(x)
theta.hat <- cor(x, y)
theta.hat
theta.star <- double(nboot)
for (i in 1:nboot) {
    k.star <- sample.int(n, n, replace = TRUE)
    x.star <- x[k.star]
    y.star <- y[k.star]
    theta.star[i] <- cor(x.star, y.star)
}
alpha <- 1 - conf.level
quantile(theta.star, c(alpha / 2, 1 - alpha / 2))





Dataset URL





    Other Bootstrap Confidence Intervals



Many different methods of making bootstrap confidence intervals have
been proposed, far too many to cover in this course.  The course
on nonparametric inference (Stat 5601) usually covers them.

Here are some web pages from the last time your instructor taught that
course.  These cover some but by no means all the methods.

Bootstrap T Intervals
Bootstrap Percentile Intervals
Better Bootstrap Confidence Intervals


    Bootstrap Hypothesis Tests



The bootstrap doesn't do hypothesis tests in general, the reason being
that the bootstrap has no general way to sample from (an analog of) the
null hypothesis when the null hypothesis is not true.  The bootstrap
simulates from (an analog of) the true unknown distribution.  Hence when
the alternative hypothesis is true, the bootstrap samples from (an analog
of) the alternative hypothesis.  Not what is wanted.

In special situations, one can cook up a bootstrap-like procedure that
can be claimed to simulate from (an analog of) the null hypothesis.
But there is no general procedure for that.

One can always invert bootstrap confidence intervals to perform a hypothesis
test about the parameter the confidence interval is for.  This is a simple
application of the duality of tests and confidence intervals
(slide 206, deck 2).

Here is a web page from the last time your instructor taught Stat 5601
covering that.

Nonparametric Bootstrap Hypothesis Tests


    Parametric Bootstrap



Here is our example of
 confidence intervals for mean values for
a generalized linear model redone using the parametric bootstrap.

The data set

http://www.stat.umn.edu/geyer/5102/data/ex6-1.txt


contains two variables the response y, which is Bernoulli,
and the predictor x which is quantative and the distribution
of which doesn't matter, since we condition on it.

The following R statements fit the model and do a parametric bootstrap
of the mean value for an individual whose x value is 25.




R statements
xnew <- 25
lout <- glm(y ~ x, family = binomial)
mu.hat <- predict(lout, newdata = data.frame(x = xnew),
    type = "response")
mu.hat

nboot <- 250
n <- length(y)
mu.boot <- predict(lout, type = "response")
mu.star <- double(nboot)
for (iboot in 1:nboot) {
    y.star <- as.numeric(runif(n) < mu.boot)
    lout.star <- glm(y.star ~ x, family = binomial)
    mu.star[iboot] <- predict(lout.star,
        newdata = data.frame(x = xnew), type = "response")
}
hist(mu.star)
abline(v = mu.hat, col = "red")





Dataset URL





From the histogram of the parametric bootstrap distribution of the
estimator, we see we are a long way from asymptopia.

    
Bootstrap T Intervals



The generally accepted way to make parametric bootstrap confidence intervals
is via bootstrap t procedures, which are analogous to t
confidence intervals when the data are assumed normal.

Here is a web page from the last time your instructor taught Stat 5601
covering that.

The Parametric Bootstrap

 




Navigation

         University of Minnesota
        
 School of Statistics
        
 Rweb
        
 Stat 5102 (Spring 2009)
        
 Stat 5101
        
 Stat 5102
        
         General Info
        
 Homework Assignments
        
 Exams
        
 Course Notes
        
 Course Slides
        
 Computer Examples
        
 Recordings
            



        Contents
        
         Nonparametric Bootstrap
        
 Bootstrap Percentile Intervals
        
 Other Bootstrap Confidence Intervals
        
 Bootstrap Hypothesis Tests
        
 Parametric Bootstrap
        
 Bootstrap T Intervals