Statistics 5102 (Geyer, Spring 2013) Examples: Bayesian Inference

Point Estimates

Posterior medians require the computer, except when the posterior distribution is symmetric (in which case the median is the center of symmetry).

Here is the example from the slides. The data are Binomial(n, p) and the prior distribution for p is Beta(α₁, α₂)

Posterior PDF

When you have a computer, there is no point in not plotting the whole posterior PDF. For the same example above

Interval Estimates

Equal Tails

The most obvious Bayesian competitor of frequentist confidence intervals is the interval between the α ⁄ 2 and 1 − α ⁄ 2 quantiles of the posterior distribution of the parameter of interest. This makes a 100 (1 − α) % credible interval for the parameter of interest.

Again, the data are Binomial(n, p) and the prior distribution for p is Beta(α₁, α₂)

The shaded area under the curve has posterior probability conf.level. The unshaded areas on either side have posterior probability α ⁄ 2.

With the data, hyperparameter, and confidence level given in the form before editing, the unshaded area to the left is too small to be seen

Rweb:> qbeta(0.025, 0 + 1 / 2, 10 - 0 + 1 / 2) 
[1] 4.789043e-05

But it is there.

Highest Posterior Density

The next most obvious Bayesian competitor of frequentist confidence intervals is the level set of the posterior PDF of the parameter of interest that has probability 1 − α. This makes a 100 (1 − α) % credible interval for the parameter of interest.

Again, the data are Binomial(n, p) and the prior distribution for p is Beta(α₁, α₂)

R statements x <- 0 n <- 10 alpha1 <- 1 / 2 alpha2 <- 1 / 2 conf.level <- 0.95 alpha <- 1 - conf.level eps <- 1e-4 theta <- seq(0, 1, eps) h1 <- pbeta(theta, x + alpha1, n - x + alpha2) h2 <- pbeta(theta, x + alpha1, n - x + alpha2, lower.tail = FALSE) d1 <- diff(h1) d2 <- (- diff(h2)) d <- pmax(d1, d2) i <- seq(along = d) isort <- rev(i[order(d)]) dsort <- rev(d[order(d)]) csort <- cumsum(dsort) inies <- isort[csort <= conf.level] tlow <- theta[min(inies)] thig <- theta[max(inies) + 1] round(c(tlow, thig), 4) y <- dbeta(theta, x + alpha1, n - x + alpha2) ymax <- max(y) if (! is.finite(ymax)) ymax <- max( dbeta(0.02, x + alpha1, n - x + alpha2), dbeta(0.98, x + alpha1, n - x + alpha2)) plot(theta, y, type = "l", ylim = c(0, ymax), xlab = "p", ylab = "h(p | x)") xpoly <- c(theta[seq(min(inies), max(inies) + 1)], thig, tlow) ypoly <- c(y[seq(min(inies), max(inies) + 1)], 0, 0) ypoly <- pmin(ypoly, par("usr")[4]) polygon(xpoly, ypoly, border = NA, col = "seagreen1") lines(theta, y)

The shaded area under the curve has posterior probability conf.level.

Unlike the equal tailed interval, the HPD region automatically switches from two-sided to one-sided as appropriate.

With the data, hyperparameter, and confidence level given in the form before editing, the HPD is one-sided, going all the way to zero.

Two Intervals Compared

Same as in the two preceding sections except we put both intervals on one plot.

R statements x <- 0 n <- 10 alpha1 <- 1 / 2 alpha2 <- 1 / 2 conf.level <- 0.95 alpha <- 1 - conf.level qlow <- qbeta(alpha / 2, x + alpha1, n - x + alpha2) qhig <- qbeta(alpha / 2, x + alpha1, n - x + alpha2, lower.tail = FALSE) par(mar = c(5, 4, 0, 0) + 0.1, mfrow = c(2, 1)) eps <- 1e-4 theta <- seq(0, 1, eps) y <- dbeta(theta, x + alpha1, n - x + alpha2) ymax <- max(y) if (! is.finite(ymax)) ymax <- max( dbeta(0.02, x + alpha1, n - x + alpha2), dbeta(0.98, x + alpha1, n - x + alpha2)) qlow <- round(qlow / eps) * eps qhig <- round(qhig / eps) * eps plot(theta, y, type = "l", ylim = c(0, ymax), xlab = "p", ylab = "h(p | x)") tpoly <- seq(qlow, qhig, eps) xpoly <- c(tpoly, qhig, qlow) ypoly <- c(dbeta(tpoly, x + alpha1, n - x + alpha2), 0, 0) ypoly <- pmin(ypoly, par("usr")[4]) polygon(xpoly, ypoly, border = NA, col = "hotpink1") lines(theta, y) text(0.9, 0.9 * ymax, "Equal", adj = c(0.5, 0.5), col = "hotpink2", cex = 2) h1 <- pbeta(theta, x + alpha1, n - x + alpha2) h2 <- pbeta(theta, x + alpha1, n - x + alpha2, lower.tail = FALSE) d1 <- diff(h1) d2 <- (- diff(h2)) d <- pmax(d1, d2) i <- seq(along = d) isort <- rev(i[order(d)]) dsort <- rev(d[order(d)]) csort <- cumsum(dsort) inies <- isort[csort <= conf.level] tlow <- theta[min(inies)] thig <- theta[max(inies) + 1] y <- dbeta(theta, x + alpha1, n - x + alpha2) ymax <- max(y) if (! is.finite(ymax)) ymax <- max( dbeta(0.02, x + alpha1, n - x + alpha2), dbeta(0.98, x + alpha1, n - x + alpha2)) plot(theta, y, type = "l", ylim = c(0, ymax), xlab = "p", ylab = "h(p | x)") xpoly <- c(theta[seq(min(inies), max(inies) + 1)], thig, tlow) ypoly <- c(y[seq(min(inies), max(inies) + 1)], 0, 0) ypoly <- pmin(ypoly, par("usr")[4]) polygon(xpoly, ypoly, border = NA, col = "seagreen1") lines(theta, y) text(0.9, 0.9 * ymax, "HPD", adj = c(0.5, 0.5), col = "seagreen2", cex = 2) # equal tailed interval round(c(qlow, qhig), 4) # highest posterior density region round(c(tlow, thig), 4)

Hypothesis Tests

One Sample, One Tailed

The most obvious Bayesian competitor of frequentist P-values is the Bayes factor comparing the hypotheses.

The hypotheses (models) are

H₀ = m₁ : p ≥ p₀
H₁ = m₂ : p < p₀

Again, the data are Binomial(n, p). The prior distribution for p is Beta(α₁, α₂) conditioned on whichever hypothesis we are doing.

One Sample, Two Tailed

The hypotheses (models) are

H₀ = m₁ : p = p₀
H₁ = m₂ : p ≠ p₀

Again, the data are Binomial(n, p). The prior distribution for m₁ is concentrated at the point p₀. The prior distribution for m₂ is p is Beta(α₁, α₂).

Two Sample, Two Tailed

Now the data are x_i, i = 1, 2, where the x_i are independent and x_i is Binomial(n_i, p_i).

The hypotheses (models) are

H₀ = m₁ : p₁ = p₂
H₁ = m₂ : p₁ ≠ p₂

The prior distribution for m₁ forces p₁ = p₂ = p, in which case the distribution of x₁ + x₂ is Binomial(n₁ + n₂, p). For model m₂ we consider p₁ and p₂ a priori independent, and we use the same Beta(α₁, α₂) for both parameters. In model m₁ we use the prior Beta(α₃, α₄) for the only parameter.

Two Sample, One Tailed

Now the data are x_i, i = 1, 2, where the x_i are independent and x_i is Binomial(n_i, p_i).

The hypotheses (models) are

H₀ = m₁ : p₁ ≥ p₂
H₁ = m₂ : p₁ < p₂

We use the same prior distribution Beta(α₁, α₂) for both parameters.

Statistics 5102 (Geyer, Spring 2013) Examples: Bayesian Inference

Point Estimates

Posterior PDF

Interval Estimates

Equal Tails

Highest Posterior Density

Two Intervals Compared

Hypothesis Tests

One Sample, One Tailed

One Sample, Two Tailed

Two Sample, Two Tailed

Two Sample, One Tailed

Navigation

Contents

Navigation

Contents