Chapter 6 One-Sample Bayesian Procedures

The key philosophical difference with Bayesian procedures is that we express our prior knowledge about parameters and other unknowns using prior distributions and update those distributions to posterior distributions after observing the data. The key practical difference with Bayesian procedures is that there are generally no simple computations. Instead, we get approximate samples from the posterior distributions via Monte Carlo simulation.

There are several tools for doing the Monte Carlo in R including jags (which does Gibbs sampling), rstan (which does Hamiltonian sampling), brms, and others. We will illustrate using the bglmm function in cfcdae, which is a front end for rstan.

We will demonstrate using the data from the RunStitch example in cfcdae, which we have used before. Each worker run stitches collars using two different setups: the conventional setup and an ergonomic setup. The two runs are made in random order for each worker, and the interest is in any difference in average speed between the two setups.

Load the runstitch data from the package and create differences between Standard and Ergonomic.

data(RunStitch)
differences <- RunStitch[,"Standard"]-RunStitch[,"Ergonomic"]
head(differences)
[1]  1.03 -0.04  0.26  0.30 -0.97  0.04

6.1 bglmm Prior Distributions

bglmm places a normal prior with mean zero and standard deviation \(\sigma_k\) on all coefficients in term k of the model. At present in our example, we have at most one term in our formula: the mean. bglmm then puts a second level prior on \(\sigma_k\), assuming that it follows a gamma distribution with expected value \(\tau_k\) and shape parameter \(\nu_k\). A large value of \(\nu_k\) means that you are very certain that \(\sigma_k\) is near \(\tau_k\), whereas a small value indicates uncertainty.

To illustrate the degree of uncertainty, consider the ratio of the .995 quantile to the .005 quantile of the distribution. This is one measure of the spread of the middle 99% of the distribution.

shape <- c(1,1.5,2:10,15,20,30,50,100,200,400,1000)
ratio <- qgamma(.995,shape)/qgamma(.005,shape)
data.frame(shape,ratio)
    shape       ratio
1     1.0 1057.012101
2     1.5  178.999426
3     2.0   71.792473
4     3.0   27.448349
5     4.0   16.330513
6     5.0   11.683607
7     6.0    9.206618
8     7.0    7.686343
9     8.0    6.663908
10    9.0    5.930983
11   10.0    5.380372
12   15.0    3.893019
13   20.0    3.224391
14   30.0    2.587675
15   50.0    2.081903
16  100.0    1.676711
17  200.0    1.440322
18  400.0    1.294069
19 1000.0    1.176992

With a shape of just 1, the plausible range of values extends over 3 orders of magnitude. By the time you to shape 5, the range is down to just one order of magnitude. You need a huge shape parameter to really nail down a value for \(\sigma_k\).

We also need a prior for the error standard deviation \(\sigma\). We use the same approach of giving a prior expected value for \(\sigma\) and a shape parameter indicating degree of certainty.

If you do not specify the shape parameter, bglmm uses 1.5, which is rather dispersed. If you do not specify the prior expected value for a standard deviation, bglmm defaults to a value that is probably bigger than is reasonable.

6.2 RunStitch data

We want to illustrate fitting the Bayesian model, but we also want to illustrate how the results might depend on the prior distribution you choose. We want to consider prior means of either 0 or .5 (representing two different persons’ prior beliefs). We will also vary the prior standard deviation for the mean from nearly 0 up to .8. Generally, we will assume quite vague prior knowledge about the standard deviation of the data. However, we will also give one example where we assume much stronger certainty about the standard deviation to see what does, and does not, change.

We want to fit Bayesian models where we have a prior mean \(\mu\) for the differences and some known prior standard deviation for the differences. Things are fairly straightforward if the prior mean is 0, because that is what bglmm assumes. Just set the expected standard deviation to the desired value and use a very large shape to make it effectively constant. If the prior mean we want to fit is not zero, then we need to subtract the prior mean from the data and use a zero prior mean for the adjusted data.

Before seeing the data we might have a vague idea of how big the standard deviation of the data might be. Perhaps we think the prior mean is 1, but we are quite uncertain. For that, we might use a shape of 1.5 (ratio around 180). On the other hand, we might be more certain and use a shape of 6 (ratio about 10).

6.2.1 Model Fitting

Our first model is for someone who has a very strong prior belief that the mean should be near zero, evidenced by the small expected prior standard deviation for the intercept (sigma.scale.int) and large shape (gamma.shape.int). We use expected prior error standard deviation of 1 (sigma.scale0), along with a small shape indicating vague information (gamma.shape0). The quiet argument suppresses a lot of the progress information that is printed.

set.seed(20210330)
fit.0.0.vague <- bglmm(differences~1,
  sigma.scale.int=.001,gamma.shape.int=1000,
  sigma.scale0=1,gamma.shape0=1.5,quiet=TRUE)
Need to compile the model. Compiling now.
Trying to compile a simple C file

We now fit several similar models, where we gradually increase the prior standard deviation of the mean.

fit.0.1.vague <- bglmm(differences~1,
  sigma.scale.int=.1,gamma.shape.int=1000,
  sigma.scale0=1,gamma.shape0=1.5,quiet=TRUE)
fit.0.5.vague <- bglmm(differences~1,
  sigma.scale.int=.5,gamma.shape.int=1000,
  sigma.scale0=1,gamma.shape0=1.5,quiet=TRUE)
fit.0.8.vague <- bglmm(differences~1,
  sigma.scale.int=.8,gamma.shape.int=1000,
  sigma.scale0=1,gamma.shape0=1.5,quiet=TRUE)

To illustrate the effect of being more certain about the standard deviation of the data, we also fit a model with a larger shape.

fit.0.5.notsovague <- bglmm(differences~1,
  sigma.scale.int=.5,gamma.shape.int=1000,
  sigma.scale0=1,gamma.shape0=6,quiet=TRUE)

Finally, we fit three models that use a prior mean on the differences of .5, and use prior standard deviations of .001, .1, and .5.

fit.5.0.vague <- bglmm(differences-.5~1,
  sigma.scale.int=.001,gamma.shape.int=1000,
  sigma.scale0=1,gamma.shape0=1.5,quiet=TRUE)
fit.5.1.vague <- bglmm(differences-.5~1,
  sigma.scale.int=.1,gamma.shape.int=1000,
  sigma.scale0=1,gamma.shape0=1.5,quiet=TRUE)
fit.5.5.vague <- bglmm(differences-.5~1,
  sigma.scale.int=.5,gamma.shape.int=1000,
  sigma.scale0=1,gamma.shape0=1.5,quiet=TRUE)

6.2.2 Model Assessment

One risk with Bayesian analysis via MCMC is that the Markov chains might not be behaving properly. Here are three good, but not foolproof, ways to check your fit. If these show say that the chain is behaving badly, then it is behaving badly. However, a chain can be behaving badly without these methods catching the problem.

First, bglmm runs multiple chains (four by default). If you plot these traces, the four chains should all be on top of each other and the plot should look like a broad, blurry band without any patters. Uninteresting is good for the trace plots. Here we just look at the intercept, but you should look at all the parameters. It looks fine.

plot(fit.0.5.vague,plottype="trace",pars="(Intercept)")

Second, successive values of the Markov chains are correlated. Typically, close values are more highly correlated and the correlation eventually dies off to zero. We want it to go to zero quickly and stay near zero for every parameter. Again, this looks fine.

plot(fit.0.5.vague,plottype="autocor",pars="(Intercept)")
Warning: Ignoring unknown parameters: fun.y
No summary function supplied, defaulting to `mean_se()`

Third, the summary() method for bglmm objects includes a column labeled Rhat. You want the Rhat values to be small, which means down near 1. If they’re bigger than 1.1 or 1.2, you may have problems. Here, things are fine.

summary(fit.0.5.vague)[,"Rhat"]
    (Intercept)          sigma0 sigma.Intercept 
              1               1               1 

6.2.3 Model Results

We’re eager to see the results, and now that we have assess the chains as working well (well, at least for fit.0.5.vague and the intercept), we can look at the results. The simplest way to see the results of the fit is to use the summary() function. For each parameter in the model, it produces

  • The posterior mean for the parameter.
  • The posterior standard error for the posterior mean.
  • The posterior standard deviation of the parameter.
  • The 2.5, 25, 50, 75, and 97.5 percentiles of the posterior distribution (these can be reset via optional arguments).
  • The effective sample size of the chain (more is better, and closer to 4,000 is good for default chain sizes).
  • The Rhat value.

Note: the posterior standard error is essentially a numerical issue. If you run the chains longer and longer, this standard error will decrease to zero. However, the posterior standard deviation, which measures the true statistical variability in the posterior, does not decrease as you increase the size of the chains.

Here we look at a few of the columns for the different models. Watch how the posterior intervals for the intercept and error variance (sigma0) change, or don’t change, across the models. Recall that the intervals for the intercept in the models where we subtracted .5 from the data need to have that .5 added back.

summary(fit.0.0.vague)[,c("mean","sd","2.5%","97.5%","Rhat")]
                    mean       sd      2.5%   97.5% Rhat
(Intercept)     6.09e-06 0.000990 -0.001950 0.00194    1
sigma0          6.78e-01 0.091500  0.529000 0.89400    1
sigma.Intercept 1.00e-03 0.000031  0.000943 0.00106    1
summary(fit.0.1.vague)[,c("mean","sd","2.5%","97.5%","Rhat")]
                  mean      sd    2.5% 97.5%  Rhat
(Intercept)     0.0701 0.07670 -0.0793 0.221 1.000
sigma0          0.6690 0.08840  0.5260 0.882 1.000
sigma.Intercept 0.1000 0.00313  0.0940 0.106 0.999
summary(fit.0.5.vague)[,c("mean","sd","2.5%","97.5%","Rhat")]
                 mean     sd    2.5% 97.5% Rhat
(Intercept)     0.164 0.1150 -0.0616 0.393    1
sigma0          0.664 0.0896  0.5140 0.865    1
sigma.Intercept 0.500 0.0156  0.4680 0.531    1
summary(fit.0.5.notsovague)[,c("mean","sd","2.5%","97.5%","Rhat")]
                 mean     sd    2.5% 97.5%  Rhat
(Intercept)     0.165 0.1210 -0.0763 0.397 1.000
sigma0          0.687 0.0918  0.5380 0.890 0.999
sigma.Intercept 0.500 0.0157  0.4690 0.531 1.000
summary(fit.0.8.vague)[,c("mean","sd","2.5%","97.5%","Rhat")]
                 mean     sd    2.5% 97.5% Rhat
(Intercept)     0.173 0.1250 -0.0675 0.413    1
sigma0          0.666 0.0891  0.5180 0.864    1
sigma.Intercept 0.799 0.0249  0.7520 0.848    1
summary(fit.5.5.vague)[,c("mean","sd","2.5%","97.5%","Rhat")] #ignore intercept
                  mean     sd   2.5%   97.5% Rhat
(Intercept)     -0.307 0.1200 -0.548 -0.0732    1
sigma0           0.667 0.0909  0.516  0.8690    1
sigma.Intercept  0.500 0.0155  0.470  0.5310    1
summary(fit.5.1.vague)[,c("mean","sd","2.5%","97.5%","Rhat")] #ignore intercept
                  mean     sd    2.5%  97.5%  Rhat
(Intercept)     -0.126 0.0827 -0.2860 0.0396 1.000
sigma0           0.695 0.0965  0.5360 0.9200 1.000
sigma.Intercept  0.100 0.0032  0.0941 0.1060 0.999
summary(fit.5.0.vague)[,c("mean","sd","2.5%","97.5%","Rhat")] #ignore intercept
                     mean       sd      2.5%   97.5% Rhat
(Intercept)     -0.000018 1.02e-03 -0.002010 0.00197    1
sigma0           0.735000 9.77e-02  0.578000 0.95700    1
sigma.Intercept  0.000999 3.14e-05  0.000937 0.00106    1
summary(fit.5.5.vague)["(Intercept)",c("mean","2.5%","97.5%")]+.5
   mean    2.5%   97.5% 
 0.1930 -0.0480  0.4268 
summary(fit.5.1.vague)["(Intercept)",c("mean","2.5%","97.5%")]+.5
  mean   2.5%  97.5% 
0.3740 0.2140 0.5396 
summary(fit.5.0.vague)["(Intercept)",c("mean","2.5%","97.5%")]+.5
    mean     2.5%    97.5% 
0.499982 0.497990 0.501970 

Let’s tabulate those intervals so that they can be more easily compared.

Model Lower Upper
0.0.vague 0.00 0.00
0.1.vague -0.08 0.22
0.5.vague -0.06 0.39
0.5.notsovague -0.08 0.40
0.8.vague -0.07 0.41
5.5.vague -0.05 0.43
5.1.vague 0.21 0.54
5.0.vague 0.50 0.50

Looking at the interval estimates above, we see that the precise priors (at 0 or .5) give us precise posterior estimates of the at the same place; this was expected. The models with .1 prior standard deviations for the intercept pull the values toward the prior mean value (upper end pulled down for the 0 prior mean, lower end pulled up for the .5 prior mean). Once you get to prior standard deviations for the mean of .5 or .8, it really doesn’t matter what the prior mean is, the posterior intervals are (roughly) the same. Changing the certainty about the data standard deviation also made essentially no change in the posterior interval.

6.2.4 Model Selection

We have looked at a lot of models. Ordinarily we wouldn’t be comparing models with lots of different prior parameters, because we would know our prior distributions and use them. Again, we are doing it here to illustrate what happens as those prior parameters change.

That said, we can still assess which models fit the data better. One way to do that is via the LOOIC, which we can get via the function loo(). Smaller values of LOOIC are better. All except fit.5.1.vague and fit.5.0.vague are roughly equivalent, but those last two are larger.

loo(fit.0.0.vague)
[1] 62.12145
loo(fit.0.1.vague)
[1] 61.33113
loo(fit.0.5.vague)
[1] 61.78026
loo(fit.0.5.notsovague)
[1] 61.66798
loo(fit.0.8.vague)
[1] 61.89743
loo(fit.5.5.vague)
[1] 61.97736
loo(fit.5.1.vague)
[1] 63.32825
loo(fit.5.0.vague)
[1] 66.68975

Bayes factors are a second way to assess models, but be aware that Bayes factors can be sensitive to priors. Let’s compare all these models to the one with a point prior at 0. According to the usual guidelines for judging Bayes factors, there is no differentiation between the point prior at 0 model and any of the others except for the last two, where the point prior is positively preferred.

bayes_factor(fit.0.0.vague,fit.0.1.vague,silent=TRUE)
Estimated Bayes factor in favor of bridge1 over bridge2: 0.82632
bayes_factor(fit.0.0.vague,fit.0.5.vague,silent=TRUE)
Estimated Bayes factor in favor of bridge1 over bridge2: 1.54156
bayes_factor(fit.0.0.vague,fit.0.5.notsovague,silent=TRUE)
Estimated Bayes factor in favor of bridge1 over bridge2: 1.04228
bayes_factor(fit.0.0.vague,fit.0.8.vague,silent=TRUE)
Estimated Bayes factor in favor of bridge1 over bridge2: 2.31803
bayes_factor(fit.0.0.vague,fit.5.5.vague,silent=TRUE)
Estimated Bayes factor in favor of bridge1 over bridge2: 1.76297
bayes_factor(fit.0.0.vague,fit.5.1.vague,silent=TRUE)
Estimated Bayes factor in favor of bridge1 over bridge2: 3.68983
bayes_factor(fit.0.0.vague,fit.5.0.vague,silent=TRUE)
Estimated Bayes factor in favor of bridge1 over bridge2: 10.47117