Chapter 11 Example 2.6

Let’s examine a variety of possible Bayesian analyses of the differences in the RunStitch data. The variety of possible prior distributions produces the variety of possible analyses. This is both the strength, and to many the weakness, of the Bayesian approach.

11.1 Prior distributions

We begin with a prior for \(\sigma_0\). We use sigma0.mean=.75 and sigma0.shape=3; this puts 99% of the prior probability between .08 and 2.32. Given that the sample standard deviation of the differences is .64, this is an adequately broad prior. We will also show an example using sigma0.shape=20; this put 99% of the prior probability between .39 and 1.25.

A model of the form differences ~ 0 has a mean of precisely 0; there are no parameters beyond \(\sigma_0\), and no further priors are needed. Similarly, a model of the form differences - mu ~ 0 assumes that the shifted data have a mean of precisely 0.

Generally, we will use a model of the form differences ~ 1; this fits a mean value to the differences. The prior distribution for the mean is normal with mean 0 and standard deviation \(\sigma\), and the second level prior for \(\sigma\) is a gamma distribution. We specify the expected value for this gamma via sigma.mean and its shape via sigma.shape.

We can specify \(\sigma\) (almost) exactly by using a very large value of sigma.shape.int. (Note, bglmm() lets you set prior parameters for the intercept separately from those for other effects.) For example, with a mean of 1 and a shape of 20,000, 99% of the probability will be between .98 and 1.02. More typically, we use a smaller value of sigma.shape and let the data help us decide the appropriate value for \(\sigma\).

Mr. Skeptical thought that the mean difference should be zero, but he might have various degrees of certainty about that claim. That is, he might express his claim as his prior for the mean difference is normal with mean 0, but it has a smaller standard deviation if he is more certain and a larger standard deviation if he is less certain. Similarly, Mr. Enthusiastic has a prior distribution for the mean of the differences that has mean .5, and different standard deviations depending on his degree of certainty.

11.2 Fitting the models

Load the RunStitch data from the package and create differences between Standard and Ergonomic.

> library(cfcdae)
> data(RunStitch)
> differences <- RunStitch[,"Standard"] - RunStitch[,"Ergonomic"]

We first fit ten models, five each for prior mean 0 and prior mean .5, with the difference being various levels of certainty. We use quiet=TRUE to suppress the voluminous progress information that would otherwise be printed. The order below begins at prior mean precisely 0, then gradually increases the prior uncertainty, then switches to prior mean .5 (actually, prior mean 0 for data shifted by .5) but still with considerable uncertainty, then gradually decreases the prior uncertainty until the prior is precisely .5.

> library(bcfcdae)
> library(rstan)
> set.seed(20220425)
> #  mean precisely 0
> fit.0.0 <- bglmm(differences ~ 0, sigma0.mean=.75, sigma0.shape=3, 
+                  quiet=TRUE,seed=10004)
Need to compile the model.
Trying to compile a simple C file
> # prior mean 0 with sd .1 (ish)
> fit.0.1 <- bglmm(differences ~ 1, sigma0.mean=.75, sigma0.shape=3, 
+                  sigma.mean.int=.1, sigma.shape.int=20000,
+                  quiet=TRUE,seed=10005)
> # prior mean 0 with sd .3 (ish)
> fit.0.3 <- bglmm(differences ~ 1, sigma0.mean=.75, sigma0.shape=3, 
+                  sigma.mean.int=.3, sigma.shape.int=20000, 
+                  quiet=TRUE,seed=10006)
> # prior mean 0 with sd .5 (ish)
> fit.0.5 <- bglmm(differences ~ 1, sigma0.mean=.75, sigma0.shape=3, 
+                  sigma.mean.int=.5, sigma.shape.int=20000,
+                  quiet=TRUE,seed=10007)
> # prior mean 0 with sd .8 (ish)
> fit.0.8 <- bglmm(differences ~ 1, sigma0.mean=.75, sigma0.shape=3, 
+                  sigma.mean.int=.8, sigma.shape.int=20000,
+                  quiet=TRUE,seed=10008)
> # prior mean .5 with sd .8 (ish)
> fit.5.8 <- bglmm(differences-.5 ~ 1, sigma0.mean=.75, sigma0.shape=3, 
+                  sigma.mean.int=.8, sigma.shape.int=20000,
+                  quiet=TRUE,seed=10009)
> # prior mean .5 with sd .5 (ish)
> fit.5.5 <- bglmm(differences-.5 ~ 1, sigma0.mean=.75, sigma0.shape=3, 
+                  sigma.mean.int=.5, sigma.shape.int=20000,
+                  quiet=TRUE,seed=10010)
> # prior mean .5 with sd .3 (ish)
> fit.5.3 <- bglmm(differences-.5 ~ 1, sigma0.mean=.75, sigma0.shape=3, 
+                  sigma.mean.int=.3, sigma.shape.int=20000,
+                  quiet=TRUE,seed=10011)
> # prior mean .5 with sd .1 (ish)
> fit.5.1 <- bglmm(differences-.5 ~ 1, sigma0.mean=.75, sigma0.shape=3, 
+                  sigma.mean.int=.1, sigma.shape.int=20000,
+                  quiet=TRUE,seed=10012)
> # prior mean precisely .5 
> fit.5.0 <- bglmm(differences-.5 ~ 0, sigma0.mean=.75, sigma0.shape=3, 
+                  quiet=TRUE,seed=10013)

Note that we had to compile twice. That’s because fitting a mean-zero model and fitting a model with mean predictors use different codes that each need compilation. However, once it’s been compiled we do not need to recompile in this run.

11.2.1 MCMC Assessment

One risk with Bayesian analysis via MCMC is that the Markov chains might not be behaving properly. Begin by looking at n_eff and Rhat:
> summary(fit.0.5)[,c("n_eff","Rhat")]
                n_eff Rhat
(Intercept)      3110    1
sigma0           2530    1
sigma.Intercept  4030    1
The effective sample sames for the mean and \(\sigma_0\) are about 80% of what is possible, which is OK. Rhat of 1 indicates no problems detected. Most of the others are similar, but fit.5.0 (certainty on a mean of .5) shows a low effective sample size.
> summary(fit.5.0)[,c("n_eff","Rhat")]
n_eff  Rhat 
 1130     1 
Next plot the traces for each variable. These should look very blah with minimal patterns and no separation between the multiple chains.
> plot(fit.0.5,"trace")
Trace plots for the N(0,.5) prior

Figure 11.1: Trace plots for the N(0,.5) prior

These look fine. We should do this for any bglmm() model we fit, but I’ll spare you all of the others except for fit.5.0.
> plot(fit.5.0,"trace")
Trace plots for the N(.5,0) prior

Figure 11.2: Trace plots for the N(.5,0) prior

The traces all overlap, which is good, but the traces do not mix around the sample space as quickly as what we see for the other models. This is the visual analogue of a lower effective sample size.

Finally, we should look at the autocorrelation of the Markov chain iterates. Ideally, they should decay to 0 quickly and stay at 0.
> plot(fit.0.5,"autocor")
Autocorrelation plots for N(0,.5) prior

Figure 11.3: Autocorrelation plots for N(0,.5) prior

Again, this looks fine. Compare this with the autocorrelation for the prior with certainty at .5; the autocorrelations drop to zero adequately quickly, but nearly as quickly as for the other priors.
> plot(fit.5.0,"autocor")
Autocorrelation plots for N(.5,0) prior

Figure 11.4: Autocorrelation plots for N(.5,0) prior

11.2.2 Model Results

We’re eager to see the results, and now that we have assessed the chains as working well, we can look at the results. The simplest way to see the results of the fit is to use the summary() function. For each parameter in the model, it produces

  • The posterior mean for the parameter.
  • The posterior standard error for the posterior mean.
  • The posterior standard deviation of the parameter.
  • The 2.5, 25, 50, 75, and 97.5 percentiles of the posterior distribution (these can be reset via optional arguments).
  • The effective sample size of the chain (more is better, and closer to 4,000 is good for default chain sizes).
  • The Rhat value.

Note: the posterior standard error is essentially a numerical issue. If you run the chains longer and longer, this standard error will decrease to zero. However, the posterior standard deviation, which measures the true statistical variability in the posterior, does not decrease as you increase the size of the chains.

Here we look at a few of the columns for the different models, and we combine it all into a table with nice row labels. Note that we need to add .5 back into the results for the .5 prior mean, because we were actually fitting shifted data and we need to shift the results back.
> all.out <- rbind(
+   "mean 0, sd 0"=c(0,0,0),
+   "mean 0, sd .1"=summary(fit.0.1,pars="(Intercept)")[,c("mean","2.5%","97.5%")],
+   "mean 0, sd .3"=summary(fit.0.3,pars="(Intercept)")[,c("mean","2.5%","97.5%")],
+   "mean 0, sd .5"=summary(fit.0.5,pars="(Intercept)")[,c("mean","2.5%","97.5%")],
+   "mean 0, sd .8"=summary(fit.0.8,pars="(Intercept)")[,c("mean","2.5%","97.5%")],
+   "mean .5, sd .8"=summary(fit.5.8,pars="(Intercept)")[,c("mean","2.5%","97.5%")]+.5,
+   "mean .5, sd .5"=summary(fit.5.5,pars="(Intercept)")[,c("mean","2.5%","97.5%")]+.5,
+   "mean .5, sd .3"=summary(fit.5.3,pars="(Intercept)")[,c("mean","2.5%","97.5%")]+.5,
+   "mean .5, sd .1"=summary(fit.5.1,pars="(Intercept)")[,c("mean","2.5%","97.5%")]+.5,
+   "mean .5, sd 0"=c(.5,.5,.5)
+ )
> knitr::kable(all.out)
mean 2.5% 97.5%
mean 0, sd 0 0.0000 0.0000 0.0000
mean 0, sd .1 0.0712 -0.0829 0.2220
mean 0, sd .3 0.1520 -0.0763 0.3680
mean 0, sd .5 0.1670 -0.0696 0.4080
mean 0, sd .8 0.1720 -0.0674 0.4050
mean .5, sd .8 0.1810 -0.0540 0.4121
mean .5, sd .5 0.1940 -0.0440 0.4331
mean .5, sd .3 0.2210 0.0030 0.4510
mean .5, sd .1 0.3690 0.2150 0.5261
mean .5, sd 0 0.5000 0.5000 0.5000

See how the posterior intervals for the intercept gradually creep higher and higher as we use priors that put less and less probability near 0. Looking at the interval estimates above, we see that the precise priors (at 0 or .5) give us precise posterior estimates of the at the same place; this was expected. The models with .1 prior standard deviations for the intercept pull the values toward the prior mean value (upper end pulled down for the 0 prior mean, lower end pulled up for the .5 prior mean). Once you get to prior standard deviations for the mean of .5 or .8, it really doesn’t matter whether we use a prior mean of 0 or .5, the posterior intervals are (roughly) the same. Notice, however, that the sample mean of .175 is not even included in the posterior intervals for mean .5 and standard deviation .1 or 0. This is effect of a narrow prior pulling the posterior toward it.

11.2.3 Model Selection

We have looked at a lot of models. Ordinarily we wouldn’t be comparing models with lots of different prior parameters, because we would know our prior distributions and use them. We are doing it here to illustrate the sensitivity of the results to the prior parameters.

That said, we can still assess which models fit the data better. One way to do that is via the LOOIC, which we can get via the function loo::loo(). Smaller values of LOOIC are better. Bayes factors are a second way to assess models, but be aware that Bayes factors can be sensitive to priors. Let’s compare all these models to the one with a N(0,.5) prior for the mean.

Looking at just two models, the mean 0 standard deviation .3 model is slightly preferred by both LOO and the Bayes factor (which is in the “barely worth a mention” range).
> loo::loo(fit.0.3)
[1] 61.6155
> loo::loo(fit.0.5)
Warning: Some Pareto k diagnostic values are slightly high. See help('pareto-k-diagnostic') for details.
[1] 61.99377
> bayes_factor(fit.0.5,fit.0.3,silent=TRUE)
Estimated Bayes factor in favor of bridge1 over bridge2: 0.68824
Now let’s look at all of them, again computing the Bayes factor relative to the model with prior mean zero and standard deviation .5. Again, we combine all the results into a matrix with nicely labeled rows and columns and then make it pretty with kable().
> all.compare <- rbind(
+   "mean 0, sd 0"=c("LOO"=loo::loo(fit.0.0),"BF"=bayes_factor(fit.0.5,fit.0.0,silent=TRUE)$bf),
+   "mean 0, sd .1"=c(loo::loo(fit.0.1),bayes_factor(fit.0.5,fit.0.1,silent=TRUE)$bf),
+   "mean 0, sd .3"=c(loo::loo(fit.0.3),bayes_factor(fit.0.5,fit.0.3,silent=TRUE)$bf),
+   "mean 0, sd .5"=c(loo::loo(fit.0.5),bayes_factor(fit.0.5,fit.0.5,silent=TRUE)$bf),
+   "mean 0, sd .8"=c(loo::loo(fit.0.8),bayes_factor(fit.0.5,fit.0.8,silent=TRUE)$bf),
+   "mean .5, sd .8"=c(loo::loo(fit.5.8),bayes_factor(fit.0.5,fit.5.8,silent=TRUE)$bf),
+   "mean .5, sd .5"=c(loo::loo(fit.5.5),bayes_factor(fit.0.5,fit.5.5,silent=TRUE)$bf),
+   "mean .5, sd .3"=c(loo::loo(fit.5.3),bayes_factor(fit.0.5,fit.5.3,silent=TRUE)$bf),
+   "mean .5, sd .1"=c(loo::loo(fit.5.1),bayes_factor(fit.0.5,fit.5.1,silent=TRUE)$bf),
+   "mean .5, sd 0"=c(loo::loo(fit.5.0),bayes_factor(fit.0.5,fit.5.0,silent=TRUE)$bf)
+ )
Warning: effective sample size cannot be calculated, has been replaced by number
of samples.
Warning: Some Pareto k diagnostic values are slightly high. See help('pareto-k-diagnostic') for details.
Warning: effective sample size cannot be calculated, has been replaced by number
of samples.
> knitr::kable(all.compare)
LOO BF
mean 0, sd 0 61.95804 0.6559599
mean 0, sd .1 61.53054 0.5441041
mean 0, sd .3 61.61550 0.6878520
mean 0, sd .5 61.99377 0.9985064
mean 0, sd .8 61.75867 1.5200559
mean .5, sd .8 61.87140 1.6084977
mean .5, sd .5 61.89258 1.1485162
mean .5, sd .3 61.90773 0.9836125
mean .5, sd .1 63.20871 2.4199692
mean .5, sd 0 66.59450 7.0062285

The models with mean of .5 and sd of .1 or 0 are noticeably worse (less preferred) than the other models, which are basically indistinguishable. This is not good news for Mr. Enthusiastic’s assertion.

To give some warning about Bayes factors, consider the prior mean 0 and prior standard deviation .5 for the mean, but now use a tighter prior for \(\sigma_0\) (same prior expected value but a lower shape). Specifically, use sigma0.shape equal to 20.
> fit.0.5.s <- bglmm(differences ~ 1, sigma0.mean=.75, sigma0.shape=20, 
+                  sigma.mean.int=.5, sigma.shape.int=20000,
+                  quiet=TRUE,seed=10014)
> summary(fit.0.5.s,pars=c("(Intercept)"))
             mean se_mean    sd    2.5%    25%   50%   75% 97.5% n_eff Rhat
(Intercept) 0.159  0.0021 0.123 -0.0818 0.0748 0.159 0.241 0.396  3390    1
> loo::loo(fit.0.5.s)
[1] 61.38973
> bayes_factor(fit.0.5,fit.0.5.s,silent=TRUE)
Estimated Bayes factor in favor of bridge1 over bridge2: 0.47990

This model has essentially the same inference for the mean of the differences and essentially the same LOO value, but its Bayes factor is off by a factor of 2. Bayes factors are much more influenced by priors than are LOO values or the actual posterior distribution.