UNIVERSITY OF MINNESOTA   School of Statistics

8933 Home Page   Course Info   Textbook   Course Notes   Homework Assignments   Homework Solutions   Computing

Statistics 8933 (Geyer) Spring 1998

Contents

Calling C from S-PLUS

MCMC works best when the code is as fast as possible. Then you know you are not limited in what you can do simply by the implementation inefficiency.

One way to achieve this is to write the algorithm in C or some other language that compiles to machine language. It is not necessary to write the whole thing in C, only the part of the code that takes all the computing time (the "inner loop"). Thus MCMC works fine when the inner loop is in C and the rest is S-PLUS. That's what this section is about.

Ignoring MCMC for a second, how does one call C from S-PLUS? Here is a very simple C function

void foo(long *nin, double *x)
{
	int n = nin[0];

	int i;

	for (i=0; i<n; i++)
		x[i] = x[i] * x[i];
}
which does nothing but square the elements of the n-vector x.

Notice two properties that are required for a function callable from S-PLUS

Put the code in a file foo.c and compile it to a .o file. On the HP workstations the command is (for dynamic loading on the SGI look here)
cc -Aa -c foo.c
Now the code can be called from S-PLUS by (for example)
print(dyn.load2("foo.o"))

.C("foo", n=as.integer(5), x=as.double(rnorm(5)))
The function dyn.load2 may need to be replaced by dyn.load or dyn.load.shared on your system. Check the S-PLUS help for dynamic loading. The function .C takes nargs + 1 arguments if foo takes nargs arguments. The first argument is the name of the C function to be called, and the rest are the arguments to the function call. The arguments generally have to be coerced to the proper type using as.integer, as.double, and similar functions. The function .C returns a list whose elements are the arguments to the function call (in this case n and x). The values are the values after the call (as changed by the function foo). For more on this subject see the S-PLUS help for .C or the S-PLUS Programmer's Manual.

As we have seen, it is not necessary, but is considered user-friendly to wrap the call to .C in an S-PLUS function, like

foo <- function(x) {
	if (!is.numeric(x))
		stop("argument x must be numeric")
	.C("foo",
		n=as.integer(length(x)),
		x=as.double(x))
}
This has two benefits

The Default MCMC Algorithm

First download the code and compile the C functions. The code is in the files
fun.batch
fun.logit-setup
fun.metrop
h-dumb.c
h-logit.c
metrop.c
The first two are S-PLUS, the other three are C. This C is a bit more complicated since it calls some S-PLUS functions (for random numbers and error exits). Thus it must be compiled using the Splus COMPILE command rather than plain old cc. On the HP workstations use (for dynamic loading using S-PLUS on the SGI look here)

Splus COMPILE "CFLAGS = -Aa -O" h-logit.c metrop.c
to compile (this is outside S-PLUS in UNIX). You may need different CFLAGS on another system.

The philosophy of this code is that the function metrop.c which does the housekeeping and the looping for the Metropolis algorithm never needs to be rewritten. It is the part of the code that is the same for all problems. It needs a C function named h that calculates the unnormalized density for the problem. The prototype for this function is

extern void h(long *d, double *x, double *prob);
here Since there are no arguments to pass in other things the unnormalized density depends on, these are passed in by another function, in this code called hsetup but you can call it anything you want, that must be called (once) before the first call to h.

The Kyphosis Example

First load everything into S-PLUS
print(dyn.load2(c("h-logit.o","metrop.o")))
source("fun.metrop")
source("fun.logit-setup")
Get the response and covariate matrix out of the kyphosis data frame, and do the setup
y <- as.numeric(kyphosis["Kyphosis"] == "present")
x <- cbind(1, kyphosis[c("Age", "Start", "Number")])
logit.setup(x, y)
Note that we had to add a column of ones to the covariate matrix to get an intercept term. This version of the code doesn't grok "intercept". It's a bit crude.

Now we are ready to do some MCMC. The S-PLUS function metrop has the following call

metrop(nsamp, x, nspac = 1, sigma = 1)
where The sampler runs for nsamp times nspac iterations and outputs nsamp sample points.

A good starting point is the MLE (posterior mode, since we are using a flat prior). We have no idea what to set sigma to be at the start, might as well use sigma = 1 (the default).

gout <- glm(Kyphosis ~ Age + Start + Number, family=binomial, data=kyphosis)
out <- metrop(10000, gout$coef, sigma=1)
The result of the MCMC run is a list (here assigned to out) with components Looking at out$arat we see that we need to decrease sigma (when the acceptance rate is too low, sigma is too big and vice versa). After several experiments we settle on sigma = 0.02.
out <- metrop(10000, out$x, sigma=0.02, nspac=100)
Note that this starts this run where the last run stopped. We can now make plots or whatever, for example
plot(out$xout[,1], xlab="iteration", ylab="beta0", pch=".")
plot(out$xout[,1], out$xout[,4], xlab="beta0", ylab="beta3", pch=".")
make a time-series plot and a scatter plot.

Batch Means

A function to do batch means is loaded by

source("fun.batch")
Its call is
batch(x, nbatch = 30, warn = TRUE)
where The result is a list with components For example
bout <- batch(out$xout[,1])
does the calculation, then
bout$mean + c(-1,1) * qt(.975, length(bout$batch.means) - 1) *
   sqrt(bout$var.clt / out$nsamp)
is an approximate 95% confidence interval for the posterior mean of the first regression coefficient.

Note well! The MCSE

sqrt(bout$var.clt / out$nsamp)
has nothing whatsoever to do with posterior standard deviation which is estimated by
sqrt(var(out$xout[,1]))
The MCSE goes to zero as out$nsamp goes to infinity, the posterior SD just converges to its true value.


Questions or comments to: Charles Geyer charlie@stat.umn.edu