R is rotten at iterative algorithms that require loops iterated many times (especially Markov chain Monte Carlo), not as bad as S-PLUS, but still bad. A way to get all the speed advantages of C or Fortran with most of the convenience of R is to write the inner loop in C and call it from R.
This is a very brief introduction. For much more, read the chapters System and foreign language interfaces and The R API: entry points for C code in the book Writing R Extensions available (free) from CRAN.
Here is a very simple C function (downloadable as foo.c).
void foo(int *nin, double *x) { int n = nin[0]; int i; for (i=0; i<n; i++) x[i] = x[i] * x[i]; }
which does nothing but square the elements of the n-vector x.
Notice two properties that are required for a function callable from R
Here is a Fortran function that does the same thing (downloadable as bar.f).
subroutine bar(n, x) integer n double precision x(n) integer i do 100 i = 1, n x(i) = x(i) ** 2 100 continue end
Notice it has the same two properties as the C function.
Put the C example code in a file foo.c and compile it to a shared library. The command
R CMD SHLIB foo.c
in UNIX (outside of R) does this. Now the code can be dynamically loaded into R by doing (in R)
dyn.load("foo.so")
(note: the filename may be foo.sl
on some computer architectures).
The actual call to C from R is made by the R function .C
like this
.C("foo", n=as.integer(5), x=as.double(rnorm(5)))
The function .C
takes nargs + 1 arguments if
the C function being called takes nargs arguments.
The first argument is the name of the C function to be called,
and the rest are the arguments to the function call. The arguments
generally have to be coerced to the proper type using
as.integer
, as.double
, and similar functions.
The function .C
returns a list whose elements are the arguments
to the function call (in this case n and x). The values
are the values after the call (as changed by the function foo
).
For more on this subject see the R help for the .C function.
The Fortran is similar. In UNIX do
R CMD SHLIB bar.f
and in R do
dyn.load("bar.so") .Fortran("bar", n=as.integer(5), x=as.double(rnorm(5)))
We just change the call from .C
to .Fortran
.
It is considered user-friendly
to wrap the call to .C
in an R function, like
foo <- function(x) { if (!is.numeric(x)) stop("argument x must be numeric") out <- .C("foo", n=as.integer(length(x)), x=as.double(x)) return(out$x) }
This has three benefits
Now the call is much simpler, just
foo(x)
The R random number generators and also all the other functions
for probability distributions (not only rnorm
but also
dnorm
, pnorm
, and qnorm
and
so forth for other distributions) are callable from C (but not Fortran,
see however the section of Writing R Extensions about Calling C from FORTRAN and vice versa on how to write
Fortran calls to C functions that call the R random number generators).
Here is an example (downloadable as baz.c).
#include <R.h> #include <Rmath.h> void baz(int *nin, double *sin, double *tin, double *x) { int n = nin[0]; double s = sin[0]; double t = tin[0]; int i; GetRNGstate(); for (i = 0; i < n; i++) x[i] = rbeta(s, t); PutRNGstate(); }
Also the statements
GetRNGstate(); PutRNGstate();
must be done before and after (respectively)
all calls to random number functions (in this example rbeta
).
With that done, the random numbers work in C just like in R.
set.seed(42) rbeta(5, 1.5, 2.5) dyn.load("baz.so") baz <- function(n, s, t) { .C("baz", n = as.integer(n), s = as.double(s), t = as.double(t), x = double(n))$x } set.seed(42) baz(5, 1.5, 2.5)
(note, exactly the same random numbers when using the same seeds).
And how does one know how to call the C function rbeta
(which is not the R function of the same name, rather
the C function that the R function calls)? Unfortunately there
is no API documentation, so one must
RTFS.
On our computers, the source code happens to be (when this was written)
in the directory /APPS/src/R-2.6.2
.
For example, the source code for rbeta
is in
/APPS/src/R-2.6.2/src/nmath/rbeta.c
, but we may move it
(the directory name changes with each version), and if you
aren't here, you will have to figure out where it is locally
or get it from
CRAN.
Use the R error
or warning
function.
They work just like printf
but produce an R error
or warning as the case may be.
Here is an example (downloadable as qux.c).
#include <R.h> void qux(int *nin, double *x) { int n = nin[0]; int i; if (n < 1) error("arg n was %d, must be positive\n", n); for (i = 0; i < n; i++) x[i] = x[i] * x[i]; }
And try it out.
dyn.load("qux.so") .C("qux", n = as.integer(0), x = as.double(1:4))
Questions or comments to: Charles Geyer charlie@stat.umn.edu