Introduction to Rcpp and C

November 7, 2014

Preperation: packages

C++: library(Rcpp) and library(RcppArmadillo)
C: use .C() to run C functions. .C is a built-in R function

Preperation: compilers

UNIX alike: Mac OS and Linux
Mac OS default compiler: Apple LLVM/clang (Install Xcode and command line tool first!)
Install GCC on Mac: use Homebrew
Linux default compiler: gcc

Type gcc --version in terminal

Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
--with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin14.0.0
Thread model: posix

Preperation: configure R and compiler

R stores configuration in ~/.R/Makevars
mkdir ~/.R/
touch ~/.R/Makevars
vi ~/.R/Makevars to edit the configuration file
```
CC=clang
CXX=clang++
```
or
```
CC=gcc
CXX=g++
```

R and C

Write C functions
Compile .c file into .so file
Load .so file in R
Call C function by .C()

R and C: C code

Sum of 1 to \(x\)
Main function must be void type

Input must be pointer type

void sum_c(int *x)
{
  int tmp=0;
  for(int i = 0; i<x[0];++i)
  {
    tmp += i;
  }
  x[0]=tmp;
}

R and C: compile

Build shared library for dynamic loading
R CMD SHLIB sum_c.c
Add flags to the command as necessary
e.g., add search path for header files: R CMD SHLIB -I/usr/local/include -lgsl C code contains GSL header

R and C: Gibbs sampler

Sample from: \[\text{exp}(-(x^2+x^2y^2+y^2))\]
Then: \(f(x|y)\propto N(0, \frac{1}{2(1+y^2)})\), \(f(y|x)\propto N(0, \frac{1}{2(1+x^2)})\)
Gibbs sampling

R and C: Gibbs sampler

#include <math.h>
#include <time.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
void gibbs(double * n_sim, double * mat){
    double can[2] = {0,0};
    int i;
    gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937); /*Define random number t*/
    long seed = time(NULL); /*Define a seed for r*/
    gsl_rng_set(r,seed); /*Initiate the random number generator with seed*/   
    for (i = 0; i <= *n_sim; i++) {            
        can[0] = gsl_ran_gaussian (r, 1.0/(2*(can[1]*can[1]+1)));
        can[1] = gsl_ran_gaussian (r, 1.0/(2*(can[0]*can[0]+1)));     
        mat[2*i] =  can[0];
        mat[2*i+1] = can[1];          
    }
}

R and C: Call Gibbs sampler in R

Generate .so file: R CMD SHLIB -I/usr/local/include -lgsl gibbs.c
dyn.load("gibbs.so")
wrapper function:

gibbs.c <- function(n_sim){
  mat <- vector(length=2*n_sim)
  GIBBScout <- .C("gibbs", n_sim=as.double(n_sim), mat=as.double(mat))
  return(matrix(GIBBScout$mat,ncol=2,byrow=T))
}

run function: sim.gibbs<- gibbs.c(1e5)

Rcpp: C++ code

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]] in front of the main function
Input does not need to be pointers

Rcpp example: sum of a vector

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double sum_cpp(NumericVector x) {
  int n = x.size();
  double total = 0;
  for(int i = 0; i < n; ++i) {
    total += x[i];
  }
  return total;
}

In R:

sourceCpp("sum.cpp")
sum_cpp(c(1,3,4,5,7,8,1,2,3))

[1] 34

Rcpp: Gibbs sampler

#include <Rcpp.h>
#include <math.h>

using namespace Rcpp;
// [[Rcpp::export]]
Rcpp::NumericMatrix gibbs(int n_sim) {
    double can[2] = {0,0};
    Rcpp::NumericMatrix mat(n_sim, 2);

    for (int i = 0; i < n_sim; i++) {      
        can[0] = ::Rf_rnorm(0, 1.0/(2*(can[1]*can[1]+1)));
        can[1] = ::Rf_rnorm(0, 1.0/(2*(can[0]*can[0]+1)));
        mat(i,0) =  can[0];
        mat(i,1) = can[1];
    }
    return(mat);
}

Rcpp benefits

Input can be matrix/vectors while C only supports vectors
Output can be anything, including lists, e.g.,

return(List::create(Named("x") = x,Named("y") = y))

Call R function using ::Rf_functionname, while in C, one needs to do #include<Rmath.h>
No need to run R CMD SHLIB, sourceCpp() does everything for you
Call the same function name in R directly. No need to run .C

C++/C function vs R function

Speed, speed, speed…

Let's compare the Cpp Gibbs sampler and R Gibbs sampler:

gibbs_R = function(nsim){
  mat = matrix(0,ncol=2,nrow=nsim)
    can=c(0,0)
    for (i in 1:nsim){
        can[1]=rnorm(1,mean=0,sd=sqrt(1.0/(2*(can[2]^2+1))))
        can[2]=rnorm(1,mean=0,sd=sqrt(1.0/(2*(can[1]^2+1))))
        mat[i,1]=can[1]
        mat[i,2]=can[2]
    }
    return(mat)
}

Compare Speed

sourceCpp('gibbs.cpp')

library(rbenchmark)
benchmark(replications=rep(100, 1),
               gibbs_R(1e4),
               gibbs(1e4),
               columns=c('test', 'elapsed','replications'))

            test elapsed replications
1 gibbs_R(10000)  14.255          100
2   gibbs(10000)   0.230          100

Questions?