November 7, 2014

Preperation: packages

  • C++: library(Rcpp) and library(RcppArmadillo)
  • C: use .C() to run C functions. .C is a built-in R function

Preperation: compilers

  • UNIX alike: Mac OS and Linux
  • Mac OS default compiler: Apple LLVM/clang (Install Xcode and command line tool first!)
  • Install GCC on Mac: use Homebrew
  • Linux default compiler: gcc
  • Type gcc --version in terminal

    Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
    --with-gxx-include-dir=/usr/include/c++/4.2.1
    Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)
    Target: x86_64-apple-darwin14.0.0
    Thread model: posix

Preperation: configure R and compiler

  • R stores configuration in ~/.R/Makevars
  • mkdir ~/.R/
  • touch ~/.R/Makevars
  • vi ~/.R/Makevars to edit the configuration file

    CC=clang
    CXX=clang++

    or

    CC=gcc
    CXX=g++

R and C

  • Write C functions
  • Compile .c file into .so file
  • Load .so file in R
  • Call C function by .C()

R and C: C code

  • Sum of 1 to \(x\)
  • Main function must be void type
  • Input must be pointer type

    void sum_c(int *x)
    {
      int tmp=0;
      for(int i = 0; i<x[0];++i)
      {
        tmp += i;
      }
      x[0]=tmp;
    }

R and C: compile

  • Build shared library for dynamic loading
  • R CMD SHLIB sum_c.c
  • Add flags to the command as necessary
  • e.g., add search path for header files: R CMD SHLIB -I/usr/local/include -lgsl C code contains GSL header

R and C: Gibbs sampler

  • Sample from: \[\text{exp}(-(x^2+x^2y^2+y^2))\]
  • Then: \(f(x|y)\propto N(0, \frac{1}{2(1+y^2)})\), \(f(y|x)\propto N(0, \frac{1}{2(1+x^2)})\)
  • Gibbs sampling

R and C: Gibbs sampler

#include <math.h>
#include <time.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
void gibbs(double * n_sim, double * mat){
    double can[2] = {0,0};
    int i;
    gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937); /*Define random number t*/
    long seed = time(NULL); /*Define a seed for r*/
    gsl_rng_set(r,seed); /*Initiate the random number generator with seed*/   
    for (i = 0; i <= *n_sim; i++) {            
        can[0] = gsl_ran_gaussian (r, 1.0/(2*(can[1]*can[1]+1)));
        can[1] = gsl_ran_gaussian (r, 1.0/(2*(can[0]*can[0]+1)));     
        mat[2*i] =  can[0];
        mat[2*i+1] = can[1];          
    }
}

R and C: Call Gibbs sampler in R

  • Generate .so file: R CMD SHLIB -I/usr/local/include -lgsl gibbs.c
  • dyn.load("gibbs.so")
  • wrapper function:
gibbs.c <- function(n_sim){
  mat <- vector(length=2*n_sim)
  GIBBScout <- .C("gibbs", n_sim=as.double(n_sim), mat=as.double(mat))
  return(matrix(GIBBScout$mat,ncol=2,byrow=T))
}
  • run function: sim.gibbs<- gibbs.c(1e5)

Rcpp: C++ code

  • #include <Rcpp.h>
  • using namespace Rcpp;
  • // [[Rcpp::export]] in front of the main function
  • Input does not need to be pointers

Rcpp example: sum of a vector

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double sum_cpp(NumericVector x) {
  int n = x.size();
  double total = 0;
  for(int i = 0; i < n; ++i) {
    total += x[i];
  }
  return total;
}

In R:

sourceCpp("sum.cpp")
sum_cpp(c(1,3,4,5,7,8,1,2,3))
[1] 34

Rcpp: Gibbs sampler

#include <Rcpp.h>
#include <math.h>

using namespace Rcpp;
// [[Rcpp::export]]
Rcpp::NumericMatrix gibbs(int n_sim) {
    double can[2] = {0,0};
    Rcpp::NumericMatrix mat(n_sim, 2);

    for (int i = 0; i < n_sim; i++) {      
        can[0] = ::Rf_rnorm(0, 1.0/(2*(can[1]*can[1]+1)));
        can[1] = ::Rf_rnorm(0, 1.0/(2*(can[0]*can[0]+1)));
        mat(i,0) =  can[0];
        mat(i,1) = can[1];
    }
    return(mat);
}

Rcpp benefits

  • Input can be matrix/vectors while C only supports vectors
  • Output can be anything, including lists, e.g.,
  • return(List::create(Named("x") = x,Named("y") = y))
  • Call R function using ::Rf_functionname, while in C, one needs to do #include<Rmath.h>
  • No need to run R CMD SHLIB, sourceCpp() does everything for you
  • Call the same function name in R directly. No need to run .C

C++/C function vs R function

Speed, speed, speed…

Let's compare the Cpp Gibbs sampler and R Gibbs sampler:

gibbs_R = function(nsim){
  mat = matrix(0,ncol=2,nrow=nsim)
    can=c(0,0)
    for (i in 1:nsim){
        can[1]=rnorm(1,mean=0,sd=sqrt(1.0/(2*(can[2]^2+1))))
        can[2]=rnorm(1,mean=0,sd=sqrt(1.0/(2*(can[1]^2+1))))
        mat[i,1]=can[1]
        mat[i,2]=can[2]
    }
    return(mat)
}

Compare Speed

sourceCpp('gibbs.cpp')
library(rbenchmark)
benchmark(replications=rep(100, 1),
               gibbs_R(1e4),
               gibbs(1e4),
               columns=c('test', 'elapsed','replications'))
            test elapsed replications
1 gibbs_R(10000)  14.255          100
2   gibbs(10000)   0.230          100

Questions?