Copyright © January 04, 2021 by Nathaniel E. Helwig

1 Introduction

1.1 Motivation and Goals

Multiple and generalized nonparametric regression models are powerful extensions of generalized linear models that can be used to estimate unknown functional relationships between a collection of predictors \(X_1,\ldots,X_p\) and an exponential family response variable \(Y\). In R, several options exist for fitting such models. This document focuses on the gam function (in the mgcv package) and the gsm function (in the npreg package), both of which use a regression spline approach for the function estimation. This document provides theoretical background on multiple and generalized nonparametric regression models, as well as examples that illustrate how to use the gam and gsm functions.

1.2 gam Function (mgcv Package)

Syntax:

gam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,
        na.action,offset=NULL,method="GCV.Cp",
        optimizer=c("outer","newton"),control=list(),scale=0,
        select=FALSE,knots=NULL,sp=NULL,min.sp=NULL,H=NULL,gamma=1,
        fit=TRUE,paraPen=NULL,G=NULL,in.out,drop.unused.

1.3 gsm Function (npreg Package)

Syntax:

gsm(formula, family = gaussian, data, weights, types = NULL, tprk = TRUE,
        knots = NULL, update = TRUE, spar = NULL, lambda = NULL, control = list(),
        method = c("GCV", "OCV", "GACV", "ACV", "PQL", "AIC", "BIC"))

Or you could use the sm function for Gaussian data:

sm(formula, data, weights, types = NULL, tprk = TRUE, knots = NULL,
      update = TRUE, df, spar = NULL, lambda = NULL, control = list(),
      method = c("GCV", "OCV", "GACV", "ACV", "REML", "ML", "AIC", "BIC"))
The sm function has the benefit of more tuning methods and more efficient computational algorithms.

1.4 Comparison of Approaches

Overview

The gam and gsm functions differ in a variety of ways. Some key differences include:

  1. The formula specification. The gam function uses specialized syntax, whereas the gsm function uses the standard formula syntax, e.g., as used in the lm and glm functions.

  2. The types specification. The gam function requires the user to specify the smoother type within the formula specification via the arguments to the s function. The gsm function allows the user the input the smoother types separately via the types argument. Otherwise, the gsm function will infer the smoother type from the variable’s class, e.g., numeric class \(\leftrightarrow\) cubic spline, factor class \(\leftrightarrow\) nominal spline, and ordered class \(\leftrightarrow\) ordinal spline.

  3. The knots specification. The gam function requires the user to specify the knots within the formula specification via the arguments to the s function. The gsm function allows the user the input the knots separately via the knots argument. Otherwise, the gsm function will use a random sample of knots.

  4. The gam function has more options avaiable for response variables, whereas the gsm function has more options avaiable for smoothing parameter selection.

  5. Both functions offer some similar smoothers (e.g., cubic smoothing splines), as well as some unique smoothers, e.g., soap film smoothers in the gam function or ordinal smoothers in the gsm function.

Example Function

To see how these functions perform in practice, let’s look at a simulated example. Specifically, let’s simulate some data with a functional relationship where there are two predictors: \(X \in [0,1]\) is continuous and \(Z \in \{a, b, c\}\) is nominal with 3 levels.

# generate data
set.seed(0)
n <- 1000
x <- seq(0, 1, length.out = n)
z <- factor(sample(letters[1:3], size = n, replace = TRUE), levels = letters[1:3])

# define function
fun <- function(x, z){
  mu <- c(-2, 0, 2)
  zi <- as.integer(z)
  fx <- mu[zi] + 3 * x + sin(2 * pi * x + mu[zi]*pi/4)
}
fx <- fun(x, z)

# plot function
cols <- c("black", "red", "blue")
plot(x, fun(x, z = 1), t = "l", lwd = 2, 
     ylim = c(-4, 7), ylab = "f(x, z)")
lines(x, fun(x, z = 2), lty = 2, col = cols[2], lwd = 2)
lines(x, fun(x, z = 3), lty = 3, col = cols[3], lwd = 2)
legend("topleft", legend = paste0("z = ", letters[1:3]), 
       lty = 1:3, col = cols, lwd = 2, bty = "n")

Gaussian Example

Generate some Gaussian data:

set.seed(1)
y <- fx + rnorm(n)

Fit model with gam function:

library(mgcv)
## Loading required package: nlme
## This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.
mod.gam <- gam(y ~ z + s(x, by = z))
mod.gam
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ z + s(x, by = z)
## 
## Estimated degrees of freedom:
## 3.89 4.66 5.02  total = 16.57 
## 
## GCV score: 1.089394

Fit model with gsm function:

library(npreg)
probs <- seq(0, 1, length.out = 9)
knots <- list(x = quantile(x, probs = probs), z = letters[1:3])
mod.gsm <- gsm(y ~ x * z, knots = knots)
mod.gsm
## 
## Call:
## gsm(formula = y ~ x * z, knots = knots)
## 
## Smoothing Parameter:  spar = 0.7239238   lambda = 0.01012573
## Equivalent Degrees of Freedom (Df):  15.11121
## Penalized Criterion (deviance):  1056.335
## Generalized Cross-Validation (GCV):  1.088998

Compare results:

# rmse between truth and estimate
sqrt(mean((fx - mod.gam$linear.predictors)^2))
## [1] 0.1211713
sqrt(mean((fx - mod.gsm$linear.predictors)^2))
## [1] 0.1067063

Note: in this case, the gsm function does a bit better at estimating the unknown true function.

Plot results:

# define new data (for predictions)
xseq <- seq(0, 1, length.out = 100)
newdata <- expand.grid(x = xseq, z = letters[1:3])

# get predictions
yhat.gam <- predict(mod.gam, newdata)
yhat.gsm <- predict(mod.gsm, newdata)

# set-up 1 x 2 subplots
par(mfrow = c(1,2))

# plot predictions (gam)
plot(xseq, yhat.gam[1:100], t = "l", lwd = 2, ylim = c(-4, 7),
     xlab = "x", ylab = "Fitted Values", main = "gam")
lines(xseq, yhat.gam[1:100 + 100], lty = 2, lwd = 2, col = cols[2])
lines(xseq, yhat.gam[1:100 + 200], lty = 3, lwd = 2, col = cols[3])
legend("topleft", legend = paste0("z = ", letters[1:3]), 
       lty = 1:3, col = cols, lwd = 2, bty = "n")

# plot predictions (gsm)
plot(xseq, yhat.gsm[1:100], t = "l", lwd = 2, ylim = c(-4, 7),
     xlab = "x", ylab = "Fitted Values", main = "gsm")
lines(xseq, yhat.gsm[1:100 + 100], lty = 2, lwd = 2, col = cols[2])
lines(xseq, yhat.gsm[1:100 + 200], lty = 3, lwd = 2, col = cols[3])
legend("topleft", legend = paste0("z = ", letters[1:3]), 
       lty = 1:3, col = cols, lwd = 2, bty = "n")

Binomial Example

Generate some binomial data:

set.seed(1)
y <- rbinom(n = n, size = 1, p = 1 / (1 + exp(-fx)))

Fit model with gam function:

library(mgcv)
mod.gam <- gam(y ~ z + s(x, by = z), family = binomial)
mod.gam
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## y ~ z + s(x, by = z)
## 
## Estimated degrees of freedom:
## 3.06 2.81 2.42  total = 11.29 
## 
## UBRE score: -0.1454208

Fit model with gsm function:

library(npreg)
probs <- seq(0, 1, length.out = 9)
knots <- list(x = quantile(x, probs = probs), z = letters[1:3])
mod.gsm <- gsm(y ~ x * z, family = binomial, knots = knots)
mod.gsm
## 
## Call:
## gsm(formula = y ~ x * z, family = binomial, knots = knots)
## 
## Smoothing Parameter:  spar = 0.7053665   lambda = 0.007436275
## Equivalent Degrees of Freedom (Df):  7.440589
## Penalized Criterion (deviance):  844.2546
## Generalized Cross-Validation (GCV):  0.4335775

Compare results:

# rmse between truth and estimate
sqrt(mean((fx - mod.gam$linear.predictors)^2))
## [1] 0.6004284
sqrt(mean((fx - mod.gsm$linear.predictors)^2))
## [1] 0.3966958

Note: in this case, the gsm function does noticeably better at estimating the unknown true function.

Plot results:

# define new data (for predictions)
xseq <- seq(0, 1, length.out = 100)
newdata <- expand.grid(x = xseq, z = letters[1:3])

# get predictions
yhat.gam <- predict(mod.gam, newdata)
yhat.gsm <- predict(mod.gsm, newdata)

# set-up 1 x 2 subplots
par(mfrow = c(1,2))

# plot predictions (gam)
plot(xseq, yhat.gam[1:100], t = "l", lwd = 2, ylim = c(-4, 7),
     xlab = "x", ylab = "Linear Predictors", main = "gam")
lines(xseq, yhat.gam[1:100 + 100], lty = 2, lwd = 2, col = cols[2])
lines(xseq, yhat.gam[1:100 + 200], lty = 3, lwd = 2, col = cols[3])
legend("topleft", legend = paste0("z = ", letters[1:3]), 
       lty = 1:3, col = cols, lwd = 2, bty = "n")

# plot predictions (gsm)
plot(xseq, yhat.gsm[1:100], t = "l", lwd = 2, ylim = c(-4, 7),
     xlab = "x", ylab = "Linear Predictors", main = "gsm")
lines(xseq, yhat.gsm[1:100 + 100], lty = 2, lwd = 2, col = cols[2])
lines(xseq, yhat.gsm[1:100 + 200], lty = 3, lwd = 2, col = cols[3])
legend("topleft", legend = paste0("z = ", letters[1:3]), 
       lty = 1:3, col = cols, lwd = 2, bty = "n")

2 Definition

2.1 Model Form

Suppose we an independent sample of \(n\) observations \((x_{i1}, \ldots, x_{ip}, y_i) \sim F_{X,Y}\), and assume that \(y_i\) follows an exponential family distribution. A multiple and generalized nonparametric regression model assumes that \[ g(\mu_i) = \eta_i = f(x_{i1}, \ldots, x_{ip}) \] where \(\mu_i\) is the conditional mean of \(y_i\) given \(\mathbf{x}_i\), the function \(g(\cdot)\) is the known link function, the term \(\eta_i\) is referred to as the linear predictor, and \(f(\cdot)\) is the unknown smooth function. The goal is to estimate the unknown function \(f(\cdot)\) from the sample of data.

2.2 Penalized Likelihood Estimation

To estimate \(f(\cdot)\), a generalized smooth model (GSM) minimizes the penalized likelihood functional \[ f_\lambda = \min_{f \in \mathcal{H}} -\frac{1}{n} L(f) + \lambda J(f) \] where \(L(f)\) denotes the log-likelihood function of \(y_i\) given \(\mathbf{x}_i\), the penalty functional \(J(f) \geq 0\) quantifies the “roughness” of the function, and \(\lambda > 0\) is the smoothing parameter that controls the influence of the penalty.

The function space \(\mathcal{H} = \{f: J(f) < \infty \}\) is the space of functions with finite penalty. The nature of the penalty functional \(J(\cdot)\) will depend on the form of the model (i.e., additive or interactive effects) and the types of smoothers used for the predictors (e.g., continuous or nominal predictors).

2.3 Additive vs. Interactive Effects

The unknown function \(f \in \mathcal{H}\) could be assumed to contain nonparametric additive or interactive effects. For example, with \(p = 2\) predictors, we could assume either \[ \begin{split} \mbox{additive:}& \quad \eta_i = f_0 + f_1(x_{i1}) + f_2(x_{i2}) \\ \mbox{interaction:}& \quad \eta_i = f_0 + f_1(x_{i1}) + f_2(x_{i2}) + f_{12}(x_{i1}, x_{i2})\\ \end{split} \] where \(f_0 \in \mathcal{H}_0\) is a constant (intercept) term, \(f_1 \in \mathcal{H}_1\) is the main effect of the first predictor, \(f_2 \in \mathcal{H}_2\) is the main effect of the second predictor, and \(f_{12} \in \mathcal{H}_{12}\) is the interaction effect.

Using this notation, we can decompose the function space such as \[ \mathcal{H} = \mathcal{H}_0 \oplus \mathcal{H}_1 \oplus \mathcal{H}_2 \oplus \mathcal{H}_{12} \] where \(\oplus\) denotes the tensor summation. Note that the additive model corresponds to removing the subspace \(\mathcal{H}_{12}\) from the function space, i.e., \(\mathcal{H} = \mathcal{H}_0 \oplus \mathcal{H}_1 \oplus \mathcal{H}_2\) for the additive model.

2.4 Different Types of Smoothers

The way smoothness is defined for each predictor will depend on the type of predictor (or class of the variable in R lingo).

gam gsm
Nominal NA type = “nom”
Ordinal NA type = “ord”
Polynomial (linear) NA type = “lin”
Polynomial (cubic) bs = “cr” type = “cub”
Polynomial (quintic) NA type = “qui”
Periodic (linear) NA type = “per.lin”
Periodic (cubic) bs = “cc” type = “per”
Periodic (quintic) NA type = “per.qui”
Thin plate bs = “tp” type = “tps”
Spherical bs = “sos” type = “sph”

The gam function also offers several smoother types that are not listed here (see ?smooth.terms).

Note that the gam function requires the user to specify the smoother types when specifying the model formula (or the default type of bs = “tp” is used). In contrast, the gsm function will select a type that is reasonable for each predictor (based on the predictor’s class.)

The defaults in the sm and gsm functions are:

  • factor or character \(\leftrightarrow\) type = “nom”

  • ordered \(\leftrightarrow\) type = “ord”

  • integer or numeric \(\leftrightarrow\) type = “cub”

  • matrix \(\leftrightarrow\) type = “tps”

3 Example 1: Prestige by Income and Education (Gaussian)

3.1 Overview of Data

The Prestige dataset (from the car package) contains the prestige of \(n = 102\) Canadian occupations from 1971, as well as the average income and education level of the occupation. We will use a multiple nonparametric regression model to model presitge as a function of income and education.

First, let’s load the data and visualize the relationship between income (\(X_1\)), education (\(X_2\)), and prestige (\(Y\)).

# load data
library(car)
## Loading required package: carData
data(Prestige)
head(Prestige)
##                     education income women prestige census type
## gov.administrators      13.11  12351 11.16     68.8   1113 prof
## general.managers        12.26  25879  4.02     69.1   1130 prof
## accountants             12.77   9271 15.70     63.4   1171 prof
## purchasing.officers     11.42   8865  9.11     56.8   1175 prof
## chemists                14.62   8403 11.68     73.5   2111 prof
## physicists              15.64  11030  5.13     77.6   2113 prof
# 3d scatterplot
library(rgl)
plot3d(prestige ~ income * education, data = Prestige,
       col = ifelse(type == "prof", "red", "black"))
rglwidget()

Note: the marginal relationship between prestige and income looks non-linear, whereas the marginal relationship between prestige and education looks mostly linear.

3.2 Analyses and Results: gam Function

Model Fitting

R code to fit additive and interaction models with gam function:

# additive model (using default number of knots for each predictor)
add.gam <- gam(prestige ~ s(income, bs = "cr") + s(education, bs = "cr"), 
               data = Prestige)
add.gam
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## prestige ~ s(income, bs = "cr") + s(education, bs = "cr")
## 
## Estimated degrees of freedom:
## 3.00 3.15  total = 7.15 
## 
## GCV score: 51.98404
# interaction model (using default number of knots for each predictor)
int.gam <- gam(prestige ~ s(income, bs = "cr") + s(education, bs = "cr") 
               + ti(income, education, bs = c("cr", "cr")), data = Prestige)
int.gam
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## prestige ~ s(income, bs = "cr") + s(education, bs = "cr") + ti(income, 
##     education, bs = c("cr", "cr"))
## 
## Estimated degrees of freedom:
## 2.07 3.08 2.00  total = 8.16 
## 
## GCV score: 50.43405

Model Summary

R code to summarize the fit models:

# summarize additive model
summary(add.gam)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## prestige ~ s(income, bs = "cr") + s(education, bs = "cr")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  46.8333     0.6884   68.03   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                edf Ref.df     F p-value    
## s(income)    2.996  3.604 16.68  <2e-16 ***
## s(education) 3.154  3.913 39.22  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.837   Deviance explained = 84.7%
## GCV = 51.984  Scale est. = 48.34     n = 102
# summarize interaction model
summary(int.gam)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## prestige ~ s(income, bs = "cr") + s(education, bs = "cr") + ti(income, 
##     education, bs = c("cr", "cr"))
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   48.996      1.055   46.45   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                        edf Ref.df      F  p-value    
## s(income)            2.074  2.543 20.252 2.11e-07 ***
## s(education)         3.083  3.875 41.372  < 2e-16 ***
## ti(income,education) 1.998  2.308  3.426   0.0299 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.843   Deviance explained = 85.4%
## GCV = 50.434  Scale est. = 46.402    n = 102

Model Comparison

Compare additive and interaction models:

# pseudo F test of interaction effect
sse.dif <- add.gam$deviance - int.gam$deviance
df.dif <- sum(int.gam$edf1) - sum(add.gam$edf1)
Fstat <- (sse.dif / df.dif) / int.gam$sig2
pvalue <- 1 - pf(Fstat, df1 = df.dif, df2 = nrow(Prestige) - sum(int.gam$edf))
c(Fstat, pvalue)
## [1] 4.10952931 0.03805945
# analysis of deviance table (gam)
anova(add.gam, int.gam, test = "F")
## Analysis of Deviance Table
## 
## Model 1: prestige ~ s(income, bs = "cr") + s(education, bs = "cr")
## Model 2: prestige ~ s(income, bs = "cr") + s(education, bs = "cr") + ti(income, 
##     education, bs = c("cr", "cr"))
##   Resid. Df Resid. Dev     Df Deviance      F  Pr(>F)  
## 1    93.483     4585.1                                 
## 2    92.274     4354.5 1.2089   230.53 4.1095 0.03811 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Plot Results

Plot results for additive model:

# setup 1 x 2 subplots
par(mfrow = c(1,2))

# get predictor sequences
xrng <- list(income = range(Prestige$income),
             education = range(Prestige$education))
inc.seq <- seq(xrng$income[1], xrng$income[2], length.out = 25)
edu.seq <- seq(xrng$education[1], xrng$education[2], length.out = 25)

# main effect of income
newdata <- data.frame(income = inc.seq, education = mean(Prestige$education))
yhat.inc <- predict(add.gam, newdata = newdata, se.fit = TRUE)
plotci(inc.seq, yhat.inc$fit, yhat.inc$se.fit,
       xlab = "Income", ylab = "Prestige",
       main = "Income Effect")
rug(Prestige$income)

# main effect of education
newdata <- data.frame(income = mean(Prestige$income), education = edu.seq)
yhat.edu <- predict(add.gam, newdata = newdata, se.fit = TRUE)
plotci(edu.seq, yhat.edu$fit, yhat.edu$se.fit,
       xlab = "Education", ylab = "Prestige",
       main = "Education Effect")
rug(Prestige$education)

Plot results for interaction model:

# setup 2 x 2 subplots
par(mfrow = c(2,2))

# quantiles of education
labs <- c("Low", "Low-Med", "Med-High", "High")
probs <- seq(0.2, 0.8, by = 0.2)
qvals <- unname(quantile(Prestige$education, probs = probs))
qvals
## [1]  8.128  9.708 11.172 13.620
# plot prestige ~ income at different education levels
for(j in 1:4){
  newdata <- data.frame(income = inc.seq, education = qvals[j])
  yhat <- predict(int.gam, newdata, se.fit = TRUE)
  plotci(inc.seq, yhat$fit, yhat$se.fit, ylim = c(10, 110),
         xlab = "Income", ylab = "Prestige",
         main = paste0("Education = ", labs[j]))
}

Surface plots for additive and interaction models:

# get predictions (for visualization)
inc.seq <- seq(xrng$income[1], xrng$income[2], length.out = 25)
edu.seq <- seq(xrng$education[1], xrng$education[2], length.out = 25)
newdata <- expand.grid(income = inc.seq, education = edu.seq)
yhat.add <- predict(add.gam, newdata)
yhat.int <- predict(int.gam, newdata)

# plot results
library(lattice)
par(mfrow = c(1,2))
persp(x = inc.seq, y = edu.seq, z = matrix(yhat.add, 25, 25),
      theta = -45, phi = 30, xlab = "Income", ylab = "Education",
      zlab = "Prestige", main = "Additive", zlim = c(10, 90))
persp(x = inc.seq, y = edu.seq, z = matrix(yhat.int, 25, 25),
      theta = -45, phi = 30, xlab = "Income", ylab = "Education",
      zlab = "Prestige", main = "Interaction", zlim = c(10, 90))

3.3 Analyses and Results: gsm Function

Model Fitting

R code to fit additive and interaction models with gsm function (using all \(n = 102\) data points as knots):

# additive model (using all knots)
add.gsm <- gsm(prestige ~ income + education, data = Prestige, knots = 102)
add.gsm
## 
## Call:
## gsm(formula = prestige ~ income + education, data = Prestige, 
##     knots = 102)
## 
## Smoothing Parameter:  spar = 1   lambda = 0.9999995
## Equivalent Degrees of Freedom (Df):  7.531583
## Penalized Criterion (deviance):  4565.108
## Generalized Cross-Validation (GCV):  52.17687
# interaction model (using all knots)
int.gsm <- gsm(prestige ~ income * education, data = Prestige, knots = 102)
int.gsm
## 
## Call:
## gsm(formula = prestige ~ income * education, data = Prestige, 
##     knots = 102)
## 
## Smoothing Parameter:  spar = 0.9631003   lambda = 0.5412649
## Equivalent Degrees of Freedom (Df):  8.221877
## Penalized Criterion (deviance):  4377.092
## Generalized Cross-Validation (GCV):  50.76715

Model Summary

R code to summarize the fit models:

# summarize additive model
summary(add.gsm)
## 
## Call:
## gsm(formula = prestige ~ income + education, data = Prestige, 
##     knots = 102)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.2811  -4.6473   0.8965   3.7966  18.2977 
## 
## Approx. Signif. of Parametric Effects:
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)    55.25      1.731  31.928 0.000e+00 ***
## income         33.86      6.003   5.640 1.761e-07 ***
## education      33.82      4.386   7.711 1.251e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Approx. Signif. of Nonparametric Effects:
##                  Df Sum Sq Mean Sq F value    Pr(>F)    
## s(income)     2.299  894.7  389.16   8.053 0.0003152 ***
## s(education)  2.452  361.5  147.46   3.052 0.0418193   *
## Residuals    94.468 4565.1   48.32                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Dispersion Parameter: 48.32,   Deviance Explained: 0.8473
##   Multiple R-squared: 0.8474,   Adjusted R-squared: 0.8367
# summarize interaction model
summary(int.gsm)
## 
## Call:
## gsm(formula = prestige ~ income * education, data = Prestige, 
##     knots = 102)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.8121  -4.5071   0.9162   4.2439  19.3760 
## 
## Approx. Signif. of Parametric Effects:
##                  Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)         61.19      2.575  23.766 0.000e+00 ***
## income              39.41      8.200   4.807 5.825e-06 ***
## education           24.12      8.553   2.820 5.854e-03  **
## income:education   -62.59     28.195  -2.220 2.883e-02   *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Approx. Signif. of Nonparametric Effects:
##                          Df  Sum Sq Mean Sq F value   Pr(>F)   
## s(income)            0.8165   45.74   56.02   1.200 0.276093   
## s(education)         2.3423  621.39  265.29   5.684 0.002963 **
## s(income:education)  3.9008  352.39   90.34   1.935 0.112620   
## Residuals           93.7781 4377.09   46.67                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Dispersion Parameter: 46.67,   Deviance Explained: 0.8536
##   Multiple R-squared: 0.8537,   Adjusted R-squared: 0.8423

Model Comparison

Compare additive and interaction models:

# pseudo F test of interaction effect
sse.dif <- add.gsm$deviance - int.gsm$deviance
df.dif <- int.gsm$df - add.gsm$df
Fstat <- (sse.dif / df.dif) / int.gsm$dispersion
pvalue <- 1 - pf(Fstat, df1 = df.dif, df2 = nrow(Prestige) - int.gsm$df)
c(Fstat, pvalue)
## [1] 5.83548907 0.02811473

Plot Results

Plot results for additive model:

# setup 1 x 2 subplots
par(mfrow = c(1,2))

# get predictor sequences
xrng <- add.gsm$specs$xrng
inc.seq <- seq(xrng$income[1], xrng$income[2], length.out = 25)
edu.seq <- seq(xrng$education[1], xrng$education[2], length.out = 25)

# main effect of income
yhat.inc <- predict(add.gsm, newdata = data.frame(income = inc.seq),
                    se.fit = TRUE, terms = "income")
plotci(inc.seq, yhat.inc$fit, yhat.inc$se.fit,
       xlab = "Income", ylab = "Prestige",
       main = "Income Effect")
rug(Prestige$income)

# main effect of education
yhat.edu <- predict(add.gsm, newdata = data.frame(education = edu.seq),
                    se.fit = TRUE, terms = "education")
plotci(edu.seq, yhat.edu$fit, yhat.edu$se.fit,
       xlab = "Education", ylab = "Prestige",
       main = "Education Effect")
rug(Prestige$education)

Plot results for interaction model:

# setup 2 x 2 subplots
par(mfrow = c(2,2))

# quantiles of education
labs <- c("Low", "Low-Med", "Med-High", "High")
probs <- seq(0.2, 0.8, by = 0.2)
qvals <- unname(quantile(Prestige$education, probs = probs))
qvals
## [1]  8.128  9.708 11.172 13.620
# plot prestige ~ income at different education levels
for(j in 1:4){
  newdata <- data.frame(income = inc.seq, education = qvals[j])
  yhat <- predict(int.gsm, newdata, se.fit = TRUE)
  plotci(inc.seq, yhat$fit, yhat$se.fit, ylim = c(10, 110),
         xlab = "Income", ylab = "Prestige",
         main = paste0("Education = ", labs[j]))
}

Surface plots for additive and interaction models:

# get predictions (for visualization)
xrng <- add.gsm$specs$xrng
inc.seq <- seq(xrng$income[1], xrng$income[2], length.out = 25)
edu.seq <- seq(xrng$education[1], xrng$education[2], length.out = 25)
newdata <- expand.grid(income = inc.seq, education = edu.seq)
yhat.add <- predict(add.gsm, newdata)
yhat.int <- predict(int.gsm, newdata)

# plot results
library(lattice)
par(mfrow = c(1,2))
persp(x = inc.seq, y = edu.seq, z = matrix(yhat.add, 25, 25),
      theta = -45, phi = 30, xlab = "Income", ylab = "Education",
      zlab = "Prestige", main = "Additive", zlim = c(10, 90))
persp(x = inc.seq, y = edu.seq, z = matrix(yhat.int, 25, 25),
      theta = -45, phi = 30, xlab = "Income", ylab = "Education",
      zlab = "Prestige", main = "Interaction", zlim = c(10, 90))

3.4 Analyses and Results: sm Function

Model Fitting

R code to fit additive and interaction models with sm function (using all \(n = 102\) data points as knots):

# additive model (using all knots)
add.sm <- sm(prestige ~ income + education, data = Prestige, knots = 102)
add.sm
## 
## Call:
## sm(formula = prestige ~ income + education, data = Prestige, 
##     knots = 102)
## 
## Smoothing Parameter:  spar = 1.007553   lambda = 1.133891
## Equivalent Degrees of Freedom (Df):  7.329865
## Penalized Criterion (RSS):  4583.735
## Generalized Cross-Validation (GCV):  52.16674
# interaction model (using all knots)
int.sm <- sm(prestige ~ income * education, data = Prestige, knots = 102)
int.sm
## 
## Call:
## sm(formula = prestige ~ income * education, data = Prestige, 
##     knots = 102)
## 
## Smoothing Parameter:  spar = 0.9631003   lambda = 0.541265
## Equivalent Degrees of Freedom (Df):  8.221877
## Penalized Criterion (RSS):  4377.092
## Generalized Cross-Validation (GCV):  50.76715

Model Summary

R code to summarize the fit models:

# summarize additive model
summary(add.sm)
## 
## Call:
## sm(formula = prestige ~ income + education, data = Prestige, 
##     knots = 102)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.2685  -4.5381   0.9389   3.8369  18.2640 
## 
## Approx. Signif. of Parametric Effects:
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)    55.30      1.714  32.262 0.000e+00 ***
## income         33.67      5.965   5.645 1.719e-07 ***
## education      33.95      4.322   7.856 6.172e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Approx. Signif. of Nonparametric Effects:
##                  Df Sum Sq Mean Sq F value    Pr(>F)    
## s(income)     2.191  880.0  401.62   8.295 0.0003157 ***
## s(education)  2.338  348.3  148.96   3.076 0.0430021   *
## Residuals    94.670 4583.7   48.42                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 6.958 on 94.7 degrees of freedom
## Multiple R-squared:  0.8467,    Adjusted R-squared:  0.8364
## F-statistic: 81.85 on 6.33 and 94.67 DF,  p-value: <2e-16
# summarize interaction model
summary(int.sm)
## 
## Call:
## sm(formula = prestige ~ income * education, data = Prestige, 
##     knots = 102)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.8121  -4.5071   0.9162   4.2439  19.3760 
## 
## Approx. Signif. of Parametric Effects:
##                  Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)         61.19      2.575  23.766 0.000e+00 ***
## income              39.41      8.200   4.807 5.825e-06 ***
## education           24.12      8.553   2.820 5.854e-03  **
## income:education   -62.59     28.195  -2.220 2.883e-02   *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Approx. Signif. of Nonparametric Effects:
##                          Df  Sum Sq Mean Sq F value   Pr(>F)   
## s(income)            0.8165   45.74   56.02   1.200 0.276093   
## s(education)         2.3423  621.39  265.29   5.684 0.002963 **
## s(income:education)  3.9008  352.39   90.34   1.935 0.112620   
## Residuals           93.7781 4377.09   46.67                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 6.832 on 93.8 degrees of freedom
## Multiple R-squared:  0.8537,    Adjusted R-squared:  0.8423
## F-statistic: 74.98 on 7.222 and 93.78 DF,  p-value: <2e-16

Model Comparison

Compare additive and interaction models:

# pseudo F test of interaction effect
sse.dif <- add.sm$sse - int.sm$sse
df.dif <- int.sm$df - add.sm$df
Fstat <- (sse.dif / df.dif) / int.sm$sigma^2
pvalue <- 1 - pf(Fstat, df1 = df.dif, df2 = nrow(Prestige) - int.sm$df)
c(Fstat, pvalue)
## [1] 4.96324831 0.03209404

Plot Results

Plot results for additive model:

# setup 1 x 2 subplots
par(mfrow = c(1,2))

# get predictor sequences
xrng <- add.sm$specs$xrng
inc.seq <- seq(xrng$income[1], xrng$income[2], length.out = 25)
edu.seq <- seq(xrng$education[1], xrng$education[2], length.out = 25)

# main effect of income
yhat.inc <- predict(add.sm, newdata = data.frame(income = inc.seq),
                    se.fit = TRUE, terms = "income")
plotci(inc.seq, yhat.inc$fit, yhat.inc$se.fit,
       xlab = "Income", ylab = "Prestige",
       main = "Income Effect")
rug(Prestige$income)

# main effect of education
yhat.edu <- predict(add.sm, newdata = data.frame(education = edu.seq),
                    se.fit = TRUE, terms = "education")
plotci(edu.seq, yhat.edu$fit, yhat.edu$se.fit,
       xlab = "Education", ylab = "Prestige",
       main = "Education Effect")
rug(Prestige$education)

Plot results for interaction model:

# setup 2 x 2 subplots
par(mfrow = c(2,2))

# quantiles of education
labs <- c("Low", "Low-Med", "Med-High", "High")
probs <- seq(0.2, 0.8, by = 0.2)
qvals <- unname(quantile(Prestige$education, probs = probs))
qvals
## [1]  8.128  9.708 11.172 13.620
# plot prestige ~ income at different education levels
for(j in 1:4){
  newdata <- data.frame(income = inc.seq, education = qvals[j])
  yhat <- predict(int.sm, newdata, se.fit = TRUE)
  plotci(inc.seq, yhat$fit, yhat$se.fit, ylim = c(10, 110),
         xlab = "Income", ylab = "Prestige",
         main = paste0("Education = ", labs[j]))
}

Surface plots for additive and interaction models:

# get predictions (for visualization)
xrng <- add.sm$specs$xrng
inc.seq <- seq(xrng$income[1], xrng$income[2], length.out = 25)
edu.seq <- seq(xrng$education[1], xrng$education[2], length.out = 25)
newdata <- expand.grid(income = inc.seq, education = edu.seq)
yhat.add <- predict(add.sm, newdata)
yhat.int <- predict(int.sm, newdata)

# plot results
library(lattice)
par(mfrow = c(1,2))
persp(x = inc.seq, y = edu.seq, z = matrix(yhat.add, 25, 25),
      theta = -45, phi = 30, xlab = "Income", ylab = "Education",
      zlab = "Prestige", main = "Additive", zlim = c(10, 90))
persp(x = inc.seq, y = edu.seq, z = matrix(yhat.int, 25, 25),
      theta = -45, phi = 30, xlab = "Income", ylab = "Education",
      zlab = "Prestige", main = "Interaction", zlim = c(10, 90))

4 Example 2: Titantic Survival by Age and Sex (Binomial)

4.1 Overview of Data

The TitanicSurvival dataset (from the car package) contains survival and demographic information for \(n = 1309\) passengers on the Titanic. We will use a nonparametric logistic regression model to model the probability of survial as a function of age and sex.

First, let’s load the data and visualize the relationship between age (\(X_1\)), sex (\(X_2\)), and survival (\(Y\)).

# load data
library(car)
data(TitanicSurvival)
head(TitanicSurvival)
##                                 survived    sex     age passengerClass
## Allen, Miss. Elisabeth Walton        yes female 29.0000            1st
## Allison, Master. Hudson Trevor       yes   male  0.9167            1st
## Allison, Miss. Helen Loraine          no female  2.0000            1st
## Allison, Mr. Hudson Joshua Crei       no   male 30.0000            1st
## Allison, Mrs. Hudson J C (Bessi       no female 25.0000            1st
## Anderson, Mr. Harry                  yes   male 48.0000            1st
# plot survived ~ age * sex
y <- ifelse(TitanicSurvival$survived == "yes", 1, 0)
plot(TitanicSurvival$age, y,
     xlab = "Age (Years)", ylab = "Survived (1 = Yes)",
     pch = ifelse(TitanicSurvival$sex == "male", 16, 17),
     col = ifelse(TitanicSurvival$sex == "male", "blue", "red"))
legend("right", c("female", "male"), pch = c(17, 16),
       col = c("red", "blue"), bty = "n")

4.2 Analyses and Results: gam Function

Model Fitting

R code to fit additive and interaction models with gam function:

# additive model
add.gam <- gam(y ~ sex + s(age, bs = "cr"), 
               family = binomial, data = TitanicSurvival)
add.gam
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## y ~ sex + s(age, bs = "cr")
## 
## Estimated degrees of freedom:
## 6.91  total = 8.91 
## 
## UBRE score: 0.04974017
# interaction model
int.gam <- gam(y ~ sex + s(age, bs = "cr", by = sex), 
               family = binomial, data = TitanicSurvival)
int.gam
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## y ~ sex + s(age, bs = "cr", by = sex)
## 
## Estimated degrees of freedom:
## 1.00 6.98  total = 9.98 
## 
## UBRE score: 0.02615761

Model Summary

R code to summarize the fit models:

# summarize additive model
summary(add.gam)
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## y ~ sex + s(age, bs = "cr")
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.1228     0.1191   9.425   <2e-16 ***
## sexmale      -2.4907     0.1549 -16.075   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##          edf Ref.df Chi.sq p-value  
## s(age) 6.911  7.976  18.38   0.016 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.302   Deviance explained = 23.6%
## UBRE = 0.04974  Scale est. = 1         n = 1046
# summarize interaction model
summary(int.gam)
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## y ~ sex + s(age, bs = "cr", by = sex)
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.1662     0.1226   9.515   <2e-16 ***
## sexmale      -2.5622     0.1590 -16.112   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                    edf Ref.df Chi.sq  p-value    
## s(age):sexfemale 1.000  1.001  6.958  0.00835 ** 
## s(age):sexmale   6.979  8.030 36.329 8.79e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.32   Deviance explained = 25.5%
## UBRE = 0.026158  Scale est. = 1         n = 1046

Model Comparison

Compare additive and interaction models:

# pseudo Chisq test of interaction effect
sse.dif <- add.gam$deviance - int.gam$deviance
df.dif <- sum(int.gam$edf1) - sum(add.gam$edf1)
pvalue <- 1 - pchisq(sse.dif, df = df.dif)
c(sse.dif, pvalue)
## [1] 2.680526e+01 2.547716e-07
# analysis of deviance table (gam)
anova(add.gam, int.gam, test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: y ~ sex + s(age, bs = "cr")
## Model 2: y ~ sex + s(age, bs = "cr", by = sex)
##   Resid. Df Resid. Dev     Df Deviance  Pr(>Chi)    
## 1      1036     1080.2                              
## 2      1035     1053.4 1.0544   26.805 2.548e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Plot Results

Plot results for additive and interaction models:

# get predictions (for visualization)
xrng <- list(age = range(TitanicSurvival$age, na.rm = TRUE))
age.seq <- seq(xrng$age[1], xrng$age[2], length.out = 25)
sex.seq <- c("female", "male")
newdata <- expand.grid(age = age.seq, sex = sex.seq)
yhat.add <- predict(add.gam, newdata, type = "response")
yhat.int <- predict(int.gam, newdata, type = "response")

# additive model
par(mfrow = c(1,2))
plot(age.seq, yhat.add[1:25], ylim = c(0, 1), t = "l", col = "red",
     xlab = "Age (Years)", ylab = "Probability of Survival",
     main = "Additive Model")
lines(age.seq, yhat.add[1:25 + 25], lty = 2, col = "blue")
legend("bottomleft", legend = c("female", "male"), lty = 1:2,
       col = c("red", "blue"), horiz = TRUE, bty = "n")

# interaction model
plot(age.seq, yhat.int[1:25], ylim = c(0, 1), t = "l", col = "red",
     xlab = "Age (Years)", ylab = "Probability of Survival",
     main = "Interaction Model")
lines(age.seq, yhat.int[1:25 + 25], lty = 2, col = "blue")
legend("bottomleft", legend = c("female", "male"), lty = 1:2,
       col = c("red", "blue"), horiz = TRUE, bty = "n")

4.3 Analyses and Results: gsm Function

Model Fitting

R code to fit additive and interaction models with gsm function:

# select knots (age quantiles by sex)
probs <- seq(0, 1, length.out = 9)
knots <- with(TitanicSurvival, tapply(age, sex, quantile, probs = probs, na.rm = T))
knots <- list(age = unlist(knots), sex = rep(c("female", "male"), each = 9))
#knots <- list(age = quantile(TitanicSurvival$age, probs = probs, na.rm = TRUE),
#              sex = c("female", "male"))

# additive model
add.gsm <- gsm(y ~ age + sex, family = binomial,
               data = TitanicSurvival, knots = knots)
add.gsm
## 
## Call:
## gsm(formula = y ~ age + sex, family = binomial, data = TitanicSurvival, 
##     knots = knots)
## 
## Smoothing Parameter:  spar = 0.6554119   lambda = 0.003239273
## Equivalent Degrees of Freedom (Df):  9.723163
## Penalized Criterion (deviance):  1078.102
## Generalized Cross-Validation (GCV):  0.5245551
# interaction model
int.gsm <- gsm(y ~ age * sex, family = binomial,
               data = TitanicSurvival, knots = knots)
int.gsm
## 
## Call:
## gsm(formula = y ~ age * sex, family = binomial, data = TitanicSurvival, 
##     knots = knots)
## 
## Smoothing Parameter:  spar = 0.6088316   lambda = 0.001492511
## Equivalent Degrees of Freedom (Df):  9.023688
## Penalized Criterion (deviance):  1062.67
## Generalized Cross-Validation (GCV):  0.5165595

Model Summary

R code to summarize the fit models:

# summarize additive model
summary(add.gsm)
## 
## Call:
## gsm(formula = y ~ age + sex, family = binomial, data = TitanicSurvival, 
##     knots = knots)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1924 -0.6839 -0.6077  0.7891  1.9287 
## 
## Approx. Signif. of Parametric Effects:
##             Estimate Std. Error  z value Pr(>|z|)  
## (Intercept) -0.07714     0.1608 -0.47966   0.6315  
## age         -0.14833     1.4976 -0.09904   0.9211  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Approx. Signif. of Nonparametric Effects:
##                 Df  Chi Sq p.value    
## s(age)       6.755   15.93 0.02239   *
## s(sex)       1.006  250.36 0.00000 ***
## Residuals 1036.277 1078.10            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Dispersion Parameter: 1,   Deviance Explained: 0.2379
##   Multiple R-squared: 0.309,   Adjusted R-squared: 0.3031
# summarize interaction model
summary(int.gsm)
## 
## Call:
## gsm(formula = y ~ age * sex, family = binomial, data = TitanicSurvival, 
##     knots = knots)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9217 -0.6483 -0.6253  0.7849  1.9419 
## 
## Approx. Signif. of Parametric Effects:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -0.03578     0.1626 -0.2201   0.8258  
## age         -0.66317     1.0636 -0.6235   0.5329  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Approx. Signif. of Nonparametric Effects:
##                  Df   Chi Sq  p.value    
## s(age)        2.908    6.434 0.086450   .
## s(sex)        3.426  249.238 0.000000 ***
## s(age:sex)    5.907   19.488 0.003188  **
## Residuals  1036.976 1062.670             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Dispersion Parameter: 1,   Deviance Explained: 0.2488
##   Multiple R-squared: 0.3203,   Adjusted R-squared: 0.3149

Model Comparison

Compare additive and interaction models:

# pseudo Chisq test of interaction effect
sse.dif <- add.gsm$deviance - int.gsm$deviance
df.dif <- int.gsm$df - add.gsm$df
pvalue <- 1 - pchisq(sse.dif, df = df.dif)
## Warning in pchisq(sse.dif, df = df.dif): NaNs produced
c(sse.dif, pvalue)
## [1] 15.43197      NaN
# df.dif is negative, so set df.dif = 1
pvalue <- 1 - pchisq(sse.dif, df = 1)
c(sse.dif, pvalue)
## [1] 1.543197e+01 8.552917e-05

Plot Results

Plot results for additive and interaction models:

# get predictions (for visualization)
xrng <- add.gsm$specs$xrng
age.seq <- seq(xrng$age[1], xrng$age[2], length.out = 25)
sex.seq <- c("female", "male")
newdata <- expand.grid(age = age.seq, sex = sex.seq)
yhat.add <- predict(add.gsm, newdata, type = "response")
yhat.int <- predict(int.gsm, newdata, type = "response")

# additive model
par(mfrow = c(1,2))
plot(age.seq, yhat.add[1:25], ylim = c(0, 1), t = "l", col = "red",
     xlab = "Age (Years)", ylab = "Probability of Survival",
     main = "Additive Model")
lines(age.seq, yhat.add[1:25 + 25], lty = 2, col = "blue")
legend("bottomleft", legend = c("female", "male"), lty = 1:2,
       col = c("red", "blue"), horiz = TRUE, bty = "n")

# interaction model
plot(age.seq, yhat.int[1:25], ylim = c(0, 1), t = "l", col = "red",
     xlab = "Age (Years)", ylab = "Probability of Survival",
     main = "Interaction Model")
lines(age.seq, yhat.int[1:25 + 25], lty = 2, col = "blue")
legend("bottomleft", legend = c("female", "male"), lty = 1:2,
       col = c("red", "blue"), horiz = TRUE, bty = "n")