Copyright © January 04, 2021 by Nathaniel E. Helwig
Multiple and generalized nonparametric regression models are powerful extensions of generalized linear models that can be used to estimate unknown functional relationships between a collection of predictors \(X_1,\ldots,X_p\) and an exponential family response variable \(Y\). In R, several options exist for fitting such models. This document focuses on the gam
function (in the mgcv package) and the gsm
function (in the npreg package), both of which use a regression spline approach for the function estimation. This document provides theoretical background on multiple and generalized nonparametric regression models, as well as examples that illustrate how to use the gam
and gsm
functions.
Syntax:
gam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,
na.action,offset=NULL,method="GCV.Cp",
optimizer=c("outer","newton"),control=list(),scale=0,
select=FALSE,knots=NULL,sp=NULL,min.sp=NULL,H=NULL,gamma=1,
fit=TRUE,paraPen=NULL,G=NULL,in.out,drop.unused.
Syntax:
gsm(formula, family = gaussian, data, weights, types = NULL, tprk = TRUE,
knots = NULL, update = TRUE, spar = NULL, lambda = NULL, control = list(),
method = c("GCV", "OCV", "GACV", "ACV", "PQL", "AIC", "BIC"))
Or you could use the sm
function for Gaussian data:
sm(formula, data, weights, types = NULL, tprk = TRUE, knots = NULL,
update = TRUE, df, spar = NULL, lambda = NULL, control = list(),
method = c("GCV", "OCV", "GACV", "ACV", "REML", "ML", "AIC", "BIC"))
The sm
function has the benefit of more tuning methods and more efficient computational algorithms.
Overview
The gam
and gsm
functions differ in a variety of ways. Some key differences include:
The formula
specification. The gam
function uses specialized syntax, whereas the gsm
function uses the standard formula
syntax, e.g., as used in the lm
and glm
functions.
The types
specification. The gam
function requires the user to specify the smoother type within the formula
specification via the arguments to the s
function. The gsm
function allows the user the input the smoother types separately via the types
argument. Otherwise, the gsm
function will infer the smoother type from the variable’s class
, e.g., numeric class \(\leftrightarrow\) cubic spline, factor class \(\leftrightarrow\) nominal spline, and ordered class \(\leftrightarrow\) ordinal spline.
The knots
specification. The gam
function requires the user to specify the knots within the formula
specification via the arguments to the s
function. The gsm
function allows the user the input the knots separately via the knots
argument. Otherwise, the gsm
function will use a random sample of knots.
The gam
function has more options avaiable for response variables, whereas the gsm
function has more options avaiable for smoothing parameter selection.
Both functions offer some similar smoothers (e.g., cubic smoothing splines), as well as some unique smoothers, e.g., soap film smoothers in the gam
function or ordinal smoothers in the gsm
function.
Example Function
To see how these functions perform in practice, let’s look at a simulated example. Specifically, let’s simulate some data with a functional relationship where there are two predictors: \(X \in [0,1]\) is continuous and \(Z \in \{a, b, c\}\) is nominal with 3 levels.
# generate data
set.seed(0)
n <- 1000
x <- seq(0, 1, length.out = n)
z <- factor(sample(letters[1:3], size = n, replace = TRUE), levels = letters[1:3])
# define function
fun <- function(x, z){
mu <- c(-2, 0, 2)
zi <- as.integer(z)
fx <- mu[zi] + 3 * x + sin(2 * pi * x + mu[zi]*pi/4)
}
fx <- fun(x, z)
# plot function
cols <- c("black", "red", "blue")
plot(x, fun(x, z = 1), t = "l", lwd = 2,
ylim = c(-4, 7), ylab = "f(x, z)")
lines(x, fun(x, z = 2), lty = 2, col = cols[2], lwd = 2)
lines(x, fun(x, z = 3), lty = 3, col = cols[3], lwd = 2)
legend("topleft", legend = paste0("z = ", letters[1:3]),
lty = 1:3, col = cols, lwd = 2, bty = "n")
Gaussian Example
Generate some Gaussian data:
set.seed(1)
y <- fx + rnorm(n)
Fit model with gam
function:
library(mgcv)
## Loading required package: nlme
## This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.
mod.gam <- gam(y ~ z + s(x, by = z))
mod.gam
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ z + s(x, by = z)
##
## Estimated degrees of freedom:
## 3.89 4.66 5.02 total = 16.57
##
## GCV score: 1.089394
Fit model with gsm
function:
library(npreg)
probs <- seq(0, 1, length.out = 9)
knots <- list(x = quantile(x, probs = probs), z = letters[1:3])
mod.gsm <- gsm(y ~ x * z, knots = knots)
mod.gsm
##
## Call:
## gsm(formula = y ~ x * z, knots = knots)
##
## Smoothing Parameter: spar = 0.7239238 lambda = 0.01012573
## Equivalent Degrees of Freedom (Df): 15.11121
## Penalized Criterion (deviance): 1056.335
## Generalized Cross-Validation (GCV): 1.088998
Compare results:
# rmse between truth and estimate
sqrt(mean((fx - mod.gam$linear.predictors)^2))
## [1] 0.1211713
sqrt(mean((fx - mod.gsm$linear.predictors)^2))
## [1] 0.1067063
Note: in this case, the gsm
function does a bit better at estimating the unknown true function.
Plot results:
# define new data (for predictions)
xseq <- seq(0, 1, length.out = 100)
newdata <- expand.grid(x = xseq, z = letters[1:3])
# get predictions
yhat.gam <- predict(mod.gam, newdata)
yhat.gsm <- predict(mod.gsm, newdata)
# set-up 1 x 2 subplots
par(mfrow = c(1,2))
# plot predictions (gam)
plot(xseq, yhat.gam[1:100], t = "l", lwd = 2, ylim = c(-4, 7),
xlab = "x", ylab = "Fitted Values", main = "gam")
lines(xseq, yhat.gam[1:100 + 100], lty = 2, lwd = 2, col = cols[2])
lines(xseq, yhat.gam[1:100 + 200], lty = 3, lwd = 2, col = cols[3])
legend("topleft", legend = paste0("z = ", letters[1:3]),
lty = 1:3, col = cols, lwd = 2, bty = "n")
# plot predictions (gsm)
plot(xseq, yhat.gsm[1:100], t = "l", lwd = 2, ylim = c(-4, 7),
xlab = "x", ylab = "Fitted Values", main = "gsm")
lines(xseq, yhat.gsm[1:100 + 100], lty = 2, lwd = 2, col = cols[2])
lines(xseq, yhat.gsm[1:100 + 200], lty = 3, lwd = 2, col = cols[3])
legend("topleft", legend = paste0("z = ", letters[1:3]),
lty = 1:3, col = cols, lwd = 2, bty = "n")
Binomial Example
Generate some binomial data:
set.seed(1)
y <- rbinom(n = n, size = 1, p = 1 / (1 + exp(-fx)))
Fit model with gam
function:
library(mgcv)
mod.gam <- gam(y ~ z + s(x, by = z), family = binomial)
mod.gam
##
## Family: binomial
## Link function: logit
##
## Formula:
## y ~ z + s(x, by = z)
##
## Estimated degrees of freedom:
## 3.06 2.81 2.42 total = 11.29
##
## UBRE score: -0.1454208
Fit model with gsm
function:
library(npreg)
probs <- seq(0, 1, length.out = 9)
knots <- list(x = quantile(x, probs = probs), z = letters[1:3])
mod.gsm <- gsm(y ~ x * z, family = binomial, knots = knots)
mod.gsm
##
## Call:
## gsm(formula = y ~ x * z, family = binomial, knots = knots)
##
## Smoothing Parameter: spar = 0.7053665 lambda = 0.007436275
## Equivalent Degrees of Freedom (Df): 7.440589
## Penalized Criterion (deviance): 844.2546
## Generalized Cross-Validation (GCV): 0.4335775
Compare results:
# rmse between truth and estimate
sqrt(mean((fx - mod.gam$linear.predictors)^2))
## [1] 0.6004284
sqrt(mean((fx - mod.gsm$linear.predictors)^2))
## [1] 0.3966958
Note: in this case, the gsm
function does noticeably better at estimating the unknown true function.
Plot results:
# define new data (for predictions)
xseq <- seq(0, 1, length.out = 100)
newdata <- expand.grid(x = xseq, z = letters[1:3])
# get predictions
yhat.gam <- predict(mod.gam, newdata)
yhat.gsm <- predict(mod.gsm, newdata)
# set-up 1 x 2 subplots
par(mfrow = c(1,2))
# plot predictions (gam)
plot(xseq, yhat.gam[1:100], t = "l", lwd = 2, ylim = c(-4, 7),
xlab = "x", ylab = "Linear Predictors", main = "gam")
lines(xseq, yhat.gam[1:100 + 100], lty = 2, lwd = 2, col = cols[2])
lines(xseq, yhat.gam[1:100 + 200], lty = 3, lwd = 2, col = cols[3])
legend("topleft", legend = paste0("z = ", letters[1:3]),
lty = 1:3, col = cols, lwd = 2, bty = "n")
# plot predictions (gsm)
plot(xseq, yhat.gsm[1:100], t = "l", lwd = 2, ylim = c(-4, 7),
xlab = "x", ylab = "Linear Predictors", main = "gsm")
lines(xseq, yhat.gsm[1:100 + 100], lty = 2, lwd = 2, col = cols[2])
lines(xseq, yhat.gsm[1:100 + 200], lty = 3, lwd = 2, col = cols[3])
legend("topleft", legend = paste0("z = ", letters[1:3]),
lty = 1:3, col = cols, lwd = 2, bty = "n")
Suppose we an independent sample of \(n\) observations \((x_{i1}, \ldots, x_{ip}, y_i) \sim F_{X,Y}\), and assume that \(y_i\) follows an exponential family distribution. A multiple and generalized nonparametric regression model assumes that \[ g(\mu_i) = \eta_i = f(x_{i1}, \ldots, x_{ip}) \] where \(\mu_i\) is the conditional mean of \(y_i\) given \(\mathbf{x}_i\), the function \(g(\cdot)\) is the known link function, the term \(\eta_i\) is referred to as the linear predictor, and \(f(\cdot)\) is the unknown smooth function. The goal is to estimate the unknown function \(f(\cdot)\) from the sample of data.
To estimate \(f(\cdot)\), a generalized smooth model (GSM) minimizes the penalized likelihood functional \[ f_\lambda = \min_{f \in \mathcal{H}} -\frac{1}{n} L(f) + \lambda J(f) \] where \(L(f)\) denotes the log-likelihood function of \(y_i\) given \(\mathbf{x}_i\), the penalty functional \(J(f) \geq 0\) quantifies the “roughness” of the function, and \(\lambda > 0\) is the smoothing parameter that controls the influence of the penalty.
The function space \(\mathcal{H} = \{f: J(f) < \infty \}\) is the space of functions with finite penalty. The nature of the penalty functional \(J(\cdot)\) will depend on the form of the model (i.e., additive or interactive effects) and the types of smoothers used for the predictors (e.g., continuous or nominal predictors).
The unknown function \(f \in \mathcal{H}\) could be assumed to contain nonparametric additive or interactive effects. For example, with \(p = 2\) predictors, we could assume either \[ \begin{split} \mbox{additive:}& \quad \eta_i = f_0 + f_1(x_{i1}) + f_2(x_{i2}) \\ \mbox{interaction:}& \quad \eta_i = f_0 + f_1(x_{i1}) + f_2(x_{i2}) + f_{12}(x_{i1}, x_{i2})\\ \end{split} \] where \(f_0 \in \mathcal{H}_0\) is a constant (intercept) term, \(f_1 \in \mathcal{H}_1\) is the main effect of the first predictor, \(f_2 \in \mathcal{H}_2\) is the main effect of the second predictor, and \(f_{12} \in \mathcal{H}_{12}\) is the interaction effect.
Using this notation, we can decompose the function space such as \[ \mathcal{H} = \mathcal{H}_0 \oplus \mathcal{H}_1 \oplus \mathcal{H}_2 \oplus \mathcal{H}_{12} \] where \(\oplus\) denotes the tensor summation. Note that the additive model corresponds to removing the subspace \(\mathcal{H}_{12}\) from the function space, i.e., \(\mathcal{H} = \mathcal{H}_0 \oplus \mathcal{H}_1 \oplus \mathcal{H}_2\) for the additive model.
The way smoothness is defined for each predictor will depend on the type of predictor (or class
of the variable in R lingo).
gam | gsm | |
---|---|---|
Nominal | NA | type = “nom” |
Ordinal | NA | type = “ord” |
Polynomial (linear) | NA | type = “lin” |
Polynomial (cubic) | bs = “cr” | type = “cub” |
Polynomial (quintic) | NA | type = “qui” |
Periodic (linear) | NA | type = “per.lin” |
Periodic (cubic) | bs = “cc” | type = “per” |
Periodic (quintic) | NA | type = “per.qui” |
Thin plate | bs = “tp” | type = “tps” |
Spherical | bs = “sos” | type = “sph” |
The gam
function also offers several smoother types that are not listed here (see ?smooth.terms
).
Note that the gam
function requires the user to specify the smoother types when specifying the model formula (or the default type of bs = “tp” is used). In contrast, the gsm
function will select a type that is reasonable for each predictor (based on the predictor’s class
.)
The defaults in the sm
and gsm
functions are:
factor or character \(\leftrightarrow\) type = “nom”
ordered \(\leftrightarrow\) type = “ord”
integer or numeric \(\leftrightarrow\) type = “cub”
matrix \(\leftrightarrow\) type = “tps”
The Prestige
dataset (from the car package) contains the prestige of \(n = 102\) Canadian occupations from 1971, as well as the average income and education level of the occupation. We will use a multiple nonparametric regression model to model presitge as a function of income and education.
First, let’s load the data and visualize the relationship between income (\(X_1\)), education (\(X_2\)), and prestige (\(Y\)).
# load data
library(car)
## Loading required package: carData
data(Prestige)
head(Prestige)
## education income women prestige census type
## gov.administrators 13.11 12351 11.16 68.8 1113 prof
## general.managers 12.26 25879 4.02 69.1 1130 prof
## accountants 12.77 9271 15.70 63.4 1171 prof
## purchasing.officers 11.42 8865 9.11 56.8 1175 prof
## chemists 14.62 8403 11.68 73.5 2111 prof
## physicists 15.64 11030 5.13 77.6 2113 prof
# 3d scatterplot
library(rgl)
plot3d(prestige ~ income * education, data = Prestige,
col = ifelse(type == "prof", "red", "black"))
rglwidget()
Note: the marginal relationship between prestige and income looks non-linear, whereas the marginal relationship between prestige and education looks mostly linear.
Model Fitting
R code to fit additive and interaction models with gam
function:
# additive model (using default number of knots for each predictor)
add.gam <- gam(prestige ~ s(income, bs = "cr") + s(education, bs = "cr"),
data = Prestige)
add.gam
##
## Family: gaussian
## Link function: identity
##
## Formula:
## prestige ~ s(income, bs = "cr") + s(education, bs = "cr")
##
## Estimated degrees of freedom:
## 3.00 3.15 total = 7.15
##
## GCV score: 51.98404
# interaction model (using default number of knots for each predictor)
int.gam <- gam(prestige ~ s(income, bs = "cr") + s(education, bs = "cr")
+ ti(income, education, bs = c("cr", "cr")), data = Prestige)
int.gam
##
## Family: gaussian
## Link function: identity
##
## Formula:
## prestige ~ s(income, bs = "cr") + s(education, bs = "cr") + ti(income,
## education, bs = c("cr", "cr"))
##
## Estimated degrees of freedom:
## 2.07 3.08 2.00 total = 8.16
##
## GCV score: 50.43405
Model Summary
R code to summarize the fit models:
# summarize additive model
summary(add.gam)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## prestige ~ s(income, bs = "cr") + s(education, bs = "cr")
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.8333 0.6884 68.03 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(income) 2.996 3.604 16.68 <2e-16 ***
## s(education) 3.154 3.913 39.22 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.837 Deviance explained = 84.7%
## GCV = 51.984 Scale est. = 48.34 n = 102
# summarize interaction model
summary(int.gam)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## prestige ~ s(income, bs = "cr") + s(education, bs = "cr") + ti(income,
## education, bs = c("cr", "cr"))
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.996 1.055 46.45 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(income) 2.074 2.543 20.252 2.11e-07 ***
## s(education) 3.083 3.875 41.372 < 2e-16 ***
## ti(income,education) 1.998 2.308 3.426 0.0299 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.843 Deviance explained = 85.4%
## GCV = 50.434 Scale est. = 46.402 n = 102
Model Comparison
Compare additive and interaction models:
# pseudo F test of interaction effect
sse.dif <- add.gam$deviance - int.gam$deviance
df.dif <- sum(int.gam$edf1) - sum(add.gam$edf1)
Fstat <- (sse.dif / df.dif) / int.gam$sig2
pvalue <- 1 - pf(Fstat, df1 = df.dif, df2 = nrow(Prestige) - sum(int.gam$edf))
c(Fstat, pvalue)
## [1] 4.10952931 0.03805945
# analysis of deviance table (gam)
anova(add.gam, int.gam, test = "F")
## Analysis of Deviance Table
##
## Model 1: prestige ~ s(income, bs = "cr") + s(education, bs = "cr")
## Model 2: prestige ~ s(income, bs = "cr") + s(education, bs = "cr") + ti(income,
## education, bs = c("cr", "cr"))
## Resid. Df Resid. Dev Df Deviance F Pr(>F)
## 1 93.483 4585.1
## 2 92.274 4354.5 1.2089 230.53 4.1095 0.03811 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Plot Results
Plot results for additive model:
# setup 1 x 2 subplots
par(mfrow = c(1,2))
# get predictor sequences
xrng <- list(income = range(Prestige$income),
education = range(Prestige$education))
inc.seq <- seq(xrng$income[1], xrng$income[2], length.out = 25)
edu.seq <- seq(xrng$education[1], xrng$education[2], length.out = 25)
# main effect of income
newdata <- data.frame(income = inc.seq, education = mean(Prestige$education))
yhat.inc <- predict(add.gam, newdata = newdata, se.fit = TRUE)
plotci(inc.seq, yhat.inc$fit, yhat.inc$se.fit,
xlab = "Income", ylab = "Prestige",
main = "Income Effect")
rug(Prestige$income)
# main effect of education
newdata <- data.frame(income = mean(Prestige$income), education = edu.seq)
yhat.edu <- predict(add.gam, newdata = newdata, se.fit = TRUE)
plotci(edu.seq, yhat.edu$fit, yhat.edu$se.fit,
xlab = "Education", ylab = "Prestige",
main = "Education Effect")
rug(Prestige$education)
Plot results for interaction model:
# setup 2 x 2 subplots
par(mfrow = c(2,2))
# quantiles of education
labs <- c("Low", "Low-Med", "Med-High", "High")
probs <- seq(0.2, 0.8, by = 0.2)
qvals <- unname(quantile(Prestige$education, probs = probs))
qvals
## [1] 8.128 9.708 11.172 13.620
# plot prestige ~ income at different education levels
for(j in 1:4){
newdata <- data.frame(income = inc.seq, education = qvals[j])
yhat <- predict(int.gam, newdata, se.fit = TRUE)
plotci(inc.seq, yhat$fit, yhat$se.fit, ylim = c(10, 110),
xlab = "Income", ylab = "Prestige",
main = paste0("Education = ", labs[j]))
}
Surface plots for additive and interaction models:
# get predictions (for visualization)
inc.seq <- seq(xrng$income[1], xrng$income[2], length.out = 25)
edu.seq <- seq(xrng$education[1], xrng$education[2], length.out = 25)
newdata <- expand.grid(income = inc.seq, education = edu.seq)
yhat.add <- predict(add.gam, newdata)
yhat.int <- predict(int.gam, newdata)
# plot results
library(lattice)
par(mfrow = c(1,2))
persp(x = inc.seq, y = edu.seq, z = matrix(yhat.add, 25, 25),
theta = -45, phi = 30, xlab = "Income", ylab = "Education",
zlab = "Prestige", main = "Additive", zlim = c(10, 90))
persp(x = inc.seq, y = edu.seq, z = matrix(yhat.int, 25, 25),
theta = -45, phi = 30, xlab = "Income", ylab = "Education",
zlab = "Prestige", main = "Interaction", zlim = c(10, 90))
Model Fitting
R code to fit additive and interaction models with gsm
function (using all \(n = 102\) data points as knots):
# additive model (using all knots)
add.gsm <- gsm(prestige ~ income + education, data = Prestige, knots = 102)
add.gsm
##
## Call:
## gsm(formula = prestige ~ income + education, data = Prestige,
## knots = 102)
##
## Smoothing Parameter: spar = 1 lambda = 0.9999995
## Equivalent Degrees of Freedom (Df): 7.531583
## Penalized Criterion (deviance): 4565.108
## Generalized Cross-Validation (GCV): 52.17687
# interaction model (using all knots)
int.gsm <- gsm(prestige ~ income * education, data = Prestige, knots = 102)
int.gsm
##
## Call:
## gsm(formula = prestige ~ income * education, data = Prestige,
## knots = 102)
##
## Smoothing Parameter: spar = 0.9631003 lambda = 0.5412649
## Equivalent Degrees of Freedom (Df): 8.221877
## Penalized Criterion (deviance): 4377.092
## Generalized Cross-Validation (GCV): 50.76715
Model Summary
R code to summarize the fit models:
# summarize additive model
summary(add.gsm)
##
## Call:
## gsm(formula = prestige ~ income + education, data = Prestige,
## knots = 102)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.2811 -4.6473 0.8965 3.7966 18.2977
##
## Approx. Signif. of Parametric Effects:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 55.25 1.731 31.928 0.000e+00 ***
## income 33.86 6.003 5.640 1.761e-07 ***
## education 33.82 4.386 7.711 1.251e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approx. Signif. of Nonparametric Effects:
## Df Sum Sq Mean Sq F value Pr(>F)
## s(income) 2.299 894.7 389.16 8.053 0.0003152 ***
## s(education) 2.452 361.5 147.46 3.052 0.0418193 *
## Residuals 94.468 4565.1 48.32
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Dispersion Parameter: 48.32, Deviance Explained: 0.8473
## Multiple R-squared: 0.8474, Adjusted R-squared: 0.8367
# summarize interaction model
summary(int.gsm)
##
## Call:
## gsm(formula = prestige ~ income * education, data = Prestige,
## knots = 102)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.8121 -4.5071 0.9162 4.2439 19.3760
##
## Approx. Signif. of Parametric Effects:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 61.19 2.575 23.766 0.000e+00 ***
## income 39.41 8.200 4.807 5.825e-06 ***
## education 24.12 8.553 2.820 5.854e-03 **
## income:education -62.59 28.195 -2.220 2.883e-02 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approx. Signif. of Nonparametric Effects:
## Df Sum Sq Mean Sq F value Pr(>F)
## s(income) 0.8165 45.74 56.02 1.200 0.276093
## s(education) 2.3423 621.39 265.29 5.684 0.002963 **
## s(income:education) 3.9008 352.39 90.34 1.935 0.112620
## Residuals 93.7781 4377.09 46.67
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Dispersion Parameter: 46.67, Deviance Explained: 0.8536
## Multiple R-squared: 0.8537, Adjusted R-squared: 0.8423
Model Comparison
Compare additive and interaction models:
# pseudo F test of interaction effect
sse.dif <- add.gsm$deviance - int.gsm$deviance
df.dif <- int.gsm$df - add.gsm$df
Fstat <- (sse.dif / df.dif) / int.gsm$dispersion
pvalue <- 1 - pf(Fstat, df1 = df.dif, df2 = nrow(Prestige) - int.gsm$df)
c(Fstat, pvalue)
## [1] 5.83548907 0.02811473
Plot Results
Plot results for additive model:
# setup 1 x 2 subplots
par(mfrow = c(1,2))
# get predictor sequences
xrng <- add.gsm$specs$xrng
inc.seq <- seq(xrng$income[1], xrng$income[2], length.out = 25)
edu.seq <- seq(xrng$education[1], xrng$education[2], length.out = 25)
# main effect of income
yhat.inc <- predict(add.gsm, newdata = data.frame(income = inc.seq),
se.fit = TRUE, terms = "income")
plotci(inc.seq, yhat.inc$fit, yhat.inc$se.fit,
xlab = "Income", ylab = "Prestige",
main = "Income Effect")
rug(Prestige$income)
# main effect of education
yhat.edu <- predict(add.gsm, newdata = data.frame(education = edu.seq),
se.fit = TRUE, terms = "education")
plotci(edu.seq, yhat.edu$fit, yhat.edu$se.fit,
xlab = "Education", ylab = "Prestige",
main = "Education Effect")
rug(Prestige$education)
Plot results for interaction model:
# setup 2 x 2 subplots
par(mfrow = c(2,2))
# quantiles of education
labs <- c("Low", "Low-Med", "Med-High", "High")
probs <- seq(0.2, 0.8, by = 0.2)
qvals <- unname(quantile(Prestige$education, probs = probs))
qvals
## [1] 8.128 9.708 11.172 13.620
# plot prestige ~ income at different education levels
for(j in 1:4){
newdata <- data.frame(income = inc.seq, education = qvals[j])
yhat <- predict(int.gsm, newdata, se.fit = TRUE)
plotci(inc.seq, yhat$fit, yhat$se.fit, ylim = c(10, 110),
xlab = "Income", ylab = "Prestige",
main = paste0("Education = ", labs[j]))
}
Surface plots for additive and interaction models:
# get predictions (for visualization)
xrng <- add.gsm$specs$xrng
inc.seq <- seq(xrng$income[1], xrng$income[2], length.out = 25)
edu.seq <- seq(xrng$education[1], xrng$education[2], length.out = 25)
newdata <- expand.grid(income = inc.seq, education = edu.seq)
yhat.add <- predict(add.gsm, newdata)
yhat.int <- predict(int.gsm, newdata)
# plot results
library(lattice)
par(mfrow = c(1,2))
persp(x = inc.seq, y = edu.seq, z = matrix(yhat.add, 25, 25),
theta = -45, phi = 30, xlab = "Income", ylab = "Education",
zlab = "Prestige", main = "Additive", zlim = c(10, 90))
persp(x = inc.seq, y = edu.seq, z = matrix(yhat.int, 25, 25),
theta = -45, phi = 30, xlab = "Income", ylab = "Education",
zlab = "Prestige", main = "Interaction", zlim = c(10, 90))
Model Fitting
R code to fit additive and interaction models with sm
function (using all \(n = 102\) data points as knots):
# additive model (using all knots)
add.sm <- sm(prestige ~ income + education, data = Prestige, knots = 102)
add.sm
##
## Call:
## sm(formula = prestige ~ income + education, data = Prestige,
## knots = 102)
##
## Smoothing Parameter: spar = 1.007553 lambda = 1.133891
## Equivalent Degrees of Freedom (Df): 7.329865
## Penalized Criterion (RSS): 4583.735
## Generalized Cross-Validation (GCV): 52.16674
# interaction model (using all knots)
int.sm <- sm(prestige ~ income * education, data = Prestige, knots = 102)
int.sm
##
## Call:
## sm(formula = prestige ~ income * education, data = Prestige,
## knots = 102)
##
## Smoothing Parameter: spar = 0.9631003 lambda = 0.541265
## Equivalent Degrees of Freedom (Df): 8.221877
## Penalized Criterion (RSS): 4377.092
## Generalized Cross-Validation (GCV): 50.76715
Model Summary
R code to summarize the fit models:
# summarize additive model
summary(add.sm)
##
## Call:
## sm(formula = prestige ~ income + education, data = Prestige,
## knots = 102)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.2685 -4.5381 0.9389 3.8369 18.2640
##
## Approx. Signif. of Parametric Effects:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 55.30 1.714 32.262 0.000e+00 ***
## income 33.67 5.965 5.645 1.719e-07 ***
## education 33.95 4.322 7.856 6.172e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approx. Signif. of Nonparametric Effects:
## Df Sum Sq Mean Sq F value Pr(>F)
## s(income) 2.191 880.0 401.62 8.295 0.0003157 ***
## s(education) 2.338 348.3 148.96 3.076 0.0430021 *
## Residuals 94.670 4583.7 48.42
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.958 on 94.7 degrees of freedom
## Multiple R-squared: 0.8467, Adjusted R-squared: 0.8364
## F-statistic: 81.85 on 6.33 and 94.67 DF, p-value: <2e-16
# summarize interaction model
summary(int.sm)
##
## Call:
## sm(formula = prestige ~ income * education, data = Prestige,
## knots = 102)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.8121 -4.5071 0.9162 4.2439 19.3760
##
## Approx. Signif. of Parametric Effects:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 61.19 2.575 23.766 0.000e+00 ***
## income 39.41 8.200 4.807 5.825e-06 ***
## education 24.12 8.553 2.820 5.854e-03 **
## income:education -62.59 28.195 -2.220 2.883e-02 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approx. Signif. of Nonparametric Effects:
## Df Sum Sq Mean Sq F value Pr(>F)
## s(income) 0.8165 45.74 56.02 1.200 0.276093
## s(education) 2.3423 621.39 265.29 5.684 0.002963 **
## s(income:education) 3.9008 352.39 90.34 1.935 0.112620
## Residuals 93.7781 4377.09 46.67
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.832 on 93.8 degrees of freedom
## Multiple R-squared: 0.8537, Adjusted R-squared: 0.8423
## F-statistic: 74.98 on 7.222 and 93.78 DF, p-value: <2e-16
Model Comparison
Compare additive and interaction models:
# pseudo F test of interaction effect
sse.dif <- add.sm$sse - int.sm$sse
df.dif <- int.sm$df - add.sm$df
Fstat <- (sse.dif / df.dif) / int.sm$sigma^2
pvalue <- 1 - pf(Fstat, df1 = df.dif, df2 = nrow(Prestige) - int.sm$df)
c(Fstat, pvalue)
## [1] 4.96324831 0.03209404
Plot Results
Plot results for additive model:
# setup 1 x 2 subplots
par(mfrow = c(1,2))
# get predictor sequences
xrng <- add.sm$specs$xrng
inc.seq <- seq(xrng$income[1], xrng$income[2], length.out = 25)
edu.seq <- seq(xrng$education[1], xrng$education[2], length.out = 25)
# main effect of income
yhat.inc <- predict(add.sm, newdata = data.frame(income = inc.seq),
se.fit = TRUE, terms = "income")
plotci(inc.seq, yhat.inc$fit, yhat.inc$se.fit,
xlab = "Income", ylab = "Prestige",
main = "Income Effect")
rug(Prestige$income)
# main effect of education
yhat.edu <- predict(add.sm, newdata = data.frame(education = edu.seq),
se.fit = TRUE, terms = "education")
plotci(edu.seq, yhat.edu$fit, yhat.edu$se.fit,
xlab = "Education", ylab = "Prestige",
main = "Education Effect")
rug(Prestige$education)
Plot results for interaction model:
# setup 2 x 2 subplots
par(mfrow = c(2,2))
# quantiles of education
labs <- c("Low", "Low-Med", "Med-High", "High")
probs <- seq(0.2, 0.8, by = 0.2)
qvals <- unname(quantile(Prestige$education, probs = probs))
qvals
## [1] 8.128 9.708 11.172 13.620
# plot prestige ~ income at different education levels
for(j in 1:4){
newdata <- data.frame(income = inc.seq, education = qvals[j])
yhat <- predict(int.sm, newdata, se.fit = TRUE)
plotci(inc.seq, yhat$fit, yhat$se.fit, ylim = c(10, 110),
xlab = "Income", ylab = "Prestige",
main = paste0("Education = ", labs[j]))
}
Surface plots for additive and interaction models:
# get predictions (for visualization)
xrng <- add.sm$specs$xrng
inc.seq <- seq(xrng$income[1], xrng$income[2], length.out = 25)
edu.seq <- seq(xrng$education[1], xrng$education[2], length.out = 25)
newdata <- expand.grid(income = inc.seq, education = edu.seq)
yhat.add <- predict(add.sm, newdata)
yhat.int <- predict(int.sm, newdata)
# plot results
library(lattice)
par(mfrow = c(1,2))
persp(x = inc.seq, y = edu.seq, z = matrix(yhat.add, 25, 25),
theta = -45, phi = 30, xlab = "Income", ylab = "Education",
zlab = "Prestige", main = "Additive", zlim = c(10, 90))
persp(x = inc.seq, y = edu.seq, z = matrix(yhat.int, 25, 25),
theta = -45, phi = 30, xlab = "Income", ylab = "Education",
zlab = "Prestige", main = "Interaction", zlim = c(10, 90))
The TitanicSurvival
dataset (from the car package) contains survival and demographic information for \(n = 1309\) passengers on the Titanic. We will use a nonparametric logistic regression model to model the probability of survial as a function of age and sex.
First, let’s load the data and visualize the relationship between age (\(X_1\)), sex (\(X_2\)), and survival (\(Y\)).
# load data
library(car)
data(TitanicSurvival)
head(TitanicSurvival)
## survived sex age passengerClass
## Allen, Miss. Elisabeth Walton yes female 29.0000 1st
## Allison, Master. Hudson Trevor yes male 0.9167 1st
## Allison, Miss. Helen Loraine no female 2.0000 1st
## Allison, Mr. Hudson Joshua Crei no male 30.0000 1st
## Allison, Mrs. Hudson J C (Bessi no female 25.0000 1st
## Anderson, Mr. Harry yes male 48.0000 1st
# plot survived ~ age * sex
y <- ifelse(TitanicSurvival$survived == "yes", 1, 0)
plot(TitanicSurvival$age, y,
xlab = "Age (Years)", ylab = "Survived (1 = Yes)",
pch = ifelse(TitanicSurvival$sex == "male", 16, 17),
col = ifelse(TitanicSurvival$sex == "male", "blue", "red"))
legend("right", c("female", "male"), pch = c(17, 16),
col = c("red", "blue"), bty = "n")
Model Fitting
R code to fit additive and interaction models with gam
function:
# additive model
add.gam <- gam(y ~ sex + s(age, bs = "cr"),
family = binomial, data = TitanicSurvival)
add.gam
##
## Family: binomial
## Link function: logit
##
## Formula:
## y ~ sex + s(age, bs = "cr")
##
## Estimated degrees of freedom:
## 6.91 total = 8.91
##
## UBRE score: 0.04974017
# interaction model
int.gam <- gam(y ~ sex + s(age, bs = "cr", by = sex),
family = binomial, data = TitanicSurvival)
int.gam
##
## Family: binomial
## Link function: logit
##
## Formula:
## y ~ sex + s(age, bs = "cr", by = sex)
##
## Estimated degrees of freedom:
## 1.00 6.98 total = 9.98
##
## UBRE score: 0.02615761
Model Summary
R code to summarize the fit models:
# summarize additive model
summary(add.gam)
##
## Family: binomial
## Link function: logit
##
## Formula:
## y ~ sex + s(age, bs = "cr")
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.1228 0.1191 9.425 <2e-16 ***
## sexmale -2.4907 0.1549 -16.075 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(age) 6.911 7.976 18.38 0.016 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.302 Deviance explained = 23.6%
## UBRE = 0.04974 Scale est. = 1 n = 1046
# summarize interaction model
summary(int.gam)
##
## Family: binomial
## Link function: logit
##
## Formula:
## y ~ sex + s(age, bs = "cr", by = sex)
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.1662 0.1226 9.515 <2e-16 ***
## sexmale -2.5622 0.1590 -16.112 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(age):sexfemale 1.000 1.001 6.958 0.00835 **
## s(age):sexmale 6.979 8.030 36.329 8.79e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.32 Deviance explained = 25.5%
## UBRE = 0.026158 Scale est. = 1 n = 1046
Model Comparison
Compare additive and interaction models:
# pseudo Chisq test of interaction effect
sse.dif <- add.gam$deviance - int.gam$deviance
df.dif <- sum(int.gam$edf1) - sum(add.gam$edf1)
pvalue <- 1 - pchisq(sse.dif, df = df.dif)
c(sse.dif, pvalue)
## [1] 2.680526e+01 2.547716e-07
# analysis of deviance table (gam)
anova(add.gam, int.gam, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: y ~ sex + s(age, bs = "cr")
## Model 2: y ~ sex + s(age, bs = "cr", by = sex)
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 1036 1080.2
## 2 1035 1053.4 1.0544 26.805 2.548e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Plot Results
Plot results for additive and interaction models:
# get predictions (for visualization)
xrng <- list(age = range(TitanicSurvival$age, na.rm = TRUE))
age.seq <- seq(xrng$age[1], xrng$age[2], length.out = 25)
sex.seq <- c("female", "male")
newdata <- expand.grid(age = age.seq, sex = sex.seq)
yhat.add <- predict(add.gam, newdata, type = "response")
yhat.int <- predict(int.gam, newdata, type = "response")
# additive model
par(mfrow = c(1,2))
plot(age.seq, yhat.add[1:25], ylim = c(0, 1), t = "l", col = "red",
xlab = "Age (Years)", ylab = "Probability of Survival",
main = "Additive Model")
lines(age.seq, yhat.add[1:25 + 25], lty = 2, col = "blue")
legend("bottomleft", legend = c("female", "male"), lty = 1:2,
col = c("red", "blue"), horiz = TRUE, bty = "n")
# interaction model
plot(age.seq, yhat.int[1:25], ylim = c(0, 1), t = "l", col = "red",
xlab = "Age (Years)", ylab = "Probability of Survival",
main = "Interaction Model")
lines(age.seq, yhat.int[1:25 + 25], lty = 2, col = "blue")
legend("bottomleft", legend = c("female", "male"), lty = 1:2,
col = c("red", "blue"), horiz = TRUE, bty = "n")
Model Fitting
R code to fit additive and interaction models with gsm
function:
# select knots (age quantiles by sex)
probs <- seq(0, 1, length.out = 9)
knots <- with(TitanicSurvival, tapply(age, sex, quantile, probs = probs, na.rm = T))
knots <- list(age = unlist(knots), sex = rep(c("female", "male"), each = 9))
#knots <- list(age = quantile(TitanicSurvival$age, probs = probs, na.rm = TRUE),
# sex = c("female", "male"))
# additive model
add.gsm <- gsm(y ~ age + sex, family = binomial,
data = TitanicSurvival, knots = knots)
add.gsm
##
## Call:
## gsm(formula = y ~ age + sex, family = binomial, data = TitanicSurvival,
## knots = knots)
##
## Smoothing Parameter: spar = 0.6554119 lambda = 0.003239273
## Equivalent Degrees of Freedom (Df): 9.723163
## Penalized Criterion (deviance): 1078.102
## Generalized Cross-Validation (GCV): 0.5245551
# interaction model
int.gsm <- gsm(y ~ age * sex, family = binomial,
data = TitanicSurvival, knots = knots)
int.gsm
##
## Call:
## gsm(formula = y ~ age * sex, family = binomial, data = TitanicSurvival,
## knots = knots)
##
## Smoothing Parameter: spar = 0.6088316 lambda = 0.001492511
## Equivalent Degrees of Freedom (Df): 9.023688
## Penalized Criterion (deviance): 1062.67
## Generalized Cross-Validation (GCV): 0.5165595
Model Summary
R code to summarize the fit models:
# summarize additive model
summary(add.gsm)
##
## Call:
## gsm(formula = y ~ age + sex, family = binomial, data = TitanicSurvival,
## knots = knots)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1924 -0.6839 -0.6077 0.7891 1.9287
##
## Approx. Signif. of Parametric Effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.07714 0.1608 -0.47966 0.6315
## age -0.14833 1.4976 -0.09904 0.9211
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approx. Signif. of Nonparametric Effects:
## Df Chi Sq p.value
## s(age) 6.755 15.93 0.02239 *
## s(sex) 1.006 250.36 0.00000 ***
## Residuals 1036.277 1078.10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Dispersion Parameter: 1, Deviance Explained: 0.2379
## Multiple R-squared: 0.309, Adjusted R-squared: 0.3031
# summarize interaction model
summary(int.gsm)
##
## Call:
## gsm(formula = y ~ age * sex, family = binomial, data = TitanicSurvival,
## knots = knots)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9217 -0.6483 -0.6253 0.7849 1.9419
##
## Approx. Signif. of Parametric Effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.03578 0.1626 -0.2201 0.8258
## age -0.66317 1.0636 -0.6235 0.5329
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approx. Signif. of Nonparametric Effects:
## Df Chi Sq p.value
## s(age) 2.908 6.434 0.086450 .
## s(sex) 3.426 249.238 0.000000 ***
## s(age:sex) 5.907 19.488 0.003188 **
## Residuals 1036.976 1062.670
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Dispersion Parameter: 1, Deviance Explained: 0.2488
## Multiple R-squared: 0.3203, Adjusted R-squared: 0.3149
Model Comparison
Compare additive and interaction models:
# pseudo Chisq test of interaction effect
sse.dif <- add.gsm$deviance - int.gsm$deviance
df.dif <- int.gsm$df - add.gsm$df
pvalue <- 1 - pchisq(sse.dif, df = df.dif)
## Warning in pchisq(sse.dif, df = df.dif): NaNs produced
c(sse.dif, pvalue)
## [1] 15.43197 NaN
# df.dif is negative, so set df.dif = 1
pvalue <- 1 - pchisq(sse.dif, df = 1)
c(sse.dif, pvalue)
## [1] 1.543197e+01 8.552917e-05
Plot Results
Plot results for additive and interaction models:
# get predictions (for visualization)
xrng <- add.gsm$specs$xrng
age.seq <- seq(xrng$age[1], xrng$age[2], length.out = 25)
sex.seq <- c("female", "male")
newdata <- expand.grid(age = age.seq, sex = sex.seq)
yhat.add <- predict(add.gsm, newdata, type = "response")
yhat.int <- predict(int.gsm, newdata, type = "response")
# additive model
par(mfrow = c(1,2))
plot(age.seq, yhat.add[1:25], ylim = c(0, 1), t = "l", col = "red",
xlab = "Age (Years)", ylab = "Probability of Survival",
main = "Additive Model")
lines(age.seq, yhat.add[1:25 + 25], lty = 2, col = "blue")
legend("bottomleft", legend = c("female", "male"), lty = 1:2,
col = c("red", "blue"), horiz = TRUE, bty = "n")
# interaction model
plot(age.seq, yhat.int[1:25], ylim = c(0, 1), t = "l", col = "red",
xlab = "Age (Years)", ylab = "Probability of Survival",
main = "Interaction Model")
lines(age.seq, yhat.int[1:25 + 25], lty = 2, col = "blue")
legend("bottomleft", legend = c("female", "male"), lty = 1:2,
col = c("red", "blue"), horiz = TRUE, bty = "n")