Chapter 46 Repeated Measures
46.1 Emulsions
More emulsion data. Three emulsifiers: milk protein concentrate (MPC), modified milk protein concentrate (mMPC), and sodium caseinate (NC). An emulsion is made from each of the emulsifiers and the turbidity is measured at 0, 1, 2, 3, 4, 5, 6, 12, 24, 36, and 48 hours. A good emulsifier should keep the turbidity high. The experiment is then repeated two more times.
This is a repeated measures design, because we cannot randomize time (although we do randomize emulsifiers to emulsions). Note
- We also create a quantitative version of hours as well as the factor version.
- The “subject” in this repeated measures design (analogous to a whole plot) is formed by the block by protein combinations.
> library(cfcdae)
Registered S3 method overwritten by 'DoE.base':
method from
factorize.factor conf.design
> library(lme4)
Loading required package: Matrix
> library(nlme)
Attaching package: 'nlme'
The following object is masked from 'package:lme4':
lmList
> library(car)
Loading required package: carData
> library(conf.design)
Attaching package: 'conf.design'
The following object is masked from 'package:lme4':
factorize
> library(pbkrtest)
> emulsions <- read.table("emul2.dat.txt",header=TRUE)
> emulsions <- within(emulsions,hrs <- c(0,1,2,3,4,5,6,12,24,36,48)[time])
> emulsions <- within(emulsions,{protein<-factor(protein);
+ time<-factor(time);block<-factor(block)})
> emulsions <- within(emulsions,subject <- join(protein,block))
> emulsions
y protein time block hrs subject
1 1.035 1 1 1 0 1:1
2 1.051 1 1 2 0 1:2
3 1.054 1 1 3 0 1:3
4 1.047 2 1 1 0 2:1
5 1.067 2 1 2 0 2:2
6 1.087 2 1 3 0 2:3
7 1.036 3 1 1 0 3:1
8 1.078 3 1 2 0 3:2
9 1.083 3 1 3 0 3:3
10 1.032 1 2 1 1 1:1
11 1.044 1 2 2 1 1:2
12 1.016 1 2 3 1 1:3
13 1.091 2 2 1 1 2:1
14 1.067 2 2 2 1 2:2
15 1.058 2 2 3 1 2:3
16 1.087 3 2 1 1 3:1
17 1.071 3 2 2 1 3:2
18 1.079 3 2 3 1 3:3
19 0.978 1 3 1 2 1:1
20 1.016 1 3 2 2 1:2
21 0.978 1 3 3 2 1:3
22 1.085 2 3 1 2 2:1
23 1.072 2 3 2 2 2:2
24 1.060 2 3 3 2 2:3
25 1.080 3 3 1 2 3:1
26 1.066 3 3 2 2 3:2
27 1.065 3 3 3 2 3:3
28 0.927 1 4 1 3 1:1
29 0.998 1 4 2 3 1:2
30 0.974 1 4 3 3 1:3
31 1.074 2 4 1 3 2:1
32 1.071 2 4 2 3 2:2
33 1.060 2 4 3 3 2:3
34 1.028 3 4 1 3 3:1
35 1.066 3 4 2 3 3:2
36 1.062 3 4 3 3 3:3
37 0.925 1 5 1 4 1:1
38 0.960 1 5 2 4 1:2
39 0.938 1 5 3 4 1:3
40 1.067 2 5 1 4 2:1
41 1.058 2 5 2 4 2:2
42 1.048 2 5 3 4 2:3
43 1.075 3 5 1 4 3:1
44 1.062 3 5 2 4 3:2
45 1.062 3 5 3 4 3:3
46 0.886 1 6 1 5 1:1
47 0.923 1 6 2 5 1:2
48 0.904 1 6 3 5 1:3
49 1.067 2 6 1 5 2:1
50 1.048 2 6 2 5 2:2
51 1.049 2 6 3 5 2:3
52 1.072 3 6 1 5 3:1
53 1.072 3 6 2 5 3:2
54 1.063 3 6 3 5 3:3
55 0.868 1 7 1 6 1:1
56 0.899 1 7 2 6 1:2
57 0.890 1 7 3 6 1:3
58 1.067 2 7 1 6 2:1
59 1.046 2 7 2 6 2:2
60 1.043 2 7 3 6 2:3
61 1.076 3 7 1 6 3:1
62 1.054 3 7 2 6 3:2
63 1.066 3 7 3 6 3:3
64 0.794 1 8 1 12 1:1
65 0.768 1 8 2 12 1:2
66 0.792 1 8 3 12 1:3
67 1.064 2 8 1 12 2:1
68 1.038 2 8 2 12 2:2
69 1.038 2 8 3 12 2:3
70 1.062 3 8 1 12 3:1
71 1.050 3 8 2 12 3:2
72 1.040 3 8 3 12 3:3
73 0.750 1 9 1 24 1:1
74 0.726 1 9 2 24 1:2
75 0.768 1 9 3 24 1:3
76 1.079 2 9 1 24 2:1
77 1.007 2 9 2 24 2:2
78 0.996 2 9 3 24 2:3
79 1.057 3 9 1 24 3:1
80 1.026 3 9 2 24 3:2
81 0.989 3 9 3 24 3:3
82 0.740 1 10 1 36 1:1
83 0.731 1 10 2 36 1:2
84 0.768 1 10 3 36 1:3
85 1.036 2 10 1 36 2:1
86 0.935 2 10 2 36 2:2
87 0.917 2 10 3 36 2:3
88 1.023 3 10 1 36 3:1
89 0.938 3 10 2 36 3:2
90 0.956 3 10 3 36 3:3
91 0.720 1 11 1 48 1:1
92 0.698 1 11 2 48 1:2
93 0.738 1 11 3 48 1:3
94 0.900 2 11 1 48 2:1
95 0.924 2 11 2 48 2:2
96 0.891 2 11 3 48 2:3
97 0.925 3 11 1 48 3:1
98 0.924 3 11 2 48 3:2
99 0.934 3 11 3 48 3:3
46.2 Univariate (classical) analysis
The
basic univariate analysis looks just like a split plot.
It works fine if nature is being kind. We could fit this
model using either lme()
or lmer()
, but we’ll fit
both because we’ll need both later.
> emulsions.lme <- lme(y~protein*time,random=~1|block/subject,data=emulsions)
> emulsions.lmer <- lmer(y~protein*time+(1|block)+(1|subject),data=emulsions)
boundary (singular) fit: see help('isSingular')
Constant variance doesn’t look too bad:
> plot(emulsions.lme)

Figure 46.1: Residual plot of emulsions data
Normality is a bit more suspect:
> qqnorm(residuals(emulsions.lme))

Figure 46.2: NPP of residuals in emulsions data
Both main effects and the interaction are highly significant.
(Note that we can often use plain old anova()
after lme()
.)
> anova(emulsions.lme)
numDF denDF F-value p-value
(Intercept) 1 60 74591.40 <.0001
protein 2 4 191.59 1e-04
time 10 60 99.09 <.0001
protein:time 20 60 16.97 <.0001
Let’s take a look at that interaction. Protein 1 is lousy, and there is little difference between 2 and 3, which fail much more slowly than 1. But fail they all surely do.
> with(emulsions,interactplot(time,protein,y))

Figure 46.3: Interaction plot for emulsions data
time
is a factor, but the actual quantities the levels
of time
represents are not equally spaced. Thus the
first interaction plot does not reproduce the functional
effect of time. To get that, we need the horizontal locations
to reflect the actual values of time. Those values are found
in the hrs
variable, and we feed them into
interactplot()
via the at=
argument.
In this new interaction plot, we see that turbidity is dropping more or less linearly in time for proteins 2 and 3. For protein 1, it drops very quickly, and then very slowly after 8 hours. (This is kind of the opposite of how Hemmingway said he went bankrupt, “Gradually, and then suddenly.”)
> with(emulsions,interactplot(time,protein,y,at=sort(unique(hrs))))

Figure 46.4: Interaction plot for emulsions data with quantitative levels of time
46.2.1 Functional modeling
Time is quantitative, and we might wish to create a functional representation of how the turbidity changes. Typically, we use polynomials. However, the behavior we see for protein 1 (dropping quickly, then changing very slowly) is not a shape that polynomials can fit very well, at least not without using many powers.
Let’s instead fit linear splines with knot at 8 hours. This
means our function will be linear up to 8, and linear after 8,
but the slope could be different before and after 8. We
represent this with two variables. The first is the standard
“linear in time” represented by hrs
. The second subtracts 8
from hrs
and then sets negative values to 0. This will
be 0 before 8, and linearly increasing after 8.
> hrs8 <- emulsions$hrs-8
> hrs8[hrs8 < 0] <- 0
Now we just use hrs
and hrs8
together in the model.
Again, for illustrative purposes, we fit with both
lme()
and lmer()
.
> emulsions.lme.hrs <- lme(y~protein*(hrs+hrs8),random=~1|block/subject,data=emulsions)
> emulsions.lmer.hrs <- lmer(y~protein*(hrs+hrs8)+(1|block)+(1|subject),data=emulsions)
boundary (singular) fit: see help('isSingular')
If we try to compare emulsions.lme
and emulsions.lme.hrs
using anova()
, it produces output based on AIC and the REML
likelihood. We know these don’t work, because the two models
differ in their fixed effects; anova()
has the courtesy to
warn us.
> anova(emulsions.lme.hrs,emulsions.lme)
Warning in anova.lme(emulsions.lme.hrs, emulsions.lme): fitted objects with
different fixed effects. REML comparisons are not meaningful.
Model df AIC BIC logLik Test L.Ratio p-value
emulsions.lme.hrs 1 12 -356.6913 -326.6936 190.3456
emulsions.lme 2 36 -169.3336 -90.5060 120.6668 1 vs 2 139.3577 <.0001
If we use lmer()
model fits we can use KRmodcomp()
to
do the comparison. We see that the linear spline model
fits well, and the additional 24 fixed effects degrees
of freedom in the full model are not needed.
> KRmodcomp(emulsions.lmer,emulsions.lmer.hrs)
large : y ~ protein * time + (1 | block) + (1 | subject)
small : y ~ protein * (hrs + hrs8) + (1 | block) + (1 | subject)
stat ndf ddf F.scaling p.value
Ftest 0.7796 24.0000 60.0000 1 0.7459
Look at the summary, and we see something interesting: the protein effects are not significant. (Recall that protein was highly significant in the non-functional model.) In this model, protein effects represent different intercepts (values at time 0) for the different proteins. Evidently, all the emulsions begin with the same time 0 turbidity, so we don’t need separate intercepts.
> summary(emulsions.lmer.hrs)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ protein * (hrs + hrs8) + (1 | block) + (1 | subject)
Data: emulsions
REML criterion at convergence: -380.7
Scaled residuals:
Min 1Q Median 3Q Max
-2.23531 -0.52813 0.00651 0.40146 3.15729
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 8.006e-05 8.948e-03
block (Intercept) 3.501e-12 1.871e-06
Residual 4.135e-04 2.034e-02
Number of obs: 99, groups: subject, 9; block, 3
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.0647594 0.0051582 206.422
protein1 -0.0060711 0.0072947 -0.832
protein2 0.0030366 0.0072947 0.416
hrs -0.0108661 0.0009448 -11.501
hrs8 0.0077663 0.0010912 7.117
protein1:hrs -0.0208494 0.0013362 -15.604
protein2:hrs 0.0101060 0.0013362 7.564
protein1:hrs8 0.0216577 0.0015431 14.035
protein2:hrs8 -0.0106403 0.0015431 -6.895
Correlation of Fixed Effects:
(Intr) protn1 protn2 hrs hrs8 prtn1: prtn2: prt1:8
protein1 0.000
protein2 0.000 -0.500
hrs -0.672 0.000 0.000
hrs8 0.632 0.000 0.000 -0.990
proten1:hrs 0.000 -0.672 0.336 0.000 0.000
proten2:hrs 0.000 0.336 -0.672 0.000 0.000 -0.500
protn1:hrs8 0.000 0.632 -0.316 0.000 0.000 -0.990 0.495
protn2:hrs8 0.000 -0.316 0.632 0.000 0.000 0.495 -0.990 -0.500
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
Just to push this to its conclusion, let’s fit two more models.
The first uses hrs
and hrs8
separately for each protein but removes the individual intercepts for each protein.
Now the final model. This includes a common overall intercept,
hrs
and hrs8
terms for protein 1, and a common coefficient
of hrs
for proteins 2 and 3.
This is the model that encapsulates what the interaction plot
looks like. Is it good enough?
> emulsions.lmer.hrs.int <- lmer(y~protein:(hrs+hrs8)+(1|block)+(1|subject),data=emulsions)
boundary (singular) fit: see help('isSingular')
> prot1 <- with(emulsions,(protein==1)*1) # indicates protein 1
> prot23 <- 1-prot1 # indicates proteins 2 and 3
> emulsions.lmer.hrs.min <- lmer(y~prot1:hrs+prot1:hrs8+prot23:hrs+(1|block)+(1|subject),data=emulsions)
boundary (singular) fit: see help('isSingular')
Compare the model with the common intercept to the full model and the model with separate intercepts. No evidence we need a more complicated model.
> KRmodcomp(emulsions.lmer,emulsions.lmer.hrs.int)
large : y ~ protein * time + (1 | block) + (1 | subject)
small : y ~ protein:(hrs + hrs8) + (1 | block) + (1 | subject)
stat ndf ddf F.scaling p.value
Ftest 0.7428 26.0000 54.3461 0.99651 0.7938
> KRmodcomp(emulsions.lmer.hrs,emulsions.lmer.hrs.int)
large : y ~ protein * (hrs + hrs8) + (1 | block) + (1 | subject)
small : y ~ protein:(hrs + hrs8) + (1 | block) + (1 | subject)
stat ndf ddf F.scaling p.value
Ftest 0.3463 2.0000 15.7643 1 0.7125
Now look at the model with common intercept, a single slope for proteins 2 and 3, and a linear spline for protein 1. There is little evidence to suggest that anything more complex than this minimal model is needed.
> KRmodcomp(emulsions.lmer,emulsions.lmer.hrs.min)
large : y ~ protein * time + (1 | block) + (1 | subject)
small : y ~ prot1:hrs + prot1:hrs8 + prot23:hrs + (1 | block) + (1 |
subject)
stat ndf ddf F.scaling p.value
Ftest 0.9526 29.0000 48.2109 0.98721 0.5466
> KRmodcomp(emulsions.lmer.hrs,emulsions.lmer.hrs.min)
large : y ~ protein * (hrs + hrs8) + (1 | block) + (1 | subject)
small : y ~ prot1:hrs + prot1:hrs8 + prot23:hrs + (1 | block) + (1 |
subject)
stat ndf ddf F.scaling p.value
Ftest 1.8361 5.0000 21.2762 0.94116 0.1487
> KRmodcomp(emulsions.lmer.hrs.int,emulsions.lmer.hrs.min)
large : y ~ protein:(hrs + hrs8) + (1 | block) + (1 | subject)
small : y ~ prot1:hrs + prot1:hrs8 + prot23:hrs + (1 | block) + (1 |
subject)
stat ndf ddf F.scaling p.value
Ftest 2.9197 3.0000 45.4460 0.98442 0.04407 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now look at the slopes. For proteins 2 and 3, the slope is -.003 with an SE of .00016. For protein 1, the initial slope is -.334, and after 8 hours the slope is -.033411+.031265 = -.00215 (the SE for this is about .00037).
> summary(emulsions.lmer.hrs.min)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ prot1:hrs + prot1:hrs8 + prot23:hrs + (1 | block) + (1 |
subject)
Data: emulsions
REML criterion at convergence: -428.2
Scaled residuals:
Min 1Q Median 3Q Max
-2.2032 -0.3199 0.0004 0.4237 3.1639
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 8.826e-05 0.009394
block (Intercept) 0.000e+00 0.000000
Residual 4.330e-04 0.020808
Number of obs: 99, groups: subject, 9; block, 3
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.072695 0.004451 241.01
prot1:hrs -0.033411 0.001360 -24.57
prot1:hrs8 0.031265 0.001617 19.34
hrs:prot23 -0.003008 0.000163 -18.45
Correlation of Fixed Effects:
(Intr) prt1:h prt1:8
prot1:hrs -0.396
prot1:hrs8 0.362 -0.987
hrs:prot23 -0.376 0.149 -0.136
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
46.3 Modeling correlation
Given that we did not randomize, we could find that there is correlation among the responses for a given subject. In particular, there could be a correlation that does not match the simple one required for the univariate analysis to work (everything has the same variance with constant correlation). We may be able to model the correlation to come up with something that fits the data better.
One concept for this correlation is that it is largest for points near in time and then decays as time gets farther apart. The simplest model for this is the autoregressive model of order one (AR1).
What this command does is say to keep each subject independent, but to allow
for AR1 correlations between responses for a given subject, with time taken as hrs
.
> emulsions.lme.ar1 <- update(emulsions.lme,correlation=corAR1(form=~hrs|block/subject))
AIC and BIC prefer the model with the AR1 adjustment.
> anova(emulsions.lme,emulsions.lme.ar1)
Model df AIC BIC logLik Test L.Ratio p-value
emulsions.lme 1 36 -169.3336 -90.50598 120.6668
emulsions.lme.ar1 2 37 -175.9648 -94.94762 124.9824 1 vs 2 8.631294 0.0033
Just to illustrate, let’s look at the first 10 estimated fixed effects with and without the AR1 correlation. Because we have balanced data with the orthogonal main effects and interactions model, the estimates are the same in the two models.
> fixed.effects(emulsions.lme)[1:10]
(Intercept) protein1 protein2 time1 time2 time3
0.98760606 -0.10006061 0.04745455 0.07217172 0.07294949 0.05683838
time4 time5 time6 time7
0.04128283 0.03406061 0.02172727 0.01339394
> fixed.effects(emulsions.lme.ar1)[1:10]
(Intercept) protein1 protein2 time1 time2 time3
0.98760606 -0.10006061 0.04745455 0.07217172 0.07294949 0.05683838
time4 time5 time6 time7
0.04128283 0.03406061 0.02172727 0.01339394
However, their standard errors differ.
> sqrt(diag(vcov(emulsions.lme)))[1:10]
(Intercept) protein1 protein2 time1 time2 time3
0.003616091 0.005113925 0.005113925 0.006676523 0.006676523 0.006676523
time4 time5 time6 time7
0.006676523 0.006676523 0.006676523 0.006676523
> sqrt(diag(vcov(emulsions.lme.ar1)))[1:10]
(Intercept) protein1 protein2 time1 time2 time3
0.005927617 0.007475466 0.007475466 0.006616551 0.006087686 0.005770966
time4 time5 time6 time7
0.005657149 0.005742990 0.006030932 0.006529343
The estimates will not always be the same. Here we fit the minimal spline regression models with and without the AR1 adjustment. The coefficients are slightly different (and could be more different in other settings).
> emulsions.lme.hrs.min <- lme(y~prot1:hrs+prot1:hrs8+prot23:hrs,random=~1|block/subject,data=emulsions)
> emulsions.lme.hrs.min.ar1 <- lme(y~prot1:hrs+prot1:hrs8+prot23:hrs,random=~1|block/subject,data=emulsions,correlation=corAR1(form=~hrs|block/subject))
> fixed.effects(emulsions.lme.hrs.min)
(Intercept) prot1:hrs prot1:hrs8 hrs:prot23
1.072694767 -0.033410454 0.031264678 -0.003008316
> fixed.effects(emulsions.lme.hrs.min.ar1)
(Intercept) prot1:hrs prot1:hrs8 hrs:prot23
1.074217594 -0.033820109 0.031696788 -0.003039819