Chapter 38 Random/Mixed Effects
38.1 Basic Tools
Beyond cfcdae
, we will also need:
- A package to fit random and fixed effects models. We
will mostly use tools in
lme4
, but we will also work a bit withnlme
. Typically only use one at a time. These both come with base R. - A package to get accurate tests for random effects
(fix the distribution of the likelihood ratio test of a random effect).
We will use
RLRsim
. - A package to do Kenward-Roger ANOVA for fixed
effects. Several can work including
car
andpbkrtest
.
> library(cfcdae)
> library(lme4)
Attaching package: 'lme4'
The following object is masked from 'package:nlme':
lmList
The following object is masked from 'package:bcfcdae':
fixef
> library(RLRsim)
> library(car)
> library(pbkrtest)
38.2 Random effects models
38.2.1 Bulls data
These are the data from exercise 5-1 of Kuehl (1994, Duxbury). There are 5 bulls selected at random, and we observe the birth weights of male calves. Sire is considered random, and we make it a factor.
> bulls <- read.table("kuehl1.dat.txt",header=TRUE)
> bulls <- within(bulls,sire <- as.factor(sire))
> bulls
sire wts
1 1 61
2 1 100
3 1 56
4 1 113
5 1 99
6 1 103
7 1 75
8 1 62
9 2 75
10 2 102
11 2 95
12 2 103
13 2 98
14 2 115
15 2 98
16 2 94
17 3 58
18 3 60
19 3 60
20 3 57
21 3 57
22 3 59
23 3 54
24 3 100
25 4 57
26 4 56
27 4 67
28 4 59
29 4 58
30 4 121
31 4 101
32 4 101
33 5 59
34 5 46
35 5 120
36 5 115
37 5 115
38 5 93
39 5 105
40 5 75
38.2.2 A bit of old school
OK, before jumping into REML, we will take just a little taste of the “old school” method for random effects. This example is one of the situations where old school is just dead easy.
The old school basically takes the fixed effects approach, ANOVA, and tries to fix it up for random effects. It works reasonably well in simple situations, but it doesn’t extend well.
This is the ordinary ANOVA. It doesn’t know anything about fixed or random effects. The DF, SS, and MS are correct. Under the null hypothesis of no sire effect, the random effects model and the fixed effects model are the same. Thus this F-test and p-value are valid.
There is modest evidence of sire-to-sire variability.
> calves.lm <- lm(wts~sire,data=bulls)
> anova(calves.lm)
Analysis of Variance Table
Response: wts
Df Sum Sq Mean Sq F value Pr(>F)
sire 4 5591.2 1397.79 3.0138 0.03087 *
Residuals 35 16232.7 463.79
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Normality is pretty good.
> plot(calves.lm,which=2)

Figure 38.1: NPP for Kuehl’s bull data
Constant variance is a little bit doubtful. If you do BoxCox, the optimum is near the log, but leaving the data alone (power 1) is well within the confidence interval. (It takes a pretty strong power family transformation to do much since the ratio of largest to smallest response is only about 2.) For simplicity, we’ll leave the data alone.
> plot(calves.lm,which=1)

Figure 38.2: Residual plot for Kuehl’s bull data
Old school uses “expected mean squares” extensively. These are the expected values of the mean squares in the basic ANOVA table.
Term | EMS |
---|---|
bull |
\(\sigma^2 + 8 \sigma_{bull}^2\) |
Error |
\(\sigma^2\) |
The function cfcdae::mixed.ems()
will compute those
expected mean squares for balanced situations. Here
we say we have a factor named bull
. The next argument
says bull
has five levels and there are eight observations
for each bull. Finally, we tell it bull
is random.
It gives us (note: it prints V[foo]
instead of
\(\sigma_{foo}^2\))
> mixed.ems(~bull,c(5,8),random="bull")
Intercept bull
"V[Error]+8V[bull]+40Q[Intercept]" "V[Error]+8V[bull]"
Error
"V[Error]"
A little algebra shows us that \((MS_{bull}-MS_E)/8\) should be an unbiased estimate of \(\sigma_{bull}^2\).
> (1397.79-463.79)/8
[1] 116.75
38.2.3 And now the modern way
There are two main functions for doing REML. The first is lme()
from
the nlme
package, and the second is lmer()
from the lme4
package. We will mostly use lmer()
, but
we will dabble with lme()
from time to time.
In lmer()
, the fixed effects
terms are entered as usual. In our case, the only fixed effect term is the overall mean.
The random effects terms are entered inside parentheses. In this case, the 1
in the 1|sire
means give us an additive effect (which R will call an intercept), and the |sire
part means
give us a separate, independent effect for each level of sire.
These are the \(\alpha_i\) random effects.
> calves.lmer <- lmer(wts ~ 1 + (1|sire),data=bulls)
The anova()
function applied to
an lmer()
model gives test statistics for the fixed effects
(and we’ll see, not quite what we were hoping for).
However, it doesn’t
give anything for the intercept,
which is the only fixed effect in this
model. Thus, the anova()
output for this lmer model
is a little unfulfilling.
> anova(calves.lmer)
Analysis of Variance Table
npar Sum Sq Mean Sq F value
summary()
output, on the other hand, has lots of good stuff.
The summary has several parts, beginning with a statement of the model. Next is the “REML criterion,” which is the REML deviance: -2 times the REML log likelihood. Differences in REML deviance are treated, kind of, sort of, like chisquare statistics. If you add twice the number of parameters to the deviance you get AIC; if you instead multiply the number of parameters by the log of the number of data, you get BIC.
Following that is a brief summary of the residuals.
The next section is estimates of the random effects variances. We have two random effects here, one for sire and one for error. The output gives the estimated variance, along with the standard deviation (just the square root of the variance, not the variability of our estimate of variance). Here the values of 116.75 for the sire variance and 463.79 for error variance are exactly the same as what we obtained from the old school method. This is generally the case in simple, balanced problems.
The final section is the fixed effects, their estimates and their standard errors. In this model the only fixed effect is the intercept (grand mean).
NOTE: this example is fairly straightforward. In some more complicated situations, the estimated fixed effects could depend on the estimated random effects. In those situations, standard errors for fixed effects are usually a little too small. The reason is that they are computed assuming that we know the random effects, but we don’t really know the random effects. Not knowing the random effects introduces more variability that we are not accounting for.
> summary(calves.lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: wts ~ 1 + (1 | sire)
Data: bulls
REML criterion at convergence: 358.2
Scaled residuals:
Min 1Q Median 3Q Max
-1.9593 -0.7459 -0.1581 0.8143 1.9421
Random effects:
Groups Name Variance Std.Dev.
sire (Intercept) 116.7 10.81
Residual 463.8 21.54
Number of obs: 40, groups: sire, 5
Fixed effects:
Estimate Std. Error t value
(Intercept) 82.550 5.911 13.96
> AIC(calves.lmer)
[1] 364.217
> BIC(calves.lmer)
[1] 369.2836
All the pieces fit together as you might expect. For example,
the estimate of the grand mean is the mean of the treatment means.
The treatment means are independent, so the variance of
the grand mean is the variance of a treatment mean divided by 5.
The variance of the treatment mean is the variance of a treatment
effect plus the variance of the average of 8 \(\epsilon_{ij}\)s,
or \(\sigma_\alpha^2+\sigma^2/8\). Put together,
the variance of the grand mean should be
\(\sigma_\alpha^2/5+\sigma^2/40\). This should match
what we saw in the summary()
.
> sqrt(116.7/5+463.8/40)
[1] 5.910584
Note: when we do this simple case by hand, we would be using t with 4 df when forming confidence intervals for the grand mean, because this is really coming from the MS for sire with 4 df. Thus we should not think of this as estimate plus or minus two times standard error, because we need a t multiplier. Sadly, REML does not let us know about the “degrees of freedom.”
38.2.4 Residuals and effects
Here are the usual residuals from the fixed effects model and the REML random effects model. They are not the same!> residuals(calves.lm)
1 2 3 4 5 6 7 8 9 10 11 12
-22.625 16.375 -27.625 29.375 15.375 19.375 -8.625 -21.625 -22.500 4.500 -2.500 5.500
13 14 15 16 17 18 19 20 21 22 23 24
0.500 17.500 0.500 -3.500 -5.125 -3.125 -3.125 -6.125 -6.125 -4.125 -9.125 36.875
25 26 27 28 29 30 31 32 33 34 35 36
-20.500 -21.500 -10.500 -18.500 -19.500 43.500 23.500 23.500 -32.000 -45.000 29.000 24.000
37 38 39 40
24.000 2.000 14.000 -16.000
> residuals(calves.lmer)
1 2 3 4 5 6 7 8 9
-22.268310 16.731690 -27.268310 29.731690 15.731690 19.731690 -8.268310 -21.268310 -17.539516
10 11 12 13 14 15 16 17 18
9.460484 2.460484 10.460484 5.460484 22.460484 5.460484 1.460484 -11.570312 -9.570312
19 20 21 22 23 24 25 26 27
-9.570312 -12.570312 -12.570312 -10.570312 -15.570312 30.429688 -22.175615 -23.175615 -12.175615
28 29 30 31 32 33 34 35 36
-20.175615 -21.175615 41.824385 21.824385 21.824385 -29.196248 -42.196248 31.803752 26.803752
37 38 39 40
26.803752 4.803752 16.803752 -13.196248
Similarly, the estimated random level for the first sire is not the same as the estimated fixed effect for the first sire. In general, the random effects are shrunk in a little bit toward zero.
We have this complicated form to get random
effects estimates because ranef()
generates a list
with a named component for every random effect in the model; here we pick
the effects for sire
. The components
are themselves lists with a named component for every term in the random effect.
In our case, the random effect just has an intercept (the 1 on the left hand
side of the vertical bar), so the [[1]] just selects the first (and only) component.
> model.effects(calves.lm,"sire")
1 2 3 4 5
1.075 14.950 -19.425 -5.050 8.450
> ranef(calves.lmer)$sire[[1]]
[1] 0.7183096 9.9895155 -12.9796882 -3.3743848 5.6462479
Here we just plot them against each other along with the y=x line. We can see that the random effects (on the vertical axis) are shrunk back toward zero.
The amount by which they are shrunk depends on the variance components. The larger the error variance is relative to the random effect variance, the greater the shrinkage. This can lead to some puzzling patterns in residual plots.
> plot(model.effects(calves.lm,"sire"),ranef(calves.lmer)$sire[[1]])
> abline(0,1)

Figure 38.3: Random effects against fixed effects for bulls
We can also do diagnostic plots including residuals versus fitted and NPP plots for residuals and effects.
Constant variance might be iffy.
> plot(calves.lmer)

Figure 38.4: Residual plot for bulls data
Normality of residuals is not too bad.
> qqnorm(resid(calves.lmer),main="Residuals")

Figure 38.5: NPP of residuals for bulls data
With so few levels it is difficult to say much about normality of the random effects.
> qqnorm(ranef(calves.lmer)$sire[[1]],main="Sire effects")

Figure 38.6: NPP of sire random effects for bulls data
38.2.5 A look at lme()
We will briefly peek at the lme()
function
in the nlme
library.
In lme()
,
you specify the random effects in a separate argument from the fixed effects.
For this example, there’s not much to choose between them.
In general, lmer()
can do crossed random
effects while that is very difficult/impossible in lme()
. However, lme()
can do some very
complicated special covariance structures for random effects that cannot
be done in lmer()
,
and lme()
can handle some big problems that might
overtax lmer()
. In addition, the non-crossed nature of the random
effects in lme()
makes estimating “denominator” degrees of freedom
simpler, so lme()
is happier about doing an anova for fixed effects.
> library(nlme)
> calves.lme <- lme(wts ~ 1, random= ~1|sire,data=bulls)
The presentation of information in the summary for
lme()
is
different than for lmer()
, but the information agrees.
> summary(calves.lme)
Linear mixed-effects model fit by REML
Data: bulls
AIC BIC logLik
364.217 369.2077 -179.1085
Random effects:
Formula: ~1 | sire
(Intercept) Residual
StdDev: 10.8053 21.53582
Fixed effects: wts ~ 1
Value Std.Error DF t-value p-value
(Intercept) 82.55 5.911487 35 13.96434 0
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.9593563 -0.7458505 -0.1580621 0.8142560 1.9420875
Number of Observations: 40
Number of Groups: 5
38.3 More than one factor
38.3.1 Resistors data
These are data from problem 6.18 of Hicks and Turner (1999 Oxford). Ten resistors are chosen at random, and three operators are chosen at random. Each operator measures the resistance of each resistor twice, with the 20 measurements made in random order. Response is in milliohms.
> resistors <- read.table("hicksturner.dat.txt",header=TRUE)
> resistors
part oper mohms
1 1 1 417
2 1 2 394
3 1 3 404
4 1 1 419
5 1 2 398
6 1 3 410
7 2 1 417
8 2 2 387
9 2 3 398
10 2 1 417
11 2 2 399
12 2 3 402
13 3 1 423
14 3 2 389
15 3 3 407
16 3 1 418
17 3 2 407
18 3 3 402
19 4 1 412
20 4 2 389
21 4 3 407
22 4 1 410
23 4 2 405
24 4 3 411
25 5 1 407
26 5 2 386
27 5 3 400
28 5 1 409
29 5 2 405
30 5 3 410
31 6 1 408
32 6 2 382
33 6 3 405
34 6 1 413
35 6 2 400
36 6 3 410
37 7 1 409
38 7 2 385
39 7 3 407
40 7 1 408
41 7 2 400
42 7 3 400
43 8 1 408
44 8 2 384
45 8 3 402
46 8 1 411
47 8 2 401
48 8 3 405
49 9 1 412
50 9 2 387
51 9 3 412
52 9 1 408
53 9 2 401
54 9 3 405
55 10 1 410
56 10 2 386
57 10 3 418
58 10 1 404
59 10 2 407
60 10 3 404
Make factors and fit the model. This model has the constant as the only fixed effect. We have an independent random level for each part, for each operator, and for each part by operator combination.
> resistors <- within(resistors, {oper <- as.factor(oper);part <- as.factor(part)})
> mohms.lmer <- lmer(mohms ~ 1 + (1|part) + (1|oper) + (1|part:oper),data=resistors)
boundary (singular) fit: see help('isSingular')
An interesting feature here is that two of the random effects (part and part by operator) are estimated to have zero variance. That is what the “boundary (singular) fit” warning was about.
> summary(mohms.lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: mohms ~ 1 + (1 | part) + (1 | oper) + (1 | part:oper)
Data: resistors
REML criterion at convergence: 396.9
Scaled residuals:
Min 1Q Median 3Q Max
-2.0235 -0.6149 -0.1380 0.8115 1.9140
Random effects:
Groups Name Variance Std.Dev.
part:oper (Intercept) 0.00 0.000
part (Intercept) 0.00 0.000
oper (Intercept) 76.02 8.719
Residual 40.31 6.349
Number of obs: 60, groups: part:oper, 30; part, 10; oper, 3
Fixed effects:
Estimate Std. Error t value
(Intercept) 404.2 5.1 79.25
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
Look at assumptions. Variance might be decreasing a little, but it is probably OK.
> plot(mohms.lmer)

Figure 38.7: Residual plot for resistor data
Normality of residuals looks good.
> qqnorm(residuals(mohms.lmer),main="Residuals")

Figure 38.8: NPP of residuals for resistor data
Not much to see with part effects (which are numerically 0 anyway).
> qqnorm(ranef(mohms.lmer)$part[[1]],main="Part effects")

Figure 38.9: NPP of part effects for resistor data
Only three operators, so you can’t really say much about normality.
> qqnorm(ranef(mohms.lmer)$oper[[1]],main="Oper effects")

Figure 38.10: NPP of operator effects for resistor data
Interaction effects also numerically zero.
> qqnorm(ranef(mohms.lmer)$"part:oper"[[1]],main="Part:oper interaction effects")

Figure 38.11: NPP of interaction effects for resistor data
38.3.2 Testing random effects
The standard measure of overall model fit is the log likelihood. We saw the log likelihood in the model summary above.
> logLik(calves.lmer,REML=TRUE)
'log Lik.' -179.1085 (df=3)
When we want to compare two models that differ by a random effect, we take twice the difference of the log likelihoods (this is the difference of the deviances) as a test statistic and compare it to a chi-square distribution with degrees of freedom equal to the difference in parameters. For a single random effect, this is just one df.
When we test random effects, we can regular or REML likelihoods. When we test fixed
effects, we must use regular likelihoods (i.e., REML=FALSE
).
> logLik(lm(wts~1,data=bulls),REML=TRUE)
'log Lik.' -180.5634 (df=2)
> 2*(-179.1085- -180.5634)
[1] 2.9098
> pchisq(2.91,1,lower.tail=FALSE)
[1] 0.08803187
The apparent p-value is about .09.
Everyone should be a little queasy about the “apparent” in the previous statement. Well, there’s good news and bad news here. The good news is we can do a chi square test, but the bad news is that the distribution of the chi square test when testing variance components isn’t really chi square. Ouch.
The problem with the likelihood ratio tests of variance components is that the null value 0 is on the boundary of the possible parameters (unlike for a fixed effect, where 0 is in the middle of positive and negative values that are all possible). The consequence is that LR tests for variance components tend to be conservative. That is, the p-values that you compute using the chi square distribution tend to be bigger than they should be. I’ve seen recommendations to divide the nominal p-value by 2, but that’s just a rough guide. Dividing by 2, our corrected p-value is about .045.
What we will do is use the exactRLRT()
function. It simulates
the distribution of the chi square test to get a more accurate
p-value. That is,
it uses the (restricted) likelihood ratio as a test statistic, but it simulates the
real distribution to get a p-value. It doesn’t compare it to a chi square distribution.
If you have a model with a
single variance component (other than error), then you can test that component
simply with exactRLRT(model)
. As we can see, the
p-value is about .03, which is what the old school anova gave us.
> exactRLRT(calves.lmer)
simulated finite sample distribution of RLRT.
(p-value based on 10000 simulated values)
data:
RLRT = 2.9099, p-value = 0.0302
Note: exactRLRT()
uses a simulation. It will give you slightly
different results each time.
38.3.3 Testing random effects when there are more than one
The exactRLRT()
function is a little more complex
when there is more than one random effect. To test one random effect, call it A,
we are going to need three fitted lmer()
models. The first is a model with A
as the only random effect; the second is the full alternative model (with
all random effects including A); the third is the null model, with all the random effects
except A. All models should have all of the fixed
effects, but in this case it’s only the constant.
So let’s set this up.
Note that some of these are non-hierarchical in the random terms, but that’s OK with random effects.
> mohms.partonly.lmer <- lmer(mohms ~ 1 + (1|part),data=resistors)
boundary (singular) fit: see help('isSingular')
> mohms.operonly.lmer <- lmer(mohms ~ 1 + (1|oper),data=resistors)
> mohms.intronly.lmer <- lmer(mohms ~ 1 + (1|part:oper),data=resistors)
> mohms.nopart.lmer <- lmer(mohms ~ 1 + (1 | oper) + (1 | part:oper),data=resistors)
boundary (singular) fit: see help('isSingular')
> mohms.nooper.lmer <- lmer(mohms ~ 1 + (1 | part) + (1 | part:oper),data=resistors)
boundary (singular) fit: see help('isSingular')
> mohms.nointr.lmer <- lmer(mohms ~ 1 + (1 | part) + (1 | oper),data=resistors)
boundary (singular) fit: see help('isSingular')
To
test part
we need three models, the one with only part
, the full model,
and the one without part
. We’re not surprised to find a big p-value
for a random effect estimated to have zero variance.
Note that these p-values are simulated, and we should expect them to change a little if we run the command again.
> exactRLRT(mohms.partonly.lmer,mohms.lmer,mohms.nopart.lmer)
simulated finite sample distribution of RLRT.
(p-value based on 10000 simulated values)
data:
RLRT = 1.4438e-11, p-value = 0.445
For operator, we have a highly significant p-value.
> exactRLRT(mohms.operonly.lmer,mohms.lmer,mohms.nooper.lmer)
simulated finite sample distribution of RLRT.
(p-value based on 10000 simulated values)
data:
RLRT = 35.532, p-value < 2.2e-16
The interaction is not significant (it was also estimated at zero).
> exactRLRT(mohms.intronly.lmer,mohms.lmer,mohms.nointr.lmer)
Observed RLRT statistic is 0, no simulation performed.
simulated finite sample distribution of RLRT.
(p-value based on 0 simulated values)
data:
RLRT = 0, p-value = 1
38.3.4 Predictive comparison
AIC and BIC agree that the model with just operator is the best model.
> AIC(mohms.lmer,mohms.partonly.lmer,mohms.operonly.lmer,mohms.intronly.lmer,mohms.nopart.lmer,mohms.nooper.lmer,mohms.nointr.lmer)
df AIC
mohms.lmer 5 406.9429
mohms.partonly.lmer 3 444.2165
mohms.operonly.lmer 3 402.9429
mohms.intronly.lmer 3 438.4752
mohms.nopart.lmer 4 404.9429
mohms.nooper.lmer 4 440.4752
mohms.nointr.lmer 4 404.9429
> BIC(mohms.lmer,mohms.partonly.lmer,mohms.operonly.lmer,mohms.intronly.lmer,mohms.nopart.lmer,mohms.nooper.lmer,mohms.nointr.lmer)
df BIC
mohms.lmer 5 417.4146
mohms.partonly.lmer 3 450.4996
mohms.operonly.lmer 3 409.2260
mohms.intronly.lmer 3 444.7582
mohms.nopart.lmer 4 413.3203
mohms.nooper.lmer 4 448.8526
mohms.nointr.lmer 4 413.3203
So what do we conclude from all of this? There is very little variability between parts or between parts separately by operator. On the other hand, there seems to be substantial variability between operators, possibly much larger in size than error variability. However, we do not have enough levels of operator to get a tight estimate of the operator variance.
38.3.5 Confidence intervals (and now the bad news …)
If we want confidence intervals, we can use the
confint()
function that comes with lme4
.
This function has a couple of options for computing the
intervals. The default is profile
. This is basically
finding all of the values for a parameter for which the
corresponding likelihood ratio test (with that
value as null) would not reject.
We have already stated that the LR test without
modification has trouble down at zero.
In general, treat confidence intervals for variance components with considerable skepticism.
(The oldNames=FALSE gives us something that is more readable.)
> confint(calves.lmer,oldNames=FALSE)
Computing profile confidence intervals ...
2.5 % 97.5 %
sd_(Intercept)|sire 0.00000 24.61580
sigma 17.32943 27.76544
(Intercept) 69.83802 95.26197
> confint(calves.lmer,oldNames=FALSE,level=.8)
Computing profile confidence intervals ...
10 % 90 %
sd_(Intercept)|sire 2.579135 17.29149
sigma 18.615255 25.30825
(Intercept) 75.177634 89.92236
Sometimes you might want to do ordinary likelihood instead of REML. You get
that by setting REML to FALSE in the lmer()
command. You can see that the
estimated variances tend to be smaller. In fact, the likelihood estimates of
variance tend to be biased downwards, which is why people like REML.
> summary(lmer(wts ~ 1 +(1|sire),data=bulls,REML=FALSE))
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: wts ~ 1 + (1 | sire)
Data: bulls
AIC BIC logLik deviance df.resid
369.5 374.6 -181.7 363.5 37
Scaled residuals:
Min 1Q Median 3Q Max
-1.9268 -0.7671 -0.1272 0.8397 1.9226
Random effects:
Groups Name Variance Std.Dev.
sire (Intercept) 81.8 9.045
Residual 463.8 21.536
Number of obs: 40, groups: sire, 5
Fixed effects:
Estimate Std. Error t value
(Intercept) 82.550 5.287 15.61
38.3.6 CIs for the resistors data
Here are confidence intervals, but, well, you know they
aren’t that great. confint()
also has a lot of numerical/algorithmic
complaints that arise at least in part from the boundary fits.
> confint(mohms.lmer,oldNames=FALSE)
Computing profile confidence intervals ...
2.5 % 97.5 %
sd_(Intercept)|part:oper 0.00000 2.713635
sd_(Intercept)|part 0.00000 2.797155
sd_(Intercept)|oper 3.55482 21.252377
sigma 5.34056 7.720188
(Intercept) 392.55756 415.809160