Chapter 18 Example 3.7: Model parameters

summary() gives more internal detail about the model fit, including estimates and standard errors of the parameters in the model for the mean.

18.1 Factor-based Models

> summary(separate.means)

Call:
lm(formula = logTime ~ temp, data = myRL)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.22667 -0.03667  0.00250  0.03125  0.20333 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.43794    0.01585  90.708  < 2e-16 ***
temp1        0.49456    0.03065  16.134  < 2e-16 ***
temp2        0.19081    0.03065   6.225 5.67e-07 ***
temp3       -0.06044    0.03065  -1.972   0.0573 .  
temp4       -0.24365    0.03222  -7.563 1.30e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0958 on 32 degrees of freedom
Multiple R-squared:  0.9233,    Adjusted R-squared:  0.9138 
F-statistic: 96.36 on 4 and 32 DF,  p-value: < 2.2e-16

Note that some of the individual coefficients can be “non-significant” even within a highly significant term such as temp.

What we call \(\widehat{\mu}\) is what R calls (Intercept) (sometimes called the Constant in other software), so \(\widehat{\mu} = 1.43794\). The temp1, temp2, etc. values are \(\widehat{\alpha}_1, \widehat{\alpha}_2, ...\). Thus the estimate (or prediction) for the second treatment level is \(\widehat{\mu} + \widehat{\alpha}_2 = 1.43794 + .19081 = 1.62875\) as we saw above.

The individual means model just fits means as \(\widehat{\mu}_i\) instead of a central value plus treatment effects \(\widehat{\mu}+\widehat{\alpha}_i\) as in the separate means model. This summary shows the individual means and no overall central value.
> summary(individual.means)

Call:
lm(formula = logTime ~ temp - 1, data = myRL)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.22667 -0.03667  0.00250  0.03125  0.20333 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
temp175  1.93250    0.03387   57.05   <2e-16 ***
temp194  1.62875    0.03387   48.09   <2e-16 ***
temp213  1.37750    0.03387   40.67   <2e-16 ***
temp231  1.19429    0.03621   32.98   <2e-16 ***
temp250  1.05667    0.03911   27.02   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0958 on 32 degrees of freedom
Multiple R-squared:  0.9965,    Adjusted R-squared:  0.9959 
F-statistic:  1808 on 5 and 32 DF,  p-value: < 2.2e-16

As seen above, emmeans() does something similar (the foo::goo form says take object goo from package/library foo).

> emmeans::emmeans(separate.means,"temp")
 temp emmean     SE df lower.CL upper.CL
 175    1.93 0.0339 32    1.864     2.00
 194    1.63 0.0339 32    1.560     1.70
 213    1.38 0.0339 32    1.309     1.45
 231    1.19 0.0362 32    1.121     1.27
 250    1.06 0.0391 32    0.977     1.14

Confidence level used: 0.95 

R’s default parameterization for factors is that the first treatment effect is 0. However, when you load the cfcdae package, it changes the default parameterization for factors to the sum of the effects is zero. We can thus figure out the coefficient for treatment 5 as minus the sum of the other four coefficients.

The model.effects() function in cfcdae makes finding all the effects a little easier. It’s not a big deal here, but it’s more compelling in more complex models.

> model.effects(separate.means,"temp")
        175         194         213         231         250 
 0.49455952  0.19080952 -0.06044048 -0.24365476 -0.38127381 

A side-by-side plot (function sidebyside() in cfcdae) is a way to see the size of effects relative to the size of the residuals or other effects. In this simple example, you can’t see much that you couldn’t already see in the boxplots, but this approach becomes more insightful in factorial models.

> sidebyside(separate.means)
Side by side plot for resin lifetime data

Figure 18.1: Side by side plot for resin lifetime data

We have mentioned a couple of times that R’s default is to use the “\(\widehat{\alpha}_1=0\)” parameterization, and that the cfcdae package resets that default to be that the treatment effects sum to zero. We can force R to fit with the “\(\widehat{\alpha}_1=0\)” parameterization by using this somewhat awkward optional argument (or this is what you’d get if you fit the model before loading cfcdae). If you wanted to force the sum to zero constraint you would use contrasts=list(temp=contr.sum) instead of contr.treatment as shown here.

> out2 <- lm(logTime~temp,data=myRL,contrasts=list(temp=contr.treatment))
> summary(out2)

Call:
lm(formula = logTime ~ temp, data = myRL, contrasts = list(temp = contr.treatment))

Residuals:
     Min       1Q   Median       3Q      Max 
-0.22667 -0.03667  0.00250  0.03125  0.20333 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.93250    0.03387  57.055  < 2e-16 ***
temp2       -0.30375    0.04790  -6.341 4.06e-07 ***
temp3       -0.55500    0.04790 -11.586 5.49e-13 ***
temp4       -0.73821    0.04958 -14.889 6.13e-16 ***
temp5       -0.87583    0.05174 -16.928  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0958 on 32 degrees of freedom
Multiple R-squared:  0.9233,    Adjusted R-squared:  0.9138 
F-statistic: 96.36 on 4 and 32 DF,  p-value: < 2.2e-16

Note that we now get coefficients for the last four levels of temp. There is no coefficient shown for temp1 because we know that it is zero when using this set of constraints on the coefficients. We also see that the (Intercept) value is the mean in the first treatment group (which it must be since \(\alpha_1 = 0\)). A little calculation shows that \(\hat{\alpha}_i-\hat{\alpha}_j\) is the same for both approaches, regardless of \(i\) or \(j\).

18.2 Polynomial Models

summary() works for functional models as well. Here we use the function coef(model) to see just the model coefficients of the standard polynomial fit of order two and the corresponding orthogonal polynomial fit.

> coef(p2)
  (Intercept)        temp.z       temp.z2 
 7.417999e+00 -4.509806e-02  7.860391e-05 
> coef(op2)
              (Intercept) poly(temp.z, degree = 2)1 poly(temp.z, degree = 2)2 
                1.4651351                -1.8599091                 0.2798988 

Note that the coefficients are different between p2 and op2. This is because the two models use different predictors (monomials versus orthogonal polynomials), even though the overall models are equivalent quadratic (second order) models.

Here we see that the first order (linear term) coefficient is different in p1 and p2, but it is the same in op1 and op2. This is one of the claims to fame for othogonal polynomials.

> coef(p1)
(Intercept)      temp.z 
 3.95600752 -0.01185672 
> coef(op1)
             (Intercept) poly(temp.z, degree = 1) 
                1.465135                -1.859909 

18.3 Over-Parameterized Models: Too Much of a Good Thing

If you have \(g\) groups, you can fit a model with up to \(g\) parameters in the mean structure, either via a factor or some functional form such as a polynomial. You can fit fewer than \(g\) (say quadratic instead of quartic), but you cannot fit more than \(g\).

If you try to fit more than \(g\), R won’t let you fit more than \(g\). The lm() call won’t crash, but it will toss out some of the potential predictors so that you are left with only \(g\). Depending on which predictors it tosses, the parameters it retains might be easy to understand, or they could be completely incomprehensible. Parameters are always arbitrary, because the model you choose and how you set it up is arbitrary (factor with zero sum effects versus factor with first coefficient 0 versus ordinary polynomial versus orthogonal polynomial versus …). Combine that now with how your particular software package decides to toss unneeded predictors, and, well, let’s just say that the fun really begins.

When you use summary() or coef() after fitting an over-parameterized model, the predictors that lm() has tossed will show up as NA (missing). So if you try to fit the fifth power to the resin lifetime data, it tosses the quintic term out.
> p5 <- lm(logTime ~ temp.z + temp.z2 + temp.z3 + temp.z4 +
+            I(temp.z^5),data=myRL)
> coef(p5)
  (Intercept)        temp.z       temp.z2       temp.z3       temp.z4 
 9.699492e-01  7.573323e-02 -7.648846e-04  2.600330e-06 -2.987894e-09 
  I(temp.z^5) 
           NA 
But if you had typed the model with the quintic term first, then some other term will be eliminated depending on the order of the terms. It’s usually the last term that is eliminated, but you cannot count on that.
> p5b <- lm(logTime ~ I(temp.z^5) + temp.z + temp.z2 + temp.z3 + temp.z4,
+            data=myRL)
> coef(p5b)
  (Intercept)   I(temp.z^5)        temp.z       temp.z2       temp.z3 
 2.143774e+00 -2.810813e-12  4.768732e-02 -4.979222e-04  1.334793e-06 
      temp.z4 
           NA 
The extra term could be a factor after the polynomial exhausts all of the degrees of freedom, or it could be the polynomial after you have fit the separate means model.
> p4.plus <- lm(logTime ~ temp.z + temp.z2 + temp.z3 + temp.z4 +
+            temp,data=myRL)
> coef(p4.plus)
  (Intercept)        temp.z       temp.z2       temp.z3       temp.z4 
 9.699492e-01  7.573323e-02 -7.648846e-04  2.600330e-06 -2.987894e-09 
        temp1         temp2         temp3         temp4 
           NA            NA            NA            NA 
> separate.means.plus <- lm(logTime ~ temp + temp.z + temp.z2 + temp.z3 + 
+                             temp.z4,data=myRL)
> coef(separate.means.plus)
(Intercept)       temp1       temp2       temp3       temp4      temp.z 
 1.43794048  0.49455952  0.19080952 -0.06044048 -0.24365476          NA 
    temp.z2     temp.z3     temp.z4 
         NA          NA          NA 
If you have an over-parameterized model, anova() just pretends that the eliminated terms are not there at all.
> anova(p5)
Analysis of Variance Table

Response: logTime
          Df Sum Sq Mean Sq  F value    Pr(>F)    
temp.z     1 3.4593  3.4593 376.9128 < 2.2e-16 ***
temp.z2    1 0.0783  0.0783   8.5361  0.006338 ** 
temp.z3    1 0.0000  0.0000   0.0020  0.964399    
temp.z4    1 0.0000  0.0000   0.0009  0.976258    
Residuals 32 0.2937  0.0092                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> anova(p5b)
Analysis of Variance Table

Response: logTime
            Df  Sum Sq Mean Sq F value    Pr(>F)    
I(temp.z^5)  1 3.10714 3.10714 338.547 < 2.2e-16 ***
temp.z       1 0.43006 0.43006  46.858 9.627e-08 ***
temp.z2      1 0.00042 0.00042   0.046    0.8316    
temp.z3      1 0.00001 0.00001   0.001    0.9751    
Residuals   32 0.29369 0.00918                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> anova(p4.plus)
Analysis of Variance Table

Response: logTime
          Df Sum Sq Mean Sq  F value    Pr(>F)    
temp.z     1 3.4593  3.4593 376.9128 < 2.2e-16 ***
temp.z2    1 0.0783  0.0783   8.5361  0.006338 ** 
temp.z3    1 0.0000  0.0000   0.0020  0.964399    
temp.z4    1 0.0000  0.0000   0.0009  0.976258    
Residuals 32 0.2937  0.0092                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> anova(separate.means.plus)
Analysis of Variance Table

Response: logTime
          Df Sum Sq Mean Sq F value    Pr(>F)    
temp       4 3.5376 0.88441  96.363 < 2.2e-16 ***
Residuals 32 0.2937 0.00918                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In these examples, the model that was ultimately fit was reasonably understandable, yielding the quartic model (p5 and p4.plus) or the separate means model (separate.means.plus). The terms retained in p5b are not hierarchical (quartic is missing), so while it is understandable, it is not a model we like to use.

Now look at a model that uses linear and quadratic polynomial terms plus the factor term. It will fit the polynomial terms, but it can only fit 2 degrees of freedom from the factor term; it cannot fit any more. The factor term soaks up any additional sums of squares and degrees of freedom that were not previously fit by earlier terms.
> p2.plus <- lm(logTime ~ temp.z + temp.z2 + temp,data=myRL)
> coef(p2.plus)
  (Intercept)        temp.z       temp.z2         temp1         temp2 
 7.469597e+00 -4.558867e-02  7.975247e-05 -1.498976e-03  1.791155e-03 
        temp3         temp4 
           NA            NA 
> anova(p2.plus)
Analysis of Variance Table

Response: logTime
          Df Sum Sq Mean Sq  F value    Pr(>F)    
temp.z     1 3.4593  3.4593 376.9128 < 2.2e-16 ***
temp.z2    1 0.0783  0.0783   8.5361  0.006338 ** 
temp       2 0.0000  0.0000   0.0015  0.998540    
Residuals 32 0.2937  0.0092                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> anova(p2,separate.means)
Analysis of Variance Table

Model 1: logTime ~ temp.z + temp.z2
Model 2: logTime ~ temp
  Res.Df     RSS Df  Sum of Sq      F Pr(>F)
1     34 0.29372                            
2     32 0.29369  2 2.6829e-05 0.0015 0.9985

Note that the degrees of freedom and sums of squares for temp in p2.plus are the same as that going from the quadratic model to the separate means model. That can sometimes be useful.

But for the love of heaven, don’t try to make sense out of the coefficients when you have over-parameterized by mixing polynomial and factor predictors for the same factor. It can be done, but there lies madness. In particular, when you have over-parameterized and used too many predictors, there are infinitely many different sets of coefficients that will lead to the same set of fitted values, the vast majority of these sets of coefficients are not understandable, and which set you get depends delicately on your software.