data checking

We fixed a couple data errors after discussing with the investigators

Also found one 37-year-old, decided to keep in.

d <- read.csv("halfmile-2016-03-21.csv")
d$Diff <- d$After - d$Before

make some exploratory plots

Want to plot all the variables and as many relationships as possible, to check for anything that looks odd and to build intuition about the data set and the relationships of interest.

Big scatterplot matrix of continuous variables, with color for sex, check for anything that looks odd; all looks basically okay. (so this has all but smoking)

pairs(d[,c("Age","YearsRunning", "Before","After", "Diff")], col=d$Sex)

Here’s boxplots of before and after by treatment; treatment is faster, all points look reasonable.

dm <- melt(d, id.vars=c("ID", "Treatment", "Sex"), measure.vars=c("Before", "After"), variable.name="Time")
ggplot(dm) + aes(Treatment, value) + geom_boxplot(outlier.size=0) + geom_point(position=position_jitter(width=0.2))+ facet_wrap(~Time)

Before to after lines for each individual, with averages in thicker lines, looking to make sure all changes feel reasonable too.

ggplot(dm) + aes(Time, value, group=ID, color=Treatment) + geom_point() + geom_line(alpha=0.5) + facet_wrap(~Sex) + geom_smooth(aes(group=Treatment), method="lm", lwd=2, se=FALSE)

Simple boxplots of difference by treatment; looks like treatment might have some effect, shortening the run times. Notice that both control and treatment got faster, on average.

ggplot(d) + aes(Treatment, Diff) + geom_boxplot(outlier.size=0) + geom_point(position=position_jitter(width=0.2))

Make that plot separate by covariates; see similar patterns. Notice not too many smokers!

ggplot(d) + aes(Treatment, Diff) + facet_grid(Sex~Smoking) + geom_boxplot(outlier.size=0) + geom_point(position=position_jitter(width=0.2))

xtabs(~Smoking + Sex, d)
##        Sex
## Smoking  F  M
##     No  19 17
##     Yes  3  1

check out years running, looks like more running, less room for improvement

ggplot(d) + aes(YearsRunning, Diff, color=Treatment) + geom_point() + facet_wrap(~Sex) + geom_smooth(method="lm")

check out age

ggplot(d) + aes(Age, Diff, color=Treatment) + geom_point() +  geom_smooth(method="lm")

check out before

ggplot(d) + aes(Before, Diff, color=Treatment) + geom_point() +
  geom_smooth(method="lm")

make a model and check some assumptions

first just a t-test

m1 <- t.test(Diff ~ Treatment, d)
m1
## 
##  Welch Two Sample t-test
## 
## data:  Diff by Treatment
## t = 2.2031, df = 37.553, p-value = 0.0338
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2664884 6.3335116
## sample estimates:
## mean in group Control     mean in group Trt 
##                  -1.1                  -4.4

now with a linear model

m2 <- lm(Diff ~ Treatment, d)
summary(m2)
## 
## Call:
## lm(formula = Diff ~ Treatment, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.600  -3.675   0.750   3.100  11.400 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    -1.100      1.059  -1.039   0.3056  
## TreatmentTrt   -3.300      1.498  -2.203   0.0337 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.737 on 38 degrees of freedom
## Multiple R-squared:  0.1133, Adjusted R-squared:  0.08993 
## F-statistic: 4.854 on 1 and 38 DF,  p-value: 0.03372
par(mfrow=c(2,2)); plot(m2)

m3a <- lm(Diff ~ Treatment + Sex + Age + Smoking + YearsRunning, d)
summary(m3a)
## 
## Call:
## lm(formula = Diff ~ Treatment + Sex + Age + Smoking + YearsRunning, 
##     data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.1760 -2.7348 -0.1381  2.6104  9.1833 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    8.6915     4.3422   2.002   0.0534 . 
## TreatmentTrt  -2.9278     1.3794  -2.123   0.0412 * 
## SexM          -2.0315     1.4416  -1.409   0.1679   
## Age           -0.4007     0.1379  -2.906   0.0064 **
## SmokingYes    -0.8979     2.3727  -0.378   0.7075   
## YearsRunning   0.2174     0.2312   0.940   0.3538   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.316 on 34 degrees of freedom
## Multiple R-squared:  0.3412, Adjusted R-squared:  0.2443 
## F-statistic: 3.521 on 5 and 34 DF,  p-value: 0.01134
m3 <- lm(Diff ~ Treatment + Sex + Age + YearsRunning, d)
summary(m3)
## 
## Call:
## lm(formula = Diff ~ Treatment + Sex + Age + YearsRunning, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.1357 -2.6997 -0.0888  2.6587  9.2495 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    8.7568     4.2853   2.043  0.04859 * 
## TreatmentTrt  -2.9214     1.3623  -2.144  0.03902 * 
## SexM          -1.9411     1.4042  -1.382  0.17561   
## Age           -0.4030     0.1361  -2.962  0.00546 **
## YearsRunning   0.1953     0.2210   0.884  0.38281   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.263 on 35 degrees of freedom
## Multiple R-squared:  0.3384, Adjusted R-squared:  0.2628 
## F-statistic: 4.475 on 4 and 35 DF,  p-value: 0.00502

Try out two-way interactions

m4 <- lm(Diff ~ (Treatment + Sex + Age + YearsRunning)^2, d)
summary(m4)
## 
## Call:
## lm(formula = Diff ~ (Treatment + Sex + Age + YearsRunning)^2, 
##     data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.4535  -2.3466   0.2803   2.8236   7.5678 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                16.20581   10.81148   1.499   0.1447  
## TreatmentTrt               -1.14307   10.03606  -0.114   0.9101  
## SexM                      -11.81130    9.24726  -1.277   0.2116  
## Age                        -0.64857    0.38045  -1.705   0.0989 .
## YearsRunning               -0.47114    1.32721  -0.355   0.7252  
## TreatmentTrt:SexM           0.78669    3.03663   0.259   0.7974  
## TreatmentTrt:Age            0.01095    0.32721   0.033   0.9735  
## TreatmentTrt:YearsRunning  -0.37076    0.47263  -0.784   0.4391  
## SexM:Age                    0.20838    0.31040   0.671   0.5073  
## SexM:YearsRunning           0.65524    0.49104   1.334   0.1925  
## Age:YearsRunning            0.02005    0.04575   0.438   0.6644  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.47 on 29 degrees of freedom
## Multiple R-squared:  0.3972, Adjusted R-squared:  0.1894 
## F-statistic: 1.911 on 10 and 29 DF,  p-value: 0.08472
anova(m3,m4)
## Analysis of Variance Table
## 
## Model 1: Diff ~ Treatment + Sex + Age + YearsRunning
## Model 2: Diff ~ (Treatment + Sex + Age + YearsRunning)^2
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     35 636.13                           
## 2     29 579.56  6    56.567 0.4717 0.8236

Try out two-way interactions just with Treatment

m5 <- lm(Diff ~ Treatment * (Sex + Age + YearsRunning), d)
summary(m5)
## 
## Call:
## lm(formula = Diff ~ Treatment * (Sex + Age + YearsRunning), data = d)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.694 -2.573  0.175  2.569  9.120 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                7.97526    5.51692   1.446   0.1580  
## TreatmentTrt              -1.36073    9.18359  -0.148   0.8831  
## SexM                      -2.37947    1.97720  -1.203   0.2376  
## Age                       -0.41742    0.18083  -2.308   0.0276 *
## YearsRunning               0.40514    0.32876   1.232   0.2268  
## TreatmentTrt:SexM          1.13336    2.93781   0.386   0.7022  
## TreatmentTrt:Age           0.01985    0.29106   0.068   0.9461  
## TreatmentTrt:YearsRunning -0.40201    0.46181  -0.871   0.3905  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.396 on 32 degrees of freedom
## Multiple R-squared:  0.3567, Adjusted R-squared:  0.216 
## F-statistic: 2.535 on 7 and 32 DF,  p-value: 0.03406
anova(m3,m5)
## Analysis of Variance Table
## 
## Model 1: Diff ~ Treatment + Sex + Age + YearsRunning
## Model 2: Diff ~ Treatment * (Sex + Age + YearsRunning)
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     35 636.13                           
## 2     32 618.53  3    17.606 0.3036 0.8225

Check out before effect…

m6 <- lm(Diff ~ Treatment * Before + Sex + Age + YearsRunning, d)
summary(m6)
## 
## Call:
## lm(formula = Diff ~ Treatment * Before + Sex + Age + YearsRunning, 
##     data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5903 -2.2837  0.0454  2.2633  5.9872 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)   
## (Intercept)          2.74528    6.46998   0.424  0.67409   
## TreatmentTrt        17.88046    6.17775   2.894  0.00669 **
## Before               0.02157    0.01848   1.168  0.25134   
## SexM                -2.34018    1.23983  -1.888  0.06791 . 
## Age                 -0.37811    0.12299  -3.074  0.00422 **
## YearsRunning         0.27026    0.19245   1.404  0.16958   
## TreatmentTrt:Before -0.08811    0.02575  -3.422  0.00167 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.668 on 33 degrees of freedom
## Multiple R-squared:  0.5381, Adjusted R-squared:  0.4542 
## F-statistic: 6.408 on 6 and 33 DF,  p-value: 0.0001506
anova(m3,m6)
## Analysis of Variance Table
## 
## Model 1: Diff ~ Treatment + Sex + Age + YearsRunning
## Model 2: Diff ~ Treatment * Before + Sex + Age + YearsRunning
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1     35 636.13                                
## 2     33 444.08  2    192.06 7.1359 0.002658 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Try on log scale just for fun…

m7 <- lm(I(log(After) - log(Before))~ Treatment * Before + Sex + Age + YearsRunning, d)
summary(m7)
## 
## Call:
## lm(formula = I(log(After) - log(Before)) ~ Treatment * Before + 
##     Sex + Age + YearsRunning, data = d)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.031491 -0.008833 -0.000629  0.010658  0.033355 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)   
## (Intercept)          8.060e-03  2.899e-02   0.278  0.78268   
## TreatmentTrt         7.000e-02  2.768e-02   2.529  0.01639 * 
## Before               9.745e-05  8.277e-05   1.177  0.24749   
## SexM                -1.211e-02  5.554e-03  -2.180  0.03650 * 
## Age                 -1.643e-03  5.510e-04  -2.981  0.00536 **
## YearsRunning         1.469e-03  8.622e-04   1.704  0.09772 . 
## TreatmentTrt:Before -3.351e-04  1.153e-04  -2.906  0.00650 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01643 on 33 degrees of freedom
## Multiple R-squared:  0.491,  Adjusted R-squared:  0.3984 
## F-statistic: 5.305 on 6 and 33 DF,  p-value: 0.000636

OK with not including any, except Before score

summary(m6)
## 
## Call:
## lm(formula = Diff ~ Treatment * Before + Sex + Age + YearsRunning, 
##     data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5903 -2.2837  0.0454  2.2633  5.9872 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)   
## (Intercept)          2.74528    6.46998   0.424  0.67409   
## TreatmentTrt        17.88046    6.17775   2.894  0.00669 **
## Before               0.02157    0.01848   1.168  0.25134   
## SexM                -2.34018    1.23983  -1.888  0.06791 . 
## Age                 -0.37811    0.12299  -3.074  0.00422 **
## YearsRunning         0.27026    0.19245   1.404  0.16958   
## TreatmentTrt:Before -0.08811    0.02575  -3.422  0.00167 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.668 on 33 degrees of freedom
## Multiple R-squared:  0.5381, Adjusted R-squared:  0.4542 
## F-statistic: 6.408 on 6 and 33 DF,  p-value: 0.0001506
Anova(m6)
## Anova Table (Type II tests)
## 
## Response: Diff
##                  Sum Sq Df F value   Pr(>F)   
## Treatment         81.05  1  6.0228 0.019568 * 
## Before            34.44  1  2.5596 0.119158   
## Sex               47.94  1  3.5627 0.067914 . 
## Age              127.18  1  9.4508 0.004216 **
## YearsRunning      26.54  1  1.9720 0.169579   
## Treatment:Before 157.61  1 11.7123 0.001674 **
## Residuals        444.08 33                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(2,2)); plot(m6)

m6b <- lm(Diff ~ Treatment + Before + Sex + Age + YearsRunning, d)
par(mfrow=c(3,2)); crPlots(m6b)

Try out a smaller model with only significant results

m8 <- lm(Diff ~ Treatment*Before + Age, d)
summary(m8)
## 
## Call:
## lm(formula = Diff ~ Treatment * Before + Age, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.0497 -1.9712  0.3687  2.9460  6.8119 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)   
## (Intercept)          1.15010    6.17025   0.186  0.85321   
## TreatmentTrt        17.80564    6.42093   2.773  0.00884 **
## Before               0.02870    0.01892   1.517  0.13830   
## Age                 -0.35832    0.12075  -2.968  0.00538 **
## TreatmentTrt:Before -0.08722    0.02677  -3.259  0.00249 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.818 on 35 degrees of freedom
## Multiple R-squared:  0.4693, Adjusted R-squared:  0.4087 
## F-statistic: 7.738 on 4 and 35 DF,  p-value: 0.000141
Anova(m8)
## Anova Table (Type II tests)
## 
## Response: Diff
##                  Sum Sq Df F value   Pr(>F)   
## Treatment         73.69  1  5.0546 0.030961 * 
## Before            15.87  1  1.0884 0.303986   
## Age              128.39  1  8.8064 0.005384 **
## Treatment:Before 154.80  1 10.6179 0.002494 **
## Residuals        510.26 35                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m8, m6)
## Analysis of Variance Table
## 
## Model 1: Diff ~ Treatment * Before + Age
## Model 2: Diff ~ Treatment * Before + Sex + Age + YearsRunning
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     35 510.26                           
## 2     33 444.08  2    66.184 2.4591  0.101

Do some diagnostic plots

par(mfrow=c(2,2)); plot(m8)

m8b <- lm(Diff ~ Treatment + Before + Age, d)
par(mfrow=c(2,2)); crPlots(m8b)

OK, I believe this smaller model

Anova(m8)
## Anova Table (Type II tests)
## 
## Response: Diff
##                  Sum Sq Df F value   Pr(>F)   
## Treatment         73.69  1  5.0546 0.030961 * 
## Before            15.87  1  1.0884 0.303986   
## Age              128.39  1  8.8064 0.005384 **
## Treatment:Before 154.80  1 10.6179 0.002494 **
## Residuals        510.26 35                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m8)
## 
## Call:
## lm(formula = Diff ~ Treatment * Before + Age, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.0497 -1.9712  0.3687  2.9460  6.8119 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)   
## (Intercept)          1.15010    6.17025   0.186  0.85321   
## TreatmentTrt        17.80564    6.42093   2.773  0.00884 **
## Before               0.02870    0.01892   1.517  0.13830   
## Age                 -0.35832    0.12075  -2.968  0.00538 **
## TreatmentTrt:Before -0.08722    0.02677  -3.259  0.00249 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.818 on 35 degrees of freedom
## Multiple R-squared:  0.4693, Adjusted R-squared:  0.4087 
## F-statistic: 7.738 on 4 and 35 DF,  p-value: 0.000141