We fixed a couple data errors after discussing with the investigators
Also found one 37-year-old, decided to keep in.
d <- read.csv("halfmile-2016-03-21.csv")
d$Diff <- d$After - d$Before
Want to plot all the variables and as many relationships as possible, to check for anything that looks odd and to build intuition about the data set and the relationships of interest.
Big scatterplot matrix of continuous variables, with color for sex, check for anything that looks odd; all looks basically okay. (so this has all but smoking)
pairs(d[,c("Age","YearsRunning", "Before","After", "Diff")], col=d$Sex)
Here’s boxplots of before and after by treatment; treatment is faster, all points look reasonable.
dm <- melt(d, id.vars=c("ID", "Treatment", "Sex"), measure.vars=c("Before", "After"), variable.name="Time")
ggplot(dm) + aes(Treatment, value) + geom_boxplot(outlier.size=0) + geom_point(position=position_jitter(width=0.2))+ facet_wrap(~Time)
Before to after lines for each individual, with averages in thicker lines, looking to make sure all changes feel reasonable too.
ggplot(dm) + aes(Time, value, group=ID, color=Treatment) + geom_point() + geom_line(alpha=0.5) + facet_wrap(~Sex) + geom_smooth(aes(group=Treatment), method="lm", lwd=2, se=FALSE)
Simple boxplots of difference by treatment; looks like treatment might have some effect, shortening the run times. Notice that both control and treatment got faster, on average.
ggplot(d) + aes(Treatment, Diff) + geom_boxplot(outlier.size=0) + geom_point(position=position_jitter(width=0.2))
Make that plot separate by covariates; see similar patterns. Notice not too many smokers!
ggplot(d) + aes(Treatment, Diff) + facet_grid(Sex~Smoking) + geom_boxplot(outlier.size=0) + geom_point(position=position_jitter(width=0.2))
xtabs(~Smoking + Sex, d)
## Sex
## Smoking F M
## No 19 17
## Yes 3 1
check out years running, looks like more running, less room for improvement
ggplot(d) + aes(YearsRunning, Diff, color=Treatment) + geom_point() + facet_wrap(~Sex) + geom_smooth(method="lm")
check out age
ggplot(d) + aes(Age, Diff, color=Treatment) + geom_point() + geom_smooth(method="lm")
check out before
ggplot(d) + aes(Before, Diff, color=Treatment) + geom_point() +
geom_smooth(method="lm")
first just a t-test
m1 <- t.test(Diff ~ Treatment, d)
m1
##
## Welch Two Sample t-test
##
## data: Diff by Treatment
## t = 2.2031, df = 37.553, p-value = 0.0338
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.2664884 6.3335116
## sample estimates:
## mean in group Control mean in group Trt
## -1.1 -4.4
now with a linear model
m2 <- lm(Diff ~ Treatment, d)
summary(m2)
##
## Call:
## lm(formula = Diff ~ Treatment, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.600 -3.675 0.750 3.100 11.400
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.100 1.059 -1.039 0.3056
## TreatmentTrt -3.300 1.498 -2.203 0.0337 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.737 on 38 degrees of freedom
## Multiple R-squared: 0.1133, Adjusted R-squared: 0.08993
## F-statistic: 4.854 on 1 and 38 DF, p-value: 0.03372
par(mfrow=c(2,2)); plot(m2)
m3a <- lm(Diff ~ Treatment + Sex + Age + Smoking + YearsRunning, d)
summary(m3a)
##
## Call:
## lm(formula = Diff ~ Treatment + Sex + Age + Smoking + YearsRunning,
## data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.1760 -2.7348 -0.1381 2.6104 9.1833
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.6915 4.3422 2.002 0.0534 .
## TreatmentTrt -2.9278 1.3794 -2.123 0.0412 *
## SexM -2.0315 1.4416 -1.409 0.1679
## Age -0.4007 0.1379 -2.906 0.0064 **
## SmokingYes -0.8979 2.3727 -0.378 0.7075
## YearsRunning 0.2174 0.2312 0.940 0.3538
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.316 on 34 degrees of freedom
## Multiple R-squared: 0.3412, Adjusted R-squared: 0.2443
## F-statistic: 3.521 on 5 and 34 DF, p-value: 0.01134
m3 <- lm(Diff ~ Treatment + Sex + Age + YearsRunning, d)
summary(m3)
##
## Call:
## lm(formula = Diff ~ Treatment + Sex + Age + YearsRunning, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.1357 -2.6997 -0.0888 2.6587 9.2495
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.7568 4.2853 2.043 0.04859 *
## TreatmentTrt -2.9214 1.3623 -2.144 0.03902 *
## SexM -1.9411 1.4042 -1.382 0.17561
## Age -0.4030 0.1361 -2.962 0.00546 **
## YearsRunning 0.1953 0.2210 0.884 0.38281
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.263 on 35 degrees of freedom
## Multiple R-squared: 0.3384, Adjusted R-squared: 0.2628
## F-statistic: 4.475 on 4 and 35 DF, p-value: 0.00502
Try out two-way interactions
m4 <- lm(Diff ~ (Treatment + Sex + Age + YearsRunning)^2, d)
summary(m4)
##
## Call:
## lm(formula = Diff ~ (Treatment + Sex + Age + YearsRunning)^2,
## data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.4535 -2.3466 0.2803 2.8236 7.5678
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.20581 10.81148 1.499 0.1447
## TreatmentTrt -1.14307 10.03606 -0.114 0.9101
## SexM -11.81130 9.24726 -1.277 0.2116
## Age -0.64857 0.38045 -1.705 0.0989 .
## YearsRunning -0.47114 1.32721 -0.355 0.7252
## TreatmentTrt:SexM 0.78669 3.03663 0.259 0.7974
## TreatmentTrt:Age 0.01095 0.32721 0.033 0.9735
## TreatmentTrt:YearsRunning -0.37076 0.47263 -0.784 0.4391
## SexM:Age 0.20838 0.31040 0.671 0.5073
## SexM:YearsRunning 0.65524 0.49104 1.334 0.1925
## Age:YearsRunning 0.02005 0.04575 0.438 0.6644
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.47 on 29 degrees of freedom
## Multiple R-squared: 0.3972, Adjusted R-squared: 0.1894
## F-statistic: 1.911 on 10 and 29 DF, p-value: 0.08472
anova(m3,m4)
## Analysis of Variance Table
##
## Model 1: Diff ~ Treatment + Sex + Age + YearsRunning
## Model 2: Diff ~ (Treatment + Sex + Age + YearsRunning)^2
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 35 636.13
## 2 29 579.56 6 56.567 0.4717 0.8236
Try out two-way interactions just with Treatment
m5 <- lm(Diff ~ Treatment * (Sex + Age + YearsRunning), d)
summary(m5)
##
## Call:
## lm(formula = Diff ~ Treatment * (Sex + Age + YearsRunning), data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.694 -2.573 0.175 2.569 9.120
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.97526 5.51692 1.446 0.1580
## TreatmentTrt -1.36073 9.18359 -0.148 0.8831
## SexM -2.37947 1.97720 -1.203 0.2376
## Age -0.41742 0.18083 -2.308 0.0276 *
## YearsRunning 0.40514 0.32876 1.232 0.2268
## TreatmentTrt:SexM 1.13336 2.93781 0.386 0.7022
## TreatmentTrt:Age 0.01985 0.29106 0.068 0.9461
## TreatmentTrt:YearsRunning -0.40201 0.46181 -0.871 0.3905
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.396 on 32 degrees of freedom
## Multiple R-squared: 0.3567, Adjusted R-squared: 0.216
## F-statistic: 2.535 on 7 and 32 DF, p-value: 0.03406
anova(m3,m5)
## Analysis of Variance Table
##
## Model 1: Diff ~ Treatment + Sex + Age + YearsRunning
## Model 2: Diff ~ Treatment * (Sex + Age + YearsRunning)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 35 636.13
## 2 32 618.53 3 17.606 0.3036 0.8225
Check out before effect…
m6 <- lm(Diff ~ Treatment * Before + Sex + Age + YearsRunning, d)
summary(m6)
##
## Call:
## lm(formula = Diff ~ Treatment * Before + Sex + Age + YearsRunning,
## data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5903 -2.2837 0.0454 2.2633 5.9872
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.74528 6.46998 0.424 0.67409
## TreatmentTrt 17.88046 6.17775 2.894 0.00669 **
## Before 0.02157 0.01848 1.168 0.25134
## SexM -2.34018 1.23983 -1.888 0.06791 .
## Age -0.37811 0.12299 -3.074 0.00422 **
## YearsRunning 0.27026 0.19245 1.404 0.16958
## TreatmentTrt:Before -0.08811 0.02575 -3.422 0.00167 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.668 on 33 degrees of freedom
## Multiple R-squared: 0.5381, Adjusted R-squared: 0.4542
## F-statistic: 6.408 on 6 and 33 DF, p-value: 0.0001506
anova(m3,m6)
## Analysis of Variance Table
##
## Model 1: Diff ~ Treatment + Sex + Age + YearsRunning
## Model 2: Diff ~ Treatment * Before + Sex + Age + YearsRunning
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 35 636.13
## 2 33 444.08 2 192.06 7.1359 0.002658 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Try on log scale just for fun…
m7 <- lm(I(log(After) - log(Before))~ Treatment * Before + Sex + Age + YearsRunning, d)
summary(m7)
##
## Call:
## lm(formula = I(log(After) - log(Before)) ~ Treatment * Before +
## Sex + Age + YearsRunning, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.031491 -0.008833 -0.000629 0.010658 0.033355
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.060e-03 2.899e-02 0.278 0.78268
## TreatmentTrt 7.000e-02 2.768e-02 2.529 0.01639 *
## Before 9.745e-05 8.277e-05 1.177 0.24749
## SexM -1.211e-02 5.554e-03 -2.180 0.03650 *
## Age -1.643e-03 5.510e-04 -2.981 0.00536 **
## YearsRunning 1.469e-03 8.622e-04 1.704 0.09772 .
## TreatmentTrt:Before -3.351e-04 1.153e-04 -2.906 0.00650 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01643 on 33 degrees of freedom
## Multiple R-squared: 0.491, Adjusted R-squared: 0.3984
## F-statistic: 5.305 on 6 and 33 DF, p-value: 0.000636
OK with not including any, except Before score
summary(m6)
##
## Call:
## lm(formula = Diff ~ Treatment * Before + Sex + Age + YearsRunning,
## data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5903 -2.2837 0.0454 2.2633 5.9872
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.74528 6.46998 0.424 0.67409
## TreatmentTrt 17.88046 6.17775 2.894 0.00669 **
## Before 0.02157 0.01848 1.168 0.25134
## SexM -2.34018 1.23983 -1.888 0.06791 .
## Age -0.37811 0.12299 -3.074 0.00422 **
## YearsRunning 0.27026 0.19245 1.404 0.16958
## TreatmentTrt:Before -0.08811 0.02575 -3.422 0.00167 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.668 on 33 degrees of freedom
## Multiple R-squared: 0.5381, Adjusted R-squared: 0.4542
## F-statistic: 6.408 on 6 and 33 DF, p-value: 0.0001506
Anova(m6)
## Anova Table (Type II tests)
##
## Response: Diff
## Sum Sq Df F value Pr(>F)
## Treatment 81.05 1 6.0228 0.019568 *
## Before 34.44 1 2.5596 0.119158
## Sex 47.94 1 3.5627 0.067914 .
## Age 127.18 1 9.4508 0.004216 **
## YearsRunning 26.54 1 1.9720 0.169579
## Treatment:Before 157.61 1 11.7123 0.001674 **
## Residuals 444.08 33
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(2,2)); plot(m6)
m6b <- lm(Diff ~ Treatment + Before + Sex + Age + YearsRunning, d)
par(mfrow=c(3,2)); crPlots(m6b)
Try out a smaller model with only significant results
m8 <- lm(Diff ~ Treatment*Before + Age, d)
summary(m8)
##
## Call:
## lm(formula = Diff ~ Treatment * Before + Age, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0497 -1.9712 0.3687 2.9460 6.8119
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.15010 6.17025 0.186 0.85321
## TreatmentTrt 17.80564 6.42093 2.773 0.00884 **
## Before 0.02870 0.01892 1.517 0.13830
## Age -0.35832 0.12075 -2.968 0.00538 **
## TreatmentTrt:Before -0.08722 0.02677 -3.259 0.00249 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.818 on 35 degrees of freedom
## Multiple R-squared: 0.4693, Adjusted R-squared: 0.4087
## F-statistic: 7.738 on 4 and 35 DF, p-value: 0.000141
Anova(m8)
## Anova Table (Type II tests)
##
## Response: Diff
## Sum Sq Df F value Pr(>F)
## Treatment 73.69 1 5.0546 0.030961 *
## Before 15.87 1 1.0884 0.303986
## Age 128.39 1 8.8064 0.005384 **
## Treatment:Before 154.80 1 10.6179 0.002494 **
## Residuals 510.26 35
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m8, m6)
## Analysis of Variance Table
##
## Model 1: Diff ~ Treatment * Before + Age
## Model 2: Diff ~ Treatment * Before + Sex + Age + YearsRunning
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 35 510.26
## 2 33 444.08 2 66.184 2.4591 0.101
Do some diagnostic plots
par(mfrow=c(2,2)); plot(m8)
m8b <- lm(Diff ~ Treatment + Before + Age, d)
par(mfrow=c(2,2)); crPlots(m8b)
OK, I believe this smaller model
Anova(m8)
## Anova Table (Type II tests)
##
## Response: Diff
## Sum Sq Df F value Pr(>F)
## Treatment 73.69 1 5.0546 0.030961 *
## Before 15.87 1 1.0884 0.303986
## Age 128.39 1 8.8064 0.005384 **
## Treatment:Before 154.80 1 10.6179 0.002494 **
## Residuals 510.26 35
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m8)
##
## Call:
## lm(formula = Diff ~ Treatment * Before + Age, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0497 -1.9712 0.3687 2.9460 6.8119
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.15010 6.17025 0.186 0.85321
## TreatmentTrt 17.80564 6.42093 2.773 0.00884 **
## Before 0.02870 0.01892 1.517 0.13830
## Age -0.35832 0.12075 -2.968 0.00538 **
## TreatmentTrt:Before -0.08722 0.02677 -3.259 0.00249 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.818 on 35 degrees of freedom
## Multiple R-squared: 0.4693, Adjusted R-squared: 0.4087
## F-statistic: 7.738 on 4 and 35 DF, p-value: 0.000141