data checking

We fixed a couple data errors after discussing with the investigators

Also found one 37-year-old, decided to keep in.

d <- read.csv("halfmile-2016-03-21.csv")
d$Diff <- d$After - d$Before

make some exploratory plots

Scatterplot of before vs after, looks okay as all between 150 and 350, with changes relatively small between before and after

ggplot(d) + aes(Before, After) + geom_point()

Big scatterplot matrix, check for anything that looks odd; all looks basically okay.

pairs(d[,c("Age","Smoking", "Sex", "YearsRunning", "Before","After")])

Here’s boxplots of before and after by treatment; treatment is faster, all points look reasonable.

dm <- melt(d, id.vars=c("ID", "Treatment", "Sex"), measure.vars=c("Before", "After"), variable.name="Time")
ggplot(dm) + aes(Treatment, value) + geom_boxplot(outlier.size=0) + geom_point(position=position_jitter(width=0.2))+ facet_wrap(~Time)

Before to after lines for each individual, with averages in thicker lines, looking to make sure all changes feel reasonable too.

ggplot(dm) + aes(Time, value, group=ID, color=Treatment) + geom_point() + geom_line(alpha=0.5) + facet_wrap(~Sex) + geom_smooth(aes(group=Treatment), method="lm", lwd=2, se=FALSE)

Simple boxplots of difference by treatment; looks like treatment might have some effect, shortening the run times. Notice that both control and treatment got faster, on average.

ggplot(d) + aes(Treatment, Diff) + geom_boxplot(outlier.size=0) + geom_point(position=position_jitter(width=0.2))

Make that plot separate by covariates; see similar patterns

ggplot(d) + aes(Treatment, Diff) + facet_wrap(~Sex) + geom_boxplot(outlier.size=0) + geom_point(position=position_jitter(width=0.2))

check out years running, looks like more running, less room for improvement

ggplot(d) + aes(YearsRunning, Diff, color=Treatment) + geom_point() + facet_wrap(~Sex) + geom_smooth(method="lm")

check out smoking, not a lot of smokers!!

ggplot(d) + aes(YearsRunning, Diff, color=Smoking) + geom_point() + facet_wrap(~Sex) + geom_smooth(method="lm")

xtabs(~Smoking + Sex, d)
##        Sex
## Smoking  F  M
##     No  19 17
##     Yes  3  1

make a model and check some assumptions

first just a t-test

m1 <- t.test(Diff ~ Treatment, d)
m1
## 
##  Welch Two Sample t-test
## 
## data:  Diff by Treatment
## t = 2.2031, df = 37.553, p-value = 0.0338
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2664884 6.3335116
## sample estimates:
## mean in group Control     mean in group Trt 
##                  -1.1                  -4.4

now with a linear model

m2 <- lm(Diff ~ Treatment, d)
summary(m2)
## 
## Call:
## lm(formula = Diff ~ Treatment, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.600  -3.675   0.750   3.100  11.400 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    -1.100      1.059  -1.039   0.3056  
## TreatmentTrt   -3.300      1.498  -2.203   0.0337 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.737 on 38 degrees of freedom
## Multiple R-squared:  0.1133, Adjusted R-squared:  0.08993 
## F-statistic: 4.854 on 1 and 38 DF,  p-value: 0.03372
par(mfrow=c(2,2)); plot(m2)