We fixed a couple data errors after discussing with the investigators
Also found one 37-year-old, decided to keep in.
d <- read.csv("halfmile-2016-03-21.csv")
d$Diff <- d$After - d$Before
Scatterplot of before vs after, looks okay as all between 150 and 350, with changes relatively small between before and after
ggplot(d) + aes(Before, After) + geom_point()
Big scatterplot matrix, check for anything that looks odd; all looks basically okay.
pairs(d[,c("Age","Smoking", "Sex", "YearsRunning", "Before","After")])
Here’s boxplots of before and after by treatment; treatment is faster, all points look reasonable.
dm <- melt(d, id.vars=c("ID", "Treatment", "Sex"), measure.vars=c("Before", "After"), variable.name="Time")
ggplot(dm) + aes(Treatment, value) + geom_boxplot(outlier.size=0) + geom_point(position=position_jitter(width=0.2))+ facet_wrap(~Time)
Before to after lines for each individual, with averages in thicker lines, looking to make sure all changes feel reasonable too.
ggplot(dm) + aes(Time, value, group=ID, color=Treatment) + geom_point() + geom_line(alpha=0.5) + facet_wrap(~Sex) + geom_smooth(aes(group=Treatment), method="lm", lwd=2, se=FALSE)
Simple boxplots of difference by treatment; looks like treatment might have some effect, shortening the run times. Notice that both control and treatment got faster, on average.
ggplot(d) + aes(Treatment, Diff) + geom_boxplot(outlier.size=0) + geom_point(position=position_jitter(width=0.2))
Make that plot separate by covariates; see similar patterns
ggplot(d) + aes(Treatment, Diff) + facet_wrap(~Sex) + geom_boxplot(outlier.size=0) + geom_point(position=position_jitter(width=0.2))
check out years running, looks like more running, less room for improvement
ggplot(d) + aes(YearsRunning, Diff, color=Treatment) + geom_point() + facet_wrap(~Sex) + geom_smooth(method="lm")
check out smoking, not a lot of smokers!!
ggplot(d) + aes(YearsRunning, Diff, color=Smoking) + geom_point() + facet_wrap(~Sex) + geom_smooth(method="lm")
xtabs(~Smoking + Sex, d)
## Sex
## Smoking F M
## No 19 17
## Yes 3 1
first just a t-test
m1 <- t.test(Diff ~ Treatment, d)
m1
##
## Welch Two Sample t-test
##
## data: Diff by Treatment
## t = 2.2031, df = 37.553, p-value = 0.0338
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.2664884 6.3335116
## sample estimates:
## mean in group Control mean in group Trt
## -1.1 -4.4
now with a linear model
m2 <- lm(Diff ~ Treatment, d)
summary(m2)
##
## Call:
## lm(formula = Diff ~ Treatment, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.600 -3.675 0.750 3.100 11.400
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.100 1.059 -1.039 0.3056
## TreatmentTrt -3.300 1.498 -2.203 0.0337 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.737 on 38 degrees of freedom
## Multiple R-squared: 0.1133, Adjusted R-squared: 0.08993
## F-statistic: 4.854 on 1 and 38 DF, p-value: 0.03372
par(mfrow=c(2,2)); plot(m2)