\documentclass[11pt]{article} \usepackage{indentfirst} \usepackage{amsmath} \usepackage{natbib} \usepackage{url} \begin{document} \title{Cauchy Example for Le Cam Made Simple} \author{Charles J. Geyer} \maketitle \section{Introduction} We do a very simple one-parameter model for which everything is trivial, the Cauchy location model. This model is mildly notorious among theoreticians because the likelihood is multimodal and the multimodality does not go away as $n$ goes to infinity \citep{reeds}. It's not that there is any theory that leads one to expect that multimodality should go away as $n$ goes to infinity, but just that the Cauchy model is simple enough to analyze and in it one can see that indeed what theory doesn't lead one to expect actually doesn't happen. So let's make up some data. Since the model is translation invariant, it doesn't matter what we choose at the ``simulation truth'' parameter value <>= set.seed(42) n <- 30 theta0 <- 0 x <- theta0 + rcauchy(n) @ To do maximum likelihood using local optimization we need a good starting point for Newton's method. The obvious starting point for the Cauchy location model is the sample median, easily calculated, highly robust, guaranteed $\sqrt{n}$-consistent. Minus the log likelihood (minus because \verb@nlm@ does minimization rather than maximization) <>= mlogl <- function(theta, x) sum(- dcauchy(x, theta, log = TRUE)) @ and the MLE <>= out <- nlm(mlogl, median(x), typsize = n, hessian = TRUE, x = x) print(out) @ So our MLE, observed Fisher information (we will pretend we don't know how to calculate expected Fisher information), and asymptotic confidence interval calculated from them are <>= theta.hat <- out$estimate info <- out$hessian conf.level <- 0.95 crit <- qnorm((1 + conf.level) / 2) theta.hat + crit * c(-1, 1) / sqrt(info) @ Couldn't be simpler. \section{Single Bootstrap} By ``bootstrap'' we mean \emph{parametric bootstrap}, since we are working with maximum likelihood. <>= nboot <- 200 save.seed <- .Random.seed theta.star <- double(nboot) theta.star.code <- double(nboot) info.star <- double(nboot) for (iboot in 1:nboot) { x.star <- theta.hat + rcauchy(n) out <- nlm(mlogl, median(x.star), typsize = n, hessian = TRUE, x = x.star) theta.star[iboot] <- out$estimate theta.star.code[iboot] <- out$code info.star[iboot] <- out$hessian } all(theta.star.code <= 3) @ The pivotal quantity used to construct the confidence interval (or, more precisely, its bootstrap approximation) is <>= z.star <- (theta.star - theta.hat) * sqrt(info.star) @ Figure~\ref{fig:one} shows its Q-Q plot <>= qqnorm(z.star) abline(0, 1) @ and appears on p.~\pageref{fig:one}. \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Q-Q Plot of Bootstrap Approximation of Pivotal Quantity. Line has intercept zero and slope one.} \label{fig:one} \end{figure} It looks like we are in asymptopia. Thus there is no point in using the bootstrap distribution of the pivot rather than the normal approximation to calculate the confidence interval. <>= crit.star <- as.numeric(quantile(z.star, probs = c(0.025, 0.975))) theta.hat - rev(crit.star) / sqrt(info) theta.hat + crit * c(-1, 1) / sqrt(info) @ Pretty much the same thing either way. We should note that with this model there is no point to a double bootstrap. Since the model is translation invariant, the single bootstrap sampling distribution is pivotal, and there is no work for the double bootstrap to do. \section{One-Step Newton} For this section, we'll assume we do know the formula for the Cauchy log likelihood and derivatives <>= logl <- expression(- log(1 + (x - theta)^2)) scor <- D(logl, "theta") hess <- D(scor, "theta") print(scor) print(hess) mloglfun <- function(theta, x) sum(- eval(logl)) gradientfun <- function(theta, x) sum(- eval(scor)) hessianfun <- function(theta, x) sum(- eval(hess)) @ So we redo the bootstrap and this time also do one-step Newton <>= objfun <- function(theta, x) { result <- mloglfun(theta, x) attr(result, "gradient") <- gradientfun(theta, x) return(result) } .Random.seed <- save.seed theta.star2 <- double(nboot) theta.star2.code <- double(nboot) theta.start <- double(nboot) theta.onestep <- double(nboot) for (iboot in 1:nboot) { x.star <- theta.hat + rcauchy(n) theta.start[iboot] <- median(x.star) out <- nlm(objfun, theta.start[iboot], typsize = n, x = x.star) theta.star2[iboot] <- out$estimate theta.star2.code[iboot] <- out$code theta.onestep[iboot] <- theta.start[iboot] - gradientfun(theta.start[iboot], x.star) / hessianfun(theta.start[iboot], x.star) } max(abs(theta.star2 - theta.star)) all(theta.star2.code <= 3) @ Figure~\ref{fig:two} compares the two estimators <>= err.start <- theta.start - theta.star err.onestep <- theta.onestep - theta.star plot(err.start, err.onestep, xlab = "error of starting point (sample median)", ylab = "error of one-step Newton estimator") @ and appears on p.~\pageref{fig:two}. \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Comparison of error of starting point for Newton's method (sample median) and error of one-step Newton estimator, where ``error'' means difference from MLE.} \label{fig:two} \end{figure} It looks like we are not yet in asymptopia, though perhaps pretty close. For most of the bootstrap samples Newton's method nearly converges in one step (those in the black clump along the horizontal axis). A sizeable fraction are not in the clump, however, even for points not in the clump the one-step error is less than a tenth the starting error (except for \Sexpr{sum(abs(err.onestep) > 0.1 * abs(err.start))} points out of \Sexpr{length(err.onestep)}). \section{Quadraticity} In this section we directly investigate quadraticity over the region containing a certain fraction of the MLE and starting points for Newton's method <>= alpha <- 0.05 foo <- quantile(theta.start, probs = c(alpha / 4, 1 - alpha / 4)) bar <- quantile(theta.star2, probs = c(alpha / 4, 1 - alpha / 4)) baz <- pretty(c(foo, bar)) print(baz) .Random.seed <- save.seed hessians <- matrix(NaN, nboot, length(baz)) for (iboot in 1:nboot) { x.star <- theta.hat + rcauchy(n) for (j in seq(along = baz)) hessians[iboot, j] <- hessianfun(baz[j], x.star) } hess.max <- apply(hessians, 1, max) hess.min <- apply(hessians, 1, min) hess.med <- apply(hessians, 1, median) @ Figure~\ref{fig:four} compares the two estimators <>= plot(c(hess.med, hess.med), c(hess.max, hess.min), type = "n", xlab = "median value of Hessian", ylab = "min and max value of Hessian") segments(hess.med, hess.min, hess.med, hess.max) @ and appears on p.~\pageref{fig:four}. \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Constancy of the Hessian as a function of the parameter. Lines connect the minimum and maximum value of the Hessian for a particular bootstrap data set at the set of points $\Sexpr{paste(baz, collapse = "$, $")}$. The abscissa of the line is the median value of the Hessian at these points.} \label{fig:four} \end{figure} This plot looks like we are really not yet in asymptopia. There is a lot of variability of the Hessian over the range. \section{Discussion} One can experiment with this simulation in various ways. \begin{itemize} \item One can increase the sample size \verb@n@ set in the first code chunk and rerun. We have done this and indeed Figures~\ref{fig:two} and Figure~\ref{fig:four} do indeed look more ``in asymptopia'' as \verb@n@ increases. \item One can increase the dimension of the parameter space, going to a Cauchy location-scale model. (Exercise: What is a good starting point for Newton's method in this case? The median is still a good estimate of location. What scale estimate does one use?) \item One can move outside the class of location-scale families to a family for which the single bootstrap is not exact. Then it is possible to investigate \begin{itemize} \item How close the distribution of the Hessian is to \emph{invariant in law}. \item How necessary (or unnecessary) double bootstrap correction of the single bootstrap is. \end{itemize} \end{itemize} This we see that this example is too simple to illustrate all aspects of the ``Le Cam Made Simple'' view of asymptotics, but the aspects it does illustrate are instructive. \bibliographystyle{annals} \bibliography{simple} \end{document}