tl;dr

Over the course of a long career, I have done a lot of things with exponential families. They are not obviously related, but all share the fundamental toolkit of exponential family theory (which is a lot more than what is taught in 8111—8112). Together they make an interesting course that touches many areas of statistics

existence and uniqueness of maximum likelihood estimates (MLE),
hypothesis tests and confidence intervals when MLE do not exist,
fuzzy tests and confidence intervals,
spatial statistics (Markov lattice process and Markov point processes),
network statistics (Markov graph models),
life history analysis (aster models), and
inequality-constrained statistical inference.

There are many interesting open research questions in these areas.

Announcement

Charlie, some day you will learn that not everything is an exponential family.

— Elizabeth A. Thompson (when she was my thesis advisor, or perhaps earlier when I was her RA, or perhaps even earlier when she was my teacher in the 580’s (equivalent of 8111–8112 here))

I still hadn’t really learned that lesson by the time I finished my PhD thesis (Geyer, 1990), although I did learn it later (Geyer, 1994 a, 1994 b, 2013).

Nevertheless, exponential families have remained important in my work because they have many properties that do not generalize to other statistical models and that allow procedures that also do not generalize.

In regular full exponential exponential families (or even closed convex exponential families) the problem of when maximum likelihood estimates (MLE) exist is a completely solved problem. And when MLE do not exist (in the exponential family), they may exist in the Barndorff-Nielsen completion of the family. Completion is also well understood (Eck and Geyer, 2021; Geyer, 1990, 2009, 2023). Hypothesis tests and confidence intervals when the MLE is in the completion are also well understood (Geyer, 2009).

Explaining all of this to naive users is not well understood.

There is no analogue of this theory for general statistical models.
As you all learned in 8111, UMP (uniformly most powerful) one-tailed tests exist for statistical models having the monotone likelihood ratio property (a generalization of exponential families). But UMPU (UMP unbiased) two-tailed tests exist only for one-parameter exponential families. UMP and UMPU tests can also be derived by conditioning from some multiparameter families, for example, comparing two independent Poisson random variables \(X\) and \(Y\), the conditional distribution of \(X\) given \(X + Y\) is binomial and gives UMP and UMPU tests. Comparing two independent binomial random variables \(X\) and \(Y\), the conditional distribution of \(X\) given \(X + Y\) is the noncentral hypergeometric distribution (the exponential family generated by the hypergeometric distribution), not something studied elsewhere in statistics. Comparing two independent negative binomial random variables leads to the noncentral negative hypergeometric distribution.

Geyer and Meeden (2005) reinterpret classical UMP and UMPU theory, replacing realized randomized (simulate a random variable) with abstract randomized (describe the distribution of a random variable) and fuzzy (interpret as a fuzzy set).

Fuzzy hypothesis tests and confidence intervals can be derived from any randomized test procedure (Thompson and Geyer, 2007), but UMP and UMPU are, of course, optimal.
In spatial statistics and network statistics and statistical genetics, there is no reason why exponential families are required, but, as everywhere else in statistics, exponential family models are so much better behaved that they are the most used in these areas. When the models are so complicated in some ways, just because of the nature of complicated dependence, it helps that the models are simple in other ways (exponential family). Most of the models fit by R packages spatstat (Baddeley et al., 2023) and ergm (Handcock et al., 2023) are exponential family models. Geyer (1990), Geyer and Thompson (1992), Geyer (1994 b), Geyer and Møller (1994), Geyer (1999), Okabayashi and Geyer (2012), and Okabayashi (2011) are about doing maximum likelihood estimation in such models using Markov chain Monte Carlo (MCMC) and also doing likelihood-based hypothesis tests and confidence intervals.
Aster models (Geyer, 2021; Geyer et al., 2013, 2007; Shaw et al., 2008) are models for life history analysis (like survival analysis, but also accounting for things happening conditional on survival). In biology, life history analysis is important for estimating Darwinian fitness, which involves survival and reproduction, in wild populations. Aster models are like generalized linear models except
- components of the response vector may come from different families (some Bernoulli, some Poisson, some zero-truncated Poisson, some negative binomial, some zero-truncated negative binomial, some multinomial, some normal),
- components of response vector can be dependent following a simple graphical model (having the predecessor is sample size property), and
- arrows in the graph correspond to conditional distributions of components of the response vector given other components and only canonical link functions are allowed so these conditional distributions are exponential family.
Hence aster models are generalized generalized linear models G\(^2\)LM. They also generalize
- survival analysis,
- life table analysis,
- multinomial response regression,
- zero-inflated Poisson regression, and
- zero-inflated negative binomial regression
(parts of an aster model can look like any of these).

Aster models are also regular full exponential family models, but the unconditional canonical parameterization (the canonical parameterization of the joint distribution of the aster model) is very different from the conditional canonical parameterization (the canonical parameterizations of the conditional distributions of components of the response vector given other components). The mapping between these two parameterizations, called the aster transform, is too complicated for users to understand, but there is a recursive formula the computer can carry out.

Explaining all of this to naive users is not well understood.
So aster models are one way that exponential family models can be very complicated. Spatial statistics is another. There the map from canonical to mean value parameters cannot be computed exactly, MCMC must be used (literature cited above).

It follows that we need ways to understand exponential family statistical models that do not involve specific formulas (link functions, for example) that may not exist. Thus we emphasize
- the observed-equals-expected property of maximum likelihood in regular full exponential families,
- the sufficient-dimension-reduction property of canonical affine submodels of exponential families,
- the maximum entropy property of exponential families, and
- the multivariate monotonicity of the mapping from canonical to mean-value parameters in a regular full exponential family.
Non-regular exponential families (Geyer and Møller, 1994) and closed convex exponential families (Geyer, 1991) require the use of inequality-constrained maximum likelihood and the asymptotics thereof (Geyer, 1994 a).

Thus there is a lot more to exponential family theory than you learned in 8111 and 8112. Enough for an interesting special topics course with lots of open research questions.

Bibliography

Baddeley, A., Turner, R. and Rubak, E. (2023) R Package spatstat: Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests, Version 3.0-6. Available at: https://cran.r-project.org/package=spatstat.

Eck, D. J. and Geyer, C. J. (2021) Computationally efficient likelihood inference in exponential families when the maximum likelihood estimator does not exist. Electronic Journal of Statistics, 15, 2105–2156. DOI: 10.1214/21-EJS1815.

Geyer, C. J. (1990) Likelihood and exponential families. PhD thesis. University of Washington. Available at: https://purl.umn.edu/56330.

Geyer, C. J. (1991) Constrained maximum likelihood exemplified by isotonic convex logistic regression. Journal of the American Statistical Association, 86, 717–724. DOI: 10.1080/01621459.1991.10475100.

Geyer, C. J. (1994a) On the asymptotics of constrained \(M\)-estimation. Annals of Statistics, 22, 1993–2010. DOI: 10.1214/aos/1176325768.

Geyer, C. J. (1994b) On the convergence of Monte Carlo maximum likelihood calculations. Journal of the Royal Statistical Society, Series B, 56, 261–274. DOI: 10.1111/j.2517-6161.1994.tb01976.x.

Geyer, C. J. (1999) Likelihood inference for spatial point processes. In Stochastic Geometry: Likelihood and Computation (eds O. E. Barndorff-Nielsen, W. S. Kendall, and M. N. M. van Lieshout), pp. 79–140. Boca Raton, FL: Chapman & Hall/CRC.

Geyer, C. J. (2009) Likelihood inference in exponential families and directions of recession. Electronic Journal of Statistics, 3, 259–289. DOI: 10.1214/08-EJS349.

Geyer, C. J. (2013) Asymptotics of maximum likelihood without the LLN or CLT or sample size going to infinity. In Advances in Modern Statistical Theory and Applications: A Festschrift in Honor of Morris L. Eaton (eds G. L. Jones and X. Shen), pp. 1–24. Hayward, CA: Institute of Mathematical Statistics. DOI: 10.1214/12-IMSCOLL1001.

Geyer, C. J. (2021) R Package aster: Aster Models, Version 1.1-2. Available at: https://cran.r-project.org/package=aster.

Geyer, C. J. (2023) R Package llmdr: Log-Linear Models Done Right, Version 0.1. Available at: https://github.com/cjgeyer/llmdr/.

Geyer, C. J. and Meeden, G. D. (2005) Fuzzy and randomized confidence intervals and \(P\)-values (with discussion). Statistical Science, 20, 358–387. DOI: 10.1214/088342305000000340.

Geyer, C. J. and Møller, J. (1994) Simulation procedures and likelihood inference for spatial point processes. Scandinavian Journal of Statistics, 21, 359–373. Available at: https://www.jstor.org/stable/4616323.

Geyer, C. J. and Thompson, E. A. (1992) Constrained Monte Carlo maximum likelihood for dependent data (with discussion). Journal of the Royal Statistical Society, Series B, 54, 657–699. DOI: 10.1111/j.2517-6161.1992.tb01443.x.

Geyer, C. J., Wagenius, S. and Shaw, R. G. (2007) Aster models for life history analysis. Biometrika, 94, 415–426. DOI: 10.1093/biomet/asm030.

Geyer, C. J., Ridley, C. E., Latta, R. G., et al. (2013) Local adaptation and genetic effects on fitness: Calculations for exponential family models with random effects. Annals of Applied Statistics, 7, 1778–1795. DOI: 10.1214/13-AOAS653.

Handcock, M. S., Hunter, D. R., Butts, C. T., et al. (2023) R Package ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks, Version 4.5.0. Available at: https://cran.r-project.org/package=ergm.

Okabayashi, S. (2011) Parameter estimation in social network models. PhD thesis. University of Minnesota. Available at: https://hdl.handle.net/11299/108208.

Okabayashi, S. and Geyer, C. J. (2012) Long range search for maximum likelihood in exponential families. Electronic Journal of Statistics, 6, 123–147. DOI: 10.1214/11-EJS664.

Shaw, R. G., Geyer, C. J., Wagenius, S., et al. (2008) Unifying life history analysis for inference of fitness and population growth. American Naturalist, 172, E35–E47. DOI: 10.1086/588063.

Thompson, E. A. and Geyer, C. J. (2007) Fuzzy \(p\)-values in latent variable problems. Biometrika, 94, 49–60. DOI: 10.1093/biomet/asm001.

Stat 8931, Spring 2024, Course Announcement

Charles J. Geyer

December 13, 2023

tl;dr

Announcement

Bibliography