A Social Network



The marriage network of sixteen prominent families in Florence around the 1430s.
There was a struggle for political control of the city dominated by two factions: one revolved around the Medicis and the other around the Strozzis.

A social network perspective will view the decision of whom to marry as not just a function of the family-specific attributes of the potential spouse (such as wealth) but also consider the other marriages that are present.

The data is originally from Padgett (1994) and made available and plotted in R using the ergm package (Handcock et. al, 2010).

Thesis

"Parameter Estimation in Social Network Models"
Advisor: Charles J. Geyer

Motivated by my interest in social networks, I am researching model fitting methods for exponential families with complex dependence

Complex dependence in the context of a social network means that the response variables--whether or not a relation forms between pairs of individuals--can in fact depend on one another.  Capturing this dependency is crucial in a friendship network because people do not choose their friends in isolation.  For example, a friend of a friend is much more likely to become a friend.  Although seemingly obvious, this dependency is difficult to model statistically.  Traditional methods like regression analysis would assume that friendship choices are independent of one another.

Friendship formation is only one example of a phenomena with complex dependence. Other examples include: transmission of diseases through a group of individuals, interaction of proteins, trade between countries,  interaction of neightboring atoms in a lattice (Ising model), voting behavior of senators, DNA fingerprinting, and plant growth in neighboring plots. 

Despite the complexity of the underlying phenomena, the mathematical expression for modeling them is quite simple. The challenge is actually in fitting these models to data. More specifically (and technically), this means finding the parameter values called maximum likelihood estimates (MLE) that "best" fit the model to the observed dataset. In my research, we devised a new algorithm for finding these MLEs that addresses shortcomings of existing approaches including cases where:
  • Initial guess for parameter is far away from the best fit value. The dependency issue necessitates an iterated algorithm, with repeated (and improving) guesses for the parameters. For the most commonly used algorithms--Newton-Raphson, stochastic approximation, Markov chain Monte Carlo Maximum likelihood (MCMCML)--a poor initial starting point may mean these algorithms will not converge in practice to the MLE. Our algorithm is specifically designed to work from a long range. We are in the process of revising and resubmitting a paper about this for the Eletronic Journal of Statistics (details below).

  • Non-existent MLEs. It is possible in certain scenarios that the "best" fit does not actually exist. Any conventional method that searches for the MLE assuming it exists (typically assumed) will fail. We expand on work by Geyer (2009) to find a limiting model in such a case that still makes statistical inference--confidence intervals, hypothesis testing--possible. Writing this into a paper for publication is one of my summer projects.

In 2011, my thesis was nominated by the Department of Statistics for the Graduate School's Best Disseration Award.  Here are the slides (warning:12.7 MB) from the public portion of my PhD defense. The intended audience is someone with a background in probability theory although there are few formulas.  Also, here is a poster I prepared for a conference last summer that focuses on the main component of our algorithm (first part of my dissertation research).

Publications

Okabayashi, S. and Geyer, C. J.  Long Range Search for Maximum Likelihood in Exponential Families.  Revised and resubmitted to Electronic Journal of Statistics.
Okabayashi, S., Johnson, L., and Geyer, C. J. (2011). Extending Pseudo-likelihood for Potts ModelsStatistica Sinica 21 331-347.

Academic Presentations

Mostly Markov Chains Working Group
Non-existent MLEs in Exponential Random Graph Models. 
December 2010.
Minneapolis, MN
2010 Bayesian Modeling & Computation for Social Networks
Long Range Search for Maximum Likelihood in Exponential Families. 
June 2010.
Whistler, BC
Mostly Markov Chains Working Group
A Composite Likelihood Extending Pseudo-likelihood for Potts Models. 
WIth Leif Johnson.  October 2009.
Minneapolis, MN
2009 Joint Statistical Meetings
A Simple Algorithm for Maximum Likelihood in Exponential Families that
Uses Only Gradients.
August 2009.
Washington, DC
2009 Midwest Statsitscs Research Colloquium
A Simple Algorithm for Maximum Likelihood in Exponential Families that
Uses Only Gradients.

March 2009.
Chicago, IL
Mostly Markov Chains Working Group
Social Networks and Markov Chain Monte Carlo. 
May 2008.
Minneapolis, MN

References

Padgett, J. F. (1994). Marriage and Elite Structure in Renaissance Florence,
1282-1500. Ph.D. thesis, Paper delivered to the Social Science History Association.
Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, P. N., Steven
M. Krivitsky, and Morris, M. (2010). ergm: A package to fit, simulate, and
diagnose exponentail-family models for networks. Version 2.2-7. Project home page
at http://statnetproject.org, URL http://CRAN.R-project.org/package=ergm.