Statistics 5421 (Geyer, Fall 2020)

Announcements

New! Homework 5 solutions posted. User name and password given out in class Mon Sep 28. Subject to revision during grading.

New! Class recordings up to date through Wednesday, December 16.

New! New lecture note on maximum likelihood estimates at infinity.

New! Changed one word in problem 5-3. Calculate their mean values becomes Calculate their expected values.

New! New lecture note on how to draw a graph in R.

New! Homework assignment 5 posted. Assignment upload link and discussion group have been made in Canvas.

New! Version 0.4 of R package glmbb has a bug that causes it to report models twice (so they get only half the weight they should have) depending on the order of terms in the formulas you provide R function glmbb. The bug has been fixed in version 0.5-1, which is now on the main CRAN site and soon will be on all mirrors. So it is advisable to reinstall this package (or do update.packages()) before finishing homework 4.

New! Homework 4 due date changed to Wednesday, November 25. Canvas submission date changed accordingly, and the end of the Canvas discussion group for this assignment also changed accordingly.

New! New erratum posted. Simplified notes on Chapter 8.

New! New erratum posted. Error noticed the notes on model selection and model averaging has been fixed.

New! New reading assignment in Chapters 8 and 9 in Agresti.

Homework assignment 4 posted. Assignment upload link and discussion group have been made in Canvas.

Some formulas from office hours Monday and Wednesday.

η = M β

is the relationship between η and β and the log unnormalized posterior density is given by

∑_i (y_i + ½) η_i − e^η_i

In the course notes for this stuff the R expression gout$x extracts the model matrix from the result returned by R function glm (provided the optional argument x = TRUE was supplied) and the R expression gout$y extracts the response vector. They are used in R function logl in defined in the notes. The line

eta <- drop(modmat %*% beta)

in that function defines η = M β. The rest of the function in the notes is for the binomial rather than the poisson distribution, so you have to figure out how to implement the formula above that calculates the log unnormalized density of the posterior distribution (what you want R function metrop to sample).

Mentioned in class, for more on Markov chain Monte Carlo, including an explanation of why the Metropolis algorithm works, one can turn to the introductory chapter of the Handbook of Markov Chain Monte Carlo written by your humble instructor. But everything necessary to do the homework is in the lecture notes for this course. This is only for those who want more info on MCMC.

Reading assignment for Friday, Oct 16 cut. Just read Sections 4.1, 4.2, and 4.3 in Agresti.

Link for submitting homework assignment 1 late added to canvas.

Broken link in hw 2 assignment now fixed. The assignment was wrong the file is at

http://www.stat.umn.edu/geyer/5421/mydata/hw2-4.txt

that is, hw2-4.txt rather than h2-4.txt.

New course notes on the Poisson distribution.

New erratum about R function logl in the notes on the binomial distribution.

On Friday, Sep 25, 2020 we depart from categorical data analysis to the reproducibility crisis in science and what statistics done badly has to do with that. Our texts are

a talk I gave to the Minnesota Center for Philosophy of Science and the University of Minnesota Program in History of Science, Technology and Medicine in January 2012,
a paper published in the journal Science titled Estimating the Reproducibility of Psychological Science by the Open Science Collaboration (270 authors) in August 2015, and
the Many Faces of Reproducibility Interdisciplinary Collaborative Workshop that has been running here at the U of M for the past 2 years and is continuing.

This course will use plain R rather than Rstudio.

You can use Rstudio if you want but I don't need anything it does.

There are two R packages designed to be used in this course.

R package CatDataAnalysis is found at Github.
https://github.com/cjgeyer/CatDataAnalysis
Instructions for installing the package are found on that web page and work on all computers AFAIK.
R package glmbb will not be used until the middle of the course. It is found on CRAN
https://CRAN.R-project.org/package=glmbb
but one does not normally install the package by going to that web site.
Install the package in R by executing the command
```
install.packages("glmbb")
```
at the R command line or, of you prefer, by mousing around in menus of the R app or Rstudio.

New! This web site has no index, so in order to find stuff one needs to use a search engine. Here is how to do that. For example, if you want to find information on the beta distribution, then the search

"beta distribution" site:www.stat.umn.edu/geyer/5101

does that. This works either with Google or with DuckDuckGo. The quotation marks mean find the exact phrase. If they are left off, then the search engine will return results that have the word beta and the word distribution, not necessarily in the same page much less in the same sentence. The magic is the site: part, which tells the search engine only to look in that site. The site can be made more restrictive, for example,

"beta distribution" site:www.stat.umn.edu/geyer/5101/slides

says to look only in the slides (this seems to work only in Google, but not in DuckDuckGo even though it is supposed to work in DuckDuckGo).

New! This course is entirely on-line, regardless of what the U does with other courses. It is synchronous in the sense that, if you want to ask questions during lecture, then you must be in the Zoom session at the scheduled time. It is asynchronous in the sense that all Zoom sessions will be recorded and linked on the Canvas site for the course.

Navigation

Announcements

Statistics 5421 (Geyer, Fall 2020)

Announcements

Navigation

Contents