Calculating estimates for cluster random samples


Here is some R code that caluclates two estimators when sampling from clusters.

Here we are assuming that the cluster sizes, the Mi's, vary and the sample sizes, the mi's, vary as well. For estimating the pop total this calculates the unbiased estimator in (5.21) of the text and its unbiased estimate of variance in (5.25). For the population mean it finds the ratio estimate in (5.28) and its estimate of variance in (5.29). This is done in the context of a simple example which assume we have a random sample of 3 clusters from a population consisting of N=15 clusters. The total number of elements in the population is assumed to be unknown.

Note the equation numbers in the above refer to the first edition of the text. The corresponding numbers for the second edition are (5.20), (5.24), (5.27) and (5.28).

These commands should help you do HW assignment 5. Note for some of the bigger problems you will not want to look at all of X.

External Data Entry

Enter a Dataset URL :

Remeber the populations from the text are in a password protected directory.

Here is a way to simplify some of the calculations for problem 12 a. To get an estimate of the variance of our estimator we need (using the terminology from anov) the (between SS)/(size of sample within a cluster) and the within SS. To see how this can be done consider a simple example where we have 2 clusters each with a sample of size 3. In the following y is the set of values for the character of interest and x tells us cluster they belong to. In gl(n,m,k) n is the number of clusters, m is the size of the sample within a cluster and k=n*m is the total number of elements in the sample.

Note the Sum Sq due to x in the above will be just the between SS and the residuals Sum Sq is the within SS. So in this example the numbers we need are 24/3 and 4.