Statistics 3701 (Geyer, Spring 2017) Homework 2

Rules

See the Section about Rules for Quizzes and Homeworks on the General Info page.

Your work handed into Moodle should be a plain text file with R commands and comments that can be run to produce what you did. We do not take your word for what the output is. We run it ourselves.

Note: Plain text specifically excludes Microsoft Word native format (extension .docx). If you have to Word as your text editor, then save as and choose the format to be Text (.txt) or something like that. Then upload the saved plain text file.

If you have questions about the quiz, ask them in the Moodle forum for this quiz. Here is the link for that https://ay16.moodle.umn.edu/mod/forum/view.php?id=1241814.

On future assignments you can use knitr or rmarkdown after we have talked about it. But avoid that on this assignment.

Quizzes must uploaded by the end of class (1:10). Moodle actually allows a few minutes after that. Here is the link for uploading the quiz https://ay16.moodle.umn.edu/mod/assign/view.php?id=1241821.

Homeworks must uploaded before midnight the day they are due. Here is the link for uploading the homework. https://ay16.moodle.umn.edu/mod/assign/view.php?id=1241828.

Quiz 2

Problem 1

Write an R function that, given a numeric matrix A and a numeric vector x, calculates x^T A^{− 1} x. Note that this only makes sense when A is a square matrix and the dimension of x is the same as the dimensions of the row and column dimensions of A.

Follow GIEMO (garbage in, error messages out). Make sure your function gives an error when the dimensions are wrong or when either argument is not numeric. When either argument contains NA, NaN, Inf, -Inf, your function can give an error, give a warning, or produce one of these results (your choice).

We deduct points for actually inverting the matrix A. Think of this in terms of solving linear equations, instead.

We deduct points for using a loop or loops.

Not only write a function, but also show it working on the data obtained by the R command


load(url("http://www.stat.umn.edu/geyer/s17/3701/data/q2p1.rda"))
ls()

(This loads two R objects: a matrix a and a vector x.)

Problem 2

Rewrite your function for the preceding problem so it is a binary operator rather than an apparent function call, that is, if your function from the preceding problem was invoked

alice(a, x)

rewrite it so it is invoked

a %alice% x

(of course, you can change alice to any other valid R name.

Problem 3

Write a function that takes a numeric matrix and standardizes its columns (we will explain what this means). For any vector x its standardization is computed as follows.


(x - mean(x)) / sd(x)

Note that in the special case that the matrix has only one row, it cannot be standardized (because in that case the standard deviation of the columns is zero and the algorithm wants to divide by zero. You can either produce an error, a warning, or all components of the result NA, NaN, Inf, or -Inf, your choice.

There is no deduction of points for using a loop or loops in this problem.

Not only write a function, but also show it working on the data obtained by the R command


load(url("http://www.stat.umn.edu/geyer/s17/3701/data/q2p3.rda"))
ls()

(This loads one R object: a matrix a.)

But it might be better to add the error check

    stopifnot(nrow(a) > 1)

to the definition of standardize.

Homework 2

Homework problems start with problem number 4 because, if you don't like your solutions to the problems on the quiz, you are allowed to redo them (better) for homework. If you don't submit anything for problems 1–3, then we assume you liked the answers you already submitted.

Problem 4

This is a modification of problem 3. Do it without loops. (If you already did it without loops, then there is nothing left to do, your solution to problem 3 also counts as a solution to this problem.)

Hint: The R function apply, when given a function that maps vectors to vectors, returns a matrix. Section 6.8 of the course notes about Matrices, Arrays, and Data Frames illustrates this.

Problem 5

This problem is to write a function just like the function array in the R base package, which is described in Section 6.8 of the course notes about Matrices, Arrays, and Data Frames except that the function to be written for this problem — for concreteness call it myapply — is a lot simpler.

Like array its signature is

function(X, MARGIN, FUN, ...)

but unlike array its arguments are a lot simpler.

Argument X is a matrix of any type a matrix can have (numeric, character, logical, complex). This is unlike the corresponding argument of apply, which can be an array of any dimension.
Argument MARGIN is either (the number) 1 or (the number) 2. This is unlike the corresponding argument of apply, which can be either a numeric vector or a character vector (the latter a possibility I was unaware of before writing this question).
Argument FUN is an R function that maps vectors to vectors (possibly of length 1 possibly of longer length, as explained in the section of the course notes cited above). We will consider it an error if the function FUN returns results of different length in any invocation of myapply. The requirements for the corresponding argument of apply are much looser, as help("apply") explains.
For this problem, you can assume (rather than check) that FUN always returns a vector of the same length in any invocation of myapply.
Argument ... is passed to FUN, that is, any arguments to myapply that do not match X, MARGIN, or FUN are passed to FUN whenever it is called by myapply. (This is just like how apply works).
That is, if foo is the thingummy we are trying to apply FUN to, either a row or column of X depending on whether MARGIN is 1 or 2, we always invoke it as
```
     FUN(foo, ...)
     
```

In order to make this problem non-trivial, you are not allowed to use the R function apply or any other R function with apply as part of its name (lapply, for example).

According to the rules (in the rules section above), it is perfectly legal to look at the source code for apply, what

apply

shows. But that code is very confusing because what apply does is much more complicated than what a solution to this problem has to do. You can even copy code from apply so long as you say you are doing that (put comments in your code to say which lines you have copied). In order that you don't just copy all the lines of array we make another rule for this problem that your function should have no more than 30 R commands, not counting any code that catches errors or the function signature (the part with the R function named function).

Hint: Looking at the source for array, its first line is


FUN <- match.fun(FUN)

You should copy that. Then FUN can either be a function or the name of a function, and match.fun takes care of that. If you do this, you do not have to do any error checks for the argument FUN. The function match.fun will catch all errors. You also do not need a comment about using this.

Another Hint: Since you cannot use apply or any other function with apply as part of its name (lapply, for example), you will have to use a loop.

Yet Another Hint: Since you do not know what length vector FUN returns until after the first time you call it, you have to wait until you have called it to find out what the dimensions of the result of myapply are.

Try your function myapply on all of the examples using apply in Section 6.8 of the course notes about Matrices, Arrays, and Data Frames making sure that you get the same results.

Your function does not have to produce the same row and column names on its result as array does. It is also OK if your function sometimes produces a matrix when apply produces a vector or vice versa. It is enough that the numbers are the same.

Problem 6

This problem is about arrays and the R function apply applied to arrays. This problem does not require you to write a function.

The R command


load(url("http://www.stat.umn.edu/geyer/s17/3701/data/q2p6.rda"))

loads one R object: a three-dimensional array called pat.

Apply the R function median to the three two-dimensional margins (indexed by pairs of indices) and the three one-dimensional margins (indexed by single indices) of this array.