University of Minnesota
School of Statistics

Statistics 5021

Assignment Sheet 2
January 29, 2002

Reading for Week 2, January 27-31
An Introduction to MacAnova, Sec. 5 - 6, 7.1, 7.2 IPS Section 1.2, Section 1.3 through p. 69
Reading for Week 3, February 3-7
An Introduction to MacAnova, Sec. 5 - 6, 7.1, 7.2 IPS Chapter 1: pp. 72-85; Sec. 2.1
Written Assignment #2, due in class Friday, February 7.
IPS pp. 55-63: Ex. 1.42*, 1.44, 1.46, 1.50, 1.54, 1.58, 1.62 (use "white box MacAnova approach), 1.68; pp. 84-92: Ex. 1.78, 1.80

*I have created a data file indiv1000.txt containing data from a random sample of size 1000 from the data referred to in Ex. 1.42. It can be downloaded from the web page http://www.stat.umn.edu/~kb/classes/5021/mooredatasets.html. As part of 1.42, make five number box plots and regular MacAnova box plots of income (variable earn) split by educational leveled (see below)

Optional Problems, not to be turned in or graded:
IPS 55-63: Ex. 1.41, 1.43, 1.47,1.49, 1.53, 1.55, 1.63, 1.65; pp. 84-92: 1.77, 1.79, 1.81

I have added to the class Web page a link to densities.mac, a file of macros useful in drawing density functions. http://www.stat.umn.edu/~kb/classes/5021/densities.mac.html includes some examples of how to use the macros in densities.mac.

To use macros in the file you need to do the following:

  1. Download the file.
    Click on the link. Use the right button in Windows, or hold the button down on a Macintosh. Then choose either Save Link As or Save Target As.

    I have posted links to identical copies of the file, densities.mac and densities.mac.txt. Some browsers sometimes have a problem with names that end .mac, not recognizing them as text files If you run into difficulty, click on the link to densities.mac.txt.

    If you are downloading to your computer or one you use regularly, use the file navigation dialog box to save the file in the same folder as MacAnova itself. You may be able to do this a lab computer, too, but it will probably be deleted before the next time you use the computer.

    If you are unable to download to the MacAnova folder, it's probably best to download to a floppy disk.

  2. Make the file available to MacAnova
    When you start up MacAnova, do one of the following:
    1. If the macro file is in the same folder as MacAnova do the following command
      Cmd> addmacrofile("densities.mac")
      
    2. Otherwise, give the following command
      Cmd> addmacrofile(getfilename())
      

      This brings up a file navigation dialog box, just as readdata("") does. Use it to find and select densities.mac.

  3. Use any macro in the macro file. It should be automatically read in when you use it.

MacAnova Notes on Box Plots

The traditional form of a MacAnova drawn box plot includes the following:

  1. A box between the lower and upper quartiles
  2. "Whiskers" running from the ends of the box to the most extreme points inside the "inner fences" (lower quartile - 1.5*IQR and upper quartile + 1.5*IQR)
  3. Separate plotting of any suspected outliers between the inner fences and the "outer fences" (lower quartile - 3*IQR and upper quartile + 3*IQR), and of any extreme outliers beyond the outer fences.

This is close to what Moore and McCabe call modified box plots, except that different plotting symbols are used for outliers inside and outside the outer fences. Here are such box plots for the data in Table 1.2 on p. 14.

Cmd> readdata("")
# File ta01_002.txt from publisher's web site
# Data from Table 1.2, p. 14 of IPS4
# Use readdata("",factors:F)
# Col. 1: state (CHARACTER vector of state names)
# Col. 2: hispanic = percent adult hispanics by state (2000)
Read from file "TP1:Stat5021:Data:Ch01:ta01_002.txt"
Column 1 saved as factor state 
Column 2 saved as REAL vector hispanic 

Cmd> vboxplot(hispanic,ylab:"Percent Hispanic",\
          title:"Percent adult hispanics by state (2000)")

Cmd> vboxplot(hispanic,ylab:"Percent Hispanic",\
          title:"Percent adult hispanics by state (2000)",boxsize:1.5)
Two differently sized box plots

Keyword phrase boxsize:1.5 was used to make a wider box. Compare these with IPS Fig. 1.16 on p. 47. (keyword boxsize is new with the most recent MacAnova release and is not available in earlier versions.)

A new feature of MacAnova is the ability to make box plots based on the five number summary of a data set, that is, like the standard Moore and McCabe box plots. You use keyword phrase boxtype:2 to get such plots.

With odd samples size, to match Moore and McCabe box plots you also need to make another change. This is because, when n is odd, MacAnova functions such as describe() and boxplot() use a different definition of quartile from that used in IPS.

When the sample size n = 2m + 1, IPS quartiles are the medians of the the smallest m values and the largest m values, excluding the median (the value m+1) from both halves.

The default definition of quartiles in MacAnova is as the medians of the smallest m+1 values and largest m+1 values, including the median in both halves. This results in slightly different appearing box plots.

However, if you use keyword phrase excludeM:T on describe() and boxplot() or vboxplot(), Moore and McCabe's definition is used. excludeM:T has no effect when n is even.

Here I use boxtype:2 but not excludeM:T which would have no effect since n is even.

Cmd> vboxplot(hispanic,ylab:"Percent Hispanic",boxtype:2,\
		title:"Percent adult hispanics by state (2000)")

Cmd> vboxplot(hispanic,ylab:"Percent Hispanic",boxtype:2,\
		title:"Percent adult hispanics by state (2000)",boxsize:1.5)
Two differently sized box plots with boxtype:2

There is considerably less information in these plots than in the previous ones.

Here is how you can plot size-by-side box plots of several data sets. I looked at the car mileage data in Table 1.8.

Cmd> readdata("") # read file ta01_008,txt 
# File ta01_008.txt from publisher's web site
# Data from Table 1.8, p. 38 of IPS4
# Col. 1: cartype (factor, 1 = Mini, 2 = Two)
# Col. 2: city = in-city gasoline mileage (mpg)
# Col. 3: highway = highway gasoline mileag (mpg)
Read from file "TP1:Stat5021:Data:Ch01:ta01_008.txt"
Column 1 saved as factor cartype 
Column 2 saved as REAL vector city 
Column 3 saved as REAL vector highway 

Cmd> minis <- cartype == 1 # select Minicompact cars

Cmd> data <- structure(MiniHwy:highway[minis],\
          TwoHwy:highway[!minis],MiniCity:city[minis],\
          TwoCity:city[!minis]) # create a structure

Cmd> list(data)
data            STRUC  4    

Cmd> compnames(data) # structure component names
(1) "MiniHwy"
(2) "TwoHwy"
(3) "MiniCity"
(4) "TwoCity"

highway[minis] and city[minis] select the highway and city mileage for minicompact cars.

"!" is the not operator so highway[!minis] and city[!minis] select the highway and city mileage for cars that are not minicompact, that is, two-seaters.

Cmd> print(data,format:"3.0f")
data:
component: MiniHwy
(1)  31  27  28  23  24  22  28  24  30  25  22
component: TwoHwy
 (1)  24  30  28  27  21  26  21  16  13  68  26  13  28  23  19
(16)  27  23  27  30
component: MiniCity
(1)  22  19  20  16  17  16  20  18  22  17  15
component: TwoCity
 (1)  17  22  21  20

When you use a structure as argument to boxplot() or vboxplot(), side-by-side box plots of the components are drawn.

Cmd> vboxplot(data,boxtype:2,excludeM:T,ymin:0,xlab:"Type of Car",\
		title:"Box plots of data from Table 1.8",\
		xticklabs:compnames(data))
Side by side box plots of car mileage data

Note I used ymin:0 to force inclusion of y = 0 in the plot, and excludeM:T to match the IPS definition of quartiles.

This is similar to the plot on p. 45 except that it includes the entire data set. The plot on p. 45 excludes the data for the Honda Insight.


C. Bingham kb@umn.edu

Updated Wed Jan 29 12:01:35 CST 2003