University of Minnesota
School of Statistics

MacAnova FAQ for Stat 5021

This file contains edited e-mail questions that have been asked about MacAnova and their answers.

Caution This file has not been updated from Spring 2001 and hence contains references to homework problems that currently don't apply. In addition, there may be new MacAnova features that could be used that are not mentioned. I will update this as I have time and add to it as questions come in.

Contents

Making a bar graph for categorical data

Q. I am trying to create a bar graph to go along with question 1.14b in the first homework assignment using MacAnova. Could you give me some help in getting started?

A. Here is one way, using macro bargraph.

Cmd> # First enter the data

Cmd> deaths <- vector(41893, 13141, 3807, 3900, 7382) # from text

Cmd> alldeaths <- 90523 # from text

Cmd> # Calculate number from other causes by subtraction

Cmd> otherdeaths <- alldeaths - sum(deaths);otherdeaths
(1)       20400

Cmd> counts <- vector(deaths,otherdeaths) # combine in vector

Cmd> barnumber <- run(6) ; barnumber # numbers for the bars
(1)           1           2           3           4           5
(6)           6

Cmd> percents <- 100*counts/alldeaths # calculate percents

Cmd> graphicshelp(bargraph,usage:T) # see more on graphicshelp below
bargraph(edges,y [,save:T] [,draw:T] [,graphics keyword phrases]), REAL
  vectors edges and y, length(edges) = length(y) + 1
bargraph(vector(leftedge[,width]), y, [,save:T] [,draw:T] ...), width >
  0, leftedge REAL scalars, default for width = 1

Cmd> edges <- run(.5,6.5) # .5, 1.5, 2.5, ..., 6.5

Cmd> bargraph(edges,percents,title:"Accidental deaths percentages by cause",\
          xlab:"Cause number", ylab:"Percent")
WARNING: searching for unrecognized macro bargraph near bargraph(
Accidental deaths percentages by cause


Cmd> # 1 = MV, 2 = Falls, 3 = drowning,4 = fires, 5 = poison, 6 = other

This, although not optimal, provides the necessary information. It is possible to actually replace 1, 2, ..., 6 with bar labels. Here's how.

Cmd> barlabels <- vector("M Veh","Fall","Drowning","Fire","Poison","Other")

Cmd> bargraph(edges,percents,title:"Accidental deaths percentages by cause",\
                xticks:run(6),xticklab:barlabels,ylabs:"Percent")
Accidental deaths percentages by cause

This works because xticks:run(1,6) specifies there should be tick marks at 1, 2, ..., 6 and xticklabs:barlabels specifies that the elements of barlabels should be used to label the ticks instead of "1","2",... .

There are other fancier ways to do it as well.

I know that asking you to do things like this on a computer this early in the course is a real challenge, but that is sometimes the fastest way to learn about a program.


Learning about available macros

Q. How might I have found out about macro bargraph without asking someone?

A. You probably would not have because there are some missing cross references in help.

Here are two ways for finding about where to look for information for making graphs (besides An Introduction to MacAnova).

First you can use help() keyword key
Cmd> help(key:"?")
Type 'help(key:"heading")', where heading is in following list:
ANOVA                   General                 Plotting
Categorical Data        Input                   Probabilities
CHARACTER Variables     LOGICAL Variables       Random Numbers
Combining Variables     NULL Variables          Regression
Comparisons             Macros                  Residuals
Complex Arithmetic      Matrix Algebra          Structures
Confidence Intervals    Missing Values          Syntax
Control                 Multivariate Analysis   Time Series
Descriptive Statistics  Operations              Transformations
Files                   Ordering                Variables
GLM                     Output

Plotting looks promising. Let's see what help topics relate to plotting.

Cmd> help(key:"plotting") # find help topics related to plotting
The following help topics concern Plotting
addchars     colplot      graph_keys   resvsindex   stringplot
addlines     graphs       graph_ticks  resvsrankits tek
addpoints    GRAPHWINDOWS hist         resvsyhat    tekx
addstrings   graph_assign lineplot     rowplot      vboxplot
boxplot      graph_border Mouse        showplot     vt
chplot       graph_files  plot         stemleaf     vtx
For help on topic foo, enter help(foo) or help("foo")

You can get help on any of these topics using help(), say help(chplot).

Unfortunately, none of these seems closely related to making bar graphs, except hist which you already know about.

There is a whole file of macros, graphics.mac related to making graphs. Just typing graphicshelp() summarizes what is there.

Cmd> graphicshelp() # information about macros for graphing
Help entries in this file
  bargraph     Macro do draw a bar graph
  boxplot5num  Macro to draw simplified boxplot based only on extremes,
               and quartiles
  colplot      Macro to plot matrix colums vs row number
  contour      Macro to trace a level curve of a surface defined
               on a rectangular grid
  contourplot  Macro to make a contour plot of a surface defined
               on a rectangular grid
  ellipse      Macro to compute the outline of an ellipse and
               optionally draw it
  findcontour  Macro to interactively find a contour line near
               a point selected with the mouse
  graphicshelp Macro to read help on macros in Graphics.mac
  hist         Macro to draw histogram
  panelhist    Macro to draw histogram in pane of panel graph
  panelplot    Macro to draw point, line or impulse plot in pane of
               panel graph
  panel_graphs Brief description of a panel graph.
  piechart     Macro to draw a pie chart of positive data
  plotmatrix   Macro to draw a scatter plot matrix, that is a panel
               graph with plots of each column of a matrix against all
               other columns or against all columns of another matrix
  plotpanes    Macro to draw point, line or impulse plots in several
               panes of a panel graph
  rowplot      Macro to plot matrix rows vs column number
  sampcdf      Macro to plot a sample CDF
  vboxplot     Macro to make a vertical boxplot

Great! There is a macro bargraph which sounds like what you are looking for. You can use graphicshelp() to get more information about bargraph or any topic in graphics.mac.

Cmd> graphicshelp(bargraph)
bargraph(edges, y [,graphics keyword phrases]), where edges and y are
REAL vectors with edges[j+1] - edges[j] > 0 and length(edges) =
length(y) + 1 draws a bar graph.  Bar j has height y[j] and boundaries
edges[j] and edges[j+1].

The graphics keyword may include 'xlab', 'ylab', 'title', keywords
having to do with tick marks and so on.

bargraph(vector(leftedge, width), y [,graphics keyword phrases]), where
width > 0 and left edges are real scalars does the same except the bar
edges are leftedge, leftedge + width, ...,leftedge + length(y)*width.

bargraph(leftedge, y, ...) is the same a bargraph(vector(leftedge,1),
y, ...).

bargraph(x,y, keep:T [,draw:T] [,graphics keyword phrases]), where x is
edges, vector(leftedge,width) or leftedge, returns
	structure(x:xvals, y:yvals [,graphics keyword phrases], lines:T)
With draw:T, the graph also is drawn.

This structure can be assigned to GRAPHWINDOWS[j] to draw the bar graph
in window j or used with panelplot() to draw the bar graph in a panel
graph (see topics panelplot() and 'panel_graphs') usage:
bargraph(edges,y [,save:T] [,draw:T] [,graphics keyword phrases])

See also hist, 'graph_keys'.

There are several other files of macros, including regress.mac which contains macros related to regression analysis.

Cmd> regresshelp()

gives you a list of a all the macros in regress.mac with thumb nail descriptions.


Drawing frequency and relative frequency histograms in MacAnova

Q. Is there any command or function in MacAnova can be used to make all the three types of histogram? I know how to make the density scale histogram but I cannot find a command for making frequency histograms or relative frequency histograms.

A. hist now allows you to make all three types of histograms. The default is to make a density scale histogram. However, if you include freq:T as an argument, the heights of the histogram bars will be frequencies (counts). If, instead,you use relfreq:T as an argument, the bar heights will be relative frequencies (counts/sample size). Here's an example, using the data in Example 1.5 of Moore and McCabe, Introduction to the Practice of Statistics.

Cmd> readdata("",dollars) # read data from file eq01_005.txt
# Data from Example 1.5, p. 12 in IPS     Lines in file echoed by
# NOTE: Version from web site             readdata
dollars saved as REAL vector

Cmd> hist(dollars,vector(0,10),freq:T,\
          xlab:"Dollars spent",ylab:"Frequency",\
          title:"Frequency histogram of data in IPS example 1.5")
Frequency histogram of data in IPS example 1.5

Cmd> hist(dollars,vector(0,10),relfreq:T,\
            xlab:"Dollars spent",ylab:"Relative frequency",\
            title:"Relative frequency histogram of data in IPS example 1.5")
Relative frequency histogram of data in IPS example 1.5

This won't work if your data consists of frequencies to start with. For that you need to use macro bargraph. Here is an example using data from Table 1.2 on p. 15 of Moore and McCabe

Cmd> frequencies <-vector(9,28,59,165,244,206,146,60,24,5,1)
  
Cmd> edges <- run(2,13)
  
Cmd> n <- sum(frequencies); n # sample size
(1)         947
  
Cmd> bargraph(edges,frequencies,ylab:"Frequency",\
            title:"Frequencies of 7th grade vocabulary scores",xlab:"Vocabulary scores")
WARNING: searching for unrecognized macro bargraph near bargraph(
Frequencies of 7th grade vocabulary scores

Cmd> bargraph(edges,frequencies/n,ylab:"Frequency",xlab:"Vocabulary scores",\
        title:"Relative requencies of 7th grade vocabulary scores")

Relative requencies of 7th grade vocabulary scores


Computing frequencies using MacAnova function bin()

Q. Is there a single (or several) command(s) in MacAnova that we can use to create a frequency distribution for given intervals (equal or unequal)? I have been using stemleaf() to make a stemplot and then have counted the occurrences of observations within each interval. This is time-consuming and error-prone.

A. Yes, bin() does just that. Here is an example using bin() to make a histogram. Type help(bin) for more information.

Cmd> readdata("",dollars) # file eq01_005.txt
# Data from Example 1.5, p. 12 in IPS
# NOTE: Version from web site
dollars saved as REAL vector

Cmd> dollars
 (1)        3.11        8.88        9.26       10.81       12.69
 (6)       13.78       15.23       15.62          17       17.39
(11)       18.36       18.43       19.27        19.5       19.54
(16)       20.16       20.59       22.22       23.04       24.47
(21)       24.58       25.13       26.24       26.26       27.65
(26)       28.06       28.08       28.38       32.03       34.98
(31)       36.37       38.64       39.16       41.02       42.97
(36)       44.08       44.67        45.4       46.69       48.65
(41)       50.39       52.75        54.8       59.07       61.22
(46)       70.32        82.7       85.76       86.37       93.34

Cmd> usage(bin)  # get skeleton of its usage
bin(x,Bnds [,silent:T,leftendin:T]), x a REAL matrix, Bnds REAL vector,
   Bnds[k] < Bnds[k+1]
bin(x,vector(anchor,width) [,leftendin:T]), anchor and width > 0 REAL
   scalars
bin(x,nbins, [leftendin:T]), nbins positive integer.
bin(x [,leftendin:T])

Cmd> stuff <- bin(dollars,vector(0,10))

Cmd> stuff
component: boundaries
 (1)           0          10          20          30          40
 (6)          50          60          70          80          90
(11)         100
component: counts
(1)           3          12          13           5           7
(6)           4           1           1           3           1

Cmd> bargraph(stuff$boundaries,stuff$counts,title:"Frequency histogram",\
                xlab:"Dollars spent",ylab:"Frequency")
Frequency histogram

The form of the second argument (vector(0,10)) to bin() is vector(anchor,width). All classes had width = 10 and the boundaries were integer multiples of width away from anchor = 0. You can also explicitly provide the boundaries. The following will give exactly the same results.

Cmd> stuff <- bin(dollars,run(0,100,10))

By default, bin() puts values on the class boundaries in the class to its left, that is the right end of a class is included in a class. Another convention that is used is to put a value on a class boundary in the class to its right, that is to include the left boundary of a class in that class. You specify that with leftendin:T as an argument to bin(). With these data there are no such values so it isn't a problem.


Error message using macro bargraph

Q. When I run the bargraph macro, it produces an error saying that I didn't use the right number of bar boundaries. I cannot figure out how to remedy this problem.

A. When you use bargraph you neet to provide one more boundary than there are bars. Here is an example using data from Table 1.2 on p. 15 of Moore and McCabe

Cmd> frequencies <-vector(9,28,59,165,244,206,146,60,24,5,1)

Cmd> edges <- run(2,13)

Cmd> n <- sum(frequencies); n # sample size
(1)         947

Cmd> bargraph(edges,frequencies,xlab:"Vocabulary scores",ylab:"Frequency",\
        title:"Frequencies of 7th grade vocabulary scores")
WARNING: searching for unrecognized macro bargraph near bargraph(
Frequencies of 7th grade vocabulary
scores

That works fine because there are 11 counts and edges is the same as run(2,13) which has 12 = 11 + 1 elements. When you add an extra element to the boundaries or have too few elements, you run into trouble.

Cmd> bargraph(vector(edges,14),frequencies,xlab:"Vocabulary scores",\
        ylab:"Frequency",\
        title:"Frequencies of 7th grade vocabulary scores")
ERROR: number of bar edges != length(y) + 1 in macro bargraph

Cmd> bargraph(edges[-12],frequencies,xlab:"Vocabulary scores",\
        ylab:"Frequency",\
        title:"Frequencies of 7th grade vocabulary scores")
ERROR: number of bar edges != length(y) + 1 in macro bargraph

So the answer is to make sure the number of bar boundaries is one more than the number of bars.


Capturing graphs

Q. When I use plot(), the plot shows up in a separate window. but I do not know how to make the plot appear in the input/output window continue to work. Can you help?

A. MacAnova makes two types of plots - high resolution and low resolution. You always get a high resolution plot unless dumb:T is an argument to the plotting command (for example, plot(height,weight,dumb:T) or hist(x,12,dumb:T)).

Low Resolution Plots

In any version of MacAnova (Macintosh, Windows, DOS, Unix, Motif), low resolution plots are just like any output. They consist of ordinary printable characters. In Windows and Macintosh the plot is in the regular output window and this type of graph is the only type that ever does show up in the input/output window. If you save that window, the low resolution plots are saved as part of it.

In any version, if you use spool() to save a record of commands and output, low resolution plots will be written to the spool file just like other output.

High Resolution Plots

On a Macintosh or Windows computer, these go in separate windows and do not go in the input/output window. The easiest way to save them is to copy them to a word processor document.

When a graph window is in front, select Copy on the Edit menu to copy the high resolution plot to the Clipboard. Then switch over to (or start up and switch over to) a word processor of your choice (Microsoft Word, WordPerfect, ...) and use Paste on the Edit menu to put the graph in a word processor document.

You can, of course, use Copy and Paste to move commands and output from the input/output window to a word processor document. In this way you can keep a copy of everything you want.


Importing data from Excel and exporting data to Excel

Q. I'd like to use MacAnova for my statistical analysis but all my data are in an Excel format. Is there a way I can convert the Excel file to something MacAnova will read, or do I need to enter the data by hand?

A. Using the Windows or Macintosh versions, there is a very easy way to export data from Excel to MacAnova - use the Clipboard and MacAnova macros fromclip or clipreaddata.

1. In Excel, select the data, in the form of a set of consecutive spread sheet rows and columns and copy it to the Clipboard using Copy on the Edit menu. Suppose there are k = 10 variables, each in one of 10 columns.

2. Start up MacAnova and do the following:

Cmd>  k <- 10 # number of columns

Cmd> exceldata <- fromclip(k)

Variable exceldata should now contain the Excel data in a rectangular table (matrix) with 10 columns. If you want to work with the variables separately, then

Cmd>  makecols(exceldata, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10)

will create variables x1, x2, ... (or whatever legal MacAnova names you use).

If the data are not in consecutive columns of the spread sheet, you can move individual variables or sets of consecutive columns one at a time using the same mechanism. If you are copying only one column to the clipboard,

Cmd> x1 <-  fromclip()

will take it from the clipboard.

There is a new macro clipreaddata which makes things easier. It combines fromclip() and makecols. Suppose part of the spread sheet looks something like this:
specimen tiptype depth
1 A 9.3
2 A 9.4
3 A 9.6
4 A 10.0
1 B 9.4
2 B 9.3
3 B 9.8
4 B 9.9
1 C 9.2
2 C 9.4
3 C 9.5
4 C 9.7
1 D 9.7
2 D 9.6
3 D 10.0
4 D 10.2

Use the mouse to select all but the first row and then select Copy on the Edit menu. Then switch to MacAnova.

Cmd> clipreaddata(specimen,tiptype,depth) # rows 2 through 17 selected
specimen saved as REAL vector
tiptype saved as factor
depth saved as REAL vector

Cmd> list(specimen,tiptype,depth)
depth           REAL   16   
specimen        REAL   16   
tiptype         REAL   16    FACTOR with 4 levels

clipreaddata has automatically translated the character codes for the second column into a factor.

If you use the mouse to select all the rows, including the first one, clipreaddata will take the variable names from the first line.

Cmd> clipreaddata() # rows 1 through 18 selected
specimen saved as REAL vector
tiptype saved as factor
depth saved as REAL vector

Alternatively, it is possible to have Excel write data to a plain text (ASCII) file with no headers or other information, using some item on the File menu. Suppose you wrote the same 10 variables out into a file myexcel.dat with 10 columns, 1 for each variable. Then you can read it in as an array of 10 columns by

Cmd> exceldata <-  matrix(vecread("myexcel.dat"),10)'

(note the trailing ' to indicate transpose).

If you want to read each column directly into a named MacAnova variable, use

Cmd> readdata("myexcel.dat", x1, x2, x3, x4, x5, x6, x7, x8, x9, x10) # or readcols

The number of names must match the number of columns.

Since all versions of MacAnova can read data files, you can do this with DOS and Unix versions, too.


Back-to-back stemplots

Q. I have a question concerning MacAnova. Can one do back-to-back stemplots? I guess, I am having troubles with question 1.20 of the homework. I am also confused about 3 digit numbers since the examples so far have been with two digit numbers.

A. No, MacAnova doesn't do back-to-back stemplots at this time. What I have sometimes done is to make two stemplots, making sure the stems were compatible, and then copy the leaves from one stemplot to the other side of the stems of the other. It's messy and you have to add spaces to make things line up, but it works.

Cmd> x120w <- enter(154 109 137 115 152 140 154 178 101 103 126 126\
     137 165 165 129 200 148)

Cmd> x120m <- enter(108 140 114 91 180 115 126 92 169 146 109 132 75\
     88 113 151 70 115 187 104)
	   
Cmd> stemleaf(x120w,depth:F,outlier:F) # suppress depth and outliers
   10|139
   11|5
   12|669
   13|77
   14|08
   15|244
   16|55
   17|8
   18|
   19|
   20|0

    1|1 represents 11  Leaf digit unit = 1

Cmd> stemleaf(x120m,depth:F,outlier:F) # the other stemplot
    7|05
    8|8
    9|12
   10|489
   11|3455
   12|6
   13|2
   14|06
   15|1
   16|9
   17|
   18|07

    1|1 represents 11  Leaf digit unit = 1

Now using Copy and Paste on the Edit menu, followed by careful editing I combined the two. I copied leaves from the first stemplot, and pasted them to the left of the stems of the second plot, inserting |, and then tidying up. Here is what I ended up with.

    | 7|05
    | 8|8
    | 9|12
 139|10|489
   5|11|3455
 669|12|6
  77|13|2
  08|14|06
 244|15|1
  55|16|9
   8|17|
    |18|07
    |19|
   0|20|

    1|1 represents 11  Leaf digit unit = 1

This is an interesting example in that the best stems are actually two digit numbers (most of them). This happens when most of the data have the same first digit, here "1". When most of the first two digits are the same, as would be the case if you added 200 to each of the numbers, then you might end up with three digit stems, say 207, 208, 209, ..., 220. If you added 2000, you would need four digit stems, 2007, 2008, ..., 2020.

There are so many possibilities it's hard to cover them all. Moreover, in some cases, there is no single right answer.


Computing a trimmed mean

Q. For Problem 1.68, is there a slick way to do the trimmed mean. I know how to do it mathematically, but is there a command that is "neater?"

A There is no command at present. Here's how it can be done fairly slickly.

Cmd> y # here's some data
 (1)      110.46      110.36      110.06         110      109.66
 (6)      108.89      112.23      109.17      107.92      110.97
(11)      111.02      110.96       109.8      111.08      108.42
(16)      109.08      110.37      110.74      109.47      113.59

Cmd> stemleaf(y) # it has one apparent outlier
    1  107.|9
    2  108*|4
    3  108.|8
    6  109*|014
    9  109.|679
  ( 4) 110*|0334
    7  110.|799
    4  111*|00
    2  111.|
    2  112*|2
 High  1135
         1*|1 represents 1.1  Leaf digit unit = 0.1

Cmd> y <- sort(y); y# order the data
 (1)      107.92      108.42      108.89      109.08      109.17
 (6)      109.47      109.66       109.8         110      110.06
(11)      110.36      110.37      110.46      110.74      110.96
(16)      110.97      111.02      111.08      112.23      113.59

Cmd> n <- nrows(y) # sample size (20)

Cmd> p <- .1 # proportion to trim off each end

Cmd> k <- ceiling(p*n); k # number to trim from each end
(1)           2

Cmd> y[run(k+1,n-k)] # trimmed data (central 16 (80%) of data)
 (1)      108.89      109.08      109.17      109.47      109.66
 (6)       109.8         110      110.06      110.36      110.37
(11)      110.46      110.74      110.96      110.97      111.02
(16)      111.08

Cmd> describe(y[run(k+1,n-k)],mean:T) #mean of trimmed data
(1)      110.13

This whole thing could be put in a macro. For anyone who is interested, here's one way to do it.

Cmd> trimmean <- macro("if ($v != 2){error(\"usage: $S(y,p)\")}
@y <- sort($1)
@p <- $2
if (@p >= .5){@p <- @p/100} # convert to a proportion
if (@p >= .5){error(\"Trimming percent or proportion too big\")}
@n <- nrows(@y)
@k <- ceiling(@p*@n)
if (2*@k >= @n){@k <- @k - 1}
describe(@y[run(@k+1,@n-@k),],mean:T) # value will be returned ",dollars:T)

Cmd> trimmean(y,.10) # it works with proportion
(1)      110.13

Cmd> trimmean(y,10) # it also works with a percent
(1)      110.13

If you are interested, you can learn about the syntax of commands in a macro from help topic macro_syntax. The \ before " is required because the whole text of the macro is included in "...".

Here what the macro would look like in a macro file:

trimmean     MACRO DOLLARS
if ($v != 2){error("usage: $S(y,p)")}
@y <- sort($1)
@p <- $2
if (@p >= .5){@p <- @p/100} # convert to a proportion
if (@p >= .5){error("Trimming percent or proportion too big")}
@n <- nrows(@y)
@k <- ceiling(@p*@n)
if (2*@k >= @n){@k <- @k - 1}
describe(@y[run(@k+1,@n-@k),],mean:T) # value will be returned 
%trimmean%

If you made a file named, say, trimmean.mac containing these lines, you could read the macro by

Cmd> trimmean <- read("","trimmean")
trimmean     MACRO DOLLARS

After typing the command, you would find file trimmean.mac in the file navigation dialog box. The line

trimmean    MACRO DOLLARS

is printed once the file has been found.

If such a macro were to be distibuted with MacAnova, I would add some checking of the arguments, making sure that 0 < p < 100 and giving a warning if there are MISSING values.


Drawing a Normal Density

Q. In problem 1.74, we are asked to make a density curve with a normal distribution of N(63.1,4.8). I tried without success the following:

Cmd> x <- 63.1+4.8*rnorm(100)

Cmd> y <- run(1,100)

Cmd> lineplot(y,x)

A. You can plot a normal density curve in at least two ways. I apologize for not yet having posted a handout on plotting densities. Here is a demonstration for Ex 1.73.

First way (white box):

You need the equation for a normal density curve on p. 71. This includes e-(x-mu)2/(2*sigma2). In MacAnova you can compute ex by exp(x).

Cmd> z <- run(-3.5,3.5,.1) # equally spaced by .1 from -3.5 to 3.5

Cmd> mu <- 1.45; sigma <- 0.4

Cmd> x <- mu + sigma*z # values at which density computed

Cmd> curve <- exp(-(x - mu)^2/(2*sigma^2))/(sqrt(2*PI)*sigma)

Cmd> # the preceding is just a direct translation of the formula on

Cmd> # on page 71 of Moore and McCabe

Cmd> lineplot(x,curve,title:"N(1.45,0.4) density")
lineplot(x,densnor(
N(1.45,0.4) density

A second way requires downloading the macro file densities.mac posted on the Web page.

The following requires that densities.mac is in the same folder as MacAnova.

Cmd> addmacrofile("densities.mac")

Cmd> lineplot(x,densnor(x,mu,sigma),title:"N(1.45,0.4) density")
WARNING: searching for unrecognized macro densnor near 
lineplot(x,densnor(
N(1.45,0.4) density

What you were doing was incorrect in two ways:

  1. Your first argument to lineplot() was not well chosen, since it runs from 1 to 100 but the mean is 63.1 and SD = 4.8. Since virtually all the probability is within 4 standard deviations of the mean, at most, you need plot only from mu - 4*SD ~= 45 to mu + 4*SD ~= 82. In fact, you really don't need to go beyond 3.5*SD, as illustrated in the example.
  2. The value you computed for x (63.1+4.8*rnorm(100)) is a vector of random numbers, not a density.

Using a categorical variable as plotting symbols

Q Please explain how you would use MacAnova to make a plot that represents both categorical and numerical data.

A. Here's how you might produce the graph I showed at the end of the hour on Friday, February 2, 2001. The data are actually taken from Table 18.2.1 on p. 368 of the most widely cited scientific book of all time, Statistical Methods by Snedecor and Cochran, 7th Edition (for the first 5 editions it was just by Snedecor).

Cmd> # first enter the data

Cmd> before <- vector(11,8,5,14,19,6,10,6,11,3,6,6,7,8,18,8,19,8,5,15,\
	16,13,11,9,21,16,12,12,7,12)

Cmd> after <- vector(6,0,2,8,11,4,13,1,8,0,0,2,3,1,18,4,14,9,1,9,13,10,\
  18,5,23,12,5,16,1,20)

Cmd> treatletter <- vector(rep("A",10),rep("B",10),rep("C",10))

Cmd> paste(treatletter) # one way to look at the vector
(1) "A A A A A A A A A A B B B B B B B B B B C C C C C C C C C C"

Cmd> treatcode <- makefactor(treatletter) # create numerical code

Cmd> print(treatcode,format:"1.0f")
treatcode:
(1) 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3

Cmd> plot(before,after,symbols:treatletter,\
  title:"After treatment scores vs before treatment scores",\
  xlab:"Before leprosy scores",ylab:"After leprosy scores")
After treatment scores vs before
treatment scores

Here's another way to make up the symbols, using treatcode as a vector to select elements from the short vector vector("A","B","C"):

Cmd> plot(before,after,symbols:vector("A","B","C")[treatcode],\
  title:"After treatment scores vs before treatment scores",\
  xlab:"Before leprosy scores",ylab:"After leprosy scores")
After treatment scores vs before
treatment scores

Or, if you want numbers instead of letters, you can use treatcode directly.

Cmd>  plot(before,after,symbols:treatcode,\
   title:"After treatment scores vs before treatment scores",\
   xlab:"Before leprosy scores",ylab:"After leprosy scores")
After treatment scores vs before
treatment scores


Generating data with given least squares estimates

Q. In Ex 2.38, we are not given any data. So are we supposed to generate artificial bivariate data? What sample size and correlation should be specified?

From the wording of this question, there is no need to generate any artificial bivariate data. However, if you really want to do so, the macro corsample in file corsample.mac can do it with a little work. You have to decide on plausible values for xbar, sx, and sy. From these you can work out ybar and r and then use corsample specifying all 5 values (r, xbar, ybar, sx and sy).

Here's how I did it.

Cmd> addmacrofile("corsample.mac")

Cmd> getmacros(corsample) # read in macro corsample
corsample   MACRO DOLLARS
) Macro to generate bivariate sample with given sample correlation
) coefficient
) Usage:
)   corsample(n,r[,xbar:xbar,ybar:ybar, sx:sx, sy:sy])
)   r                 REAL scalar between -1 and +1
)   xbar, ybar        nonMISSING REAL scalars
)   sx, sy            positive REAL scalars

Cmd> xbar <- 3500 # value in the middle of the range

Cmd> a <- 1389; b <- .96 # given slope and intercept

Cmd> ybar <- 1389 + b*xbar; ybar # value for ybar
(1)        4749

Cmd> sx <- 250 # about 1/4 the range of x

Cmd> sy <- 300 # plausible value

Cmd> r <- b*sx/sy; r # correlation determined from b, sx and sy
(1)          0.8

Cmd> # generate data

Cmd> data <- corsample(40,r,sx:sx,sy:sy,xbar:xbar,ybar:ybar)

Cmd> x <- data$x; y <- data$y # extract x and y from result

Cmd> regress("y=x")
Model used is y=x
                 Coef       StdErr            t
CONSTANT         1389       409.81       3.3893
x                0.96       0.1168       8.2192

N: 40, MSE: 33253, DF: 38, R^2: 0.64000
Regression F(1,38): 67.556, Durbin-Watson: 2.4009
To see the ANOVA table type 'anova()'

Cmd> # the intercept and slope are as prescribed

Cmd> describe(hconcat(x,y),mean:T,stddev:T)
component: mean
(1)        3500        4749
component: stddev
(1)         250         300

Cmd> # the means and SDs are as prescribed

Cmd> plot(x,y,symbols:"\1",\
   title:"Artificial data with given least squares line")
Artificial data with given least squares line

Note that this is not the same thing as generating data with given population correlation and population means and standard deviations. You can do that without a macro. Suppose you want bivariate data with n = 50, population correlation rhoxy = -.5 and means mux = 10 and muy = 30 with standard deviations sigmax = 7.5 and sigmay = 10.

Cmd> n <- 50; rho <- -.5

Cmd> mu_x <- 10; mu_y <- 30

Cmd> sigma_x <- 7.5; sigma_y <- 10

Cmd> x <- rnorm(n); y <- rho*x + sqrt(1 - rho^2)*rnorm(n)

Cmd> x <- mu_x + sigma_x*x

Cmd> y <- mu_y + sigma_y*y

Adding "up and over" lines to a graph

Q How can we use MacAnova to draw the "up-and-over" line asked for in Problem 2.38?

A Assuming you have just made the graph as in (a), the following will draw the up and over lines on the same graph.

Cmd> a <- 1389; b <- .96

Cmd> x <- 3300

Cmd> y <- a + b*x # (x,y) is on the line

Cmd> lowx <- 3000; lowy <- a + b*lowx

Cmd> addlines(vector(x,x,lowx),vector(lowy,y,y), linetype:2)

The values for lowx and lowy should be close to the left edge and the bottom edge of the plot, so the addlines() command draws 2 line segments, running vertically up from the bottom edge at x = 3300 to the line and then horizontally to the left edge. linetype:2 directs that the lines be dashed.


Drawing two categorical distributions in one graph

Q How can I make a graph describing the differences between the majors for women and men as asked in problem 2.88? Would some sort of a bar graph with side by side bars be right. How can I do that in MacAnova or should I use Excel?

A Yes, a bar graph with side by side bars for Male and Female would be a good way to display the differences. Unfortunately MacAnova doesn't currently have any function or macro that can do this directly (but see below). It would be OK to use Excel if that is easier for you.

One easy way to graph the percentages is just as line graphs of the percentages for females and for males against the numbers of the Major categories, 1, 2, 3 and 4. It's not hard to label these informatively, although that would not be essential.

Here's how you could do that, starting with entering the data

Cmd> table <- matrix(vector(68,91,5,61,56,40,6,59),4,labels:structure(\
     vector("Acctng","Admin","Econ","Finance"),\
		vector("Female","Male")))

Cmd> table # here's the table of data
              Female         Male
Acctng            68           56
Admin             91           40
Econ               5            6
Finance           61           59

Cmd> condOnSex <- 100*table /sum(table)

Cmd> condOnSex # Conditional distributions of major given sex
              Female         Male
Acctng        30.222       34.783
Admin         40.444       24.845
Econ          2.2222       3.7267
Finance       27.111       36.646

Now for the plotting.

Cmd> lineplot(run(1,4),condOnSex,symbols:vector("F","M"),\
   xticks:run(4),xticklabs:getlabels(condOnSex, 1),\
   xmin:.5,xmax:4.5,xlab:"Major",ylab:"Percent",ymin:0,\
   title:"Conditional distributions of major given sex")
Conditional distributions of major given sex

The second argument, condOnSex is a matrix with two columns and lineplot() plots each line separately. symbols:vector("F","M") sets plotting symbol "F" for column 1 and "M" for column 2. I made sure y = 0 was in the graph with ymin:0.

This is is adequate but a bar graph would be better. Here is one way to do it using macro bargraph. I put pairs of bars of width 1/3, centered at 1, 2, 3 and 4 with 3 bars of 0 height in between.

Cmd> edges <-run(2/3,4+1/3,1/3) #2/3,1,4/3,5/3,2,7/3,...,13/3

Cmd> heights <- rep(0,11) # there will be 11 bars with 3 of 0 height

Cmd> heights[vector(1,4,7,10)] <- condOnSex[,1] # bars for females

Cmd> heights[vector(2,5,8,11)] <- condOnSex[,2] # bars for males

Cmd> bargraph(edges,heights,xmin:0,xmax:5,ymax:1.1*max(heights),\
xticks:run(4),xticklabs:getlabels(condOnSex,1),show:F) #don't display yet

Cmd> addstrings(vector(5,11,17,23,7,13,19,25)/6,\
	vector(condOnSex[,1],condOnSex[,2])+1,\
	strings:vector(rep("F",4),rep("M",4)),show:F)#don't display

addstrings() adds symbols "F" and "M" one percentage point above the bars. In any given situation when you want to do something like this, it may take a little experimentation to find out how far above the bars to put the symbols. Now it's time to display the graph, adding axis labels and a title.

Cmd> showplot(title:"Percents of majors by gender",\
	ylab:"Percents",xlab:"Major") # now display with labels
Bargraph of Percents of majors by gender


kb@umn.edu

Updated Mon Jan 27 08:16:46 CST 2003