This file contains edited e-mail questions that have been asked about MacAnova and their answers.
Caution This file has not been updated from Spring 2001 and hence contains references to homework problems that currently don't apply. In addition, there may be new MacAnova features that could be used that are not mentioned. I will update this as I have time and add to it as questions come in.
Q. I am trying to create a bar graph to go along with question 1.14b in the first homework assignment using MacAnova. Could you give me some help in getting started?
A. Here is one way, using macro bargraph.
Cmd> # First enter the data
Cmd> deaths <- vector(41893, 13141, 3807, 3900, 7382) # from text
Cmd> alldeaths <- 90523 # from text
Cmd> # Calculate number from other causes by subtraction
Cmd> otherdeaths <- alldeaths - sum(deaths);otherdeaths
(1) 20400
Cmd> counts <- vector(deaths,otherdeaths) # combine in vector
Cmd> barnumber <- run(6) ; barnumber # numbers for the bars
(1) 1 2 3 4 5
(6) 6
Cmd> percents <- 100*counts/alldeaths # calculate percents
Cmd> graphicshelp(bargraph,usage:T) # see more on graphicshelp below
bargraph(edges,y [,save:T] [,draw:T] [,graphics keyword phrases]), REAL
vectors edges and y, length(edges) = length(y) + 1
bargraph(vector(leftedge[,width]), y, [,save:T] [,draw:T] ...), width >
0, leftedge REAL scalars, default for width = 1
Cmd> edges <- run(.5,6.5) # .5, 1.5, 2.5, ..., 6.5
Cmd> bargraph(edges,percents,title:"Accidental deaths percentages by cause",\
xlab:"Cause number", ylab:"Percent")
WARNING: searching for unrecognized macro bargraph near bargraph(
Cmd> # 1 = MV, 2 = Falls, 3 = drowning,4 = fires, 5 = poison, 6 = other
This, although not optimal, provides the necessary information. It is possible to actually replace 1, 2, ..., 6 with bar labels. Here's how.
Cmd> barlabels <- vector("M Veh","Fall","Drowning","Fire","Poison","Other")
Cmd> bargraph(edges,percents,title:"Accidental deaths percentages by cause",\
xticks:run(6),xticklab:barlabels,ylabs:"Percent")
This works because xticks:run(1,6) specifies there should be tick marks at 1, 2, ..., 6 and xticklabs:barlabels specifies that the elements of barlabels should be used to label the ticks instead of "1","2",... .
There are other fancier ways to do it as well.
I know that asking you to do things like this on a computer this early in the course is a real challenge, but that is sometimes the fastest way to learn about a program.
Q. How might I have found out about macro bargraph without asking someone?
A. You probably would not have because there are some missing cross references in help.
Here are two ways for finding about where to look for information for making graphs (besides An Introduction to MacAnova).
First you can use help() keyword keyCmd> help(key:"?") Type 'help(key:"heading")', where heading is in following list: ANOVA General Plotting Categorical Data Input Probabilities CHARACTER Variables LOGICAL Variables Random Numbers Combining Variables NULL Variables Regression Comparisons Macros Residuals Complex Arithmetic Matrix Algebra Structures Confidence Intervals Missing Values Syntax Control Multivariate Analysis Time Series Descriptive Statistics Operations Transformations Files Ordering Variables GLM Output
Plotting looks promising. Let's see what help topics relate to plotting.
Cmd> help(key:"plotting") # find help topics related to plotting
The following help topics concern Plotting
addchars colplot graph_keys resvsindex stringplot
addlines graphs graph_ticks resvsrankits tek
addpoints GRAPHWINDOWS hist resvsyhat tekx
addstrings graph_assign lineplot rowplot vboxplot
boxplot graph_border Mouse showplot vt
chplot graph_files plot stemleaf vtx
For help on topic foo, enter help(foo) or help("foo")
You can get help on any of these topics using help(), say help(chplot).
Unfortunately, none of these seems closely related to making bar graphs, except hist which you already know about.
There is a whole file of macros, graphics.mac related to making graphs. Just typing graphicshelp() summarizes what is there.
Cmd> graphicshelp() # information about macros for graphing
Help entries in this file
bargraph Macro do draw a bar graph
boxplot5num Macro to draw simplified boxplot based only on extremes,
and quartiles
colplot Macro to plot matrix colums vs row number
contour Macro to trace a level curve of a surface defined
on a rectangular grid
contourplot Macro to make a contour plot of a surface defined
on a rectangular grid
ellipse Macro to compute the outline of an ellipse and
optionally draw it
findcontour Macro to interactively find a contour line near
a point selected with the mouse
graphicshelp Macro to read help on macros in Graphics.mac
hist Macro to draw histogram
panelhist Macro to draw histogram in pane of panel graph
panelplot Macro to draw point, line or impulse plot in pane of
panel graph
panel_graphs Brief description of a panel graph.
piechart Macro to draw a pie chart of positive data
plotmatrix Macro to draw a scatter plot matrix, that is a panel
graph with plots of each column of a matrix against all
other columns or against all columns of another matrix
plotpanes Macro to draw point, line or impulse plots in several
panes of a panel graph
rowplot Macro to plot matrix rows vs column number
sampcdf Macro to plot a sample CDF
vboxplot Macro to make a vertical boxplot
Great! There is a macro bargraph which sounds like what you are looking for. You can use graphicshelp() to get more information about bargraph or any topic in graphics.mac.
Cmd> graphicshelp(bargraph) bargraph(edges, y [,graphics keyword phrases]), where edges and y are REAL vectors with edges[j+1] - edges[j] > 0 and length(edges) = length(y) + 1 draws a bar graph. Bar j has height y[j] and boundaries edges[j] and edges[j+1]. The graphics keyword may include 'xlab', 'ylab', 'title', keywords having to do with tick marks and so on. bargraph(vector(leftedge, width), y [,graphics keyword phrases]), where width > 0 and left edges are real scalars does the same except the bar edges are leftedge, leftedge + width, ...,leftedge + length(y)*width. bargraph(leftedge, y, ...) is the same a bargraph(vector(leftedge,1), y, ...). bargraph(x,y, keep:T [,draw:T] [,graphics keyword phrases]), where x is edges, vector(leftedge,width) or leftedge, returns structure(x:xvals, y:yvals [,graphics keyword phrases], lines:T) With draw:T, the graph also is drawn. This structure can be assigned to GRAPHWINDOWS[j] to draw the bar graph in window j or used with panelplot() to draw the bar graph in a panel graph (see topics panelplot() and 'panel_graphs') usage: bargraph(edges,y [,save:T] [,draw:T] [,graphics keyword phrases]) See also hist, 'graph_keys'.
There are several other files of macros, including regress.mac which contains macros related to regression analysis.
Cmd> regresshelp()
gives you a list of a all the macros in regress.mac with thumb nail descriptions.
Q. Is there any command or function in MacAnova can be used to make all the three types of histogram? I know how to make the density scale histogram but I cannot find a command for making frequency histograms or relative frequency histograms.
A. hist now allows you to make all three types of histograms. The default is to make a density scale histogram. However, if you include freq:T as an argument, the heights of the histogram bars will be frequencies (counts). If, instead,you use relfreq:T as an argument, the bar heights will be relative frequencies (counts/sample size). Here's an example, using the data in Example 1.5 of Moore and McCabe, Introduction to the Practice of Statistics.
Cmd> readdata("",dollars) # read data from file eq01_005.txt
# Data from Example 1.5, p. 12 in IPS Lines in file echoed by
# NOTE: Version from web site readdata
dollars saved as REAL vector
Cmd> hist(dollars,vector(0,10),freq:T,\
xlab:"Dollars spent",ylab:"Frequency",\
title:"Frequency histogram of data in IPS example 1.5")
Cmd> hist(dollars,vector(0,10),relfreq:T,\
xlab:"Dollars spent",ylab:"Relative frequency",\
title:"Relative frequency histogram of data in IPS example 1.5")
This won't work if your data consists of frequencies to start with. For that you need to use macro bargraph. Here is an example using data from Table 1.2 on p. 15 of Moore and McCabe
Cmd> frequencies <-vector(9,28,59,165,244,206,146,60,24,5,1)
Cmd> edges <- run(2,13)
Cmd> n <- sum(frequencies); n # sample size
(1) 947
Cmd> bargraph(edges,frequencies,ylab:"Frequency",\
title:"Frequencies of 7th grade vocabulary scores",xlab:"Vocabulary scores")
WARNING: searching for unrecognized macro bargraph near bargraph(
Cmd> bargraph(edges,frequencies/n,ylab:"Frequency",xlab:"Vocabulary scores",\
title:"Relative requencies of 7th grade vocabulary scores")
Q. Is there a single (or several) command(s) in MacAnova that we can use to create a frequency distribution for given intervals (equal or unequal)? I have been using stemleaf() to make a stemplot and then have counted the occurrences of observations within each interval. This is time-consuming and error-prone.
A. Yes, bin() does just that. Here is an example using bin() to make a histogram. Type help(bin) for more information.
Cmd> readdata("",dollars) # file eq01_005.txt
# Data from Example 1.5, p. 12 in IPS
# NOTE: Version from web site
dollars saved as REAL vector
Cmd> dollars
(1) 3.11 8.88 9.26 10.81 12.69
(6) 13.78 15.23 15.62 17 17.39
(11) 18.36 18.43 19.27 19.5 19.54
(16) 20.16 20.59 22.22 23.04 24.47
(21) 24.58 25.13 26.24 26.26 27.65
(26) 28.06 28.08 28.38 32.03 34.98
(31) 36.37 38.64 39.16 41.02 42.97
(36) 44.08 44.67 45.4 46.69 48.65
(41) 50.39 52.75 54.8 59.07 61.22
(46) 70.32 82.7 85.76 86.37 93.34
Cmd> usage(bin) # get skeleton of its usage
bin(x,Bnds [,silent:T,leftendin:T]), x a REAL matrix, Bnds REAL vector,
Bnds[k] < Bnds[k+1]
bin(x,vector(anchor,width) [,leftendin:T]), anchor and width > 0 REAL
scalars
bin(x,nbins, [leftendin:T]), nbins positive integer.
bin(x [,leftendin:T])
Cmd> stuff <- bin(dollars,vector(0,10))
Cmd> stuff
component: boundaries
(1) 0 10 20 30 40
(6) 50 60 70 80 90
(11) 100
component: counts
(1) 3 12 13 5 7
(6) 4 1 1 3 1
Cmd> bargraph(stuff$boundaries,stuff$counts,title:"Frequency histogram",\
xlab:"Dollars spent",ylab:"Frequency")
The form of the second argument (vector(0,10)) to bin() is vector(anchor,width). All classes had width = 10 and the boundaries were integer multiples of width away from anchor = 0. You can also explicitly provide the boundaries. The following will give exactly the same results.
Cmd> stuff <- bin(dollars,run(0,100,10))
By default, bin() puts values on the class boundaries in the class to its left, that is the right end of a class is included in a class. Another convention that is used is to put a value on a class boundary in the class to its right, that is to include the left boundary of a class in that class. You specify that with leftendin:T as an argument to bin(). With these data there are no such values so it isn't a problem.
Q. When I run the bargraph macro, it produces an error saying that I didn't use the right number of bar boundaries. I cannot figure out how to remedy this problem.
A. When you use bargraph you neet to provide one more boundary than there are bars. Here is an example using data from Table 1.2 on p. 15 of Moore and McCabe
Cmd> frequencies <-vector(9,28,59,165,244,206,146,60,24,5,1)
Cmd> edges <- run(2,13)
Cmd> n <- sum(frequencies); n # sample size
(1) 947
Cmd> bargraph(edges,frequencies,xlab:"Vocabulary scores",ylab:"Frequency",\
title:"Frequencies of 7th grade vocabulary scores")
WARNING: searching for unrecognized macro bargraph near bargraph(
That works fine because there are 11 counts and edges is the same as run(2,13) which has 12 = 11 + 1 elements. When you add an extra element to the boundaries or have too few elements, you run into trouble.
Cmd> bargraph(vector(edges,14),frequencies,xlab:"Vocabulary scores",\
ylab:"Frequency",\
title:"Frequencies of 7th grade vocabulary scores")
ERROR: number of bar edges != length(y) + 1 in macro bargraph
Cmd> bargraph(edges[-12],frequencies,xlab:"Vocabulary scores",\
ylab:"Frequency",\
title:"Frequencies of 7th grade vocabulary scores")
ERROR: number of bar edges != length(y) + 1 in macro bargraph
So the answer is to make sure the number of bar boundaries is one more than the number of bars.
Q. When I use plot(), the plot shows up in a separate window. but I do not know how to make the plot appear in the input/output window continue to work. Can you help?
A. MacAnova makes two types of plots - high resolution and low resolution. You always get a high resolution plot unless dumb:T is an argument to the plotting command (for example, plot(height,weight,dumb:T) or hist(x,12,dumb:T)).
In any version of MacAnova (Macintosh, Windows, DOS, Unix, Motif), low resolution plots are just like any output. They consist of ordinary printable characters. In Windows and Macintosh the plot is in the regular output window and this type of graph is the only type that ever does show up in the input/output window. If you save that window, the low resolution plots are saved as part of it.
In any version, if you use spool() to save a record of commands and output, low resolution plots will be written to the spool file just like other output.
On a Macintosh or Windows computer, these go in separate windows and do not go in the input/output window. The easiest way to save them is to copy them to a word processor document.
When a graph window is in front, select Copy on the Edit menu to copy the high resolution plot to the Clipboard. Then switch over to (or start up and switch over to) a word processor of your choice (Microsoft Word, WordPerfect, ...) and use Paste on the Edit menu to put the graph in a word processor document.
You can, of course, use Copy and Paste to move commands and output from the input/output window to a word processor document. In this way you can keep a copy of everything you want.
Q. I'd like to use MacAnova for my statistical analysis but all my data are in an Excel format. Is there a way I can convert the Excel file to something MacAnova will read, or do I need to enter the data by hand?
A. Using the Windows or Macintosh versions, there is a very easy way to export data from Excel to MacAnova - use the Clipboard and MacAnova macros fromclip or clipreaddata.
1. In Excel, select the data, in the form of a set of consecutive spread sheet rows and columns and copy it to the Clipboard using Copy on the Edit menu. Suppose there are k = 10 variables, each in one of 10 columns.
2. Start up MacAnova and do the following:
Cmd> k <- 10 # number of columns Cmd> exceldata <- fromclip(k)
Variable exceldata should now contain the Excel data in a rectangular table (matrix) with 10 columns. If you want to work with the variables separately, then
Cmd> makecols(exceldata, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10)
will create variables x1, x2, ... (or whatever legal MacAnova names you use).
If the data are not in consecutive columns of the spread sheet, you can move individual variables or sets of consecutive columns one at a time using the same mechanism. If you are copying only one column to the clipboard,
Cmd> x1 <- fromclip()
will take it from the clipboard.
There is a new macro clipreaddata which makes things easier. It combines fromclip() and makecols. Suppose part of the spread sheet looks something like this:| specimen | tiptype | depth |
| 1 | A | 9.3 |
| 2 | A | 9.4 |
| 3 | A | 9.6 |
| 4 | A | 10.0 |
| 1 | B | 9.4 |
| 2 | B | 9.3 |
| 3 | B | 9.8 |
| 4 | B | 9.9 |
| 1 | C | 9.2 |
| 2 | C | 9.4 |
| 3 | C | 9.5 |
| 4 | C | 9.7 |
| 1 | D | 9.7 |
| 2 | D | 9.6 |
| 3 | D | 10.0 |
| 4 | D | 10.2 |
Use the mouse to select all but the first row and then select Copy on the Edit menu. Then switch to MacAnova.
Cmd> clipreaddata(specimen,tiptype,depth) # rows 2 through 17 selected specimen saved as REAL vector tiptype saved as factor depth saved as REAL vector Cmd> list(specimen,tiptype,depth) depth REAL 16 specimen REAL 16 tiptype REAL 16 FACTOR with 4 levels
clipreaddata has automatically translated the character codes for the second column into a factor.
If you use the mouse to select all the rows, including the first one, clipreaddata will take the variable names from the first line.
Cmd> clipreaddata() # rows 1 through 18 selected specimen saved as REAL vector tiptype saved as factor depth saved as REAL vector
Alternatively, it is possible to have Excel write data to a plain text (ASCII) file with no headers or other information, using some item on the File menu. Suppose you wrote the same 10 variables out into a file myexcel.dat with 10 columns, 1 for each variable. Then you can read it in as an array of 10 columns by
Cmd> exceldata <- matrix(vecread("myexcel.dat"),10)'
(note the trailing ' to indicate transpose).
If you want to read each column directly into a named MacAnova variable, use
Cmd> readdata("myexcel.dat", x1, x2, x3, x4, x5, x6, x7, x8, x9, x10) # or readcols
The number of names must match the number of columns.
Since all versions of MacAnova can read data files, you can do this with DOS and Unix versions, too.
Q. I have a question concerning MacAnova. Can one do back-to-back stemplots? I guess, I am having troubles with question 1.20 of the homework. I am also confused about 3 digit numbers since the examples so far have been with two digit numbers.
A. No, MacAnova doesn't do back-to-back stemplots at this time. What I have sometimes done is to make two stemplots, making sure the stems were compatible, and then copy the leaves from one stemplot to the other side of the stems of the other. It's messy and you have to add spaces to make things line up, but it works.
Cmd> x120w <- enter(154 109 137 115 152 140 154 178 101 103 126 126\
137 165 165 129 200 148)
Cmd> x120m <- enter(108 140 114 91 180 115 126 92 169 146 109 132 75\
88 113 151 70 115 187 104)
Cmd> stemleaf(x120w,depth:F,outlier:F) # suppress depth and outliers
10|139
11|5
12|669
13|77
14|08
15|244
16|55
17|8
18|
19|
20|0
1|1 represents 11 Leaf digit unit = 1
Cmd> stemleaf(x120m,depth:F,outlier:F) # the other stemplot
7|05
8|8
9|12
10|489
11|3455
12|6
13|2
14|06
15|1
16|9
17|
18|07
1|1 represents 11 Leaf digit unit = 1
Now using Copy and Paste on the Edit menu, followed by careful editing I combined the two. I copied leaves from the first stemplot, and pasted them to the left of the stems of the second plot, inserting |, and then tidying up. Here is what I ended up with.
| 7|05
| 8|8
| 9|12
139|10|489
5|11|3455
669|12|6
77|13|2
08|14|06
244|15|1
55|16|9
8|17|
|18|07
|19|
0|20|
1|1 represents 11 Leaf digit unit = 1
This is an interesting example in that the best stems are actually two digit numbers (most of them). This happens when most of the data have the same first digit, here "1". When most of the first two digits are the same, as would be the case if you added 200 to each of the numbers, then you might end up with three digit stems, say 207, 208, 209, ..., 220. If you added 2000, you would need four digit stems, 2007, 2008, ..., 2020.
There are so many possibilities it's hard to cover them all. Moreover, in some cases, there is no single right answer.
Q. For Problem 1.68, is there a slick way to do the trimmed mean. I know how to do it mathematically, but is there a command that is "neater?"
A There is no command at present. Here's how it can be done fairly slickly.
Cmd> y # here's some data
(1) 110.46 110.36 110.06 110 109.66
(6) 108.89 112.23 109.17 107.92 110.97
(11) 111.02 110.96 109.8 111.08 108.42
(16) 109.08 110.37 110.74 109.47 113.59
Cmd> stemleaf(y) # it has one apparent outlier
1 107.|9
2 108*|4
3 108.|8
6 109*|014
9 109.|679
( 4) 110*|0334
7 110.|799
4 111*|00
2 111.|
2 112*|2
High 1135
1*|1 represents 1.1 Leaf digit unit = 0.1
Cmd> y <- sort(y); y# order the data
(1) 107.92 108.42 108.89 109.08 109.17
(6) 109.47 109.66 109.8 110 110.06
(11) 110.36 110.37 110.46 110.74 110.96
(16) 110.97 111.02 111.08 112.23 113.59
Cmd> n <- nrows(y) # sample size (20)
Cmd> p <- .1 # proportion to trim off each end
Cmd> k <- ceiling(p*n); k # number to trim from each end
(1) 2
Cmd> y[run(k+1,n-k)] # trimmed data (central 16 (80%) of data)
(1) 108.89 109.08 109.17 109.47 109.66
(6) 109.8 110 110.06 110.36 110.37
(11) 110.46 110.74 110.96 110.97 111.02
(16) 111.08
Cmd> describe(y[run(k+1,n-k)],mean:T) #mean of trimmed data
(1) 110.13
This whole thing could be put in a macro. For anyone who is interested, here's one way to do it.
Cmd> trimmean <- macro("if ($v != 2){error(\"usage: $S(y,p)\")}
@y <- sort($1)
@p <- $2
if (@p >= .5){@p <- @p/100} # convert to a proportion
if (@p >= .5){error(\"Trimming percent or proportion too big\")}
@n <- nrows(@y)
@k <- ceiling(@p*@n)
if (2*@k >= @n){@k <- @k - 1}
describe(@y[run(@k+1,@n-@k),],mean:T) # value will be returned ",dollars:T)
Cmd> trimmean(y,.10) # it works with proportion
(1) 110.13
Cmd> trimmean(y,10) # it also works with a percent
(1) 110.13
If you are interested, you can learn about the syntax of commands in a macro from help topic macro_syntax. The \ before " is required because the whole text of the macro is included in "...".
Here what the macro would look like in a macro file:
trimmean MACRO DOLLARS
if ($v != 2){error("usage: $S(y,p)")}
@y <- sort($1)
@p <- $2
if (@p >= .5){@p <- @p/100} # convert to a proportion
if (@p >= .5){error("Trimming percent or proportion too big")}
@n <- nrows(@y)
@k <- ceiling(@p*@n)
if (2*@k >= @n){@k <- @k - 1}
describe(@y[run(@k+1,@n-@k),],mean:T) # value will be returned
%trimmean%
If you made a file named, say, trimmean.mac containing these lines, you could read the macro by
Cmd> trimmean <- read("","trimmean")
trimmean MACRO DOLLARS
After typing the command, you would find file trimmean.mac in the file navigation dialog box. The line
trimmean MACRO DOLLARS
is printed once the file has been found.
If such a macro were to be distibuted with MacAnova, I would add some checking of the arguments, making sure that 0 < p < 100 and giving a warning if there are MISSING values.
Q. In problem 1.74, we are asked to make a density curve with a normal distribution of N(63.1,4.8). I tried without success the following:
Cmd> x <- 63.1+4.8*rnorm(100) Cmd> y <- run(1,100) Cmd> lineplot(y,x)
A. You can plot a normal density curve in at least two ways. I apologize for not yet having posted a handout on plotting densities. Here is a demonstration for Ex 1.73.
First way (white box):
You need the equation for a normal density curve on p. 71. This includes e-(x-mu)2/(2*sigma2). In MacAnova you can compute ex by exp(x).
Cmd> z <- run(-3.5,3.5,.1) # equally spaced by .1 from -3.5 to 3.5 Cmd> mu <- 1.45; sigma <- 0.4 Cmd> x <- mu + sigma*z # values at which density computed Cmd> curve <- exp(-(x - mu)^2/(2*sigma^2))/(sqrt(2*PI)*sigma) Cmd> # the preceding is just a direct translation of the formula on Cmd> # on page 71 of Moore and McCabe Cmd> lineplot(x,curve,title:"N(1.45,0.4) density") lineplot(x,densnor(![]()
A second way requires downloading the macro file densities.mac posted on the Web page.
The following requires that densities.mac is in the same folder as MacAnova.
Cmd> addmacrofile("densities.mac")
Cmd> lineplot(x,densnor(x,mu,sigma),title:"N(1.45,0.4) density")
WARNING: searching for unrecognized macro densnor near
lineplot(x,densnor(
What you were doing was incorrect in two ways:
Q Please explain how you would use MacAnova to make a plot that represents both categorical and numerical data.
A. Here's how you might produce the graph I showed at the end of the hour on Friday, February 2, 2001. The data are actually taken from Table 18.2.1 on p. 368 of the most widely cited scientific book of all time, Statistical Methods by Snedecor and Cochran, 7th Edition (for the first 5 editions it was just by Snedecor).
Cmd> # first enter the data
Cmd> before <- vector(11,8,5,14,19,6,10,6,11,3,6,6,7,8,18,8,19,8,5,15,\
16,13,11,9,21,16,12,12,7,12)
Cmd> after <- vector(6,0,2,8,11,4,13,1,8,0,0,2,3,1,18,4,14,9,1,9,13,10,\
18,5,23,12,5,16,1,20)
Cmd> treatletter <- vector(rep("A",10),rep("B",10),rep("C",10))
Cmd> paste(treatletter) # one way to look at the vector
(1) "A A A A A A A A A A B B B B B B B B B B C C C C C C C C C C"
Cmd> treatcode <- makefactor(treatletter) # create numerical code
Cmd> print(treatcode,format:"1.0f")
treatcode:
(1) 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
Cmd> plot(before,after,symbols:treatletter,\
title:"After treatment scores vs before treatment scores",\
xlab:"Before leprosy scores",ylab:"After leprosy scores")
Here's another way to make up the symbols, using treatcode as a vector to select elements from the short vector vector("A","B","C"):
Cmd> plot(before,after,symbols:vector("A","B","C")[treatcode],\
title:"After treatment scores vs before treatment scores",\
xlab:"Before leprosy scores",ylab:"After leprosy scores")
Or, if you want numbers instead of letters, you can use treatcode directly.
Cmd> plot(before,after,symbols:treatcode,\ title:"After treatment scores vs before treatment scores",\ xlab:"Before leprosy scores",ylab:"After leprosy scores")![]()
Q. In Ex 2.38, we are not given any data. So are we supposed to generate artificial bivariate data? What sample size and correlation should be specified?
From the wording of this question, there is no need to generate any artificial bivariate data. However, if you really want to do so, the macro corsample in file corsample.mac can do it with a little work. You have to decide on plausible values for xbar, sx, and sy. From these you can work out ybar and r and then use corsample specifying all 5 values (r, xbar, ybar, sx and sy).
Here's how I did it.
Cmd> addmacrofile("corsample.mac")
Cmd> getmacros(corsample) # read in macro corsample
corsample MACRO DOLLARS
) Macro to generate bivariate sample with given sample correlation
) coefficient
) Usage:
) corsample(n,r[,xbar:xbar,ybar:ybar, sx:sx, sy:sy])
) r REAL scalar between -1 and +1
) xbar, ybar nonMISSING REAL scalars
) sx, sy positive REAL scalars
Cmd> xbar <- 3500 # value in the middle of the range
Cmd> a <- 1389; b <- .96 # given slope and intercept
Cmd> ybar <- 1389 + b*xbar; ybar # value for ybar
(1) 4749
Cmd> sx <- 250 # about 1/4 the range of x
Cmd> sy <- 300 # plausible value
Cmd> r <- b*sx/sy; r # correlation determined from b, sx and sy
(1) 0.8
Cmd> # generate data
Cmd> data <- corsample(40,r,sx:sx,sy:sy,xbar:xbar,ybar:ybar)
Cmd> x <- data$x; y <- data$y # extract x and y from result
Cmd> regress("y=x")
Model used is y=x
Coef StdErr t
CONSTANT 1389 409.81 3.3893
x 0.96 0.1168 8.2192
N: 40, MSE: 33253, DF: 38, R^2: 0.64000
Regression F(1,38): 67.556, Durbin-Watson: 2.4009
To see the ANOVA table type 'anova()'
Cmd> # the intercept and slope are as prescribed
Cmd> describe(hconcat(x,y),mean:T,stddev:T)
component: mean
(1) 3500 4749
component: stddev
(1) 250 300
Cmd> # the means and SDs are as prescribed
Cmd> plot(x,y,symbols:"\1",\
title:"Artificial data with given least squares line")
Note that this is not the same thing as generating data with given population correlation and population means and standard deviations. You can do that without a macro. Suppose you want bivariate data with n = 50, population correlation rhoxy = -.5 and means mux = 10 and muy = 30 with standard deviations sigmax = 7.5 and sigmay = 10.
Cmd> n <- 50; rho <- -.5 Cmd> mu_x <- 10; mu_y <- 30 Cmd> sigma_x <- 7.5; sigma_y <- 10 Cmd> x <- rnorm(n); y <- rho*x + sqrt(1 - rho^2)*rnorm(n) Cmd> x <- mu_x + sigma_x*x Cmd> y <- mu_y + sigma_y*y
Q How can we use MacAnova to draw the "up-and-over" line asked for in Problem 2.38?
A Assuming you have just made the graph as in (a), the following will draw the up and over lines on the same graph.
Cmd> a <- 1389; b <- .96 Cmd> x <- 3300 Cmd> y <- a + b*x # (x,y) is on the line Cmd> lowx <- 3000; lowy <- a + b*lowx Cmd> addlines(vector(x,x,lowx),vector(lowy,y,y), linetype:2)
The values for lowx and lowy should be close to the left edge and the bottom edge of the plot, so the addlines() command draws 2 line segments, running vertically up from the bottom edge at x = 3300 to the line and then horizontally to the left edge. linetype:2 directs that the lines be dashed.
Q How can I make a graph describing the differences between the majors for women and men as asked in problem 2.88? Would some sort of a bar graph with side by side bars be right. How can I do that in MacAnova or should I use Excel?
A Yes, a bar graph with side by side bars for Male and Female would be a good way to display the differences. Unfortunately MacAnova doesn't currently have any function or macro that can do this directly (but see below). It would be OK to use Excel if that is easier for you.
One easy way to graph the percentages is just as line graphs of the percentages for females and for males against the numbers of the Major categories, 1, 2, 3 and 4. It's not hard to label these informatively, although that would not be essential.
Here's how you could do that, starting with entering the data
Cmd> table <- matrix(vector(68,91,5,61,56,40,6,59),4,labels:structure(\
vector("Acctng","Admin","Econ","Finance"),\
vector("Female","Male")))
Cmd> table # here's the table of data
Female Male
Acctng 68 56
Admin 91 40
Econ 5 6
Finance 61 59
Cmd> condOnSex <- 100*table /sum(table)
Cmd> condOnSex # Conditional distributions of major given sex
Female Male
Acctng 30.222 34.783
Admin 40.444 24.845
Econ 2.2222 3.7267
Finance 27.111 36.646
Now for the plotting.
Cmd> lineplot(run(1,4),condOnSex,symbols:vector("F","M"),\
xticks:run(4),xticklabs:getlabels(condOnSex, 1),\
xmin:.5,xmax:4.5,xlab:"Major",ylab:"Percent",ymin:0,\
title:"Conditional distributions of major given sex")
The second argument, condOnSex is a matrix with two columns and lineplot() plots each line separately. symbols:vector("F","M") sets plotting symbol "F" for column 1 and "M" for column 2. I made sure y = 0 was in the graph with ymin:0.
This is is adequate but a bar graph would be better. Here is one way to do it using macro bargraph. I put pairs of bars of width 1/3, centered at 1, 2, 3 and 4 with 3 bars of 0 height in between.
Cmd> edges <-run(2/3,4+1/3,1/3) #2/3,1,4/3,5/3,2,7/3,...,13/3
Cmd> heights <- rep(0,11) # there will be 11 bars with 3 of 0 height
Cmd> heights[vector(1,4,7,10)] <- condOnSex[,1] # bars for females
Cmd> heights[vector(2,5,8,11)] <- condOnSex[,2] # bars for males
Cmd> bargraph(edges,heights,xmin:0,xmax:5,ymax:1.1*max(heights),\
xticks:run(4),xticklabs:getlabels(condOnSex,1),show:F) #don't display yet
Cmd> addstrings(vector(5,11,17,23,7,13,19,25)/6,\
vector(condOnSex[,1],condOnSex[,2])+1,\
strings:vector(rep("F",4),rep("M",4)),show:F)#don't display
addstrings() adds symbols "F" and "M" one percentage point above the bars. In any given situation when you want to do something like this, it may take a little experimentation to find out how far above the bars to put the symbols. Now it's time to display the graph, adding axis labels and a title.
Cmd> showplot(title:"Percents of majors by gender",\ ylab:"Percents",xlab:"Major") # now display with labels![]()
Updated Mon Jan 27 08:16:46 CST 2003