A Few Statistical Packages for Regression Analysis
Virtually all of applied statistics requires the use of a computer program. Usually, special purpose statistical packages are to be preferred, because programs that are designed for another purpose, like data base management or a spreadsheet, will be inadequate for comprehensive data analysis. We describe below five of the most widely used statistical packages, and one package written especially for regression. Four of the six are commercial products. Three or four of them are used primarily through a graphical user interface, while with the others typing commands or writing programs is more common. All of them can be used for many of the computations described in Applied Linear Regression, 3rd Edition.
If you have never used a statistical package before, check the web sites for the products described below before deciding on the one you want to learn. If you are impatient, you can skip to our advice.
- R
-
R is a "command line" statistical package. The user
types a statement requesting a computation or a graph, and it is
executed immediately. R is also a programming language that
allows automating repetitive tasks. Nearly everything in Applied Linear Regression, third edition
can be done using R, and, indeed almost all the computations done
in the book were done using R. R is a favorite program among
academic statisticians because it is free, works on Windows,
Linix/Unux and Macintosh, and can be used in a great variety of
problems. There is also a large literature developing on using R
for statistical problems. The main web site for R is
www.r-project.org. From this web site you can get to the
page for downloading R by clicking on the link for CRAN, which is an acronym for the Comprehensive R Archive Network, or go
directly to cran.r-project.org; or in the US
cran.us.r-project.org.
Documentation in PDF files for R is included with the program. These files
can also be downloaded
from the R web site. Several books are also available.
Strongly recommended books include Fox (2002), which provides a fairly
gentle introduction to R. Some of the functions discussed by Fox
that are not in the base R program are used in Applied
Linear Regression. A more comprehensive introduction to R is
Venables and Ripley (2002). Venables and Ripley uses more
computerese than does Fox's book, but its coverage is greater and
you will be able to use this book for more than linear regression.
Other books on R include Maindonald and Braun (2002), Venables and
Smith (2002), and Dalgaard (2002). We used R Version 2.0.0 on
Windows and Linux when we wrote the primers available on this
web site. A new version of R is released twice a year, so the
version you use will probably be newer. If you have a fast
internet connection, downloading and upgrading R is easy, and you
should do it regularly.
- S-Plus
- S-Plus is very similar to R, and most commands that work in R also
work in S-Plus. Both are variants of a statistical language
called "S" that was written at Bell Laboratories before the
breakup of AT&T. Unlike R, S-Plus is a commercial product, which
means that it is not free, although favorable licensing is
available for students. For more information, see the web site of
the publisher, www.insightful.com/products/splus.
In addition to the command line interface, S-Plus also has a "GUI," or graphical
user interface that allows many modelling tasks using menu items and dialogs. We don't have much experience with the GUI.
As with R,
we have prepared a library of functions that extends the capability of
S-Plus so
that it can do virtually everything that is discussed in Applied Linear
Regression, 3rd ed.. The key difference between R and S-Plus is the
level of support. R is free, so all support is through written materials.
S-Plus is a commercial product, with a well-respected company providing user
support.
If you are using S-Plus on a Windows machine, you probably have
the manuals that came with the program. If you are using
Linux/Unix, you may not have the manuals, but they are available
on-line as PDF files from
www.insightful.com/support/doc_splus_win.asp. Chambers
and Hastie (1993) provides the basics of fitting models with S
languages like S-Plus and R. For a more general reference, we
again recommend Fox (2002) and Venables and Ripley (2002), as we
did for R.
- SAS
- SAS is the largest and most widely distributed statistical package.
Most users would view SAS as a batch language, meaning that
the user writes a few lines of SAS commands, submits them to an
interpreter to be executed, and then a large amount of output is
returned. For the experienced SAS user, interaction between the user and the
program is very similar to R or S-Plus. SAS also has a GUI that allows access to some SAS commands without writing programs.
The main web site for SAS is www.sas.com, although
support.sas.com
is probably a more reasonable starting
point for students.
SAS is very widely documented, including hundreds of books
available through amazon.com and extensive on-line documentation and help files included with the program.
Muller and Fetterman (2003) is dedicated particularly to
regression.
- JMP
- JMP is another product of SAS Institute, and was designed around a
graphical user interface, so most interaction between
the user and the program is through selecting items from menus and
from dialogs. Students may purchase a yearly license for JMP for a modest price.
The JMP website is
www.jmp.com. The JMP primer
available for Applied Linear Regression, third edition is based on
version 5 of JMP; version 7 was released in 2007.
Because of its well designed GUI, JMP is probably the easiest program to get started using, particularly if you are learning on your own. However, some of the useful features of this program are fairly well
hidden, and as you get more experienced you may notice the limitations of
working with a GUI. Several topics covered in the book are simply not
possible with the GUI in JMP, although some of them could be programmed in JMP's
scripting language.
The book by Sall, Creighton and
Lehman (2005), which can be viewed as both an elementary
statistics textbook and a manual for JMP. JMP
includes very extensive manuals.
The book by Fruend, Littell and Creighton (2003) discusses
JMP specifically for regression.
- SPSS
- SPSS has evolved from a batch program to have a graphical user
interface. Like SAS, SPSS has many sophisticated tools for data
base management. A student version is available. The web site for
SPSS is www.spss.com. SPSS offers hundreds of pages of
documentation, including SPSS (2003), with Chapter 26 dedicated to
regression models. In mid-2004, amazon.com listed more than two
thousand books for which "SPSS" was a keyword. The SPSS primer we have prepared uses only the GUI, not the batch facilities of the program.
- Other programs
-
This is hardly an exhaustive list of programs that could be used
for regression analysis. There are many programs that can be used for linear regression analysis, some of them as good or better than the programs mentioned above. One program missing from the list of programs for regression
analysis is Microsoft's spreadsheet program Excel. While a
few of the methods described in the book can be computed or
graphed in Excel, most would require great endurance, patience and programming
on the part of the user. There are many add-on statistics
programs for Excel, and one of these may be useful for
comprehensive regression analysis; we don't know. If something
works for you, please let us know!
- Arc
-
A final package for regression that we should mention is called
Arc. Arc is free software, available from
www.stat.umn.edu/arc. Like JMP and SPSS, it has a graphical user interface, so all
computations in Applied Linear Regression can be done either
via point-and-click or by entries in dialogs. Unlike JMP and SPSS,
Arc is not a general-purpose statistics package, but is
specifically designed for regression analysis. Arc also includes
access to a computer language, although the language, lisp, is
considerably harder to learn than the S, SAS and SPSS languages. The use of Arc
is described in Cook and Weisberg (1999); see
www.stat.umn.edu/arc.
Advice to the bleary-eyed
In a perfect world, the you could separate learning the ideas and methods of regression from learning to use a statistical package. Any of R, S-Plus, JMP, SPSS or SAS can get in the way and force you to focus on computing issues rather than on statistics. If you can tolerate a command line interface, we recommend using R or S-Plus because, with the use of the functions we have written, these two programs let you do almost everything in the book. However, many people just can't stand the command line interface. It requires lots of memorization of commands and sometimes obscure syntax, and can bring anyone to the point of tears with unexpected behavior and inconsistencies. Our experience is that people quickly forget how to use a command line program, so if you are learning regression now, but don't expect to use it until later, command line programs might not work for you. The scripts for R and S-Plus that we have prepared, however, may allow you to relearn a program quickly.
JMP and SPSS (and, we think, S-Plus and SAS with the GUI) are considerably easier to use, but limited in that they easily do only what the writers of the program thought was important, giving only a subset of the methodology discussed in the book. If the thought of computer programming gives you a headache, you might be happier using one of these programs.
Most SAS users (and many SPSS users) write, submit, and then save programs, so when you do similar computations later, you have a script that you can follow. One of the themes of Applied Liner Regression is that regression is fundamentally unscripted, and requires many interactive steps in which the analyst looks at graphs and statistics before deciding on the next step. SAS and SPSS
programs do not encourage this sort of interaction, and so we think
SAS and SPSS are the most difficult of these programs for the nonexpert
to use. SAS and SPSS experts may not share this point of view.
Return to Applied Linear Regression
Home
- Chambers, J. and Hastie, T. (eds.) (1993). Statistical
Models in S. Boca Raton, FL: CRC Press.
- Cook. R. D. and Weisberg, S. (1999). Applied Regression
Including Computing and Graphics. New York: Wiley.
- Dalgaard, Peter (2002). Introductory Statistics with R. New
York: Springer.
- Fox, John (2002). An R and S-Plus Companion to Applied
Regression. Thousand Oaks, CA: Sage.
- Muller, K. and Fetterman, B. (2003). Regression and ANOVA:
An Integrated Approach using SAS Software. Cary, NC: SAS
Institute, Inc., and New York: Wiley.
- Fruend, R., Littell, R. and Creighton, L. (2003). Regression
Using JMP. Cary, NC: SAS Institute, Inc., and New York: Wiley.
- Maindonald, J. and Braun, J. (2003). Data Analysis and
Graphics Using R. Cambridge: Cambridge University Press.
- Sall, J., Creighton, L. and Lehman, A. (2005). JMP Start
Statistics, Third Edition. Cary, NC: SAS Institute, and Pacific
Grove, CA: Duxbury.
- SPSS (2003). SPSS Base 12.0 User's Guide. Chicago, IL:
SPSS, Inc.
- Venables, W. and Ripley, B. (2002). Modern Applied
Statistics with S, 4th edition. New York: Springer.
- Venables, W. and Smith, D. (2002). An Introduction to R.
Network Theory, Ltd. This is included with the distribution of R.
With the Windows or Macintosh versions, select Help->Manuals->An
Introduction to R. On Linux/Unix, type R RHOME to find out the
location of R on your system; on our system it is /APPS/8.2/lib/R.
This manual is then located in
/APPS/8.2/lib/R/doc/manual/R-intro.pdf.
Return to Applied Linear Regression
Home
S Weisberg
2007-08-07