A Few Statistical Packages for Regression Analysis

Virtually all of applied statistics requires the use of a computer program. Usually, special purpose statistical packages are to be preferred, because programs that are designed for another purpose, like data base management or a spreadsheet, will be inadequate for comprehensive data analysis. We describe below five of the most widely used statistical packages, and one package written especially for regression. Four of the six are commercial products. Three or four of them are used primarily through a graphical user interface, while with the others typing commands or writing programs is more common. All of them can be used for many of the computations described in Applied Linear Regression, 3rd Edition.

If you have never used a statistical package before, check the web sites for the products described below before deciding on the one you want to learn. If you are impatient, you can skip to our advice.

R
R is a "command line" statistical package. The user types a statement requesting a computation or a graph, and it is executed immediately. R is also a programming language that allows automating repetitive tasks. Nearly everything in Applied Linear Regression, third edition can be done using R, and, indeed almost all the computations done in the book were done using R. R is a favorite program among academic statisticians because it is free, works on Windows, Linix/Unux and Macintosh, and can be used in a great variety of problems. There is also a large literature developing on using R for statistical problems. The main web site for R is www.r-project.org. From this web site you can get to the page for downloading R by clicking on the link for CRAN, which is an acronym for the Comprehensive R Archive Network, or go directly to cran.r-project.org; or in the US cran.us.r-project.org.

Documentation in PDF files for R is included with the program. These files can also be downloaded from the R web site. Several books are also available. Strongly recommended books include Fox (2002), which provides a fairly gentle introduction to R. Some of the functions discussed by Fox that are not in the base R program are used in Applied Linear Regression. A more comprehensive introduction to R is Venables and Ripley (2002). Venables and Ripley uses more computerese than does Fox's book, but its coverage is greater and you will be able to use this book for more than linear regression. Other books on R include Maindonald and Braun (2002), Venables and Smith (2002), and Dalgaard (2002). We used R Version 2.0.0 on Windows and Linux when we wrote the primers available on this web site. A new version of R is released twice a year, so the version you use will probably be newer. If you have a fast internet connection, downloading and upgrading R is easy, and you should do it regularly.

S-Plus
S-Plus is very similar to R, and most commands that work in R also work in S-Plus. Both are variants of a statistical language called "S" that was written at Bell Laboratories before the breakup of AT&T. Unlike R, S-Plus is a commercial product, which means that it is not free, although favorable licensing is available for students. For more information, see the web site of the publisher, www.insightful.com/products/splus. In addition to the command line interface, S-Plus also has a "GUI," or graphical user interface that allows many modelling tasks using menu items and dialogs. We don't have much experience with the GUI.

As with R, we have prepared a library of functions that extends the capability of S-Plus so that it can do virtually everything that is discussed in Applied Linear Regression, 3rd ed.. The key difference between R and S-Plus is the level of support. R is free, so all support is through written materials. S-Plus is a commercial product, with a well-respected company providing user support.

If you are using S-Plus on a Windows machine, you probably have the manuals that came with the program. If you are using Linux/Unix, you may not have the manuals, but they are available on-line as PDF files from www.insightful.com/support/doc_splus_win.asp. Chambers and Hastie (1993) provides the basics of fitting models with S languages like S-Plus and R. For a more general reference, we again recommend Fox (2002) and Venables and Ripley (2002), as we did for R.

SAS
SAS is the largest and most widely distributed statistical package. Most users would view SAS as a batch language, meaning that the user writes a few lines of SAS commands, submits them to an interpreter to be executed, and then a large amount of output is returned. For the experienced SAS user, interaction between the user and the program is very similar to R or S-Plus. SAS also has a GUI that allows access to some SAS commands without writing programs. The main web site for SAS is www.sas.com, although support.sas.com is probably a more reasonable starting point for students.

SAS is very widely documented, including hundreds of books available through amazon.com and extensive on-line documentation and help files included with the program. Muller and Fetterman (2003) is dedicated particularly to regression.

JMP
JMP is another product of SAS Institute, and was designed around a graphical user interface, so most interaction between the user and the program is through selecting items from menus and from dialogs. Students may purchase a yearly license for JMP for a modest price. The JMP website is www.jmp.com. The JMP primer available for Applied Linear Regression, third edition is based on version 5 of JMP; version 7 was released in 2007.

Because of its well designed GUI, JMP is probably the easiest program to get started using, particularly if you are learning on your own. However, some of the useful features of this program are fairly well hidden, and as you get more experienced you may notice the limitations of working with a GUI. Several topics covered in the book are simply not possible with the GUI in JMP, although some of them could be programmed in JMP's scripting language.

The book by Sall, Creighton and Lehman (2005), which can be viewed as both an elementary statistics textbook and a manual for JMP. JMP includes very extensive manuals. The book by Fruend, Littell and Creighton (2003) discusses JMP specifically for regression.

SPSS
SPSS has evolved from a batch program to have a graphical user interface. Like SAS, SPSS has many sophisticated tools for data base management. A student version is available. The web site for SPSS is www.spss.com. SPSS offers hundreds of pages of documentation, including SPSS (2003), with Chapter 26 dedicated to regression models. In mid-2004, amazon.com listed more than two thousand books for which "SPSS" was a keyword. The SPSS primer we have prepared uses only the GUI, not the batch facilities of the program.

Other programs
This is hardly an exhaustive list of programs that could be used for regression analysis. There are many programs that can be used for linear regression analysis, some of them as good or better than the programs mentioned above. One program missing from the list of programs for regression analysis is Microsoft's spreadsheet program Excel. While a few of the methods described in the book can be computed or graphed in Excel, most would require great endurance, patience and programming on the part of the user. There are many add-on statistics programs for Excel, and one of these may be useful for comprehensive regression analysis; we don't know. If something works for you, please let us know!
Arc
A final package for regression that we should mention is called Arc. Arc is free software, available from www.stat.umn.edu/arc. Like JMP and SPSS, it has a graphical user interface, so all computations in Applied Linear Regression can be done either via point-and-click or by entries in dialogs. Unlike JMP and SPSS, Arc is not a general-purpose statistics package, but is specifically designed for regression analysis. Arc also includes access to a computer language, although the language, lisp, is considerably harder to learn than the S, SAS and SPSS languages. The use of Arc is described in Cook and Weisberg (1999); see www.stat.umn.edu/arc.

Advice to the bleary-eyed

In a perfect world, the you could separate learning the ideas and methods of regression from learning to use a statistical package. Any of R, S-Plus, JMP, SPSS or SAS can get in the way and force you to focus on computing issues rather than on statistics. If you can tolerate a command line interface, we recommend using R or S-Plus because, with the use of the functions we have written, these two programs let you do almost everything in the book. However, many people just can't stand the command line interface. It requires lots of memorization of commands and sometimes obscure syntax, and can bring anyone to the point of tears with unexpected behavior and inconsistencies. Our experience is that people quickly forget how to use a command line program, so if you are learning regression now, but don't expect to use it until later, command line programs might not work for you. The scripts for R and S-Plus that we have prepared, however, may allow you to relearn a program quickly.

JMP and SPSS (and, we think, S-Plus and SAS with the GUI) are considerably easier to use, but limited in that they easily do only what the writers of the program thought was important, giving only a subset of the methodology discussed in the book. If the thought of computer programming gives you a headache, you might be happier using one of these programs.

Most SAS users (and many SPSS users) write, submit, and then save programs, so when you do similar computations later, you have a script that you can follow. One of the themes of Applied Liner Regression is that regression is fundamentally unscripted, and requires many interactive steps in which the analyst looks at graphs and statistics before deciding on the next step. SAS and SPSS programs do not encourage this sort of interaction, and so we think SAS and SPSS are the most difficult of these programs for the nonexpert to use. SAS and SPSS experts may not share this point of view.

Return to Applied Linear Regression Home

References

Return to Applied Linear Regression Home



S Weisberg
2007-08-07