University of Minnesota Twin Cities main page
School of Statistics

Statistics 5021

Data Sets For Use in Stat 5021

Contents


Announcements

Data set ex07_131.txt erroneously contained data for Table 7.1 as well as Ex. 7.131. It has been corrected and file ta07_001.txt has been added.

I have added data file eg12_010.txt to the Chapter 12 data sets (not to the compressed archives). This is made up data designed to match as closely as possible the data used in Examples 12.10 , 12.11, 12.14, 12.17 and 12.19. The means and standard devations are identical to those in Figure 12.11.


Data Sets from Introduction to the Practice of Statistics

Computer readable data files come on the CD ROM which comes with Moore and McCabe. A somewhat more complete set of files is available from the text's web site in the form of zip archive files for Windows and stuffit archive files for Macintosh. Unfortunately, none of the files available completely match the most convenient way to read data in MacAnova.

I retrieved the complete set of files from the text's web site and have edited them for use in the class. Each file can be read by readdata() which automatically creates named MacAnova variables from information in the file.

The edited files are self documenting in that I have add the page number where the example, exercise or table appears and given a brief description of the variables. This information is printed when a data set is read by readdaata() in MacAnova. You should always read this information carefully when using a data sets. See below for examples of how they can be read in MacAnova.

You can download the all the data sets in compressed archive files, IPS4Data.sit, for Macintosh and IPS4Data.zip for Windows.

Or if you prefer you can download individual files from this page.

The data files follow a simple naming scheme:

Some data sets appear in more than one file. This can occur when an exercise uses data in a numbered table or different exercises use the same data.

Many but not all of the original data sets include a column of case numbers which serve little or any purpose. In most cases these have been retained in the edited versions, but in a few cases I deleted them. Often they are read as variable id

In other cases, the original files had data for different groups in different columns. I generally reformatted these so that all the values for a variable occur in one column with an additional column indicated the group number. See an example below of how to extract the data for just one group for separate analysis.

Examples of using these data files in MacAnova

Here is what file ex02_121.txt looks like.

# File ex02_121.txt from publisher's web site
# Data for Exercise 2.121, p. 216 of IPS4
# Col. 1: len = length of beam made from wood flakes (inches)
# Col. 2: strength = strength of beam (lbs/in^2)
len strength
  5   446   
  6   371   
  7   334   
  8   296   
  9   249   
 10   254   
 11   244   
 12   246   
 13   239   
 14   234   

Here we use readdata() (on a Macintosh) to read the columns in the data files in to MacAnova vectors len and strength and compute some simple descript statistics and make a stemplot of strength.

Cmd> readdata("")
# File ex02_121.txt from publisher's web site
# Data for Exercise 2.121, p. 216 of IPS4
# Col. 1: len = length of beam made from wood flakes (inches)
# Col. 2: strength = strength of beam (lbs/in^2)
Read from file "TP1:Stat5021:Data:Ch02:ex02_121.txt"
Column 1 saved as REAL vector len 
Column 2 saved as REAL vector strength 

Cmd> list(len,strength) # see what we've got
len             REAL   10   
strength        REAL   10   

Cmd> describe(hconcat(len,strength),mean:T,median:T,stddev:T,iqr:T)
component: median
(1)         9.5       251.5
component: mean
(1)         9.5       291.3
component: stddev
(1)      3.0277      71.195
component: iqr
(1)           5          90

Cmd> stemleaf(strength)
    5    2*|33444
    5    2.|59
    3    3*|3
    2    3.|7
    1    4*|4

         1*|1 represents 110  Leaf digit unit = 10

Here is an example of a data file where two groups have been combined into one. In the book, data for women and men are in two tables. In the original data file from the CD ROM, there was a single column of speeds (which are the same for both men and women) and separate columns for men's and women's stride length. In the edited file there is a column for gender which translates F and M to 1 and 2 respectively, a column for speed and a column for stride length.

Cmd> readdata("")
# File ex02_120.txt from publisher's web site
# Data for Exercise 2.120, p. 215 of IPS4
# Col. 1: gender (factor, 1 = F, 2 = M)
# Col. 2: speed = running speed (ft/sec)
# Col. 3: stride = stride rate (steps/sec)
Read from file "TP1:Stat5021:Data:Ch02:ex02_120.txt"
Column 1 saved as factor gender 
Column 2 saved as REAL vector speed 
Column 3 saved as REAL vector stride 

Cmd> gender
           F           F           F           F           F
           F           F           M           M           M
           M           M           M           M
           1           1           1           1           1
           1           1           2           2           2
           2           2           2           2

Cmd> # F's and M's are case labels; actual values are 1 or 2

Cmd> speed_male <- speed[gender==2] #select men's speed

Cmd> speed_female <- speed[gender==1] #select women's speed

The usage speed[gender==2] selects all values of variable speed for which variable gender is 2, that is just the men's data. This is an example of the use of subscripts in MacAnova.

The left pointing arrow <- is the "assignment operator". It creates a variable with the name on its left side and fills it with whatever is on its right side.

Cmd> speed_male # here are the speeds for men
(1)       15.86       16.88        17.5       18.62       19.97
(6)       21.06       22.11

Cmd> speed_female # same as speed_male
(1)       15.86       16.88        17.5       18.62       19.97
(6)       21.06       22.11

Cmd> stride_male <- stride[gender==2] #select men's stride

Cmd> stride_fem <- stride[gender==1] #select women's stride

Cmd> stride_male # mens stride lengths
(1)        2.92        2.98        3.03        3.11        3.22
(6)        3.31        3.41

Cmd> describe(stride_male, mean:T,median:T) #mean and median for men
component: median
(1)        3.11
component: mean
(1)        3.14

Cmd> stride_fem # women's stride lengths
(1)        3.05        3.12        3.17        3.25        3.36
(6)        3.46        3.55

Cmd> describe(stride_fem, mean:T,median:T) #mean and median for women
component: median
(1)        3.25
component: mean
(1)        3.28

Another way to use a data set is to copy it to the clipboard and then use clipreaddata() to read it. Click on ex02_121.txt, choose Select All on the Edit menu, and then Copy on the Edit menu. Then switch over to the MacAnova menu and type clipreaddata().

Cmd> clipreaddata() # data set was copied to clipboard
# File ex02_121.txt from publisher's web site
# Data for Exercise 2.121, p. 216 of IPS4
# Col. 1: len = length of beam made from wood flakes (inches)
# Col. 2: strength = strength of beam (lbs/in^2)
Column 1 saved as REAL vector len 
Column 2 saved as REAL vector strength 

Cmd> plot(len,strength,title:"Strength vs length for Exercise 2.121 data",\
xlab:"Beam length",ylab:"Beam Strength")
Plot of strength vs length

Individual Files

These are grouped by chapter: Chapter 1, Chapter 2, Chapter 4, Chapter 6, Chapter 7, Chapter 8, Chapter 9, Chapter 10, Chapter 11, Chapter 12, Chapter 13 and Appendix.

Chapter 1 Files eg01_004.txt (196 bytes), eg01_005.txt (498 bytes),
eg01_011.txt (1158 bytes), eg01_018.txt (185 bytes),
ex01_021.txt (630 bytes), ex01_023.txt (516 bytes),
ex01_024.txt (428 bytes), ex01_025.txt (743 bytes),
ex01_026.txt (331 bytes), ex01_027.txt (336 bytes),
ex01_037.txt (609 bytes), ex01_038.txt (443 bytes),
ex01_040.txt (1717 bytes), ex01_041.txt (582 bytes),
ex01_045.txt (297 bytes), ex01_049.txt (689 bytes),
ex01_054.txt (388 bytes), ex01_122a.txt (395 bytes),
ex01_122b.txt (505 bytes), ex01_123.txt (1643 bytes),
ex01_124.txt (365 bytes), ex01_134.txt (783 bytes),
ta01_001.txt (472 bytes), ta01_002.txt (839 bytes),
ta01_003.txt (857 bytes), ta01_004.txt (763 bytes),
ta01_005.txt (545 bytes), ta01_006.txt (2354 bytes),
ta01_007.txt (852 bytes), ta01_008.txt (819 bytes),
ta01_009.txt (1451 bytes), ta01_010.txt (2665 bytes),
ta01_011.txt (455 bytes), ta01_012.txt (1579 bytes)
Chapter 2 Files eg02_017.txt (377 bytes), eg02_030.txt (5329 bytes),
ex02_005.txt (288 bytes), ex02_010.txt (356 bytes),
ex02_011.txt (628 bytes), ex02_016.txt (452 bytes),
ex02_017.txt (687 bytes), ex02_018.txt (590 bytes),
ex02_021.txt (214 bytes), ex02_027.txt (686 bytes),
ex02_040.txt (357 bytes), ex02_054.txt (295 bytes),
ex02_055.txt (4733 bytes), ex02_058.txt (262 bytes),
ex02_059.txt (311 bytes), ex02_061.txt (341 bytes),
ex02_062.txt (462 bytes), ex02_065.txt (294 bytes),
ex02_092.txt (696 bytes), ex02_093.txt (341 bytes),
ex02_095.txt (457 bytes), ex02_101.txt (489 bytes),
ex02_102.txt (711 bytes), ex02_109.txt (272 bytes),
ex02_112.txt (383 bytes), ex02_119.txt (405 bytes),
ex02_120.txt (499 bytes), ex02_121.txt (343 bytes),
ex02_127.txt (503 bytes), ex02_129.txt (360 bytes),
ex02_130.txt (386 bytes), ex02_131.txt (1424 bytes),
ta02_001.txt (600 bytes), ta02_002.txt (818 bytes),
ta02_003.txt (486 bytes), ta02_004.txt (1352 bytes),
ta02_005.txt (931 bytes), ta02_006.txt (1169 bytes),
ta02_007.txt (289 bytes), ta02_008.txt (799 bytes),
ta02_009.txt (534 bytes), ta02_010.txt (1177 bytes),
ta02_011.txt (857 bytes), ta02_011a.txt (282 bytes),
ta02_011b.txt (282 bytes), ta02_011c.txt (282 bytes),
ta02_011d.txt (282 bytes), ta02_012.txt (527 bytes),
ta02_013.txt (1223 bytes), ta02_014.txt (685 bytes),
ta02_015.txt (623 bytes), ta02_016.txt (804 bytes),
ta02_017.txt (534 bytes)
Chapter 4 Files eg04_020.txt (289 bytes), ex04_043.txt (454 bytes),
ex04_044.txt (422 bytes), ex04_047.txt (231 bytes)
Chapter 6 Files eg06_002.txt (179 bytes), eg06_006.txt (179 bytes),
eg06_013.txt (175 bytes), eg06_016.txt (193 bytes),
ex06_006.txt (489 bytes), ex06_007.txt (332 bytes),
ex06_013.txt (669 bytes), ex06_049.txt (670 bytes),
ex06_063.txt (342 bytes), ex06_064.txt (490 bytes),
ex06_093.txt (283 bytes), ex06_095.txt (191 bytes)
Chapter 7 Files eg07_001.txt (282 bytes), eg07_017.txt (470 bytes),
ex07_002.txt (286 bytes), ex07_004.txt (286 bytes),
ex07_008.txt (373 bytes), ex07_009.txt (790 bytes),
ex07_015.txt (222 bytes), ex07_016.txt (639 bytes),
ex07_020.txt (684 bytes), ex07_029.txt (519 bytes),
ex07_031.txt (329 bytes), ex07_034.txt (262 bytes),
ex07_037.txt (331 bytes), ex07_039.txt (683 bytes),
ex07_040.txt (708 bytes), ex07_042.txt (638 bytes),
ex07_051.txt (253 bytes), ex07_058.txt (524 bytes),
ex07_060.txt (284 bytes), ex07_062.txt (284 bytes),
ex07_065.txt (1035 bytes), ex07_069.txt (796 bytes),
ex07_076.txt (1027 bytes), ex07_103.txt (497 bytes),
ex07_111.txt (285 bytes), ex07_131.txt (901 bytes),
ta07_001.txt (877 bytes), ta07_002.txt (1317 bytes),
ta07_003.txt (1005 bytes), ta07_004.txt (993 bytes),
ta07_005.txt (1945 bytes)
Chapter 8 Files ex08_066.txt (600 bytes)
Chapter 9 Files ex09_001.txt (521 bytes), ex09_004.txt (538 bytes),
ex09_011.txt (413 bytes), ex09_012.txt (389 bytes),
ex09_013.txt (382 bytes), ex09_014.txt (460 bytes),
ex09_015.txt (554 bytes), ex09_016.txt (415 bytes),
ex09_017.txt (507 bytes), ex09_019.txt (878 bytes),
ex09_020.txt (415 bytes), ex09_022.txt (362 bytes),
ex09_023.txt (414 bytes), ex09_024.txt (323 bytes),
ex09_025.txt (410 bytes), ex09_028.txt (436 bytes),
ex09_029.txt (439 bytes), ex09_031.txt (409 bytes),
ex09_032.txt (366 bytes), ex09_033.txt (375 bytes),
ex09_035.txt (375 bytes), ex09_038.txt (379 bytes),
ex09_041.txt (450 bytes), ex09_042.txt (585 bytes),
ex09_046.txt (535 bytes), ex09_047.txt (571 bytes),
ex09_049.txt (366 bytes), ex09_050.txt (446 bytes),
ex09_052.txt (1051 bytes)
Chapter 10 Files eg10_001.txt (1820 bytes), ex10_003.txt (603 bytes),
ex10_004.txt (470 bytes), ex10_005.txt (346 bytes),
ex10_007.txt (664 bytes), ex10_009.txt (1738 bytes),
ex10_011.txt (788 bytes), ex10_012.txt (391 bytes),
ex10_013.txt (281 bytes), ex10_020.txt (491 bytes),
ex10_023.txt (755 bytes), ex10_025.txt (1044 bytes),
ex10_035.txt (523 bytes), ex10_036.txt (405 bytes),
ex10_037.txt (286 bytes), ex10_040.txt (1744 bytes),
ex10_041.txt (1094 bytes), ex10_044.txt (520 bytes),
ex10_047.txt (347 bytes), ex10_048.txt (603 bytes),
ta10_001.txt (1732 bytes), ta10_002.txt (990 bytes),
ta10_003.txt (743 bytes), ta10_004.txt (1027 bytes)
Chapter 11 Files ex11_008.txt (675 bytes), ex11_016.txt (2405 bytes),
ex11_051.txt (2026 bytes)
Chapter 12 Files eg12_010.txt (1518 bytes), ex12_011.txt (381 bytes),
ex12_013.txt (530 bytes), ex12_016.txt (3340 bytes),
ex12_018.txt (1751 bytes), ex12_021.txt (911 bytes),
ex12_023.txt (974 bytes), ex12_025.txt (594 bytes),
ex12_027.txt (718 bytes), ex12_030.txt (665 bytes),
ta12_001.txt (1286 bytes), ta12_002.txt (2281 bytes),
ta12_003.txt (3328 bytes), ta12_004.txt (1738 bytes)
Chapter 13 Files eg13_008.txt (23517 bytes), ex13_007.txt (554 bytes),
ex13_008.txt (265 bytes), ex13_009.txt (344 bytes),
ex13_010.txt (382 bytes), ex13_011.txt (394 bytes),
ex13_012.txt (418 bytes), ex13_013.txt (1311 bytes),
ex13_014.txt (1764 bytes), ex13_016.txt (2145 bytes),
ta13_001.txt (1239 bytes), ta13_002.txt (5103 bytes)
Appendix Files cheese.txt (1421 bytes), concept.txt (4786 bytes),
csdata.txt (8500 bytes), dandruff.txt (6667 bytes),
individuals.txt (1453717 bytes), majors.txt (10332 bytes),
plants1.txt (6920 bytes), plants2.txt (4324 bytes),
reading.txt (2759 bytes)

Other Data Sets

Sample of size 1000 from Appendix
data set individuals.txt (1453717 bytes)
indiv1000.txt (29378 bytes)
Blood flow data from Lecture 8 bloodflow.txt
Leprosy data from Lecture 8 leprosy.txt
Nickel/iron ratio data from Lecture 8 ni_fe_ratio.txt
Seal data from Lecture 8 sealdata.txt

C. Bingham kb@umn.edu
Updated Fri May 2 08:23:27 CDT 2003