Rweb in Stat 3011 Home Page   Stat 3011 Home Page   About the Rweb in Stat 3011 Web Pages

Variables and Data Entry in Rweb (Stat 3011)

Contents

Variables

Numbers and other data can be stored in variables for later use. Variable names in R are strings of letters, digits, and dots beginning with a letter, such as x, x2, and a.very.long.variable.name. Case matters: foo, Foo, and FOO are names of different variables.

Assignment

To assign a value to a variable, use the assignment operator <- like this

     x <- 23
(note that this "arrow" is composed of two characters "less than" and "hyphen")

If submit this to Rweb, nothing appears to happen. Rweb does assign the value to that variable, but doesn't do anything with it. Moreover, the Rweb server doesn't remember anything between submissions, so it doesn't even remember this assignment.

But if you follow the assignment with some use of the variable, you can see the effect of the assignment. For example, if you submit the commands

     x <- 23
     2 * x + 3
Rweb will print the result 49.

If the following expression is just a single variable by itself, Rweb prints the value of the variable. For example, if you submit the commands

     x <- 23
     x
Rweb will print 23.

Vectors

The basic R data type is not a single number, but a "vector", which is what R calls a sequence of numbers. R uses vectors to represent whole data sets. The R function c collects numbers into a single vector object, for example,

     x <- c(2,4,11,17)
creates a vector of length 4.

Data Frames

Often data sets consist of several vectors of the same length, which consist of measurements of different variables on the same individuals. R has a data structure that caters to this situation called a data frame. If x, y, and z are vectors of the same length, then

  fred <- data.frame(x, y, z)
produces a data frame containing these vectors.

Using a Dataset URL

For our purposes the most important use of data frames is reading in a dataset from a file on a web server. This is done using the "Dataset URL" area just below the the text area where you submit R commands to the Rweb server.

Rweb always treats this URL as a plain text file containing a data frame, which it reads in and makes all the variables in the frame available for calculations in the submission.

For an example dataset we use the data Example 4.2 on page 103 of the textbook. To use this data type

     http://superior.stat.umn.edu/~charlie/3011/te0402.dat
in the "Dataset URL" window. This dataset contains two variables x and y. If you use this dataset URL and submit the R command
     plot(x, y)
you get the scatter plot of the data. It should look just like the left hand panel of Figure 4.3 in the textbook except for the labels.

Finding the Names in a Dataset

In order to use a data set read in over the web, you need to know what the variable names are. Typically, we will just use x for a single variable, and x and y if there are two. In a more complicated data set, you will have to look and see. For example, the data for Exercise 4.7 in the textbook is in the file

     http://superior.stat.umn.edu/~charlie/3011/ex0407.dat
This data set has four variables, two x,y pairs for two different scatter plots. We can't call both of the "x" variables x, although that's what the textbook does. That would confuse the computer. So we called them x.a, y.a, x.n, and y.n, the "a" and "n" suffixes relating to the two different parts of the data, which are "Angustifolia" and "Nacional". There is no way you can just guess what names we used. How do you find them? There are two ways.