Comparing Means for Two Independent Samples in R

We will use data comparing lives of generic and brand name batteries. Independent samples of two types of batteries were tested by running them in a portable CD player until it stopped. Elapsed time was recorded. The data file (battery.txt) is a tab-delimited text file meaning there are tab characters separating the columns of variables. (The only reason you need to know that is that the commands are different for inputting different kinds of files.) This is a common platform-independent file format for tabular data and you can read such files in any text editor, such as Notepad, EditPad Lite, or Emacs. You can also read them in R with the read.delim command. You have to tell R what file to read so type file= followed by the file name (with path if the file is not in the R directory). Alternatively, you can type file=file.choose() and a fairly standard file-choosing dialog box will pop up and allow you to select the file. That is what we did but the selection process is invisible below.

> read.delim(file=file.choose())
   Times Battery.Type
1  194.0   Brand Name
2  205.5   Brand Name
3  199.2   Brand Name
4  172.4   Brand Name
5  184.0   Brand Name
6  169.5   Brand Name
7  190.7      Generic
8  203.5      Generic
9  203.5      Generic
10 206.5      Generic
11 222.5      Generic
12 209.4      Generic

Note that the lifetimes are in one variable and the type of battery in another. This is standard database format. If you had looked at the original data file in a text editor you might have noted that Battery Type appeared as the label for Column 2. This was changed to Battery.Type. R does not like names with spaces in them. The spaces in "Brand Name" could also cause problems (but didn't here). Generally speaking, if you are setting up your own data, do not use names for files, values, variables, etc., that include spaces.

The command above just shows you what is in the file (and whether R can make any sense out of it). To do anything with the data, you have to read it into a data frame and then attach it to your workspace. (You do not need to know exactly what that means in order to do it.) We named the data frame bat. Here's how:

> bat <- read.delim(file=file.choose(),header=TRUE)
> attach(bat)
> Times
 [1] 194.0 205.5 199.2 172.4 184.0 169.5 190.7 203.5 203.5 206.5 222.5 209.4
> ?t.test

Typing Times causes R to list the times, verifying that we can now access them. Typing ?t.test caused a help window (not shown here) to pop up with cryptic information on the t.test command. Here is the form we need:

> > t.test(formula = Times ~ Battery.Type)

        Welch Two Sample t-test

data:  Times by Battery.Type 
t = -2.5462, df = 8.986, p-value = 0.03143
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -35.097420  -2.069246 
sample estimates:
mean in group Brand Name    mean in group Generic 
                187.4333                 206.0167 

We are thinking that battery lifetimes may depend on the type of battery. After formula = we type the dependent variable, a "~", and the independent variable. Assumptions and conditions are checked with a display.

> boxplot(formula = Times ~ Battery.Type)

At least R syntax is consistent.

boxplots

We reject the hypothesis that there is no difference in the two types of batteries. The boxplots show roughly similar shapes and variabilities. The confidence interval shows we are not too confident about how much longer the generic batteries last -- somewhere between 35 and 2 minutes. Comparing the means for the two groups and noting the negative numbers in the confidence interval, we conclude that R subtracted BrandName - Generic, and got negative results because the generic batteries actually lasted longer. Since they are likely to be cheaper as well, we might as well buy them, even if they may not be much better.


© 2006 Robert W. Hayden