Paired Differences with R

We will use data on miles driven for the same 11 people working 4-day weeks and 5-day weeks.. We looked at the files in the R directory to find out what the name of the file was (mileage.txt), then read it into a frame named mileage. We just read it again to see what the variable names were. Then we created a variable diff for the differences, typed diff to see them, and then used t.test on the differences.

> mileage <- read.delim(file="mileage.txt",header=TRUE)
> attach(mileage)
> read.delim(file="mileage.txt",header=TRUE)
      Name X5.Day_mileage X4.Day_mileage
1     Jeff           2798           2914
2    Betty           7724           6112
3    Roger           7505           6177
4      Tom            838           1102
5    Aimee           4592           3281
6     Greg           8107           4997
7  Larry G           1228           1695
8      Tad           8718           6606
9  Larry M           1097           1063
10  Leslie           8089           6392
11     Lee           3807           3362
> diff = X5.Day_mileage - X4.Day_mileage
> diff
 [1] -116 1612 1328 -264 1311 3110 -467 2112   34 1697  445
> t.test(diff)

        One Sample t-test

data:  diff 
t = 2.858, df = 10, p-value = 0.01701
alternative hypothesis: true mean is not equal to 0 
95 percent confidence interval:
  216.4276 1747.5724 
sample estimates:
mean of x 
      982 

We reject the hypothesis of no difference. Note the free confidence interval. A 5-day week means roughly 216 to 1748 extra miles driven per week. We should also Plot the data!

> stem(diff)

  The decimal point is 3 digit(s) to the right of the |

  -0 | 531
   0 | 04
   1 | 3367
   2 | 1
   3 | 1

This is not too bad for such a small dataset. Note that we need to look at the differences, not the original data.


© 2006 Robert W. Hayden