Math 263, Section 001 and 003 - Excel/R Assignment 6

Last updated on December 9, 1:35 AM.

A note of a typo eliminated by the last update

Note that you should always call 'oneway.test' with a '~' not ','. Thus
	oneway.test(Sodium ~ Type, data=hot.dog.data)
      
not
	oneway.test(Sodium, Type, data=hot.dog.data)
      

Excel/R Assignment 6

In this assignment you will:

Software used

The statistical package R. Although RExcel may be used, the best way is to use R without RExcel. You may also use Excel Data Analysis Pack ANOVA, but no instructions are provided here.

The data file

Loading data

The data file is Dataset6.txt. The file is prepared to be read with the R command:
> hot.dog.data <- read.table("Dataset6.txt", header=T)
> attach(hot.dog.data)
      

About the dataset

The dataset is a famous dataset from CMU of http://lib.stat.cmu.edu/DASL/Stories/Hotdogs.html . Please read the story to understand the data.

Variables in the dataset

Experimental design

Looks at the data (with boxplot)

For each of the quantitative variables (e.g. "Sodium") split the variable into groups according to the levels of the factor (= categorical variable) "Type":

Splitting variables according to a factor (= categorical variable)

Rationale

When your observations are recorded in a spreadsheet or a datafile, the group is identified by a level of a factor. We need to split the observations into groups.

How to split and create a boxplot with R?

It is simple:
> sodium.split <-  split(Sodium, Type)
> boxplot(sodium.split, ylab="Sodium")
      
The result for sodium is presented below:

Under the hood

The variable "sodium.split" is a list of variables "Beef", "Meat" and "Poultry". It is not (and cannot be) a dataframe, because the number of observations differs in each variable. However, you can still do this:
> attach(sodium.split)
      
to make variables "Beef", "Meat" and "Poultry" that you will need to conduct the t-tests in the last part of the assignment.

Detach unused dataframes or lists

Please make sure to detach the list after use, and before analysing "Calories", because you would have a name conflict between variables.
> detach(sodium.split)
      

An even faster way to get a side-by-side boxplot

Do this:
> boxplot(Sodium ~ Type)  
      
This uses the '~', which indicates "the formula syntax". Formulas are a powerful mechanism in R, but using formualas correctly requires some experience.

What is a name conflict?

If you already have a variable "Beef" from splitting "Sodium", you cannot have a "Beef" variable from splitting "Calories". R will complain after you try to attach the second list, given the first one is already attached. Thus, you need to detach the "Sodium" list before you can attach "Calories" list. The complaint refers to one variable "masking" the other.

Perform ANOVA, using a built-in command 'oneway.test'

Below there are instructions on how to perform the test with R, using built-in commands for maximum time savings. For every of the two quantitative variables, "Calories" and "Sodium": There are several ways to perform ANOVA in R. For example, you may use the following command:
> oneway.test(Q ~ F, data=x)
      
which will perform a one-way ANOVA for a quantitative variable Q and a categorical variable F (also called a "factor"). The values of Q will be split into groups according to the factor F and a significance test for the means will be conducted for you. All that remains is to draw the conclusions.

An example

Thus, the following will perform the ANOVA on Sodium split according to (meat) "Type":
> oneway.test(Sodium ~ Type, data=hot.dog.data)
      

Perform ANOVA, using R as a super-calculator

The step-by-step script example

Please use the script script.R illustrating the approach. In principle, you can also use Excel for a similar calculation. In principle, the calculations also may be performed by hand, or with the aid of a plain calculator without statistical functions.

A note vis a vis the Final Exam

The best way to master one-way ANOVA for the Final Exam is to follow every step and confirm the calculation results with a simple calculator (the one you will bring to the Final Exam).

What to include in your report

Please report the following:

Perform two-sample t-tests for differences in each pair

Rationale

Often, t-test is used to reveal the differences between individual groups. This procedure is suspect (see comments below on Bonferroni procedure). However, it is often used

What t-tests you should conduct?

You should conduct a t-test to test for the difference between the three groups (levels of the factor "Type"): "Beef", "Meat" and "Poultry". Please repeat for all quantitative variables (e.g. "Sodium"). Thus, you will have three pairs of variable, and three t-test for each quantitative variable.

What to report?

How to use R to answer the question?

Please follow the following steps for both quantitative variables ("Calories" and "Sodium"):

A word of caution about multiple t-tests

Performing multiple t-tests on the same data is not a valid statistical procedure. Basically, you have a lower confidence level than you would think.

The Bonferroni procedure

See the Bonferroni procedure to correctly use multiple t-tests. Also, there is a Wikipedia article: http://en.wikipedia.org/wiki/Bonferroni_correction. Actually, there is a fast way to perform the pairwise t-test procedure in R, taking the Bonferroni correction into account (for Sodium):
> pairwise.t.test(Sodium, Type)

        Pairwise comparisons using t tests with pooled SD 

data:  Sodium and Type 

        Beef Meat
Meat    0.58 -   
Poultry 0.21 0.43

P value adjustment method: holm 
      
Note that the P-values (corrected due to multiple t-tests) are None of them are significant. Note that 'pairwise.t.test' also splits the variable "Sodium" into groups according to the levels of "Type" by itself.

NOTE: Your results done without correction will be different!

Troubleshooting

Transferring graphics to Word or another processor, when using R console only

If you use RExcel or RCommander, this is a simple matter of cut-and-paste. However, if you are using R console, there is an extra step: you need to store your graph in a file. This is how this is accomplished:
> sodium.split <-  split(Sodium, Type)
> jpeg("myboxplot.jpg")
> boxplot(sodium.split, ylab="Sodium")
> dev.off()
null device 
          1 
    
After you do this, there is a graphcs file "myboxplot.jpg" in your working directory, which you can open and include in your documents (you can simply drop this file onto your Word document).

An explanation

The command 'jpeg("myboxplot.jpg") tells R to put the graphics in a file with the designated name "myboxplot.jpg", as a JPEG file.

The command 'dev.off()' turns off the current graphics 'device' (in this case, the JPEG file). You must do this, because this causes the file to be actually written. If you do not do this, the file will exist, but it will be empty. As we say, the graphics will be "flushed" at this moment. After you do 'dev.off()', the graphics will go again to your screen.

If you are interested in having graphics in another format, or controlling things like size of the image, please read the manual page:

> ?jpeg