Math 263, Section 001 and 003 - Excel/R Assignment 6
Last updated on December 9, 1:35 AM.A note of a typo eliminated by the last update
Note that you should always call 'oneway.test' with a '~' not ','. Thusoneway.test(Sodium ~ Type, data=hot.dog.data)not
oneway.test(Sodium, Type, data=hot.dog.data)
Excel/R Assignment 6
In this assignment you will:- Read data from a text file using R.
- Perform one-way ANOVA for data with a stratified design.
- Perform two-sample t-tests to test for differences between groups.
- Draw conclusions about the respective means.
Software used
The statistical package R. Although RExcel may be used, the best way is to use R without RExcel. You may also use Excel Data Analysis Pack ANOVA, but no instructions are provided here.The data file
Loading data
The data file is Dataset6.txt. The file is prepared to be read with the R command:> hot.dog.data <- read.table("Dataset6.txt", header=T) > attach(hot.dog.data)
About the dataset
The dataset is a famous dataset from CMU of http://lib.stat.cmu.edu/DASL/Stories/Hotdogs.html . Please read the story to understand the data.Variables in the dataset
- Please identify the quantitative and categorical variables present in the dataset.
Experimental design
- Please identify the experimental design used (SRS? Block?).
Looks at the data (with boxplot)
For each of the quantitative variables (e.g. "Sodium") split the variable into groups according to the levels of the factor (= categorical variable) "Type":- Draw the simultaneous boxplot of the three groups: Beef, Meat and Poultry.
Splitting variables according to a factor (= categorical variable)
Rationale
When your observations are recorded in a spreadsheet or a datafile, the group is identified by a level of a factor. We need to split the observations into groups.How to split and create a boxplot with R?
It is simple:> sodium.split <- split(Sodium, Type) > boxplot(sodium.split, ylab="Sodium")The result for sodium is presented below:
Under the hood
The variable "sodium.split" is a list of variables "Beef", "Meat" and "Poultry". It is not (and cannot be) a dataframe, because the number of observations differs in each variable. However, you can still do this:> attach(sodium.split)to make variables "Beef", "Meat" and "Poultry" that you will need to conduct the t-tests in the last part of the assignment.
Detach unused dataframes or lists
Please make sure to detach the list after use, and before analysing "Calories", because you would have a name conflict between variables.> detach(sodium.split)
An even faster way to get a side-by-side boxplot
Do this:> boxplot(Sodium ~ Type)This uses the '~', which indicates "the formula syntax". Formulas are a powerful mechanism in R, but using formualas correctly requires some experience.
What is a name conflict?
If you already have a variable "Beef" from splitting "Sodium", you cannot have a "Beef" variable from splitting "Calories". R will complain after you try to attach the second list, given the first one is already attached. Thus, you need to detach the "Sodium" list before you can attach "Calories" list. The complaint refers to one variable "masking" the other.Perform ANOVA, using a built-in command 'oneway.test'
Below there are instructions on how to perform the test with R, using built-in commands for maximum time savings. For every of the two quantitative variables, "Calories" and "Sodium":- Perform one-way ANOVA.
- Include the software output in your paper.
- Carefully state what colclusions can be made based on the sample, at 90% confidence level.
> oneway.test(Q ~ F, data=x)which will perform a one-way ANOVA for a quantitative variable Q and a categorical variable F (also called a "factor"). The values of Q will be split into groups according to the factor F and a significance test for the means will be conducted for you. All that remains is to draw the conclusions.
An example
Thus, the following will perform the ANOVA on Sodium split according to (meat) "Type":> oneway.test(Sodium ~ Type, data=hot.dog.data)
Perform ANOVA, using R as a super-calculator
The step-by-step script example
Please use the script script.R illustrating the approach. In principle, you can also use Excel for a similar calculation. In principle, the calculations also may be performed by hand, or with the aid of a plain calculator without statistical functions.A note vis a vis the Final Exam
The best way to master one-way ANOVA for the Final Exam is to follow every step and confirm the calculation results with a simple calculator (the one you will bring to the Final Exam).What to include in your report
Please report the following:- The number of degrees of freedom for the numerator and the denominator.
- The value of the Fisher's F-statistic.
- The P-value.
- The null and alternative hypothesis.
- The test conclusion.
Perform two-sample t-tests for differences in each pair
Rationale
Often, t-test is used to reveal the differences between individual groups. This procedure is suspect (see comments below on Bonferroni procedure). However, it is often usedWhat t-tests you should conduct?
You should conduct a t-test to test for the difference between the three groups (levels of the factor "Type"): "Beef", "Meat" and "Poultry". Please repeat for all quantitative variables (e.g. "Sodium"). Thus, you will have three pairs of variable, and three t-test for each quantitative variable.What to report?
- Please confirm the statement in the story http://lib.stat.cmu.edu/DASL/Stories/Hotdogs.html regarding t-tests.
How to use R to answer the question?
Please follow the following steps for both quantitative variables ("Calories" and "Sodium"):-
Let X be one of the variables ("Calories" or "Sodium").
Split the variable according to the factor "Type". The
following commands will do this (X="Calories")
> hot.dog.data = read.table("Dataset6.txt", header=T) > attach(hot.dog.data) > calories.by.type = split(Calories, Type) > attach(calories.by.type)
Now, you will have three variables: "Beef", "Meat" and "Poultry". They will hold calories in each kind of hot dog. (For convenience, we repeated some commands which read data and attach frames). -
Conduct the two-sample t-test on each pair of variable
(thus, you will have three t-tests to perform for each
quantitative variable). The t-test may be conducted in the
following manner (using the pair Beef-Meat as an example):
> t.test(Beef, Meat)
-
Report the values of the
- t-statistic
- the corresponding P-value
- Draw conclusions about each of the 6 pairs of variables. Minimally, reject null hypothesis, or say there is no reason to reject, at 90% confidence level.
A word of caution about multiple t-tests
Performing multiple t-tests on the same data is not a valid statistical procedure. Basically, you have a lower confidence level than you would think.The Bonferroni procedure
See the Bonferroni procedure to correctly use multiple t-tests. Also, there is a Wikipedia article: http://en.wikipedia.org/wiki/Bonferroni_correction. Actually, there is a fast way to perform the pairwise t-test procedure in R, taking the Bonferroni correction into account (for Sodium):> pairwise.t.test(Sodium, Type) Pairwise comparisons using t tests with pooled SD data: Sodium and Type Beef Meat Meat 0.58 - Poultry 0.21 0.43 P value adjustment method: holmNote that the P-values (corrected due to multiple t-tests) are
- 0.58 between Beef and Meat
- 0.21 between Beef and Poultry
- 0.43 between Meat and Poultry
NOTE: Your results done without correction will be different!
Troubleshooting
Transferring graphics to Word or another processor, when using R console only
If you use RExcel or RCommander, this is a simple matter of cut-and-paste. However, if you are using R console, there is an extra step: you need to store your graph in a file. This is how this is accomplished:> sodium.split <- split(Sodium, Type) > jpeg("myboxplot.jpg") > boxplot(sodium.split, ylab="Sodium") > dev.off() null device 1After you do this, there is a graphcs file "myboxplot.jpg" in your working directory, which you can open and include in your documents (you can simply drop this file onto your Word document).
An explanation
The command 'jpeg("myboxplot.jpg") tells R to put the graphics in a file with the designated name "myboxplot.jpg", as a JPEG file.
The command 'dev.off()' turns off the current graphics 'device' (in this case, the JPEG file). You must do this, because this causes the file to be actually written. If you do not do this, the file will exist, but it will be empty. As we say, the graphics will be "flushed" at this moment. After you do 'dev.off()', the graphics will go again to your screen.
If you are interested in having graphics in another format, or controlling things like size of the image, please read the manual page:
> ?jpeg