Math 263, Section 001 and 003 - authored by Marek Rychlik
Last updated on September 17, 6:56PM.Excel Assignment 2
In this assignment you will:
- Make scatter plots.
- Fit regression lines and find correlation coefficients.
- Calculate coefficient of determination.
- Compute ESS, TSS and RSS.
- Draw conclusions about data.
Software used
Excel with Data Analysis Pack. Open Office is sufficient.
The data file
The topic of the data file is
"Development of Lung Capacity with Age and the Effect of
Smoking". The file FEV.xlsx shows the
Forced Expiratory Volume for a number of children and
teenagers, along with information about their age, height, and
whether they smoke. In this assignment we will look at the
relationship between lung capacity and age (repeated for lung
capaticity and height) using scatter plots and regression.
Important: Smoking is ignored
In this assignment, you
ignore the information about smoking.
Steps for regression of FEV on age
- Make a scatter plot of FEV against age. This means FEV is
the response variable, that is the dependent variable and age is
the explanatory variable, that is the independent variable.
Make sure to properly label your data.
- Compute the correlation coefficient between FEV and age.
- Fit a regression line to the data.
Repeat the above steps for regression of FEV on height
Use the results to:
- Find the prediction of FEV for a 5 foot (60 inch) child
Compute the partition of squares for the regression you performed
Do this part for regression of FEV on age only.
Find the values of ESS, TSS and RSS. The definitions are given in Midterm 1 study guide Verify the equation \( TSS = ESS + RSS \) for the specific situation you have. This is a good check that you performed your calculations correctly. Here are some hints:- Create cells containing sample means \(\bar{x}\) and \(\bar{y}\), sample standard deviations \(s_x\) and \(s_y\), and correlation coefficient \(r\)
- Create a cell containing the slope \(b_1=r\,s_y/s_x\)and intercept \(b_0=\bar{y}-b_0\,\bar{x}\) of the regression line
- Compute a column of deviations of FEV from the formula \( = y_i - \bar{y} \)
- Compute a column of predictions \(\hat{y}_i = b_0 + b_1\,x_i\)
- Compute a column of residuals from the formula \( = y_i - \hat{y} \)
- Use the formulas for ESS, TSS and RSS.
Calculate coefficient of determination
Do this part for regression of FEV on age only.
Calculate \(R^2\) using several methods:- Using the fact that \(R^2 = r^2\)
- Using the fact that \(R^2 = 1-ESS/TSS\)
- Using the fact that \(R^2 = RSS/TSS\)
- Making Excel compute the value of \(R^2\)
Plot the residuals
Repeat this part for both regression of FEV on age and FEV on height. That is, in both cases y is FEV and x is age in the first version and height in the second.
Using Excel options for Regression command in the Data Analysis pack, plot the residuals against the predicted values.Plot the normal qqplot for the residuals
Repeat this part for both regression of FEV on age and FEV on height. That is, in both cases y is FEV and x is age in the first version and height in the second.
Using Excel options for Regression command in the Data Analysis pack, plot the normal quantile-quantile plot for the residuals.Draw conclusions about the data
Please answer the following questions for both the regression of FEV on age and FEV on height:- Is there an association between FEV and the explanatory variable (age or height)?
- If the association exists, is it strong or weak?
- Is the relationship linear or non-linear
- What is the direction of the relationship?
- Do the residuals look normally distributed?
- What percent of the variation of FEV is explained by the variation of the explanatory variable (age or height)?