Midterm 2 Information

Information about Midterm 3 (partially updated for Fall 2011)

NOTE: Please make sure that the mathematical formulas show in the standard mathematical notation. I have a report that some of the computers on campus do not display formulas but the underlying coding. This means that JavaScript is disabled in those browsers. Please enable JavaScript or use a different computer/browser.

General Information

Excluded topics for Midterm 3:
- No Welch-Satterthwaite formula.
- No power of a test or type I/II classification of errors.
- No relative risk (\(RR\)).
There is a big practice test (over 50 problems) for Midterm 3 and the Final Exam. The problems in this practice test will be used as the basis for the real test. The answers to the practice test WILL NOT be posted.
Midterm 2 is an in-class test; rules are similar to Midterm 1.
Miterm 3 will cover topics in Chapters 5-8.
The test will consist of exactly 17 multiple choice questions. Grading criteria are similar to other midterms.
Please review old tests (2010 Midterm 3 and a portion of 2010 Midterm 3 in this folder) for a sample of problems that may be similar to some test questions. Note that the material of the old Midterm 3 will not exactly correspond to this exam. There may be additional practice problems posted for the current test.

A note on notation

I will use the notation \[ \expect{X} = \mu_X \] for the expected value (mean) of a random variable \(X\). Note that in some browsers the bar in \(\bar{X}\) will not show if \(\bar{X}\) is in the subscript. If you don't see a bar below, you should watch out for this problem: \[ \expect{\bar{x}} = \mu_{\bar{x}} \] Similarly, we will use two notations as synonymous \[ \var{X} = \sigma^2_{X} \] for the variance of the random variable \(X\).

List of Chapter 5 topics covered

The binomial distribution (Section 5.1)

Be able to calculate probability from the formula for \( P(X=k)\) assuming \(B(n,p)\). \[ P(X=k) = {n \choose k} p^k (1-p)^{n-k} \]
Know mean and variance formulas. If \(X\) assumed values \(x_1,x_2,\ldots,x_n\) with probabilities \(p_1,p_2,\ldots,p_n\) (\( p_k = P(X=x_k) \)) then the mean and variance are defined by: \[ \mu_X = \sum_{i=1}^n x_i\, p_i \] \[ \sigma_X^2 = \sum_{i=1}^n (x_i-\mu_X)^2\,p_i \]
Know when to apply (drawing with vs. without replacement).
Ability to use normal approximation: if \(X\) has binomial distribution \(B(n, p)\) then \[ \mu_X = n\cdot p \] \[ \sigma_X = \sqrt{n\cdot p \cdot (1-p)} \] \[ P\left(a \le \frac{X-\mu_X}{\sigma_X} < b \right) \approx F(b) - F(a) \] where \(F(x)\) is the c.d.f. of N(0,1).

Proportions \(\hat{p}\)

Knowing the meaning of (population with fraction \(p\) having a binary characteristic (is a person a male or not, is the person a Republican or not, etc.).
The proportion \(\hat{p}\) represents the fraction of a sample that has the characteristic.
Understand the usefuleness of the special random variable \[ X(u) = \begin{cases} 1 &\text{if \(u\) has the characteristic,} \\ 0 &\text{otherwise.} \end{cases} \]
Understand perfectly that \(X\) above satisfies these two equations: \[ \mu_{X} = p \] \[ \sigma_{X}^2 = p(1-p) \]
Know the meaning of the formula \[\hat{p} = \frac{X}{n} = \bar{X} = \frac{1}{n}\left(\sum_{i=1}^n X_i\right)\] where \(\bar{X}\) is \(B(n,p)\) and \(n\) is the sample size and \(X_i\) is a Bernoulli trial: a random variable that assumes values \(\{0,1\}\) and \(P(X_i=1) = p \). Know that \(X_i\) are independent.
Understand the following equations perfectly: \[ \mu_{\hat{p}}=\mu_{\bar{X}} = \mu_{X_1} = p\] \[ \sigma_{\hat{p}}^2=\sigma_{\bar{X}}^2 = \frac{1}{n^2}\sum_{i=1}^n\sigma_{X_i}^2 = \frac{1}{n^2}\,n\,\sigma_{X_1}^2=\frac{p(1-p)}{n} \] \[ \sigma_{X_i}^2=\sigma_{X_1}^2 = p(1-p) \]

The list of Chapter 6 and Chapter 7 topics covered

Sampling distribution

The expected value and variance of the sample mean. Thus, a sample consists of \(n\) individuals. Moreover:
- A sample is either an SRS, or otherwise guaranteed to be random.
- This gives rise to \(n\) random variables \(x_1,x_2,\ldots,x_n\). I would prefer capitals, to be consistent with Chapter 4, in which random variables are denoted by capital letters. However, in most of the book, the notation uses lower case. Thus, \(x_i\) represents a measurement of a variable (such as height, weight, etc.) on the \(i\)-th individual of the sample. Hence,
  - \(x_i\) has the same distribution as the entire population.
  - Random variables \(x_1,x_2,\ldots, x_n\) are assumed to be independent.
- For the curious: the precise meaning of independence of random variables \(X_1,X_2,\ldots,X_n\) is this. For every event \(A\) expressed as a collection of inequalities \[ a_1 < X_1 \leq b_1 \;\mathrm{and}\; a_2 < X_2 \leq b_2 \;\mathrm{and}\;\ldots\;\mathrm{and}\; a_n < X_n \leq b_n \] where \(a_i, b_i\) are arbitrary numbers, the probability \(P(A)\) is the product of probabilities \[ P(a_1 < X_1\leq b_1) \cdot P(a_2 < X_2\leq b_2) \cdot \ldots \cdot P(a_n < X_n\leq b_n)\] This kind of independence is called joint independence of random variables \(X_i\). There is also pairwise independence, which requires that pairs \(X_i,X_j\) be (jointly) independent for each pair \((i,j)\) where \(1\leq i\neq j \leq n\).
- Since \(x_i\) is a random variable, it is a function \(x_i: S\to \reals\) on a sample space. The sample space is hardly ever specified, but be clear that it can be precisely defined. Let the population sample space be \(S=S_{population}\). That is, this is the sample space consisting of all outcomes (individuals). Since a sample consists of \(n\) individuals, it is an \(n\)-tuple. Thus \[ S_{sample} = \{ (s_1,s_2,\ldots,s_n) \; \mathrm{where}\;s_i\; \mathrm{belongs}\;\mathrm{to}\;S_{population}.\} \]
  Example 1:: If we draw an SRS of 3 people (with replacement) out of 100, the sample space \(S_{sample}\) consists of \(100^3\) (a million) tripples (3-tuples) \[ (person_1,person_2,person_3). \] Since we draw with replacement, these persons do not have to be distinct, that is, we may draw the same person twice.
  
  Example 2: If the situation is as in Example 1 but if we draw without replacement, the persons would have to be distinct. In this case, the number of elements in \(S_{sample}\) is \[ 100\cdot 99 \cdot 98 = \frac{100!}{(100-3)!}\] which is slightly smaller than a million, but still a very large number: \[ 970200\]
  Thus, the sample space is large even in relatively simple situations, such as picking 3 candidates for a job, when we have 100 applicants.
Clarity about the sample mean vs. the mean (expected value). Why sample mean is a random variable while mean is not, it is a parameter of the population. Why sample variance is a random variable while variance is not. Be clear why it makes sense to ask for the expected value of the sample variance. The formula for the sample mean: \[ \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \] Thus, \(\bar{x}\) is the arithmetic mean (average) of \(n\) random variables \(x_i\). Similarly, sample variance is \[ s^2 = \frac{1}{n-1}\sum_{i=1}^n \left(x_i - \bar{x}\right)^2 \] and it depends on random variables \(x_1,x_2\ldots,x_n\). Note that it also involves a random variable (statistic) \( \bar{x} \), but this random variable is expressed in terms of \(x_i\), so it introduces no new dependency.
Understand what it means that sample mean is an unbiased estimator of the expected value. This means that: \[ \expect{\bar{x}}=\mu_{\bar{x}} = \mu_{x_1} = \mu \] where \(\mu\) always means the expected value (synonymous with mean but not sample mean) of the underlying variable such as height, weight, etc. Note that \[ \mu = \sum_{u} u \cdot P(x_1 = u) \] where the summation extends over all \(u\) which are possible results of measuring the underlying variable (height, weight, etc.). Note that this applies to discrete random variables. For continuous random variables the analogous formula is: \[ \mu = \int_{-\infty}^\infty x\cdot f(x)\,dx\] where \(f(x)\) is the probability density function of the underlying variable (height, weight, etc.) Note that \(f(x)\,\Delta x\) represents the probability of the variable assuming a value in a small interval \((x,x+\Delta x)\). Thus, the integral is the limit value, as \(\Delta x\to 0\) of the Rieman sums: \[ \sum_{i} u_i \cdot P(u_i < x_i < u_i + \Delta x) = \sum_{i} u_i f(u_i)\Delta x \] where the sum is over evenly and densely spaced numbers \(u_i\) spread across all possible values of the random variable, for instance \(u_i = i \cdot \Delta x\) where \(i\) assumes all integer values (both positive and negative). Thus, the Riemann sum may involve an infinite number of terms, and thus is an infinite series. Note that this cannot be avoided for the normally distributed populations, because all real values are assumed. Thus, computing \(\mu\) often involves infinite summation or improper integrals.
Note the main point of the above discussion: \(\bar{x}\) and \(\mu\) are computed in a completely different manner. The former requires knowing the values \(x_i\) computed from a random sample, and thus is a statistic. The latter requires the knowledge of exact probabilities or the exact probability density function for the entire population. The calculation of \(\mu\) is often through a theoretical reasoning involving higher math, such as finding a sum of an (infinite) series or (improper) integral.
Similarly, note the difference between sample variance \(s^2\) and variance of the population \(\sigma^2\). Here are the contrasting formulas: \[ s^2 = \frac{1}{n-1}\sum_{i=1}^n \left(x_i - \bar{x}\right)^2 \] \[ \sigma^2 = \sum_{u} (u-\mu)^2 P(x_1 = u) \] where the summation extends over all values of the random variable \(x_1\). As you may have seen, or guessed, for continuous variables: \[ \sigma^2 = \int_{-\infty}^\infty (x-\mu)^2 f(x)\,dx \] where \(f(x)\) is the probability density function of \(x_1\).
The fact that we chose \(x_1\) in the above formulas, i.e. the measurement of the underlying variable (height, weight etc.) on the first individual of the sample drawn, is not important. Each random variables \(x_i\) with \(1\leq i \leq n\) can be used. In particular, these variables are identically distributed that is, for each real \(t\) and \(1\leq i,j \leq n\) : \[ P(x_i \le t) = P(x_j \le t).\] In addition, \(x_i\) are jointly independent. Such a collection of random variables is called IID (independent, identically distributed, or "eye-eye-dee").
The formula for the variance for the normal distribution \(N(\mu,\sigma)\) relies upon the fact that \[ \frac{1}{\sqrt{2\,\pi}\sigma}\,\int_{-\infty}^\infty (x-\mu)^2\,e^{-\frac{1}{2}\frac{(x-\mu)^2}{\sigma^2}}\,dx = \sigma^2\] for all \(\mu\) and \(\sigma>0\). This is a calculus fact as much as it is a probability fact, and is typically taught in an advanced vector calculus class, such as Math 223 at the University of Arizona.
Understand what it means that the sample variance is a an unbiased estimator of the variance. By definition, this means: \[ \expect{\bar{x}} = \mu_{\bar{x}} = \mu \] That is, the expected value of the sample mean is the population mean. \[ \expect{s^2} = \mu_{s^2} = \frac{\sigma^2}{n} \] That is, the expected value of the sample variance is the population variance. Note that this would not be true if we hat the coefficient \(1/n\) in the expression for \(s^2\). This is perhaps the most important reason to have a coefficient of \(1/(n-1)\) in the formula for \(s^2\).
Note that the sample standard deviation \[ s=\sqrt{\frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2}\] is not an unbiased estimator of \(\sigma\), the population standard deviation. For a particular distribution, the normal distribution, the unbiased estimator of \(\sigma\) is denoted by \(\hat{\sigma}\) and it may be found in this Wikipedia article along with full discussion of this topic. Note that \(s\) is asymptotically unbiased, that is, the bias disappears for large samples. The bias simply means that \(s\) will tend to systematically underestimate \(\sigma\). One can use the following estimator: \[ \hat{\sigma}=\sqrt{\frac{1}{n-0.5}\sum_{i=1}^n (x_i-\bar{x})^2}\] which typically has smaller bias than \(s\) (rule of thumb).
Ability to calculate the expected value of the sample mean and variance. That is \[ \expect{\bar{x}} = \mu_{\bar{x}} = \mu \] However, please note the distinction between the two quantities: the expected value of the sample variance \[ \expect{s^2} = \sigma^2 \] and the the variance of the sample mean \(\sigma^2_{\bar{x}}\). We recall this derivation: \[ \sigma^2_{\bar{x}} = \var \left(\frac{1}{n}\sum_{i=1} x_i \right) = \left( \frac{1}{n}\right)^2 \sum_{i=1}^n\var(x_i) = \frac{1}{n^2} n\,\var(x_1) = \frac{1}{n^2} n\,\sigma^2 = \frac{\sigma^2}{n}\] which is one of the most important derivations in statistics, and yet simple enough to understand. It is based on known properties of the variance. Thus, the standard deviation of the sample mean \(\bar{x}\) is: \[\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \] Note that this is a parameter of the population. In contrast, the sample mean \(\bar{x}\) is a statistic, and also a random variable.
Note that WebAssign sometimes uses the notation \(s_{\bar{x}}\). This notation is not used in the book or on PowerPoint slides, and is confused. There is no such thing as "sample standard deviation of sample mean" in any context discussed in this course. Typically, the confusion can be resolved by assuming that this notation means SEM, or equivalently, \(s/\sqrt{n}\).
Remember: Every statistic is a random variable. It is a special random variable which is defined on the sample space \(S_{sample}\), discussed elsewhere in these notes.
Note that we need to approximate \(\sigma_{\bar{x}}\), the standard error of the means (SEM), in applications such as the single-sample t-test. The idea of the t-statistic comes from the z-score for the sample mean (the random variable \(\bar{x}\)): \[ z = \frac{\bar{x}-\mu_{\bar{x}}}{\sigma_{\bar{x}}} = \frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\] The main idea of approximation in statistics is this: we replace a population parameter with its estimator, preferrably unbiased, or asymptotically unbiased. This immediately leads to t-statistic derived by approximating z-statistic by replacing known population parameter \(\sigma\) with a statistic \(s\). This leads to the formula \[ t = \frac{\bar{x}-\mu}{\frac{s}{\sqrt{n}}} \]
Note that SEM (standard error of the means is defined as follows: \[ SEM = \frac{s}{\sqrt{n}} \] Thus, SEM is a statistic. Using SEM, we may write the t-statistic as: \[ t = \frac{\bar{x}-\mu_{\bar{x}}}{SEM} \]
Be clear on how sample size affects the sampling distribution, in particular, its expected value and variance. Thus, the mean of the sampling distribution is the mean of the population independently of the sample size. The spread of the sampling distribution decreases with sample size. For a sample of size \(n\) it is exactly \(\sqrt{n}\) times smaller than the spread of the population distribution.

Confidence intervals

Be clear on what a confidence interval is and is NOT. Be clear about the implications.
Thus, the confidence interval is \( (\bar{x}-m, \bar{x}+m) \) for a single sample \((x_1,x_2,\ldots,x_m)\), where \(m\) is the margin of error. The true population mean \(\mu\) belongs to the confidence interval with probability equal to the confidence level \(C\). Thus, if we draw new samples over and over again (with replacement) and \(C=.95\) (95% confidence interval) then 95% of the time the true population mean \(\mu\) will be in the confidence interval.
It is not true that \(\bar{x}\), the sample mean of a sample belongs to the confidence interval with probability \(C\) - the confidence level. Note that a particular confidence interval is determined from a single sample, and thus may not be quite typical: its center may be quite far from the population mean. Thus, a particular confidence interval does not provide any information about where sample means of repeatedly drawn samples will fall.
Be able to calculate when the variance of the population is known (Chapter 6) and unknown (Chapter 7).
Understand the confidence level \(C\).
Know margin of error \(m\). The three formulas for the margin of error in different contexts.
Margin of error for single sample when \(\sigma\) is known, is \[ m = z^* \frac{\sigma}{\sqrt{n}} \] where \(z^*\) is the z-score for which the two-tailed probability is \(1-C\), where \(C\) is the confidence level. Alternatively, if we use Table A, we need to look up the z-value (inverse lookup) for \[ p = \frac{1-C}{2} \] becase Table A lists the probability of the left tail. After that we need to flip the sign: \[ z^* = -z \]
Margin of error for single sample when \(\sigma\) is unknown, is \[ m = t^* \frac{s}{\sqrt{n}} \] where \(t^*\) is the \(t\) for which the two-tailed probability of the \(t\)-distribution with degrees of freedom \(df=n-1\), is \(1-C\), where \(C\) is the confidence level. Alternatively, if we use Table D, \(t^*\) is the value of \(t\) for which the confidence level is \(C\) (the confidence level is in the bottom row). Note that this is inverse lookup (the value of \(t\) for given value of \(P\)). Because Table D lists the RIGH tail, we do not need to flip the sign of the looked up value of \(t\): \[t^* = t \]
Understand the difference between single-sample and two-sample confidence interval and hypothesis testing. Thus, if we have two samples \[ x_1 = (x_{11},x_{12},\ldots, x_{1n_1}) \] of size \(n_1\) and a second sample \[ x_2 = (x_{21},x_{22},\ldots, x_{2n_2}) \] of size \(n_2\) then we have two sample means: \[ \bar{x}_1 = \frac{1}{n_1} \sum_{i=1}^{n_1} x_{1i} \] and \[ \bar{x}_2 = \frac{1}{n_2} \sum_{i=1}^{n_1} x_{2i}. \] Therefore, we have the difference of the means \[ D = \bar{x}_2 - \bar{x}_1. \] \(D\) is a statistic and a random variable. The variance of the difference: \[ \sigma^2_D = \sigma^2_{\bar{x}_1} + \sigma^2_{\bar{x}_2} = \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} \] where \(\sigma_1^2\) and \(\sigma_2^2\) are the population standard deviations of the two populations from which samples \(x_1\) and \(x_2\) come from. Note that if \(x_1\) and \(x_2\) are two SRSs drawn from the same populations then \(\sigma_1 = \sigma_2\). This is due to the independence of \(\bar{x_1}\) and \(\bar{x_2}\). Note that this independence is a consequence of the independence between the two groups of random variables \(x_{1i}\) and \(x_{2j}\). Since only the first group contributes to \(\bar{x}_1\) and only the second group to \(\bar{x}_2\), this yields the independence of \(\bar{x_1}\) and \(\bar{x_2}\). While intuitively clear, the full derivation of this property is not given here. Thus we have the formula \[ \sigma_D = \sigma_{\bar{x}_2-\bar{x}_1} = \sqrt{\frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}} \]
The two-sample z-statistic is the z-score for the random variable \(D\). Thus \[ z = \frac{D - \mu_D}{\sigma_D} = \frac{(\bar{x}_2-\bar{x}_1) - (\mu_2-\mu_1)}{\sqrt{\frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}}}. \]
As usual, the t-statistic for a two-sample situation is obtained by replacing unknown \(\sigma\)'s with \(s\)'s. If the two populations are different, this leads to the formula \[ \sigma_D \approx SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \] and the resulting t-statistic is \[ t = \frac{(\bar{x}_2-\bar{x}_1) - (\mu_2-\mu_1)}{\sqrt{\frac{s^2_1}{n_1} + \frac{s^2_2}{n_2}}}. \] However, when the two samples \(x_1\) and \(x_2\) come from the same population, \(\sigma_1 = \sigma_2 = \sigma\) is approximated by the sample standard deviation of the pooled sample. The pooled sample is obtained by combining samples \(x_1\) and \(x_2\). Thus, the pooled sample is: \[ x = (x_{11},x_{12},\ldots,x_{1n_1},x_{21},x_{22},\ldots,x_{2n_2}) \] and it has size \(n_1 + n_2\). If \(s\) is the sample standard deviation of the pooled sample then we form the t-statistic as follows: \[ t = \frac{(\bar{x}_2-\bar{x}_1) - (\mu_2-\mu_1)}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} =\frac{(\bar{x}_2-\bar{x}_1) - (\mu_2-\mu_1)}{s\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}. \] We note that for the pooled sample \(df = n_1 + n_2 -1\) and is exact if the population is normal.
When calculating the confidence interval for two-sample situation, we modify the formulas for the margin of error accordingly. When the variances of the two populations are not the same: \[ m = t^* \sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}} \] and when they are the same \[ m = t^* s \sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \] and \(s\) is the sample mean of the pooled sample.
To the curious: As every random variable, it is a function on a sample space. The sample space in this case consists of pairs of samples \((x_1,x_2)\) as above, of fixed sizes \(n_1\) and \(n_2\). These samples may come from the same or different populations. The samples are assumed to be independently chosen, which implies that random variables \(x_{1i}\) and \(x_{2j}\) (measurements on individuals of the two samples) are independent. The consequence of independence is that we may apply a Product Rule for Probability of Independent Events: \(P(A\cap B) = P(A)\cdot P(B)\) if \(A\) depends on sample \(x_1\) and \(B\) depends on sample \(x_2\).
Remember: when doing an inverse lookup, we never multiply \(t\)-value or \(z\)-value by 2. We only perform such calculations on probabilities (P-values) as required by one-sided or two-sided nature of the calculation. Always draw the picture of the tails if unsure whether probability should be divided or multiplied by 2.

Hypothesis testing

Know the mechanics of hypothesis testing. Null hypothesis, alternative hypothesis, one-sided vs. two-sided tests.
Be able to formulate null and alternative hypothesis.
Understand the meaning of P-value, be able to compute it when the variance of the population is known (Chapter 6) and unknown (Chapter 7).
Be fluent in various lookups (straight and inverse) based on Table D (and review Table A), e.g.
- Given \(t\), degrees of freedom and confidence level, look-up P-value range (straight lookup).
- Given one of the standard confidence levels (i.e. once present in bottom row), look up \(t\) (inverse lookup).
- Be able to translate between significance level \(C\) and confidence level \(\alpha\). It is always true that \(C + \alpha = 1 \). However, in two-sided test, \(\alpha\) is split into two-tails and in one-sided tests \(\alpha\) should be the probability of one tail.
Be clear what the significance level \(\alpha\) is. Not to be confused with confidence level \(C\).
Understand the difference between rejecting null hypothesis vs. accepting alternative hypothesis.
Understand the definition type I and II errors.
Type I error is the situation when we reject a null hypothesis when it is actually true. This happens with probability \(\alpha\), the significance level.
Type II error ocurrs when we do not reject the null hypothesis when it is false. The probability of rejecting the false null hypothesis is called the power of the test. If the probability of committing type II error is \(\beta\) then the power of the test is \(1-\beta\).

The Student t-distribution

Know when to apply (when the variance of the population is unknown).
Be able to calculate t-statistic.
Be able to compute the number of degrees of freedom
Be able to use table D of the book to lookup the critical values of the t-statistic and figure out the P-value.
Know how the t-distribution changes with degrees of freedom. Know that it becomes normal for large df.
Be able to conduct a t-test in the scope of 7.1. (single-sample).
Be able to conduct a two-sample t-test in the scope of 7.2. (two-sample). Know the conservative estimate for the degrees of freedom for two samples drawn from two distinct populations for which there is no reason to believe that they have the same standard deviation: \[ df = \min(n_1-1,n_2-2) \] However, as discussed before, when the standard deviations are the same (for instance, because the two samples are SRSs drawn from the same population) then we used the pooled t-statistic discussed before, and the degrees of freedom are: \[ df = n_1 + n_2 - 1. \]
The more accurate formula for the degrees of freedom when variances of the populations are not equal is the Welch-Satterthwaite formula. \[df = {{\left( {s_1^2 \over n_1} + {s_2^2 \over n_2}\right)^2 } \over {{s_1^4 \over n_1^2 \cdot \nu_1}+{s_2^4 \over n_2^2 \cdot \nu_2}}}={{\left( {s_1^2 \over n_1} + {s_2^2 \over n_2}\right)^2 } \over {{s_1^4 \over n_1^2 \cdot \left({n_1-1}\right)}+{s_2^4 \over n_2^2 \cdot \left({n_2-1}\right)}}} .\,\] (Based on Wikipedia article on the Welch's test with the Welch-Satterthwaite formula for the number of degrees of freedom.) We used the notation \(\nu_1=n_1-1\) and \(\nu_2=n_2-1\). In calculations by hand, this formula is not very practical, but fortunately easy to do in software. A two-sample test with R is presented here. The formula is implemented there, in a reusable fashion.

The list of Chapter 8 topics covered

Inference for single proportion

Remember: In Chapter 8 we never need the t-statistic. It is all based on normal distribution (Table A).
Know the single proportion z-statistic: \[ z = \frac{\hat{p} - p_0 }{\sqrt{\frac{p_0(1-p_0)}{n}}} \] Note that in the denominator we use \(\hat{p}\) not \(p_0\). This is according to the general approximation principle stated above: we replaced the unknown population parameter \(p_0\) with its unbiased estimator \(\hat{p}\) (remember that \(\hat{p}\) is unbiased because \(\mu_{\hat{p}} = p_0\)).
Know the meaning of the denominator which depends on the knowledge of proportions as random variables from earlier Chapters. Thus, \[ \mu_{\hat{p}} = p \] and \[ \sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} \]
Know the Central Limit Theorem (CLT) and its implication for proportions: the z-statistic is approximately normally distributed according to N(0,1).
Know the large sample approximation. To apply the CLT we need to have the following:
- The population must be at least 10 times larger than the sample
- Both the number of failures and successes must be at least 10.
Know the margin of error for calculating confidence intervals for proportions: \[ m = z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \] The confidence interval is \[ (\hat{p}-m, \hat{p}+m) \]
Know how to conduct a test using the above statistic. The general methodology of tests is applicable, but in the proportion context we may text the null hypothesis \(H_0:\;p=p_0\), i.e. whether the proportion of the population having a certain characteristic is \(p_0\). For instance, we may test whether a coin is fair, by using the proportion \(\hat{p}\) from \(n=100\) coin tosses, with \(p_0 = 0.5\).
Know the plus 4 correction. That is, in order to obtain a better estimate of the confidence interval, we may compensate for the error due to the finite sample size by using a new statistic \(\tilde{p}\) instead of \(\hat{p}\) computed just like \(\hat{p}\) but assuming artificially increased number for the sample size: \(n+4\). Thus \[\tilde{p} = \frac{X + 2}{n + 4}\] where \(X\) is the count of successes. The numerator \(X+2\) is explained by the fact that 4 is evenly split between the successes and failures.
We also have a "plus 4" rule for the margin of error: \[ m = z^*\sqrt{\frac{\tilde{p}(1-\tilde{p})}{n+4}} \] Note that we used the corrected estimator \(\tilde{p}\), not \(\hat{p}\).
Finally, please remember that \[ \hat{p} = \frac{X}{n} \] where \(X\) has binomial distribution. When the samples are too small (say, less than 15 failures or successes, see the precise rules of thumb in the book covering single-sample and two-sample situations, means and proportions) to use the large sample approximation, or the "plus 4" rule, you may need to resort to using the binomial distribution (Table C). Thus, if you are testing a coin for fairness with \(n = 4\) tosses, the probability that you will get \(X=0,1,2,3,4\) successes (heads) is determined by B(4,p), i.e. \[ P(X=k) = {4\choose k}\cdot p^k\cdot(1-p)^{4-k} \] The null hypothesis would be that \(p=0.5\). Suppose you got jus 1 success. The P-value for a two-sided test of significance would be the probability of getting 1 success or worse, i.e. 0, 1, 4 or 5 successes. Hence, the P-value would be \[ P(|X-2|\ge 1) = 1- P(X=2) = 1 - {4 \choose 2}\cdot 0.5^2\cdot 0.5^2 = 1-6\cdot 0.5^4 = 0.625 \] Hence, your confidence level is \(1-0.625=.375\) or 37.5%, which by all standards does not lead to rejecting the null hypothesis. As you can see, in practice small samples are too small for testing, and the accuracy of the binomial distribution does not help here.
However, the binomial distribution should be used when extreme bias happens, such as \(\hat{p} = 0.6\) and we are testing the alternative hypothesis \(p < 0.2\) with \(n=20\). This is equivalent to stating that there were 12 successes and 8 failures because \(\hat{p}\cdot n = 0.6\cdot 20 = 12\). Below is an example of a test of significance based on the binomial distribution. The P-value can be only reliably be found by the binomial distribution, and the large sample approximation cannot be used by the rules of thum stated in the book. Note that the expected value of count \(X\) is \(n\,p = 20\cdot 0.2 = 4\), assuming the null hypothesis \(H_0: p = 0.2\). The probability of \(\hat{p}\) of \(0.6\) or higher is \[P(X\ge 0.6\cdot 20) = P(X\ge 12) = \sum_{k=12}^{20} {20 \choose k}\cdot (0.2)^k\cdot (0.8)^{20-k} \] This P-value may be evaluated with software (or Table C) and is \[P = 1.0172876515704846\cdot 10^{-4} \approx 0.01\% \] and thus is highly significant.

Inference for 2 proportions

When we have two samples from two populations with true proportions \(p_1\) and \(p_2\) of individuals with a certain characteristic of interest, we may wish to compare the proportions. The z-statistic used is: \[ z = \frac{(\hat{p}_1-\hat{p}_2) - (p_1-p_2)} {\sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}} \]
Know the resulting formula for margin of error for the difference \(p_1-p_2\) is: \[ m = z^* \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \]
Know the "plus 4" rule for two proportions. This becomes "plus 2" rule for each individual proportion, i.e. \[\tilde{p}_1 = \frac{X_1+1}{n_1+2}\] \[\tilde{p}_2 = \frac{X_2+1}{n_2+2}\] where \(X_1\) and \(X_2\) are the success counts for both samples. Similarly, the margin of error formula is modified accordingly: \[ m = z^* \sqrt{ \frac{\tilde{p}_1(1-\tilde{p}_1)}{n_1+2} + \frac{\tilde{p}_2(1-\tilde{p}_2)}{n_2+2} } \] Again, please note that \(\tilde{p}_1\) and \(\tilde{p}_2\) are used, not \(\hat{p}_1\) and \(\hat{p}_2\). This correction should be used when the confidence level \(C\ge 90\%\) and both sample sizes are at least 5.
Know the notion of relative risk: \[RR = \frac{p_1}{p_2}\] For instance, we may consider the probabilities of getting breast cancer for women in the US and in Japan. If the relative risk is \(>1\) then women in the US get breast cancer more often. If the relative risk is \(=1\) then there is no difference between the US and Japan. The estimator of the relaive risk would be \[\widehat{RR} = \frac{\hat{p}_1}{\hat{p}_2} \] and is easy to find for a given sample.
However, the book does not discuss the sampling distribution of \(\widehat{RR}\), and thus we cannot conduct tests or compute confidence intervals for \(RR\) based on the information in the book. Such information is available elsewhere. Note that figuring out the distribution of \(\widehat{RR}\) requires the use of advanced mathematics, and emphasizes the role played by mathematics in statistics.