Stat 202: Lecture 17 (covers pp. 171-191)

Nathan VanHoudnos
11/3/2014

Agenda

  1. Office hours
  2. Checkpoint comments
  3. OLI textbook comments
  4. Lecture 16 (covers pp. 171-187) wrap up
  5. Lecture 17 (covers pp. 188-191)

New office hours

Based on the results of the WhenIsGood poll, Aaron and I will change our office hours.

  • Nathan: Mondays 11-noon in my office
  • Aaron: Mondays 3-4pm at the stat department

Checkpoint #18A and #18B

  • They are on BlackBoard, not OLI
  • Both are due Wednesday

Agenda

  1. Checkpoint comments
  2. OLI textbook comments
  3. Lecture 16 (covers pp. 171-187) wrap up
  4. Lecture 17 (covers pp. 188-191)

OLI textbook comments

On p. 179:

Interval estimation takes point estimation a step further and says something like:

“I am 95% confident that by using the point estimate \( \bar{x}=115 \) to estimate \( \mu \), I am off by no more than 3 IQ points. In other words, I am 95% confident that \( \mu \) is within 3 of 115, or between 112 (115 - 3) and 118 (115 + 3).”

Yet another way to say the same thing is: I am 95% confident that μ is somewhere in (or covered by) the interval (112,118).

OLI textbook comments

Many of these sound like “there is a 95% chance that…”

Interval estimation takes point estimation a step further and says something like:

“I am 95% confident that by using the point estimate \( \bar{x}=115 \) to estimate \( \mu \), I am off by no more than 3 IQ points. In other words, I am 95% confident that \( \mu \) is within 3 of 115, or between 112 (115 - 3) and 118 (115 + 3).”

Yet another way to say the same thing is: I am 95% confident that \( \mu \) is somewhere in (or covered by) the interval (112,118).

OLI textbook comments

When someone says

I am 95% confident that \( \mu \) is within…

They are using confident as a technical shorthand that means

I am confident that, in the future, over repeated experiments, 95% of my confidence intervals will capture \( \mu \).

Note that confident \( \ne \) probabiilty.

OLI textbook comments

The book is not wrong, it is merely deeply confusing:

Interval estimation takes point estimation a step further and says something like:

“I am 95% confident that by using the point estimate \( \bar{x}=115 \) to estimate \( \mu \), I am off by no more than 3 IQ points. In other words, I am 95% confident that \( \mu \) is within 3 of 115, or between 112 (115 - 3) and 118 (115 + 3).”

Yet another way to say the same thing is: I am 95% confident that \( \mu \) is somewhere in (or covered by) the interval (112,118).

Beating a dead horse

Deborah Mayo a

Philosopher of Science at Virginia Tech

Following Savage (1962), the probability that a parameter lies in a specific interval may be referred to as a measure of final precision. While a measure of final precision may seem desirable, and while confidence levels are often (wrongly) interpreted as providing such a measure, no such interpretation is warranted. Admittedly, such a misinterpretation is encouraged by the word “confidence”.

Agenda

  1. Checkpoint comments
  2. OLI textbook comments
  3. Lecture 16 (covers pp. 171-187) wrap up
  4. Lecture 17 (covers pp. 188-191)

Lecture 16: Inference

Interval estimation

  • Estimating a population quantity with a statistic that is an interval.
  • Two ways to estimate an interval:
    • Option F: Confidence Intervals: If we repeat the experiment over and over, 95% of intervals will contain the true value.
    • Option B: Credible Intervals: The probability that the true value is contained within the interval is 95%.

More with Option F

In Stat 202, we will focus on confidence intervals.

If the data are normally distributed with an unknown mean \( \mu \) and a known standard deviation \( \sigma \), then, a 95% confidence interval for \( \mu \) is

  • \( \bar{x} \pm 2 \sigma /\sqrt{n} \)

If the are distributed with an unknown mean \( \mu \) and a known standard deviation \( \sigma \), then, if \( n \ge 30 \), a 95% confidence interval is

  • \( \bar{x} \pm 2 \sigma /\sqrt{n} \)

An example

The newest variety of potato is the Normally-Distributed (ND) potato. Its weight is normally distribution with a standard deviation of 12 grams and an unknown mean.

If Dan selects a “1 lbs” bag with 9 potatoes in it, and the weight of the average potato in the bag is 52 grams, calculate a 95% confidence interval for the population mean.

(on board)

An example

Recall that the weight of a Klamath Pearl has an unknown distribution, but we do know that the unknown distribution has a standard deviation of 10 grams.

If Dan selects a “1 lbs” bag with 9 (Klamath Pearl) potatoes in it, and the weight of the average potato in the bag is 52 grams, calculate a 95% confidence interval for the population mean.

We can not: \( n=9 \). The CLT does not apply until \( n \ge 30 \).

An example

Let the Quipper potato have an unknown weight distribution, but we do know that the unknown distribution has a standard deviation of 14 grams.

If Dan selects a “5 lbs” bag with 49 (Quipper) potatoes in it, and the weight of the average potato in the bag is 52 grams, calculate a 95% confidence interval for the population mean.

(On board)

More with Option F

If you want a different confidence interval, you change the 2.

A 68% confidence interval:

  • \( \bar{x} \pm 1 \sigma /\sqrt{n} \)

A 95% confidence interval:

  • \( \bar{x} \pm 2 \sigma /\sqrt{n} \)

A 99.7% confidence interval:

  • \( \bar{x} \pm 3 \sigma /\sqrt{n} \)

Note: Higher levels of confidence have less precise intervals.

An example

Let the Quipper potato have an unknown weight distribution, but we do know that the unknown distribution has a standard deviation of 14 grams.

If Dan selects a “5 lbs” bag with 49 (Quipper) potatoes in it, and the weight of the average potato in the bag is 52 grams, calculate 68%, 95%, and 99.7% confidence intervals for the population mean.

(On board)

More with Option F

If you want a different confidence interval, you change the 2.

A 95% confidence interval:

  • \( \bar{x} \pm 2 \sigma /\sqrt{n} \)

A \( (1-\alpha) \times 100\% \) confidence interval:

  • \( \bar{x} \pm z_{\alpha/2} \cdot \sigma /\sqrt{n} \)

Where \( z_{\alpha/2} \) is the \( z \) score that defines the appropriate tail of a standard normal.

Defining tails

\( z_{\alpha/2} \) for a 68% confidence interval

qnorm((1-.68)/2)
[1] -0.9944579

\( z_{\alpha/2} \) for a 95% confidence interval

qnorm((1-.95)/2)
[1] -1.959964

Note: that the 68/95/99.7 rule is an approximation.

Defining tails

\( z_{\alpha/2} \) for a 99.7% confidence interval

qnorm((1-.997)/2)
[1] -2.967738

\( z_{\alpha/2} \) for a 99.99999% confidence interval

qnorm((1-.9999999)/2)
[1] -5.326724

Length and Margin of Error

Definitions:

  • \( 2 \cdot z_{\alpha/2} \cdot \sigma /\sqrt{n} \) is the length
  • \( z_{\alpha/2} \cdot \sigma /\sqrt{n} \) is the margin of error.

For example, a 95% confidence interval is

  • \( \bar{x} \pm 2 \sigma /\sqrt{n} \)
  • its length is \( 4 \sigma /\sqrt{n} \)
  • its margin of error is \( 2 \sigma /\sqrt{n} \)

Sample size calculations

If you want to plan a study, you solve for \( n \)

IQ scores are known to vary normally with a standard deviation of 15. How many students should be sampled if we want to estimate the population mean IQ at 99.7% confidence with a margin of error equal to 5?

(on board)

Sample size calculations

If you want to plan a study with a margin of error \( m \), you solve for \( n \)

\[ n = \left( \frac{z_{\alpha/2} \cdot \sigma}{m} \right)^2 \]

and then round up to the nearest integer.

Note: Higher values of \( n \) give more precise intervals.

Sample size calculations

IQ scores are known to vary normally with a standard deviation of 15. How many students should be sampled if we want to estimate the population mean IQ at 99.7% confidence with a length equal to 8?

What is the margin of error for the formula?

The length is 8, the margin of error is 4.

Lecture 16 Summary (p. 187)

If the data are normally distributed with an unknownn mean \( \mu \) and a known standard deviation \( \sigma \), then, a 95% confidence interval for \( \mu \) is

  • \( \bar{x} \pm 2 \sigma /\sqrt{n} \)

If the are distributed with an unknown mean \( \mu \) and a known standard deviation \( \sigma \), then, if \( n \ge 30 \), a 95% confidence interval is

  • \( \bar{x} \pm 2 \sigma /\sqrt{n} \)

Note: The book covers cases with unknown standard deviation. We will not cover this yet.

Agenda

  1. Checkpoint comments
  2. OLI textbook comments
  3. Lecture 16 (covers pp. 171-187) wrap up
  4. Lecture 17 (covers pp. 188-191)

Lecture 17

Lecture 16, but with sample proportions:

Recall if \[ \begin{aligned} np & \ge 10 & & \text{and} & n(1-p) \ge 10 \end{aligned} \]

then \( \hat{p} \) is approximately normal:

\[ \hat{p} \sim N\left( p, \frac{p(1-p)}{n} \right) \]

Where the mean and standard deviation of \( \hat{p} \) are:

\[ \begin{aligned} E[\hat{p}] & = p & & & \text{sd}[\hat{p}] & = \sqrt{\frac{p(1-p)}{n}} \end{aligned} \]

A comparison

Sample mean

\[ \bar{x} \sim N(\mu, \sigma^2/n) \]

95% confidence interval

\[ \bar{x} \pm 2 \sigma/\sqrt{n} \]

Since \( \sigma \) is assumed to be known, we can calculate this.

Sample Proportion

\[ \hat{p} \sim N(p, p(1-p)/n) \]

95% confidence interval

\[ \hat{p} \pm 2 \sqrt{p(1-p)/n} \]

Oh no! \( p \) is not known. What to do?

Replace \( p \) with our best guess, \( \hat{p} \)

Summary

Sample mean

\[ \bar{x} \sim N(\mu, \sigma^2/n) \]

95% confidence interval

\[ \bar{x} \pm 2 \sigma/\sqrt{n} \]

Change the 2 for different levels of confidence.

Sample Proportion

\[ \hat{p} \sim N(p, p(1-p)/n) \]

95% confidence interval

\[ \hat{p} \pm 2 \sqrt{\hat{p}(1-\hat{p})/n} \]

Change the 2 for different levels of confidence.

An example

Tomorrow is election day. Go vote!

Consider a poll of 100 voters. If 60% of respondents say they will vote for Candidate A, what is the 95% confidence interval for the (population) proportion of people who will vote for Candidate A?

What are \( \hat{p} \) and \( n \)?

\( \hat{p} = .6 \) and \( n = 100 \)

\[ \begin{aligned} \hat{p} \pm 2 \sqrt{\hat{p}(1-\hat{p})/n} & = .6 \pm 2 \sqrt{.6(1-.6)/100} \\ & = .6 \pm 0.098 \end{aligned} \]

A further example

Tomorrow is election day. Go vote!

Consider a poll of 100 voters. If 60% of respondents say they will vote for Candidate A, what is the 99.7% confidence interval for the (population) proportion of people who will vote for Candidate A?

What are \( \hat{p} \) and \( n \)?

\( \hat{p} = .6 \) and \( n = 100 \)

\[ \begin{aligned} \hat{p} \pm 3 \sqrt{\hat{p}(1-\hat{p})/n} & = .6 \pm 3 \sqrt{.6(1-.6)/100} \\ & = .6 \pm 0.147 \end{aligned} \]

A hiccup for sample size planning

Sample mean 95% confidence interval

\[ \bar{x} \pm [2] \sigma/\sqrt{n} \]

To plan a study with a margin of error \( m \), you solve for \( n \)

\[ n = \left( \frac{[2] \cdot \sigma}{m} \right)^2 \]

Sample proportion 95% confidence interval

\[ \hat{p} \pm [2] \sqrt{\hat{p}(1-\hat{p})/n} \]

To plan a study with a margin of error \( m \), you solve for \( n \)

\[ n = \left( \frac{[2] \cdot \hat{p}(1-\hat{p})}{m} \right)^2 \]

We do not know \( \hat{p} \)!

Pick the worst possible p-hat

A 95% confidence interval

\[ \hat{p} \pm [2] \sqrt{\hat{p}(1-\hat{p})/n} \]

  • The worst confidence interval, is the widest confidence interval
  • Pick a value of \( \hat{p} \) to maximize \( \sqrt{\hat{p}(1-\hat{p})/n} \)
  • For \( 0 \le \hat{p} \le 1 \), the function \( \sqrt{\hat{p}(1-\hat{p})/n} \) is maximized by \[ \hat{p} = 1/2 \]

Pick the worst possible p-hat

A 95% confidence interval

\[ \hat{p} \pm [2] \sqrt{\hat{p}(1-\hat{p})/n} \]

A confidence interval that is wider than the 95% confidence interval

\[ \hat{p} \pm [2] \sqrt{.5(1-.5)/n} \]

or, more simply,

\[ \hat{p} \pm [2] \cdot \left(0.5/\sqrt{n} \right) \]

Use this wider interval for sample size calculations.

Sample size planning summary

Sample mean 95% confidence interval

\[ \bar{x} \pm [2] \sigma/\sqrt{n} \]

To plan a study with a margin of error \( m \), you solve for \( n \)

\[ n = \left( \frac{[2] \cdot \sigma}{m} \right)^2 \]

Change the \( [2] \) for different confidence.

Sample proportion 95% confidence interval

\[ \hat{p} \pm [2] \sqrt{\hat{p}(1-\hat{p})/n} \]

To plan a study with a margin of error \( m \), you solve for \( n \)

\[ n = \left( \frac{[2] \cdot 0.5}{m} \right)^2 \]

Change the \( [2] \) for different confidence.

Examples

What sample size of U.S. adults do you need, if you would like to estimate the proportion of U.S. adults who are “pro-choice” with a 2.5% margin of error (at the 95% level)?

(on board)

What sample size of U.S. adults do you need, if you would like to estimate the proportion of U.S. adults who are “pro-choice” with a 2.5% margin of error (at the 99.7% level)?

(on board)

A cold read exercise

Can I have two volunteers to read from a script?

(Time permitting.)

A dialogue

Person: Can you give me an interval that will contain \( \mu \) with 95% probability?

Stat F: Sure, I can give you a confidence interval. Over 100 replications of your experiment, 95% of the confidence intervals with contain \( \mu \).

Person: Sorry, I misspoke. I want to perform one experiment, calculate an interval, and know that there is a 95% chance that \( \mu \) is in that interval.

Stat F: [as a smarty-pants] Ah! I see that you have a common misconception of what a confidence interval is. Let me try to explain it again…

A dialogue

Person: [interrupting, frustrated] No, no, no. I am smart person who understood what you said. It is, however, clear to me that you do not understand what I am asking …

[Enter Statistician B]

Stat B: Hi! I overheard you talking to my colleague that you want an interval that will contain \( \mu \) with 95% probability?

Person: Yes! Finally, a statistician who understands what I am asking!

A dialogue

Stat F: [freaked out] Wait! Stop! He is the devil!

Person: [confused] What, why?

Stat F: He will use calculus and computer programming!

Person: [confused stare] ?

Stat F: He will make you specify how you expect your experiment to turn out!

Person: [confused] Wouldn't you ask me the same questions for a sample size calculation?

A dialogue

Stat F: Yes, but he will actually use your opinions instead of ignoring them like I do! Your result might not be objective! It might be influenced by what you believe!

Person: [confused stare] ?

Stat B: Ah, my dear colleague, but I will answer the question that is asked instead of insisting on answering a different question.

Stat B: We do not want what you are selling. Be gone!

[Stat F is so frustrated as to die an exaggerated theatrical death.]