# Stat 202: Lecture 17 (covers pp. 171-191)

Nathan VanHoudnos
11/3/2014

### Agenda

1. Office hours
4. Lecture 16 (covers pp. 171-187) wrap up
5. Lecture 17 (covers pp. 188-191)

### New office hours

Based on the results of the WhenIsGood poll, Aaron and I will change our office hours.

• Nathan: Mondays 11-noon in my office
• Aaron: Mondays 3-4pm at the stat department

### Checkpoint #18A and #18B

• They are on BlackBoard, not OLI
• Both are due Wednesday

### Agenda

3. Lecture 16 (covers pp. 171-187) wrap up
4. Lecture 17 (covers pp. 188-191)

On p. 179:

Interval estimation takes point estimation a step further and says something like:

“I am 95% confident that by using the point estimate $$\bar{x}=115$$ to estimate $$\mu$$, I am off by no more than 3 IQ points. In other words, I am 95% confident that $$\mu$$ is within 3 of 115, or between 112 (115 - 3) and 118 (115 + 3).”

Yet another way to say the same thing is: I am 95% confident that μ is somewhere in (or covered by) the interval (112,118).

Many of these sound like “there is a 95% chance that…”

Interval estimation takes point estimation a step further and says something like:

“I am 95% confident that by using the point estimate $$\bar{x}=115$$ to estimate $$\mu$$, I am off by no more than 3 IQ points. In other words, I am 95% confident that $$\mu$$ is within 3 of 115, or between 112 (115 - 3) and 118 (115 + 3).”

Yet another way to say the same thing is: I am 95% confident that $$\mu$$ is somewhere in (or covered by) the interval (112,118).

When someone says

I am 95% confident that $$\mu$$ is within…

They are using confident as a technical shorthand that means

I am confident that, in the future, over repeated experiments, 95% of my confidence intervals will capture $$\mu$$.

Note that confident $$\ne$$ probabiilty.

The book is not wrong, it is merely deeply confusing:

Interval estimation takes point estimation a step further and says something like:

“I am 95% confident that by using the point estimate $$\bar{x}=115$$ to estimate $$\mu$$, I am off by no more than 3 IQ points. In other words, I am 95% confident that $$\mu$$ is within 3 of 115, or between 112 (115 - 3) and 118 (115 + 3).”

Yet another way to say the same thing is: I am 95% confident that $$\mu$$ is somewhere in (or covered by) the interval (112,118).

Deborah Mayo

Philosopher of Science at Virginia Tech

Following Savage (1962), the probability that a parameter lies in a specific interval may be referred to as a measure of final precision. While a measure of final precision may seem desirable, and while confidence levels are often (wrongly) interpreted as providing such a measure, no such interpretation is warranted. Admittedly, such a misinterpretation is encouraged by the word “confidence”.

### Agenda

3. Lecture 16 (covers pp. 171-187) wrap up
4. Lecture 17 (covers pp. 188-191)

### Lecture 16: Inference

Interval estimation

• Estimating a population quantity with a statistic that is an interval.
• Two ways to estimate an interval:
• Option F: Confidence Intervals: If we repeat the experiment over and over, 95% of intervals will contain the true value.
• Option B: Credible Intervals: The probability that the true value is contained within the interval is 95%.

### More with Option F

In Stat 202, we will focus on confidence intervals.

If the data are normally distributed with an unknown mean $$\mu$$ and a known standard deviation $$\sigma$$, then, a 95% confidence interval for $$\mu$$ is

• $$\bar{x} \pm 2 \sigma /\sqrt{n}$$

If the are distributed with an unknown mean $$\mu$$ and a known standard deviation $$\sigma$$, then, if $$n \ge 30$$, a 95% confidence interval is

• $$\bar{x} \pm 2 \sigma /\sqrt{n}$$

### An example

The newest variety of potato is the Normally-Distributed (ND) potato. Its weight is normally distribution with a standard deviation of 12 grams and an unknown mean.

If Dan selects a “1 lbs” bag with 9 potatoes in it, and the weight of the average potato in the bag is 52 grams, calculate a 95% confidence interval for the population mean.

(on board)

### An example

Recall that the weight of a Klamath Pearl has an unknown distribution, but we do know that the unknown distribution has a standard deviation of 10 grams.

If Dan selects a “1 lbs” bag with 9 (Klamath Pearl) potatoes in it, and the weight of the average potato in the bag is 52 grams, calculate a 95% confidence interval for the population mean.

We can not: $$n=9$$. The CLT does not apply until $$n \ge 30$$.

### An example

Let the Quipper potato have an unknown weight distribution, but we do know that the unknown distribution has a standard deviation of 14 grams.

If Dan selects a “5 lbs” bag with 49 (Quipper) potatoes in it, and the weight of the average potato in the bag is 52 grams, calculate a 95% confidence interval for the population mean.

(On board)

### More with Option F

If you want a different confidence interval, you change the 2.

A 68% confidence interval:

• $$\bar{x} \pm 1 \sigma /\sqrt{n}$$

A 95% confidence interval:

• $$\bar{x} \pm 2 \sigma /\sqrt{n}$$

A 99.7% confidence interval:

• $$\bar{x} \pm 3 \sigma /\sqrt{n}$$

Note: Higher levels of confidence have less precise intervals.

### An example

Let the Quipper potato have an unknown weight distribution, but we do know that the unknown distribution has a standard deviation of 14 grams.

If Dan selects a “5 lbs” bag with 49 (Quipper) potatoes in it, and the weight of the average potato in the bag is 52 grams, calculate 68%, 95%, and 99.7% confidence intervals for the population mean.

(On board)

### More with Option F

If you want a different confidence interval, you change the 2.

A 95% confidence interval:

• $$\bar{x} \pm 2 \sigma /\sqrt{n}$$

A $$(1-\alpha) \times 100\%$$ confidence interval:

• $$\bar{x} \pm z_{\alpha/2} \cdot \sigma /\sqrt{n}$$

Where $$z_{\alpha/2}$$ is the $$z$$ score that defines the appropriate tail of a standard normal.

### Defining tails

$$z_{\alpha/2}$$ for a 68% confidence interval

qnorm((1-.68)/2)

[1] -0.9944579


$$z_{\alpha/2}$$ for a 95% confidence interval

qnorm((1-.95)/2)

[1] -1.959964


Note: that the 68/95/99.7 rule is an approximation.

### Defining tails

$$z_{\alpha/2}$$ for a 99.7% confidence interval

qnorm((1-.997)/2)

[1] -2.967738


$$z_{\alpha/2}$$ for a 99.99999% confidence interval

qnorm((1-.9999999)/2)

[1] -5.326724


### Length and Margin of Error

Definitions:

• $$2 \cdot z_{\alpha/2} \cdot \sigma /\sqrt{n}$$ is the length
• $$z_{\alpha/2} \cdot \sigma /\sqrt{n}$$ is the margin of error.

For example, a 95% confidence interval is

• $$\bar{x} \pm 2 \sigma /\sqrt{n}$$
• its length is $$4 \sigma /\sqrt{n}$$
• its margin of error is $$2 \sigma /\sqrt{n}$$

### Sample size calculations

If you want to plan a study, you solve for $$n$$

IQ scores are known to vary normally with a standard deviation of 15. How many students should be sampled if we want to estimate the population mean IQ at 99.7% confidence with a margin of error equal to 5?

(on board)

### Sample size calculations

If you want to plan a study with a margin of error $$m$$, you solve for $$n$$

$n = \left( \frac{z_{\alpha/2} \cdot \sigma}{m} \right)^2$

and then round up to the nearest integer.

Note: Higher values of $$n$$ give more precise intervals.

### Sample size calculations

IQ scores are known to vary normally with a standard deviation of 15. How many students should be sampled if we want to estimate the population mean IQ at 99.7% confidence with a length equal to 8?

What is the margin of error for the formula?

The length is 8, the margin of error is 4.

### Lecture 16 Summary (p. 187)

If the data are normally distributed with an unknownn mean $$\mu$$ and a known standard deviation $$\sigma$$, then, a 95% confidence interval for $$\mu$$ is

• $$\bar{x} \pm 2 \sigma /\sqrt{n}$$

If the are distributed with an unknown mean $$\mu$$ and a known standard deviation $$\sigma$$, then, if $$n \ge 30$$, a 95% confidence interval is

• $$\bar{x} \pm 2 \sigma /\sqrt{n}$$

Note: The book covers cases with unknown standard deviation. We will not cover this yet.

### Agenda

3. Lecture 16 (covers pp. 171-187) wrap up
4. Lecture 17 (covers pp. 188-191)

### Lecture 17

Lecture 16, but with sample proportions:

Recall if \begin{aligned} np & \ge 10 & & \text{and} & n(1-p) \ge 10 \end{aligned}

then $$\hat{p}$$ is approximately normal:

$\hat{p} \sim N\left( p, \frac{p(1-p)}{n} \right)$

Where the mean and standard deviation of $$\hat{p}$$ are:

\begin{aligned} E[\hat{p}] & = p & & & \text{sd}[\hat{p}] & = \sqrt{\frac{p(1-p)}{n}} \end{aligned}

### A comparison

Sample mean

$\bar{x} \sim N(\mu, \sigma^2/n)$

95% confidence interval

$\bar{x} \pm 2 \sigma/\sqrt{n}$

Since $$\sigma$$ is assumed to be known, we can calculate this.

Sample Proportion

$\hat{p} \sim N(p, p(1-p)/n)$

95% confidence interval

$\hat{p} \pm 2 \sqrt{p(1-p)/n}$

Oh no! $$p$$ is not known. What to do?

Replace $$p$$ with our best guess, $$\hat{p}$$

### Summary

Sample mean

$\bar{x} \sim N(\mu, \sigma^2/n)$

95% confidence interval

$\bar{x} \pm 2 \sigma/\sqrt{n}$

Change the 2 for different levels of confidence.

Sample Proportion

$\hat{p} \sim N(p, p(1-p)/n)$

95% confidence interval

$\hat{p} \pm 2 \sqrt{\hat{p}(1-\hat{p})/n}$

Change the 2 for different levels of confidence.

### An example

Tomorrow is election day. Go vote!

Consider a poll of 100 voters. If 60% of respondents say they will vote for Candidate A, what is the 95% confidence interval for the (population) proportion of people who will vote for Candidate A?

What are $$\hat{p}$$ and $$n$$?

$$\hat{p} = .6$$ and $$n = 100$$

\begin{aligned} \hat{p} \pm 2 \sqrt{\hat{p}(1-\hat{p})/n} & = .6 \pm 2 \sqrt{.6(1-.6)/100} \\ & = .6 \pm 0.098 \end{aligned}

### A further example

Tomorrow is election day. Go vote!

Consider a poll of 100 voters. If 60% of respondents say they will vote for Candidate A, what is the 99.7% confidence interval for the (population) proportion of people who will vote for Candidate A?

What are $$\hat{p}$$ and $$n$$?

$$\hat{p} = .6$$ and $$n = 100$$

\begin{aligned} \hat{p} \pm 3 \sqrt{\hat{p}(1-\hat{p})/n} & = .6 \pm 3 \sqrt{.6(1-.6)/100} \\ & = .6 \pm 0.147 \end{aligned}

### A hiccup for sample size planning

Sample mean 95% confidence interval

$\bar{x} \pm [2] \sigma/\sqrt{n}$

To plan a study with a margin of error $$m$$, you solve for $$n$$

$n = \left( \frac{[2] \cdot \sigma}{m} \right)^2$

Sample proportion 95% confidence interval

$\hat{p} \pm [2] \sqrt{\hat{p}(1-\hat{p})/n}$

To plan a study with a margin of error $$m$$, you solve for $$n$$

$n = \left( \frac{[2] \cdot \hat{p}(1-\hat{p})}{m} \right)^2$

We do not know $$\hat{p}$$!

### Pick the worst possible p-hat

A 95% confidence interval

$\hat{p} \pm [2] \sqrt{\hat{p}(1-\hat{p})/n}$

• The worst confidence interval, is the widest confidence interval
• Pick a value of $$\hat{p}$$ to maximize $$\sqrt{\hat{p}(1-\hat{p})/n}$$
• For $$0 \le \hat{p} \le 1$$, the function $$\sqrt{\hat{p}(1-\hat{p})/n}$$ is maximized by $\hat{p} = 1/2$

### Pick the worst possible p-hat

A 95% confidence interval

$\hat{p} \pm [2] \sqrt{\hat{p}(1-\hat{p})/n}$

A confidence interval that is wider than the 95% confidence interval

$\hat{p} \pm [2] \sqrt{.5(1-.5)/n}$

or, more simply,

$\hat{p} \pm [2] \cdot \left(0.5/\sqrt{n} \right)$

Use this wider interval for sample size calculations.

### Sample size planning summary

Sample mean 95% confidence interval

$\bar{x} \pm [2] \sigma/\sqrt{n}$

To plan a study with a margin of error $$m$$, you solve for $$n$$

$n = \left( \frac{[2] \cdot \sigma}{m} \right)^2$

Change the $$[2]$$ for different confidence.

Sample proportion 95% confidence interval

$\hat{p} \pm [2] \sqrt{\hat{p}(1-\hat{p})/n}$

To plan a study with a margin of error $$m$$, you solve for $$n$$

$n = \left( \frac{[2] \cdot 0.5}{m} \right)^2$

Change the $$[2]$$ for different confidence.

### Examples

What sample size of U.S. adults do you need, if you would like to estimate the proportion of U.S. adults who are “pro-choice” with a 2.5% margin of error (at the 95% level)?

(on board)

What sample size of U.S. adults do you need, if you would like to estimate the proportion of U.S. adults who are “pro-choice” with a 2.5% margin of error (at the 99.7% level)?

(on board)

Can I have two volunteers to read from a script?

(Time permitting.)

### A dialogue

Person: Can you give me an interval that will contain $$\mu$$ with 95% probability?

Stat F: Sure, I can give you a confidence interval. Over 100 replications of your experiment, 95% of the confidence intervals with contain $$\mu$$.

Person: Sorry, I misspoke. I want to perform one experiment, calculate an interval, and know that there is a 95% chance that $$\mu$$ is in that interval.

Stat F: [as a smarty-pants] Ah! I see that you have a common misconception of what a confidence interval is. Let me try to explain it again…

### A dialogue

Person: [interrupting, frustrated] No, no, no. I am smart person who understood what you said. It is, however, clear to me that you do not understand what I am asking …

[Enter Statistician B]

Stat B: Hi! I overheard you talking to my colleague that you want an interval that will contain $$\mu$$ with 95% probability?

Person: Yes! Finally, a statistician who understands what I am asking!

### A dialogue

Stat F: [freaked out] Wait! Stop! He is the devil!

Person: [confused] What, why?

Stat F: He will use calculus and computer programming!

Person: [confused stare] ?

Stat F: He will make you specify how you expect your experiment to turn out!

Person: [confused] Wouldn't you ask me the same questions for a sample size calculation?

### A dialogue

Stat F: Yes, but he will actually use your opinions instead of ignoring them like I do! Your result might not be objective! It might be influenced by what you believe!

Person: [confused stare] ?

Stat B: Ah, my dear colleague, but I will answer the question that is asked instead of insisting on answering a different question.

Stat B: We do not want what you are selling. Be gone!

[Stat F is so frustrated as to die an exaggerated theatrical death.]