Nathan VanHoudnos

11/3/2014

- Office hours
- Checkpoint comments
- OLI textbook comments
- Lecture 16 (covers pp. 171-187) wrap up
- Lecture 17 (covers pp. 188-191)

Based on the results of the WhenIsGood poll, Aaron and I will change our office hours.

- Nathan: Mondays 11-noon in my office
- Aaron: Mondays 3-4pm at the stat department

- They are on BlackBoard, not OLI
- Both are due Wednesday

- Checkpoint comments
- OLI textbook comments
- Lecture 16 (covers pp. 171-187) wrap up
- Lecture 17 (covers pp. 188-191)

On p. 179:

Interval estimation takes point estimation a step further and says something like:

“I am 95% confident that by using the point estimate \( \bar{x}=115 \) to estimate \( \mu \), I am off by no more than 3 IQ points. In other words, I am 95% confident that \( \mu \) is within 3 of 115, or between 112 (115 - 3) and 118 (115 + 3).”

Yet another way to say the same thing is: I am 95% confident that Î¼ is somewhere in (or covered by) the interval (112,118).

Many of these sound like “there is a 95% chance that…”

Interval estimation takes point estimation a step further and says something like:

“I am

95% confidentthat by using the point estimate \( \bar{x}=115 \) to estimate \( \mu \),I am off by no more than 3 IQ points. In other words, I am95% confident that \( \mu \) is within3 of 115, or between 112 (115 - 3) and 118 (115 + 3).”Yet another way to say the same thing is: I am

95% confident that \( \mu \) is somewhere in (or covered by) the interval(112,118).

When someone says

I am 95% confident that \( \mu \) is within…

They are using **confident** as a technical shorthand that means

I am confident that, in the future, over repeated experiments, 95% of my confidence intervals will capture \( \mu \).

Note that **confident** \( \ne \) **probabiilty**.

The book is **not wrong**, it is merely deeply confusing:

Interval estimation takes point estimation a step further and says something like:

“I am

95% confidentthat by using the point estimate \( \bar{x}=115 \) to estimate \( \mu \),I am off by no more than 3 IQ points. In other words, I am95% confident that \( \mu \) is within3 of 115, or between 112 (115 - 3) and 118 (115 + 3).”Yet another way to say the same thing is: I am

95% confident that \( \mu \) is somewhere in (or covered by) the interval(112,118).

**Deborah Mayo**

Philosopher of Science at Virginia Tech

Following Savage (1962), the probability that a parameter lies in a specific interval may be referred to as a measure of final precision. While a measure of final precision may seem desirable, and while confidence levels are often (wrongly) interpreted as providing such a measure, no such interpretation is warranted. Admittedly, such a misinterpretation is encouraged by the word “confidence”.

- Checkpoint comments
- OLI textbook comments
- Lecture 16 (covers pp. 171-187) wrap up
- Lecture 17 (covers pp. 188-191)

**Interval estimation**

- Estimating a population quantity with a statistic that is an
**interval**. - Two ways to estimate an interval:
**Option F: Confidence Intervals**: If we repeat the experiment over and over, 95% of intervals will contain the true value.**Option B: Credible Intervals**: The probability that the true value is contained within the interval is 95%.

In Stat 202, we will focus on confidence intervals.

If the data are normally distributed with an **unknown** mean \( \mu \) and a **known** standard deviation \( \sigma \), then, a 95% confidence interval for \( \mu \) is

- \( \bar{x} \pm 2 \sigma /\sqrt{n} \)

If the are distributed with an **unknown** mean \( \mu \) and a **known** standard deviation \( \sigma \), then, if \( n \ge 30 \), a 95% confidence interval is

- \( \bar{x} \pm 2 \sigma /\sqrt{n} \)

The newest variety of potato is the Normally-Distributed (ND) potato. Its weight is normally distribution with a standard deviation of 12 grams and an unknown mean.

If Dan selects a “1 lbs” bag with 9 potatoes in it, and the weight of the average potato in the bag is 52 grams, calculate a 95% confidence interval for the population mean.

(on board)

Recall that the weight of a Klamath Pearl has an unknown distribution, but we do know that the unknown distribution has a standard deviation of 10 grams.

If Dan selects a “1 lbs” bag with 9 (Klamath Pearl) potatoes in it, and the weight of the average potato in the bag is 52 grams, calculate a 95% confidence interval for the population mean.

**We can not: \( n=9 \).** The CLT does not apply until \( n \ge 30 \).

Let the Quipper potato have an unknown weight distribution, but we do know that the unknown distribution has a standard deviation of 14 grams.

If Dan selects a “5 lbs” bag with 49 (Quipper) potatoes in it, and the weight of the average potato in the bag is 52 grams, calculate a 95% confidence interval for the population mean.

(On board)

If you want a different confidence interval, you change the **2**.

A 68% confidence interval:

- \( \bar{x} \pm 1 \sigma /\sqrt{n} \)

A 95% confidence interval:

- \( \bar{x} \pm 2 \sigma /\sqrt{n} \)

A 99.7% confidence interval:

- \( \bar{x} \pm 3 \sigma /\sqrt{n} \)

**Note:** Higher levels of confidence have less precise intervals.

Let the Quipper potato have an unknown weight distribution, but we do know that the unknown distribution has a standard deviation of 14 grams.

If Dan selects a “5 lbs” bag with 49 (Quipper) potatoes in it, and the weight of the average potato in the bag is 52 grams, calculate 68%, 95%, and 99.7% confidence intervals for the population mean.

(On board)

If you want a different confidence interval, you change the **2**.

A 95% confidence interval:

- \( \bar{x} \pm 2 \sigma /\sqrt{n} \)

A \( (1-\alpha) \times 100\% \) confidence interval:

- \( \bar{x} \pm z_{\alpha/2} \cdot \sigma /\sqrt{n} \)

Where \( z_{\alpha/2} \) is the \( z \) score that **defines the appropriate tail of a standard normal**.

\( z_{\alpha/2} \) for a 68% confidence interval

```
qnorm((1-.68)/2)
```

```
[1] -0.9944579
```

\( z_{\alpha/2} \) for a 95% confidence interval

```
qnorm((1-.95)/2)
```

```
[1] -1.959964
```

**Note:** that the 68/95/99.7 rule is an approximation.

\( z_{\alpha/2} \) for a 99.7% confidence interval

```
qnorm((1-.997)/2)
```

```
[1] -2.967738
```

\( z_{\alpha/2} \) for a 99.99999% confidence interval

```
qnorm((1-.9999999)/2)
```

```
[1] -5.326724
```

**Definitions:**

- \( 2 \cdot z_{\alpha/2} \cdot \sigma /\sqrt{n} \) is the
**length** - \( z_{\alpha/2} \cdot \sigma /\sqrt{n} \) is the
**margin of error**.

For example, a 95% confidence interval is

- \( \bar{x} \pm 2 \sigma /\sqrt{n} \)
- its
**length**is \( 4 \sigma /\sqrt{n} \) - its
**margin of error**is \( 2 \sigma /\sqrt{n} \)

If you want to plan a study, you solve for \( n \)

IQ scores are known to vary normally with a standard deviation of 15. How many students should be sampled if we want to estimate the population mean IQ at 99.7% confidence with a margin of error equal to 5?

(on board)

If you want to plan a study with a margin of error \( m \), you solve for \( n \)

\[ n = \left( \frac{z_{\alpha/2} \cdot \sigma}{m} \right)^2 \]

and then round up to the nearest integer.

**Note:** Higher values of \( n \) give **more** precise intervals.

IQ scores are known to vary normally with a standard deviation of 15. How many students should be sampled if we want to estimate the population mean IQ at 99.7% confidence with a **length** equal to 8?

What is the margin of error for the formula?

The length is 8, the margin of error is 4.

If the data are normally distributed with an **unknownn** mean \( \mu \) and a **known** standard deviation \( \sigma \), then, a 95% confidence interval for \( \mu \) is

- \( \bar{x} \pm 2 \sigma /\sqrt{n} \)

If the are distributed with an **unknown** mean \( \mu \) and a **known** standard deviation \( \sigma \), then, if \( n \ge 30 \), a 95% confidence interval is

- \( \bar{x} \pm 2 \sigma /\sqrt{n} \)

**Note:** The book covers cases with unknown standard deviation. We will not cover this yet.

- Checkpoint comments
- OLI textbook comments
- Lecture 16 (covers pp. 171-187) wrap up
- Lecture 17 (covers pp. 188-191)

Lecture 16, but with **sample proportions**:

Recall if \[ \begin{aligned} np & \ge 10 & & \text{and} & n(1-p) \ge 10 \end{aligned} \]

then \( \hat{p} \) is approximately normal:

\[ \hat{p} \sim N\left( p, \frac{p(1-p)}{n} \right) \]

Where the mean and standard deviation of \( \hat{p} \) are:

\[ \begin{aligned} E[\hat{p}] & = p & & & \text{sd}[\hat{p}] & = \sqrt{\frac{p(1-p)}{n}} \end{aligned} \]

**Sample mean**

\[ \bar{x} \sim N(\mu, \sigma^2/n) \]

95% confidence interval

\[ \bar{x} \pm 2 \sigma/\sqrt{n} \]

Since \( \sigma \) is assumed to be known, we can calculate this.

**Sample Proportion**

\[ \hat{p} \sim N(p, p(1-p)/n) \]

95% confidence interval

\[ \hat{p} \pm 2 \sqrt{p(1-p)/n} \]

Oh no! \( p \) is **not known**. What to do?

Replace \( p \) with our best guess, \( \hat{p} \)

**Sample mean**

\[ \bar{x} \sim N(\mu, \sigma^2/n) \]

95% confidence interval

\[ \bar{x} \pm 2 \sigma/\sqrt{n} \]

Change the 2 for different levels of confidence.

**Sample Proportion**

\[ \hat{p} \sim N(p, p(1-p)/n) \]

95% confidence interval

\[ \hat{p} \pm 2 \sqrt{\hat{p}(1-\hat{p})/n} \]

Change the 2 for different levels of confidence.

Tomorrow is election day. Go vote!

Consider a poll of 100 voters. If 60% of respondents say they will vote for Candidate A, what is the 95% confidence interval for the (population) proportion of people who will vote for Candidate A?

What are \( \hat{p} \) and \( n \)?

\( \hat{p} = .6 \) and \( n = 100 \)

\[ \begin{aligned} \hat{p} \pm 2 \sqrt{\hat{p}(1-\hat{p})/n} & = .6 \pm 2 \sqrt{.6(1-.6)/100} \\ & = .6 \pm 0.098 \end{aligned} \]

Tomorrow is election day. Go vote!

Consider a poll of 100 voters. If 60% of respondents say they will vote for Candidate A, what is the 99.7% confidence interval for the (population) proportion of people who will vote for Candidate A?

What are \( \hat{p} \) and \( n \)?

\( \hat{p} = .6 \) and \( n = 100 \)

\[ \begin{aligned} \hat{p} \pm 3 \sqrt{\hat{p}(1-\hat{p})/n} & = .6 \pm 3 \sqrt{.6(1-.6)/100} \\ & = .6 \pm 0.147 \end{aligned} \]

**Sample mean**
95% confidence interval

\[ \bar{x} \pm [2] \sigma/\sqrt{n} \]

To plan a study with a margin of error \( m \), you solve for \( n \)

\[ n = \left( \frac{[2] \cdot \sigma}{m} \right)^2 \]

**Sample proportion**
95% confidence interval

\[ \hat{p} \pm [2] \sqrt{\hat{p}(1-\hat{p})/n} \]

To plan a study with a margin of error \( m \), you solve for \( n \)

\[ n = \left( \frac{[2] \cdot \hat{p}(1-\hat{p})}{m} \right)^2 \]

**We do not know** \( \hat{p} \)!

A 95% confidence interval

\[ \hat{p} \pm [2] \sqrt{\hat{p}(1-\hat{p})/n} \]

- The
**worst**confidence interval, is the**widest**confidence interval - Pick a value of \( \hat{p} \) to
**maximize**\( \sqrt{\hat{p}(1-\hat{p})/n} \) - For \( 0 \le \hat{p} \le 1 \), the function \( \sqrt{\hat{p}(1-\hat{p})/n} \) is maximized by \[ \hat{p} = 1/2 \]

A 95% confidence interval

\[ \hat{p} \pm [2] \sqrt{\hat{p}(1-\hat{p})/n} \]

A confidence interval that is **wider** than the 95% confidence interval

\[ \hat{p} \pm [2] \sqrt{.5(1-.5)/n} \]

or, more simply,

\[ \hat{p} \pm [2] \cdot \left(0.5/\sqrt{n} \right) \]

Use **this wider interval** for sample size calculations.

**Sample mean**
95% confidence interval

\[ \bar{x} \pm [2] \sigma/\sqrt{n} \]

To plan a study with a margin of error \( m \), you solve for \( n \)

\[ n = \left( \frac{[2] \cdot \sigma}{m} \right)^2 \]

Change the \( [2] \) for different confidence.

**Sample proportion**
95% confidence interval

\[ \hat{p} \pm [2] \sqrt{\hat{p}(1-\hat{p})/n} \]

To plan a study with a margin of error \( m \), you solve for \( n \)

\[ n = \left( \frac{[2] \cdot 0.5}{m} \right)^2 \]

Change the \( [2] \) for different confidence.

What sample size of U.S. adults do you need, if you would like to estimate the proportion of U.S. adults who are “pro-choice” with a 2.5% margin of error (at the 95% level)?

(on board)

What sample size of U.S. adults do you need, if you would like to estimate the proportion of U.S. adults who are “pro-choice” with a 2.5% margin of error (at the 99.7% level)?

(on board)

Can I have two volunteers to read from a script?

(Time permitting.)

Person: Can you give me an interval that will contain \( \mu \) with 95% probability?

Stat F: Sure, I can give you a confidence interval. Over 100 replications of your experiment, 95% of the confidence intervals with contain \( \mu \).

Person: Sorry, I misspoke. I want to perform one experiment, calculate an interval, and know that there is a 95% chance that \( \mu \) is in that interval.

Stat F: [as a smarty-pants] Ah! I see that you have a common misconception of what a confidence interval is. Let me try to explain it again…

Person: [interrupting, frustrated] No, no, no. I am smart person who understood what you said. It is, however, clear to me that you do not understand what I am asking …

[Enter Statistician B]

**Stat B**: Hi! I overheard you talking to my colleague that you want an interval that will contain \( \mu \) with 95% probability?

Person: Yes! Finally, a statistician who understands what I am asking!

Stat F: [freaked out] Wait! Stop! He is the devil!

Person: [confused] What, why?

Stat F: He will use **calculus** and **computer programming**!

Person: [confused stare] ?

Stat F: He will make you specify how you expect your experiment to turn out!

Person: [confused] Wouldn't you ask me the same questions for a sample size calculation?

Stat F: Yes, but he will actually use your opinions instead of ignoring them like I do! Your result might not be objective! It might be influenced by what you believe!

Person: [confused stare] ?

**Stat B:** Ah, my dear colleague, but I will answer the question that is asked instead of insisting on answering a different question.

**Stat B:** We do not want what you are selling. Be gone!

[Stat F is so frustrated as to die an exaggerated theatrical death.]