Nathan VanHoudnos

9/29/2014

- Homework comments
- Checkpoint #2 results
- Lecture 3 (covers pp. 34-47)

- 21 of you have already turned it in
- Rock on!

- Homework #2 will be released on Wednesday

- Homework comments
- Checkpoint #2 results
- Lecture 3 (covers pp. 34-47)

**PSA:**Max time on a checkpoint is 2 hoursAverage percent correct:

**82%**A little over ½ of you missed the same two questions

Here again are the boxplots showing annual incomes (in thousands of dollars) for households in two cities.

Which city has a greater percentage of households with annual incomes between $50,000 and $80,000?

Here again are the boxplots showing the real estate values of single family homes in 2 neighboring cities (in thousands of dollars).

Which city has a greater percentage of homes with real estate values between $55,000 and $85,000?

- Homework comments
- Checkpoint #2 results
- Lecture 3 (covers pp. 34-47)

Roles

**Explanatory**variables explain, predict, or affect the response (independent variable)The

**response**variables are the outcome (dependent variable)

Types

**Categorical**represent categories or labels**Quantitative**represent numerical measurements

Are the smoking habits of a person (yes, no) related to the person's gender?

- Gender: categorical explanatory
- Smoking habits: categorical response

Is there a relationship between and test scores on a particular standardized test?

- Gender: categorical explanatory
- Test score: quantitative response

How well can we predict a student's freshman year GPA from his/her SAT score?

- SAT score: quantitative explanatory
- GPA: quantitative response

Can you predict a person's favorite type of music based on his/her IQ?

- IQ: quantitative explanatory
- music: categorical response

Response | |||
---|---|---|---|

Categorical | Quantitative | ||

Explanatory | Categorical | \( C \rightarrow C \) | \( C \rightarrow Q \) |

Quantitative | \( Q \rightarrow C \) | \( Q \rightarrow Q \) | |

- \( Q \rightarrow C \)

**not**covered by introductory statistics**extremely**important to business applications (predict decision to buy)**requires post high school mathematics**- much of machine learning and “data science”

**Age by General Health**

Compare the distribution of the response \( Q \) for each category of the explanatory \( C \)

**Data display**

- side-by-side boxplots

**Numeric summaries**

- descriptive statistics

**Age by General Health**

**Example**

**Center:**The median poor respondent is 60 years old while the median excellent respondent is 40 years old.**Relative 5 number:**In fact, approximately ¾ of poor respondents are as old or older than the oldest ¼ of excellent respondents.

Are men and women just as likely to think their weight is about right?

**Two-way** table:

**Marginal distribution** of Body Image