Stat 202: Lecture 1 (covers pp. 1-21)

Nathan VanHoudnos
9/24/2014

Statistics 202 - Welcome

  • Nathan VanHoudnos (van-HOD-ness)
  • Order of business today:
    • Go over the syllabus (10 minutes)
    • Lecture proper (30 minutes)
    • Orientation to course software (10 minutes)

The Big Picture of Statistics 202

a

BRFSS: Behavioral Risk Factor Surveillance System

  • is an annual telephone survey of people living the United states.
  • seeks to identify individual-level behavioral risk factors associated with premature mortality and morbidity.
    • e.g. cigarette smoking, alcohol use …
  • collected 491,773 records in 2013.
  • further info / full data available from the CDC.
  • a 20,000 record is subset available from Lab 1 of the OpenIntro Statistics course.

Definitions

  • Data: pieces of information organized into variables

    • Individuals: particular persons or objects
    • Variables: particular characteristics of individuals
  • Two Types of Variables

    • Quantitative represent numerical measurements
      • e.g. Age, Weight, and Height
    • Categorical represent categories or labels
      • e.g. Race, Gender, and Level of Education

Categorical or Quantitative?

  • Would you say that in general your health is Excellent, Very Good, Good, Fair or Poor ?
    • Categorical
  • Have you smoked at least 100 cigarettes in your entire life?
    • Categorical
  • On average, how many hours of sleep do you get in a 24-hour period?
    • Quantitative
  • What is the ZIP Code where you live?
    • Categorical

BRFSS Example Data

genhlth age gender height weight wtdesire
good 45 f 64.00 120 110
good 77 m 70.00 175 175
very good 32 f 61.00 115 105
excellent 44 f 67.00 190 165
poor 52 f 63.00 142 120
very good 27 f 64.00 105 120
very good 18 f 63.00 140 130
very good 46 f 64.00 148 120
fair 27 f 60.00 115 115
very good 31 m 71.00 194 185

Summarizing Categorical Variables

  • genhlth Would you say that in general your health is excellent, very good, good, fair, or poor?
    • A table of raw counts:
      excellent very good good fair poor Sum
      4,657 6,972 5,675 2,019 677 20,000
    • A table of proportions:
      excellent very good good fair poor Sum
      0.23 0.35 0.28 0.10 0.03 1.00

Visualizing Categorical Data

plot of chunk unnamed-chunk-6

Mode: most common value that occurs.

What is the modal response?

  • “Very Good”

That was harder than it needed to be…

Visualizing Categorical Data

plot of chunk unnamed-chunk-8

plot of chunk unnamed-chunk-9

BRFSS Example Data

genhlth age gender height weight wtdesire
good 45 f 64.00 120 110
good 77 m 70.00 175 175
very good 32 f 61.00 115 105
excellent 44 f 67.00 190 165
poor 52 f 63.00 142 120
very good 27 f 64.00 105 120
very good 18 f 63.00 140 130
very good 46 f 64.00 148 120
fair 27 f 60.00 115 115
very good 31 m 71.00 194 185

Visualizing Quantitative Variables

Ranges Frequency
45-50 4
50-55 18
55-60 871
60-65 6,464
65-70 7,899
70-75 4,399
75-80 337
80-85 7
85-90 0
90-95 1

Histogram: analog of a bar plot plot of chunk unnamed-chunk-12

Interpreting Quantitative Variables

Histogram plot of chunk unnamed-chunk-13

  1. Describe the shape
  2. Describe the center
  3. Describe the spread
  4. Note deviations from the pattern

Shapes of Quantitative Variables

Symmetric: balanced on both sides