Introduction to statistics notes

- interval/ratio data = normally distributed
- there are three methods to describe data

  1. central tendency (mean, median, mode)
  2. variability
  3. graphics

- three levels of measurement match the three methods of describing data to form a matrix

  1. nominal - numbers have no meaning (makes no sense to calculate a mean for gender)
  2. ordinal
  3. interval/ratio

Statistics method/levels matrix:

Central tendency Variability Graphics
Nominal (categories: male/female) mode; "the most frequently occurring value was..." what are the groups? what is the frequency of each category? bar chart
Ordinal (rank) median; could also use mode, but not mean range; "the rankings range from one to 200" histogram for range of big value pool; bar chart for frequency of small number values
Interval/ratio mean; could also use mode and median but are not very likely to do so standard deviation (measure of distance from the mean); variance (standard deviation squared) histogram (is usually based as core/background of normal distribution curve)

What is correlation?
How one changes with the other. ex: Pearson's correlation coefficient

Descriptive practice using NELS-88 data
standardize the following:

  • mother education: ordinal, median, range, frequency table, bar chart
  • comprehensive race: mode, frequency table, bar chart
  • reading comprehension score: mean, standard deviation, histogram

- In SPSS, click Analyze -> descriptive statistics -> frequencies -> move mother education to the right & move move comprehensive race to the right
- For mother ed and race, keep "Display frequency tables" checked. Select median and mode in the statistics button, and bar chart with frequencies in the charts button.
- For reading score, uncheck the "Display frequency tables" checkbox; in statistics button select mean, standard deviation, skewness, and kurtosis; and in charts button select histogram

* with continuous data like test scores, generating a frequency table or bar chart is a waste of time
** with continuous data like test scores, always generate skewness and kurtosis