# Statistics minor

?

## Sampling

Necessary if:

• The population is too large or a census is too expensive
• The sampling process is destructive

Features of a sample:

• Unbiased
• Representative of the population
• Data should be relevant
• Data should not be affected by the act of sampling

• Enables proper inference to be undertaken; the probability basis under which the sample was taken is known
1 of 4

## Discrete random variables

Expectation and variance:

• Var (X) = E (X^2) - E (X)
• E (a + bX) = a + bE (X)
• Var (a + bX) = b^2 Var (X)
• E (aX +/- bY) = aE (X) +/- bE (Y)
• Var (aX +/- bY) = a^2Var (X) + b^2Var (Y)

Conditions of Binomial distribution (also Geometric):

• Each trial results in one of two outcomes
• The probability of success is constant
• The trials are independent of each other

Conditions of Poisson distribution:

• Events occur randomly at a constant average rate, independently of each other

Po (lamda) + Po (mu) = Po (lamda + mu), if lamda and mu are independent

2 of 4

## Bivariate Data

Conditions of Pearson's product moment correlation coefficient:

• Data should be random on random
• Data should be from a bivariate Normal distribution
• If one of the distributions is skewed, bimodal etc. it is unlikely to be appropriate

Null hypothesis: There is no correlation between ... and ...

Spearman's rank null hypothesis: There is no association in the population

Spearman's vs Pearson's:

• Spearman's not appropriate if scatter diagram doesn't indicate a monotonic relationship (proportion)
• Ranking data loses information

Regression lines: residual = observed value - value from regression line

3 of 4

## Chi-squared tests

Putting information into categories loses information

Contingency tables:

• Null hypothesis: no association between ... and .../variables are independent
• Calculate degrees of freedom (minimum number of values to work out the rest)
• Table of expected values
• Chi-squared values table
• Sum these to find the Chi-squared value

Goodness-of-fit:

• Null hypothesis: The given model fits the data
• Calculate degrees of freedom
• Table of expected values - use model
• Chi-squared values
• Sum to find the Chi-squared value
4 of 4