# Statistics S2 Revision

Main concepts of S2 to hopefully help with answering the some of the fiddly 1-2 mark S2 questions that focus on knowing some of the theory.

## The Normal Distribution: Features

• Has a trademark bell-shape to it
• Continuous distribution so has a curved function as opposed to binomial/poisson whose functions resemble a bar-chart.
• Symmetrical about the mean value. This means that P(Z<a)= 1-P(Z>a)
• The parameter of variance determines how stretched out it is.

The Standard Normal distribution:

Z ~ N(0, 1)

To standardise if X ~ N(mean, variance):

Z = (X - mean)/sqrt(variance)

1 of 11

## Conditions for the Normal Distribution

Many real-life variables/probabilities can be modelled by a Normal Distribution.

The main condition to remember for exams is that values of X are concentrated about the mean value and that there shouldn't be any extreme values.

Examples of things that can be modelled closely by a Normal distribution are height, marks on exams, errors in measurements etc

2 of 11

## Normal as an Approximation to the Binomial

Some binomial distributions that are not tabulated can be modelled by a Normal Distribution provided certain conditions are satisfied.

If X ~ B(n, p)

It can be said that X ~ N(np, np(1-p) ) if np > 5 and n(1-p) > 5 (and usually n > 20)

However, since the Binomial is a discrete distribution but the normal is continuous, a continuity correction is needed, by adding or subtracting 1/2 from the value of X.

This means changing the definition of "X" to be "anything that rounds to give X". eg P(X < 5) changes to P(X < 4.5) and P(X </= 5) (less than or equal to) changes to P(X </= 5.5)

3 of 11

## The Poisson Distribution

The Poisson distribution is used if you know the average number of times an event occurs within a certain time period. E.g. The number of car accidents in a certain town in a week.

Conditions:

The events occur at a constant average rate.

The events occur randomly.

The events occur independently of each other.

Sometimes a question will ask you why a condition is unlikely to hold within a given context.

Example: The number of breakdowns on a road during a day. The "constant average rate" condition is unlikely to hold here since there will probably be more breakdowns at the rush hour periods.

4 of 11

## Poisson as an Approximation to the Binomial

A binomial distribution X ~ B(n, p) can be approximated by a Poisson distribution X ~ Po(np) if these conditions hold:

np < 5, n > 50

5 of 11

## Normal Approximate to the Poisson

If X ~ Po(lambda) with lambda > 15 approximately, a Poisson distribution can be approximated by a normal distribution:

X ~ N(lambda, lambda)

It is necessary to use a continuity correction with this conversion, see the slide 'Normal as an Approximation to the Binomial'

6 of 11

## Central Limit Theorem

The Central Limit Theorem states that for a sample of size n, if n is sufficiently large, (the generally accepted number is n > 30) then the distribution of the mean will be approximately normal.

We use the CLT only if X does not follow a normal distribution.

If X does follow a normal distribution, then the mean of X is always normally distributed without us needing to use the CLT.

7 of 11

## Sampling

Sampling is an important process for carrying out tests where it is impossible/too time consuming to use the whole population. Therefore it is important that samples are random and unbiased.

Using Random Number Tables to select a sample:

1. Obtain a list of the population required. (eg. electoral roll, register of students, subscribers list for a newspaper etc)

2. Assign a number to each person/object. Use a number of digits appropriate to the size of population, eg if selecting from 100-999 students, use 3 digit numbers. Let item 1 = 001 etc.

3. Use random number tables to read off numbers and the items to which they correspond.

Waste: Sometimes to minimise waste (numbers to which no items correspond) it is possible to assign more than one number to each item, eg if choosing from 500 students you could assign both number 001 and 501 to student 1 etc.

8 of 11

## Unbiased Estimators

When using a sample with size 'n', sum of x: 'p' and the sum of x squared: 'q', it is true to say that for the whole population:

E(x) = p/n

But the variance of the whole population is biased. The unbiased estimator for the whole population's variance is:

S^2 = n/(n-1){ q - q/n}

9 of 11

## Hypotheses Testing

A hypothesis test is carries out on the null hypothesis (what was believed to be true before) and the alternative hypothesis (what is believed to be true according to a change).

A one-tailed test occurs where there is an explicit mention of an in inequality eg Test if X has increased/decreased.

A two tailed test occurs when the question asks if there has been a change ie neutral.

The Critical Region of a test is the values of the test statistic for which the null hypothesis is rejected.

When concluding a test, either use "Reject null hypothesis" or "Do not reject", rather than using "Accept". The conclusion should make some reference to the context, eg the sample provides significant evidence to suggest x has changed.

Remember to use continuity corrections for if you convert Binomial/Poisson to Normal.

10 of 11

## Type I and II Errors

A Type I Error occurs when the null hypothesis is wrongly rejected.

A Type II Error occurs when the null hypothesis is wrongly accepted.

The probability of getting a type I error is inversely proportional to the probability of a type II error ie in increasing one, the other decreases.

For a Normal Distribution, the Probability of getting a Type I error is the same as the significance level.

Type II errors are more likely as the true value of the mean gets closer to the mean given in the null hypothesis.

11 of 11