# Statistics S2 Revision

Main concepts of S2 to hopefully help with answering the some of the fiddly 1-2 mark S2 questions that focus on knowing some of the theory.

- Created by: Naomi Arnold
- Created on: 12-04-14 22:41

## The Normal Distribution: Features

- Has a trademark
**bell-shape**to it **Continuous**distribution so has a**curved function**as opposed to binomial/poisson whose functions resemble a bar-chart.**Symmetrical**about the**mean value**. This means that**P(Z<a)= 1-P(Z>a)**- The parameter of
**variance**determines how stretched out it is.

The Standard Normal distribution:

Z ~ N(0, 1)

To standardise if X ~ N(mean, variance):

Z = (X - mean)/sqrt(variance)

## Conditions for the Normal Distribution

Many real-life variables/probabilities can be modelled by a Normal Distribution.

The main condition to remember for exams is that values of X are **concentrated about the mean** value and that there shouldn't be any extreme values.

Examples of things that can be modelled closely by a Normal distribution are height, marks on exams, errors in measurements etc

## Normal as an Approximation to the Binomial

Some binomial distributions that are not tabulated can be modelled by a Normal Distribution provided certain conditions are satisfied.

If X ~ B(n, p)

It can be said that **X ~ N(np, np(1-p) )** if np > 5 and n(1-p) > 5 (and usually n > 20)

However, since the Binomial is a discrete distribution but the normal is continuous, a **continuity correction** is needed, by adding or subtracting 1/2 from the value of X.

This means changing the definition of "X" to be "anything that rounds to give X". eg P(X < 5) changes to P(X < 4.5) and P(X </= 5) (less than or equal to) changes to P(X </= 5.5)

## The Poisson Distribution

The Poisson distribution is used if you know the average number of times an event occurs within a certain time period. E.g. The number of car accidents in a certain town in a week.

**Conditions:**

The events occur at a **constant average rate.**

The events occur **randomly.**

The events occur **independently** of each other.

Sometimes a question will ask you why a condition is unlikely to hold within a given context.

**Example:** The number of breakdowns on a road during a day. The "constant average rate" condition is unlikely to hold here since there will probably be more breakdowns at the rush hour periods.

## Poisson as an Approximation to the Binomial

A binomial distribution X ~ B(n, p) can be approximated by a Poisson distribution X ~ Po(np) if these conditions hold:

**np < 5, n > 50**

## Normal Approximate to the Poisson

If X ~ Po(lambda) with lambda > 15 approximately, a Poisson distribution can be approximated by a normal distribution:

X ~ N(lambda, lambda)

It is necessary to use a continuity correction with this conversion, see the slide 'Normal as an Approximation to the Binomial'

## Central Limit Theorem

The Central Limit Theorem states that for a sample of size n, if n is **sufficiently large**, (the generally accepted number is n > 30) then the distribution of the mean will be approximately normal.

We use the CLT only if X **does not follow a normal distribution**.

If X **does follow** a normal distribution, then the mean of X is **always normally distributed** without us needing to use the CLT.

## Sampling

Sampling is an important process for carrying out tests where it is impossible/too time consuming to use the whole population. Therefore it is important that samples are random and unbiased.

Using Random Number Tables to select a sample:

1. Obtain a list of the population required. (eg. electoral roll, register of students, subscribers list for a newspaper etc)

2. Assign a number to each person/object. Use a number of digits **appropriate to the size of population**, eg if selecting from 100-999 students, use 3 digit numbers. Let item 1 = 001 etc.

3. Use random number tables to read off numbers and the items to which they correspond.

Waste: Sometimes to minimise waste (**numbers to which no items correspond**) it is possible to assign more than one number to each item, eg if choosing from 500 students you could assign both number 001 and 501 to student 1 etc.

## Unbiased Estimators

When using a sample with size 'n', sum of x: 'p' and the sum of x squared: 'q', it is true to say that for the whole population:

E(x) = p/n

But the variance of the whole population is biased. The unbiased estimator for the whole population's variance is:

S^2 = n/(n-1){ q - q/n}

## Hypotheses Testing

A hypothesis test is carries out on the null hypothesis (what was believed to be true before) and the alternative hypothesis (what is believed to be true according to a change).

A **one-tailed test** occurs where there is an explicit mention of an in inequality eg Test if X has increased/decreased.

A **two tailed test** occurs when the question asks if there has been a change ie neutral.

The **Critical Region** of a test is the values of the test statistic for which the null hypothesis is rejected.

When concluding a test, either use "Reject null hypothesis" or "Do not reject", rather than using "Accept". The conclusion should make some reference to the context, eg the sample provides significant evidence to suggest x has changed.

Remember to use continuity corrections for if you convert Binomial/Poisson to Normal.

## Type I and II Errors

A Type I Error occurs when the null hypothesis is **wrongly rejected.**

A Type II Error occurs when the null hypothesis is **wrongly accepted.**

The probability of getting a type I error is **inversely proportional** to the probability of a type II error ie in **increasing one, the other decreases**.

For a **Normal Distribution**, the Probability of getting a Type I error is the same as the significance level.

Type II errors are more likely as the true value of the mean gets **closer** to the mean given in the null hypothesis.

## Comments

No comments have yet been made