Mean, Median and Modes
Mode = the most frequent occuring value. (aka.. modal value)
Median = the central value when arranged in ascending order
Mean = the average value (sum of all x's divided by n)
If the data consists of the whole population it is μ
Otherwise if the data is just a sample, then the mean is x̄
In grouped data, the mode is the modal class,
To estimate the mean we have to assume that each length is the mid-mark length of each class, e.g 100 -150 then the mid-mark is 125
Estimating median will be, dividing n/2 then counting up to see where that value lies in the table this will therefore be the median value. (May be a decimal).
Interquartile Range and Standard Deviation
The median is (n+1)/2
The lower quartile is the median of the lower half of data, ther upper quartile is the upper half of the data.
The interquartile range is Q3 - Q1 and covers 50% of the data.
Where x̄ is the sample mean.
The variance is the Standard Deviation squared. Cannot be used to measure spread as it is in inappropiate units!
Mutually Exclusive Events (seperate events which cannot happen together)
e.g, rolling a die.
P(A ∪ B) = P(A) + P(B)
A or B
Two independent events - probability of A happening given that B already has
P(A | B) = P(A)
Also, if the events are independent
P(A ∩ B) = P(A) x P(B)
The probability that A and B will happen (if they are independent)
Laws of Probability
The Addition Law:
This law does not depend on whether two events are mutually exclusive or not.
P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
The Multiplication Law:
P(A ∩ B) = P(A)P(B | A)
Probability is measured on a scale of 0 to 1
If a trial can result in one of n equally likely outcomes and an event consists of r of these, then the probability is r/n
There must be:
Fixed number of trials - n
Two outcomes for each trial
The probability of each outcome is the same for each trial
The trials are independent
P(r) = (nCr)(p^r)(1-p)^n-r
Variance = np(1-p)
Y ~ B (n,p)
Remember pascals triangle!
Practise using binomial tables in formulae book.
Normal distribution is a bell shaped curved.
Every graph must have a mean of 0 to work with the normal tables
Z = (x - μ)/ σ
Sigma being the standard deviation.
About 66% of all data is plus or minus 1
95% is plus or minus 2
99.5% is plus or minus 3
X~N (μ,σ2 ) = (Pop mean, variance)
The disribution of the means
σ = xσn = the SD for whole pop
s = xσn-1 = the SD for sample of pop
The central limit theorem states that if sample is over 30 (n) then the means are approximately normal in respect to their original distribution.
Also if the population from which the sample is taken from is normally distributed the means will also be normal distributed.
Confidence interval = ± (Z)/√ N
This is for when the x bar is the mean of a random sample of size N from a normal distribution with an unknown mean (mu) and a known standard deviation.
If a large sample is available:
- It can be used to provide a good estimate of the population standard deviation
- It is safe to assume the mean is normal distributed.
Product moment correlation coefficient (PMCC)
It is always true that -1 ≤ r ≤ +1
r = +1 .. all points lie on a line w/ postive gradient
r = -1 .. all points lie on a line w/ negative gradient
r = 0 .. no linear correlation between 2 sets of data
To use this you need ∑x² .. ∑y² .. ∑xy
Then it is entered into the equation (in AQA formulae book)
y = a+bx <- the line of y on x
Residuals are the distance between the point on the graph and the line of best fit.
Formula for regression is
(y - y(bar)) = (Sxy)/(Sxx) x (x - x̅ )
To plot this on a graph, (x̅,y(bar)) lies on the regression line,
When using x you need to work out ŷ (just use a calculator)
To find the equation for a and b,
- Put numbers off table in calculator, mode , stat, a+bx, reg, a (write value down) then do b
- AC either x or y = .. shift, stat, reg, x(hat)/y(hat)
Extrapolation = a prediction outside the set of data is unreliable (interpolation is the opposite).