Statistics 1 Key Points

1: Mathematical models in probability and statisti

Mathematical model = simplification of a real world problem
Advantages
- quick and easy to produce
- simplify a more complex situation
- improve our understanding of the real world as certain variables can readily be changed
- enable predictions to be made about the future
Disadvantages
- only give a partial description of the real world situation
- only work for a restricted range of values
Method of production
- real world problem observed
- mathematical model devised
- used to make predictions about expected behaviour of problem
- experimental data collected from real world situation
- predicted and observed compared
- statistical tests used to assess success of the model
- MM is refined to improve predicted and observation match --> repeat steps 2, 3, 5, 6, 7

1 of 9

2: Representation & summary of data - location

Continuous variable = variable w/ any value in a given range
Discrete variable = variable w/ only specific value in a given range
grouped frequency distribution --> classes and their related class frequencies
Mode = the value that occurs most often
Median = middle value when data is in order --> discrete date = n/2 --> corresponding term IF whole then find mid point --> Continuous = interpolation
Mean = ∑x/n or ∑fx/∑f
Mean of combined set of A and B is mean = n1x1 + n2x2/n1+n2 --> x = mean of data
Coding to make numbers easier

2 of 9

3: Representation & summary of data - measures of

Range = highest - lowest
Quartiles --> Q1, Q2, Q3, Q4 --> Q1 = n/4,
Discrete --> Q3 = 3n/4 --> corresponding term
Continuous --> Q1 and Q3 --> interpolation
Interquartile range = Q3-Q1
Percentiles --> split data into 100 parts
- xth percentile, Px --> xn/100th term
- n% to m% interpercentile range --> Pm - Pn
Deviation of observation x from the mean --> x-mean
- Variance = ∑(x-mean)^2/n = ∑x^2/n - (∑x/n)^2
- Standard deviation = √Variance
variance&standard deviation for frequency table & grouped frequency distribution --> x=midpoint & f=frequency & n=∑f
- Variance = ∑f(x-mean)^2/∑f or ∑fx^2/∑f - (∑fx/∑f)^2
Coding --> Standard deviation of coded data - either X by what you / divided by or opposite

3 of 9

4: Representation of data

Stem and leaf
- order & present data
- shape of data & enables quartiles to be found
- 2 sets can be compared back-to-back
Outlier = extreme value
Box Plots
- quartiles, maximum & minimum values and any outliers
- compare 2 sets of data
Histogram
- continuous and summarised in a group frequency distribution
Skewness
- Quartiles
- shape from box plots
- measures of location
- formula --> 3(mean-median)/standard deviation

4 of 9

5: Probability

P(event A OR event B OR both) = P(A u B)
P(events A and B) = P(A n B)
P(not event A) = P(A')
Complementary probability
- P(A') = 1 - P(A)
Addition Rule
- P(A u B) = P(A) + P(B) - P(A n B)
Conditional probability
- P(A given B) = P(A/B) = P(A n B)/P(B)
Multiplication rule
- P(A n B) = P(A/B) X P(B) or P(B/A) X P(A)
A and B are independent if
- P(A/B) = P(A) or P(B/A) = P(B) or P(A n B) = P(A) X P(B)
A and B are mutually exclusive if
- P(A n B) = 0

5 of 9

6: Correlation

+ve correlation --> increase left - right --> 1st & 3rd quadrants
-ve correlation --> decrease left - right --> 2nd & 4th quadrants
no correlation --> ALL quadrants
Sxx = ∑(x - mean)^2 = ∑x^2 - (∑x)^2/n
Syy = ∑(y - mean)^2 = ∑y^2 - (∑y)^2/n
Sxy = ∑(x-mean)(y-mean) = ∑xy - ∑x∑y/n
r = Sxy/√SxxSyy
r = measure of linear correlation
- r = 1 --> perfect +ve linear correlation
- r = -1 --> perfect -ve correlation
- r = 0 --> no linear correlation
Coding
- doesn't affect r
- rewrite x --> p=x-a/b
- rewrite y --> q = y-c/d

6 of 9

7: Regression

If y= a +bx --> a = y intercept & b = gradient of the line
Independant (or explanatory) variable
- one that is set independantly of the other variable
Dependant (or response) variable
- one whose values are decided by the valuables of the independant variable
Equation of the regression line
- y = a + bx --> b = Sxy/Sxx & a =ymean -b(xmean)
Coding
- sub codes into answer
Interpolation
- estimating the value of a dependant variable within the range of the data
Extrapolation
- estimating the value outside the range of the data
- can be unreliable

7 of 9

8: Discrete random variables

random variable X
- x is a particular value of X
- P(X = x) or p(x) = probability that X is = to a particular value x
discrete random variable
- ∑P(X = x) = 1
Cumulative frequency distribution
- F(x) = P(X ≤ x)
Expected value of X
- E(X) = ∑xP(X = x)
Variance of X
- Var(X) = E(X^2) -[E(X)]^2
E(aX + b) = aE(X) + b
Var(aX + b) = a^2Var(X)
Conditions for a discrete uniform distribution
- a discrete random variable X is defined over a set of n distinct values
- Each value is equally likely
Discrete uniform distribution X over the values 1, 2, 3, ..., n
- E(X) = n + 1/ 2
- Var(X) = (n+1)(n-1)/12

8 of 9

9: Normal distribution

The random variable X that has a normal distribution w/ mean & standard deviation is represented by
- X ∼ N(μ, σ2)
If X ∼ N(μ, σ2) and Z ∼ N(0, 1^2) then
- Tables of Z are given in formula book

9 of 9

Get Revising

Statistics 1 Key Points

1: Mathematical models in probability and statisti

2: Representation & summary of data - location

3: Representation & summary of data - measures of

4: Representation of data

5: Probability

6: Correlation

7: Regression

8: Discrete random variables

9: Normal distribution

Comments

Similar Statistics resources:

1: Mathematical models in probability and statisti

2: Representation & summary of data - location

3: Representation & summary of data - measures of

4: Representation of data

5: Probability

6: Correlation

7: Regression

8: Discrete random variables

9: Normal distribution

Comments

Related discussions on The Student Room

Similar Statistics resources: