# Statistics 1 Key Points

5.0 / 5

HideShow resource information

- Created by: LShan261
- Created on: 25-04-15 10:45

## 1: Mathematical models in probability and statisti

- Mathematical model = simplification of a real world problem
- Advantages
- quick and easy to produce
- simplify a more complex situation
- improve our understanding of the real world as certain variables can readily be changed
- enable predictions to be made about the future

- Disadvantages
- only give a partial description of the real world situation
- only work for a restricted range of values

- Method of production
- real world problem observed
- mathematical model devised
- used to make predictions about expected behaviour of problem
- experimental data collected from real world situation
- predicted and observed compared
- statistical tests used to assess success of the model
- MM is refined to improve predicted and observation match --> repeat steps 2, 3, 5, 6, 7

1 of 9

## 2: Representation & summary of data - location

- Continuous variable = variable w/ any value in a given range
- Discrete variable = variable w/ only specific value in a given range
- grouped frequency distribution --> classes and their related class frequencies
- Mode = the value that occurs most often
- Median = middle value when data is in order --> discrete date = n/2 --> corresponding term IF whole then find mid point --> Continuous = interpolation
- Mean = ∑x/n or ∑fx/∑f
- Mean of combined set of A and B is mean = n1x1 + n2x2/n1+n2 --> x = mean of data
- Coding to make numbers easier

2 of 9

## 3: Representation & summary of data - measures of

- Range = highest - lowest
- Quartiles --> Q1, Q2, Q3, Q4 --> Q1 = n/4,
- Discrete --> Q3 = 3n/4 --> corresponding term
- Continuous --> Q1 and Q3 --> interpolation
- Interquartile range = Q3-Q1
- Percentiles --> split data into 100 parts
- xth percentile, Px --> xn/100th term
- n% to m% interpercentile range --> Pm - Pn

- Deviation of observation x from the mean --> x-mean
- Variance = ∑(x-mean)^2/n = ∑x^2/n - (∑x/n)^2
- Standard deviation = √Variance

- variance&standard deviation for frequency table & grouped frequency distribution --> x=midpoint & f=frequency & n=∑f
- Variance = ∑f(x-mean)^2/∑f or ∑fx^2/∑f - (∑fx/∑f)^2

- Coding --> Standard deviation of coded data - either X by what you / divided by or opposite

3 of 9

## 4: Representation of data

- Stem and leaf
- order & present data
- shape of data & enables quartiles to be found
- 2 sets can be compared back-to-back

- Outlier = extreme value
- Box Plots
- quartiles, maximum & minimum values and any outliers
- compare 2 sets of data

- Histogram
- continuous and summarised in a group frequency distribution

- Skewness
- Quartiles
- shape from box plots
- measures of location
- formula --> 3(mean-median)/standard deviation

4 of 9

## 5: Probability

- P(event A
**OR**event B OR both) = P(A u B) - P(events A
**and**B) = P(A n B) - P(not event A) = P(A')
- Complementary probability
- P(A') = 1 - P(A)

- Addition Rule
- P(A u B) = P(A) + P(B) - P(A n B)

- Conditional probability
- P(A given B) = P(A/B) = P(A n B)/P(B)

- Multiplication rule
- P(A n B) = P(A/B) X P(B) or P(B/A) X P(A)

- A and B are independent if
- P(A/B) = P(A) or P(B/A) = P(B) or P(A n B) = P(A) X P(B)

- A and B are mutually exclusive if
- P(A n B) = 0

5 of 9

## 6: Correlation

- +ve correlation --> increase left - right --> 1st & 3rd quadrants
- -ve correlation --> decrease left - right --> 2nd & 4th quadrants
- no correlation --> ALL quadrants
- Sxx = ∑(x - mean)^2 = ∑x^2 - (∑x)^2/n
- Syy = ∑(y - mean)^2 = ∑y^2 - (∑y)^2/n
- Sxy = ∑(x-mean)(y-mean) = ∑xy - ∑x∑y/n
- r = Sxy/√SxxSyy
- r = measure of linear correlation
- r = 1 --> perfect +ve linear correlation
- r = -1 --> perfect -ve correlation
- r = 0 --> no linear correlation

- Coding
- doesn't affect r
- rewrite x --> p=x-a/b
- rewrite y --> q = y-c/d

6 of 9

## 7: Regression

- If y= a +bx --> a = y intercept & b = gradient of the line
- Independant (or explanatory) variable
- one that is set independantly of the other variable

- Dependant (or response) variable
- one whose values are decided by the valuables of the independant variable

- Equation of the regression line
- y = a + bx --> b = Sxy/Sxx & a =ymean -b(xmean)

- Coding
- sub codes into answer

- Interpolation
- estimating the value of a dependant variable within the range of the data

- Extrapolation
- estimating the value outside the range of the data
- can be unreliable

7 of 9

## 8: Discrete random variables

- random variable X
- x is a particular value of X
- P(X = x) or p(x) = probability that X is = to a particular value x

- discrete random variable
- ∑P(X = x) = 1

- Cumulative frequency distribution
- F(x) = P(X ≤ x)

- Expected value of X
- E(X) = ∑xP(X = x)

- Variance of X
- Var(X) = E(X^2) -[E(X)]^2

- E(aX + b) = aE(X) + b
- Var(aX + b) = a^2Var(X)
- Conditions for a discrete uniform distribution
- a discrete random variable X is defined over a set of n distinct values
- Each value is equally likely

- Discrete uniform distribution X over the values 1, 2, 3, ..., n
- E(X) = n + 1/ 2
- Var(X) = (n+1)(n-1)/12

8 of 9

## 9: Normal distribution

- The random variable X that has a normal distribution w/ mean & standard deviation is represented by
- X ∼ N(μ, σ2)

- If X ∼ N(μ, σ2) and Z ∼ N(0, 1^2) then
- Tables of Z are given in formula book

9 of 9

## Similar Statistics resources:

0.0 / 5

0.0 / 5

## Comments

No comments have yet been made