# Statistics 1 Key Points

• Created by: LShan261
• Created on: 25-04-15 10:45

## 1: Mathematical models in probability and statisti

• Mathematical model = simplification of a real world problem
• quick and easy to produce
• simplify a more complex situation
• improve our understanding of the real world as certain variables can readily be changed
• enable predictions to be made about the future
• only give a partial description of the real world situation
• only work for a restricted range of values
• Method of production
• real world problem observed
• mathematical model devised
• used to make predictions about expected behaviour of problem
• experimental data collected from real world situation
• predicted and observed compared
• statistical tests used to assess success of the model
• MM is refined to improve predicted and observation match --> repeat steps 2, 3, 5, 6, 7
1 of 9

## 2: Representation & summary of data - location

• Continuous variable = variable w/ any value in a given range
• Discrete variable = variable w/ only specific value in a given range
• grouped frequency distribution --> classes and their related class frequencies
• Mode = the value that occurs most often
• Median = middle value when data is in order --> discrete date = n/2 --> corresponding term IF whole then find mid point --> Continuous = interpolation
• Mean = ∑x/n or ∑fx/∑f
• Mean of combined set of A and B is mean = n1x1 + n2x2/n1+n2 --> x = mean of data
• Coding to make numbers easier
2 of 9

## 3: Representation & summary of data - measures of

• Range = highest - lowest
• Quartiles --> Q1, Q2, Q3, Q4 --> Q1 = n/4,
• Discrete --> Q3 = 3n/4 --> corresponding term
• Continuous --> Q1 and Q3 --> interpolation
• Interquartile range = Q3-Q1
• Percentiles --> split data into 100 parts
• xth percentile, Px --> xn/100th term
• n% to m% interpercentile range --> Pm - Pn
• Deviation of observation x from the mean --> x-mean
• Variance = ∑(x-mean)^2/n = ∑x^2/n - (∑x/n)^2
• Standard deviation = √Variance
• variance&standard deviation for frequency table & grouped frequency distribution --> x=midpoint & f=frequency & n=∑f
• Variance = ∑f(x-mean)^2/∑f or ∑fx^2/∑f - (∑fx/∑f)^2
• Coding --> Standard deviation of coded data - either X by what you / divided by or opposite
3 of 9

## 4: Representation of data

• Stem and leaf
• order & present data
• shape of data & enables quartiles to be found
• 2 sets can be compared back-to-back
• Outlier = extreme value
• Box Plots
• quartiles, maximum & minimum values and any outliers
• compare 2 sets of data
• Histogram
• continuous and summarised in a group frequency distribution
• Skewness
• Quartiles
• shape from box plots
• measures of location
• formula --> 3(mean-median)/standard deviation
4 of 9

## 5: Probability

• P(event A OR event B OR both) = P(A u B)
• P(events A and B) = P(A n B)
• P(not event A) = P(A')
• Complementary probability
• P(A') = 1 - P(A)
• P(A u B) = P(A) + P(B) - P(A n B)
• Conditional probability
• P(A given B) = P(A/B) = P(A n B)/P(B)
• Multiplication rule
• P(A n B) = P(A/B) X P(B) or P(B/A) X P(A)
• A and B are independent if
• P(A/B) = P(A) or P(B/A) = P(B) or P(A n B) = P(A) X P(B)
• A and B are mutually exclusive if
• P(A n B) = 0
5 of 9

## 6: Correlation

• +ve correlation --> increase left - right --> 1st & 3rd quadrants
• -ve correlation --> decrease left - right --> 2nd & 4th quadrants
• no correlation --> ALL quadrants
• Sxx = ∑(x - mean)^2 = ∑x^2 - (∑x)^2/n
• Syy = ∑(y - mean)^2 = ∑y^2 - (∑y)^2/n
• Sxy = ∑(x-mean)(y-mean) = ∑xy - ∑x∑y/n
• r = Sxy/√SxxSyy
• r = measure of linear correlation
• r = 1 --> perfect +ve linear correlation
• r = -1 --> perfect -ve correlation
• r = 0 --> no linear correlation
• Coding
• doesn't affect r
• rewrite x --> p=x-a/b
• rewrite y --> q = y-c/d
6 of 9

## 7: Regression

• If y= a +bx --> a = y intercept & b = gradient of the line
• Independant (or explanatory) variable
• one that is set independantly of the other variable
• Dependant (or response) variable
• one whose values are decided by the valuables of the independant variable
• Equation of the regression line
• y = a + bx --> b = Sxy/Sxx & a =ymean -b(xmean)
• Coding
• sub codes into answer
• Interpolation
• estimating the value of a dependant variable within the range of the data
• Extrapolation
• estimating the value outside the range of the data
• can be unreliable
7 of 9

## 8: Discrete random variables

• random variable X
• x is a particular value of X
• P(X = x) or p(x) = probability that X is = to a particular value x
• discrete random variable
• ∑P(X = x) = 1
• Cumulative frequency distribution
• F(x) = P(X ≤ x)
• Expected value of X
• E(X) = ∑xP(X = x)
• Variance of X
• Var(X) = E(X^2) -[E(X)]^2
• E(aX + b) = aE(X) + b
• Var(aX + b) = a^2Var(X)
• Conditions for a discrete uniform distribution
• a discrete random variable X is defined over a set of n distinct values
• Each value is equally likely
• Discrete uniform distribution X over the values 1, 2, 3, ..., n
• E(X) = n + 1/ 2
• Var(X) = (n+1)(n-1)/12
8 of 9

## 9: Normal distribution

• The random variable X that has a normal distribution w/ mean & standard deviation is represented by
• X ∼ N(μ, σ2
• If X ∼ N(μ, σ2) and Z ∼ N(0, 1^2) then
• Tables of Z are given in formula book
9 of 9