Statistics 1 Key Points

?
  • Created by: LShan261
  • Created on: 25-04-15 10:45

1: Mathematical models in probability and statisti

  • Mathematical model = simplification of a real world problem
  • Advantages
    • quick and easy to produce
    • simplify a more complex situation
    • improve our understanding of the real world as certain variables can readily be changed
    • enable predictions to be made about the future
  • Disadvantages
    • only give a partial description of the real world situation
    • only work for a restricted range of values
  • Method of production
    • real world problem observed
    • mathematical model devised
    • used to make predictions about expected behaviour of problem
    • experimental data collected from real world situation
    • predicted and observed compared
    • statistical tests used to assess success of the model
    • MM is refined to improve predicted and observation match --> repeat steps 2, 3, 5, 6, 7
1 of 9

2: Representation & summary of data - location

  • Continuous variable = variable w/ any value in a given range
  • Discrete variable = variable w/ only specific value in a given range
  • grouped frequency distribution --> classes and their related class frequencies
  • Mode = the value that occurs most often
  • Median = middle value when data is in order --> discrete date = n/2 --> corresponding term IF whole then find mid point --> Continuous = interpolation
  • Mean = ∑x/n or ∑fx/∑f
  • Mean of combined set of A and B is mean = n1x1 + n2x2/n1+n2 --> x = mean of data
  • Coding to make numbers easier
2 of 9

3: Representation & summary of data - measures of

  • Range = highest - lowest
  • Quartiles --> Q1, Q2, Q3, Q4 --> Q1 = n/4, 
  • Discrete --> Q3 = 3n/4 --> corresponding term
  • Continuous --> Q1 and Q3 --> interpolation
  • Interquartile range = Q3-Q1
  • Percentiles --> split data into 100 parts
    • xth percentile, Px --> xn/100th term
    • n% to m% interpercentile range --> Pm - Pn
  • Deviation of observation x from the mean --> x-mean
    • Variance = ∑(x-mean)^2/n = ∑x^2/n - (∑x/n)^2
    • Standard deviation = √Variance
  • variance&standard deviation for frequency table & grouped frequency distribution --> x=midpoint & f=frequency & n=∑f
    • Variance = ∑f(x-mean)^2/∑f or ∑fx^2/∑f - (∑fx/∑f)^2
  • Coding --> Standard deviation of coded data - either X by what you / divided by or opposite
3 of 9

4: Representation of data

  • Stem and leaf
    • order & present data
    • shape of data & enables quartiles to be found
    • 2 sets can be compared back-to-back
  • Outlier = extreme value
  • Box Plots
    • quartiles, maximum & minimum values and any outliers
    • compare 2 sets of data
  • Histogram
    • continuous and summarised in a group frequency distribution
  • Skewness
    • Quartiles
    • shape from box plots
    • measures of location
    • formula --> 3(mean-median)/standard deviation
4 of 9

5: Probability

  • P(event A OR event B OR both) = P(A u B)
  • P(events A and B) = P(A n B)
  • P(not event A) = P(A')
  • Complementary probability
    • P(A') = 1 - P(A)
  • Addition Rule
    • P(A u B) = P(A) + P(B) - P(A n B)
  • Conditional probability
    • P(A given B) = P(A/B) = P(A n B)/P(B)
  • Multiplication rule
    • P(A n B) = P(A/B) X P(B) or P(B/A) X P(A)
  • A and B are independent if
    • P(A/B) = P(A) or P(B/A) = P(B) or P(A n B) = P(A) X P(B)
  • A and B are mutually exclusive if
    • P(A n B) = 0
5 of 9

6: Correlation

  • +ve correlation --> increase left - right --> 1st & 3rd quadrants
  • -ve correlation --> decrease left - right --> 2nd & 4th quadrants
  • no correlation --> ALL quadrants
  • Sxx = ∑(x - mean)^2 = ∑x^2 - (∑x)^2/n
  • Syy = ∑(y - mean)^2 = ∑y^2 - (∑y)^2/n
  • Sxy = ∑(x-mean)(y-mean) = ∑xy - ∑x∑y/n
  • r = Sxy/√SxxSyy
  • r = measure of linear correlation
    • r = 1 --> perfect +ve linear correlation
    • r = -1 --> perfect -ve correlation
    • r = 0 --> no linear correlation
  • Coding
    • doesn't affect r
    • rewrite x --> p=x-a/b
    • rewrite y --> q = y-c/d
6 of 9

7: Regression

  • If y= a +bx --> a = y intercept & b = gradient of the line
  • Independant (or explanatory) variable
    • one that is set independantly of the other variable
  • Dependant (or response) variable
    • one whose values are decided by the valuables of the independant variable
  • Equation of the regression line
    • y = a + bx --> b = Sxy/Sxx & a =ymean -b(xmean)
  • Coding
    • sub codes into answer
  • Interpolation
    • estimating the value of a dependant variable within the range of the data
  • Extrapolation
    • estimating the value outside the range of the data
    • can be unreliable
7 of 9

8: Discrete random variables

  • random variable X
    • x is a particular value of X
    • P(X = x) or p(x) = probability that X is = to a particular value x
  • discrete random variable
    • ∑P(X = x) = 1
  • Cumulative frequency distribution
    • F(x) = P(X ≤ x)
  • Expected value of X
    • E(X) = ∑xP(X = x)
  • Variance of X
    • Var(X) = E(X^2) -[E(X)]^2
  • E(aX + b) = aE(X) + b
  • Var(aX + b) = a^2Var(X)
  • Conditions for a discrete uniform distribution
    • a discrete random variable X is defined over a set of n distinct values
    • Each value is equally likely
  • Discrete uniform distribution X over the values 1, 2, 3, ..., n
    • E(X) = n + 1/ 2
    • Var(X) = (n+1)(n-1)/12
8 of 9

9: Normal distribution

  • The random variable X that has a normal distribution w/ mean & standard deviation is represented by
    • X ∼ N(μ, σ2
  • If X ∼ N(μ, σ2) and Z ∼ N(0, 1^2) then
    • (http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Probability/Z-score1.png)
    • Tables of Z are given in formula book
9 of 9

Comments

No comments have yet been made

Similar Statistics resources:

See all Statistics resources »See all Statistics 1 resources »