One-Way Analysis of Variance

?
  • Created by: rosieevie
  • Created on: 09-01-18 11:59

ANOVA

ANOVA - look for an overall difference between mean scores in 2+ samples of a factor

  • Tests effect of one factor by comparing sample means relative to sizes of variation around its' sample mean

Analyses samples to test for evidence of difference between means in sample population by

  • Measuring variation in a continuous response variable in terms of sum squared deviations from sample means
  • Partitions variation into explained and unexplained (residual) components
  • Compares partitions - how many times more vairation explained by differences between samples than within samples

Sums of sqaures - allows variation in components to add up to whole variation and allows partitioning

  • Explained component = global means - sum of squared deviations of sample means 
  • Unexplained component = sample means - sum of squared deviations of variates

Components together account for total variation - obtained from sum of squared deviations of variantes from global mean

1 of 12

ANOVA 2

ANOVA = parametric test - assumes normal distribution of residuals (obvs) around sample mean and that data is normally distributed 

Advantages of parametric tests:

  • More powerful
  • Constructed from grand scheme = flexible to cope with
    • Incomplete data
    • Interactions between factors
    • Mixes of continuous and categorical factors
2 of 12

Parameters for Estimating the Population Mean

Variable - property that varies in measurable way between subjects in sample

Sample - collection of individual observations selected by procedure e.g. no. subjects

Sample mean `Y sum of all observations divided by 

  • Estimate of population mean - one of 2 parameters for defining normal distribution

Variance s2 - average of n squared deviations by mean in normally disributed population

  • Usually refers to sample - calculated as sum of squares/n-1

Sample standard deviation SD - describes dispersion of data about mean (= square root of variance) 

  • In large sample size - sample mean = populatiom mean and SD approaches population SD
  • Means 95% observations lie w/in 1.960 SD of mean

Normal distribution - bell-shaped frequency distribution of continuous variable (SD symmetrical from mean)

3 of 12

Parameters for Estimating the Population Mean 2

Standard error of mean SE - describes uncertainty due to sampling error in mean

  • Calculated by SD√n 
  • Gets smaller as sample size increases - approaches population mean

Confidence intervals for µsample means from repeated random samples of size n would have distribution approching normal for large n

SSerror - unexplained residual variation = squared deviations of each score from sample mean

µ - population mean

4 of 12

Calculating ANOVA

  • Raw data - single factor with two levels
  • Calculate sample means for each 'treatment'
  • Calculate grand mean Ḡ - mean of all data, regardless of treatment
  • Calculate squared deviations = (Sample mean - Grand mean)2
  • Calculate residual variation deviation (error) = (Sample mean - individual raw data)2
  • Calculate deviation total = (individual raw data - sample mean)2
  • Calculate sums of squares for explained and unexplained deviation = Add all values together for each squared deviations, error deviations and total deviations
  • Calculate total sums of squares = squared deviation ** + error deviations **
  • Calculate mean squares = sums of squares/d.f.
  • Calculate F  = mean sums squares devation/mean sums squares error deviation
  • Given an F value  = ratio of explained mean squares to unexplained mean squares

Large F ratio = differences between sample means account for much of variation of scores

If F value exceeds critical valye, conclusion that there is a signficant difference occurs

5 of 12

Partitioning the Sums of Squares

  • Y = sample mean
  • = global mean
  • Y-Ḡ = the deviation from the grand mean (explained component)
  • Y -`Y= deviation of the score from the mean for its sample (unexplained component)

Sum of squares (total variation) = (Y - Ḡ)2 `(Y - Ḡ)2 + ∑(Y -`Y)2

Vector used to describeb deviation of each score in terms of explained and residual sources of variation

  • Deviations of score plotted to sample mean on axis perpendicular to one describing deviation of global mean from sample mean
  • X-axis = explained component
  • Y-axis = unexplained component

Total deviation = resultent vector, resulting from combintion of these 2 independent sources of information

If squared deviation of from`Y is big compared to Y from`then total variation explained by differences between sample means - procedure adopted by ANOVA

6 of 12

Degrees of Freedom for ANOVA

Degrees of freedom report amount of replication dataset has:

  • The more replicate independent and random observations the better
  • Audience told how much replication there is through degrees of freedom

Degrees of freedom = no. bits information - no. information required to calculate variation

ANOVA have 2 sets of d.f. with F-ratio:

  • First number = explained component of variation: a - 1 (sample means/levels - grand mean)
  • Second number = residual component of variation = n - a (sample size - sample mean)

Report ANOVA results like this:

A depends on B (Fd.f. explained, d.f. residual, P<0.05)

F-value must be above critical value to be statistically significant

7 of 12

ANOVA Assumptions

4 assumptions:

  • Random sampling - all analyses
    • All observations taken at random
    • Avoids experimenter bias
  • Independence - all analyses
    • Consercutive observations are independent 
    • Residuals independently distributed around sample means - 1 score that deviates should not reveal how others do
    • Lack of independence = pseudoreplication
    • ANOVA on repeated measures possible but requires declaring individual as second factor - adds complications and assumptions (avoid if poss)
    • If not indpendent - either add as an extra factor or redesign data collection
  • Homogeneity of variances - specific to ANOVA and related parametric
    • All samples have same variation about means
    • Analysis can pertain to just finding differences between means - violation obscures differences
    • Large variances in data = transform
8 of 12

ANOVA Assumptions 2

  • Normalitiy - ANOVA and related parametric
    • Residuals normally distributed about sample means = normal disrtibution of errors
    • Symmetrical distribution of frequencies defined by mean and average squared devations

Residual - deviation of data point from sample mean

Should do vidual diagnostic tests to check assumptions - in R can do it with command plot(aov(y~x))

9 of 12

One-Way ANOVA in R

Model variation - variation accounted for by factor, measured in squared deviations of sample means from grand means

Residual variation - error variation not accounted for by factor, measured in squared deviations of observations from sample means (residuals)

10 of 12

Residuals Analysis

4 types of residuals plots (see if conclusions are beliveable):

  • Residuals vs Fitted - visual impression if variances are similar for samples
    • Plots sample means of two 'treatments' against residuals
    • Plots show deviation from sample means
    • Tests for homogenous sample variances
      • Funnel shapes indicate heterogeneity
    • Numbers refer to nth data point for those of largest magnitude
  • Normal Q-Q - tests for normality of residuals around sample mean
    • Look for data points lying randomly either side of diagonal
    • Data points bowed below = right skew (longer tail on right)
    • Data points bowed above = left skew (longer tail on left)
    • S-shape deviation (above at top and below at bottom) = flatter than normal distribution 
    • Z-shape deviation = more peaked than normal
  • Scale location 
    • Useful for regression analysis
    • Uses square root of standardised residuals
    • Tests homogeneity of variances against a measure of residual magnitude
    • Heterogeneity revealed in slope of line of best fit
11 of 12

Residuals Analysis 2 and Summary

  • Constant Leverage
    • Useful for regression analysis
    • Similar to residuals vs fitted
    • Tests homogeneity of variances with standardized results (residual/SD(residual)) on factor levels instead of factor means

t-test work on 2 levels while one-way ANOVA works on 2+ levels

  • Always continous response on y-axis and categorical factor on x-axis

Alwaus do one global analysis - never do several t-tests on same data set in the place of an ANOVA

12 of 12

Comments

No comments have yet been made

Similar Biology resources:

See all Biology resources »See all Statistics resources »