One-Way Analysis of Variance
- Created by: rosieevie
- Created on: 09-01-18 11:59
ANOVA
ANOVA - look for an overall difference between mean scores in 2+ samples of a factor
- Tests effect of one factor by comparing sample means relative to sizes of variation around its' sample mean
Analyses samples to test for evidence of difference between means in sample population by
- Measuring variation in a continuous response variable in terms of sum squared deviations from sample means
- Partitions variation into explained and unexplained (residual) components
- Compares partitions - how many times more vairation explained by differences between samples than within samples
Sums of sqaures - allows variation in components to add up to whole variation and allows partitioning
- Explained component = global means - sum of squared deviations of sample means
- Unexplained component = sample means - sum of squared deviations of variates
Components together account for total variation - obtained from sum of squared deviations of variantes from global mean
ANOVA 2
ANOVA = parametric test - assumes normal distribution of residuals (obvs) around sample mean and that data is normally distributed
Advantages of parametric tests:
- More powerful
- Constructed from grand scheme = flexible to cope with
- Incomplete data
- Interactions between factors
- Mixes of continuous and categorical factors
Parameters for Estimating the Population Mean
Variable - property that varies in measurable way between subjects in sample
Sample - collection of individual observations selected by procedure e.g. no. subjects
Sample mean `Y - sum of all observations divided by
- Estimate of population mean - one of 2 parameters for defining normal distribution
Variance s2 - average of n squared deviations by mean in normally disributed population
- Usually refers to sample - calculated as sum of squares/n-1
Sample standard deviation SD - describes dispersion of data about mean (= square root of variance)
- In large sample size - sample mean = populatiom mean and SD approaches population SD
- Means 95% observations lie w/in 1.960 SD of mean
Normal distribution - bell-shaped frequency distribution of continuous variable (SD symmetrical from mean)
Parameters for Estimating the Population Mean 2
Standard error of mean SE - describes uncertainty due to sampling error in mean
- Calculated by SD√n
- Gets smaller as sample size increases - approaches population mean
Confidence intervals for µ- sample means from repeated random samples of size n would have distribution approching normal for large n
SSerror - unexplained residual variation = squared deviations of each score from sample mean
µ - population mean
Calculating ANOVA
- Raw data - single factor with two levels
- Calculate sample means for each 'treatment'
- Calculate grand mean Ḡ - mean of all data, regardless of treatment
- Calculate squared deviations = (Sample mean - Grand mean)2
- Calculate residual variation deviation (error) = (Sample mean - individual raw data)2
- Calculate deviation total = (individual raw data - sample mean)2
- Calculate sums of squares for explained and unexplained deviation = Add all values together for each squared deviations, error deviations and total deviations
- Calculate total sums of squares = squared deviation ** + error deviations **
- Calculate mean squares = sums of squares/d.f.
- Calculate F = mean sums squares devation/mean sums squares error deviation
- Given an F value = ratio of explained mean squares to unexplained mean squares
Large F ratio = differences between sample means account for much of variation of scores
If F value exceeds critical valye, conclusion that there is a signficant difference occurs
Partitioning the Sums of Squares
- Y = sample mean
- Ḡ = global mean
- Y-Ḡ = the deviation from the grand mean (explained component)
- Y -`Y= deviation of the score from the mean for its sample (unexplained component)
Sum of squares (total variation) = ∑(Y - Ḡ)2= ∑`(Y - Ḡ)2 + ∑(Y -`Y)2
Vector used to describeb deviation of each score in terms of explained and residual sources of variation
- Deviations of score plotted to sample mean on axis perpendicular to one describing deviation of global mean from sample mean
- X-axis = explained component
- Y-axis = unexplained component
Total deviation = resultent vector, resulting from combintion of these 2 independent sources of information
If squared deviation of Ḡfrom`Y is big compared to Y from`Y then total variation explained by differences between sample means - procedure adopted by ANOVA
Degrees of Freedom for ANOVA
Degrees of freedom report amount of replication dataset has:
- The more replicate independent and random observations the better
- Audience told how much replication there is through degrees of freedom
Degrees of freedom = no. bits information - no. information required to calculate variation
ANOVA have 2 sets of d.f. with F-ratio:
- First number = explained component of variation: a - 1 (sample means/levels - grand mean)
- Second number = residual component of variation = n - a (sample size - sample mean)
Report ANOVA results like this:
A depends on B (Fd.f. explained, d.f. residual, P<0.05)
F-value must be above critical value to be statistically significant
ANOVA Assumptions
4 assumptions:
- Random sampling - all analyses
- All observations taken at random
- Avoids experimenter bias
- Independence - all analyses
- Consercutive observations are independent
- Residuals independently distributed around sample means - 1 score that deviates should not reveal how others do
- Lack of independence = pseudoreplication
- ANOVA on repeated measures possible but requires declaring individual as second factor - adds complications and assumptions (avoid if poss)
- If not indpendent - either add as an extra factor or redesign data collection
- Homogeneity of variances - specific to ANOVA and related parametric
- All samples have same variation about means
- Analysis can pertain to just finding differences between means - violation obscures differences
- Large variances in data = transform
ANOVA Assumptions 2
- Normalitiy - ANOVA and related parametric
- Residuals normally distributed about sample means = normal disrtibution of errors
- Symmetrical distribution of frequencies defined by mean and average squared devations
Residual - deviation of data point from sample mean
Should do vidual diagnostic tests to check assumptions - in R can do it with command plot(aov(y~x))
One-Way ANOVA in R
Model variation - variation accounted for by factor, measured in squared deviations of sample means from grand means
Residual variation - error variation not accounted for by factor, measured in squared deviations of observations from sample means (residuals)
Residuals Analysis
4 types of residuals plots (see if conclusions are beliveable):
- Residuals vs Fitted - visual impression if variances are similar for samples
- Plots sample means of two 'treatments' against residuals
- Plots show deviation from sample means
- Tests for homogenous sample variances
- Funnel shapes indicate heterogeneity
- Numbers refer to nth data point for those of largest magnitude
- Normal Q-Q - tests for normality of residuals around sample mean
- Look for data points lying randomly either side of diagonal
- Data points bowed below = right skew (longer tail on right)
- Data points bowed above = left skew (longer tail on left)
- S-shape deviation (above at top and below at bottom) = flatter than normal distribution
- Z-shape deviation = more peaked than normal
- Scale location
- Useful for regression analysis
- Uses square root of standardised residuals
- Tests homogeneity of variances against a measure of residual magnitude
- Heterogeneity revealed in slope of line of best fit
Residuals Analysis 2 and Summary
- Constant Leverage
- Useful for regression analysis
- Similar to residuals vs fitted
- Tests homogeneity of variances with standardized results (residual/SD(residual)) on factor levels instead of factor means
t-test work on 2 levels while one-way ANOVA works on 2+ levels
- Always continous response on y-axis and categorical factor on x-axis
Alwaus do one global analysis - never do several t-tests on same data set in the place of an ANOVA
Related discussions on The Student Room
- SPSS Assignment »
- Standard deviation larger than the mean. »
- Central Limit Theorem: How do you calculate the test statistic? »
- Academic literacy »
- Interviews Research »
- Help in statistics please »
- maths Edexcel a level- stats »
- A level maths statistics question »
- OCR A Level Further Mathematics MEI Statistics Minor Y432/01-16 Jun 2023 [Exam Chat] »
- Easy Maths modules at University »
Comments
No comments have yet been made