Data-Based Models, How to Analyse Data and Which Test to Use
- Created by: rosieevie
- Created on: 11-01-18 15:27
Data-Based Models
Statistical packages like R work by fitting models to data
- Require you to use an appropriate model for samples and variables under investigation before they estimate parameter values that best fit data
Standard convention for presenting statistical models - response variable(s) = explanatory variable(s)
- = sign is statment of hypothesised relationship between variables
Chosen statistic quantifies the relationship of response variable to explanatory variables
3 main types of data:
- One variable, one sample - chi-squared, G-test, Kolmogorov-Smirnov
- Two variables, one sample
- Categorical responses (contingency tables) - chi-squared, G-test for independence
- Continous response and predictor - linear regression/correlation
- One or more predictors, two or more samples - ANOVA or GLM
One Variable, One Sample
Look for goodness-of-fit frequencies (observed compared to expected)
- Chi-squared or G-test of association
- For continuous data, use Kolmogorov-Smirnov
Assumptions:
- Data are nominal (not continous)
- Frequencies are independent from each other
- No cell has expected values <5
Two Variables, One Sample - Categorical Responses
For data of this kind, look for a dependent relationship between variables
Contingency tables used to look for interaction between variables
- Ch-squared or G-test
- For cells with expected values <5, use Fisher's exact test
Model formula: colour:behaviour ~ response
Assumptions:
- Categorical data
- Frequencies independent
- No cell with expected values <5 (if not Fisher's exact test)
- Correction for continuity
Two Variables, One Sample - Continuous Response an
Plot response variable on y-axis and explanatory variable on x-axis
Linear regression should be used
If no clear functional relationship, use correlation to calculate r
Mdel formula: Response ~ Explanatory
Assumptions:
- Random sampling
- Independent errors
- Homogeneity of variances
- Normal distribution of errors
- Linearity
If variance increases with response there is no linearity and data must be transformed
One-Way Classification of Two+ Samples - 1 Categor
Look for a difference between sample means
With one categorical predictor:
- t-test for two groups
- ANOVA for more than two groups
- Repeated measures ANOVA for repeated measures on subjects
- Transform data that violate asumptions
- Kruskal Wallis for non-parametric ANOVA
- Mann-Whitney for non-parametric t-test
Assumptions:
- Random sampling
- Independent errors
- Homogeneity of variances
- Normal distribution of errors
Model: Response ~ Explanatory
Selecting and Fitting Models to Data
R offers alternative commands for ANOVA
- aov suits mode straightforward analyses with normally distributed residuals
- glm = General Linear Model - accomodate ANOVA on data with inherently non-normal distributions e.g. proportions (binomial) or frequencies of rare events (Poisson)
One-Way Classification of Two+ Samples - 2 Continu
Look for differences between regression slops
ANOVA should be used with regression analysis on different slopes
Model formula: Response ~ Explanatory 1 + Explanatory 2 + Explanatory 1:Explanatory 2
Assumptions:
- Random sampling
- Independent errors
- Homogeneity of variances
- Normal distribution of errors
- Linearity
If regression plot shows two lines cross over = interaction between variables
Two-way Classification of Samples
Look for two-way differences between means
ANOVA or GLM (in non-normal error structures) should be used
Model formula: Response ~ Explanatory 1 + Explanatory 2 + Explanatory 1:Explanatory 2
Assumptions:
- Random sampling
- Independent errors
- Homogeneity of variances
- Normal distribution of errors
If data is unbalanced (samples have different numbers in them) use a GLM
Calculating Degrees of Freedom - Chi-squared
Method depends entirely on test statistic
d.f. = no. pieces of information had - no. required to calculate variation
Chi-squared test:
- Theoretical distributions n - 2 (usually)
- n = no. cateogries for explanatory variable
- 2 OR no. bits information needed to calculate expected distribution
- Contingency table = (c -1) x (r -1)
- c = no. columns
- r = no. rows
Calculating Degrees of Freedom - ANOVA/Linear Regr
ANOVA:
- Test = a - 1
- a = no. sample means
- Error = n - a
- n = no. observations
- a = no. sample means
Linear regression:
- Test = 1
- (Slope and intercept) 2 - 1 grand mean
- Error = n - 2
- n = sample size
- 2 = slope and intercept
Experimental Theory
- Define test hypothesis
- Identify model components
- Response
- Explanatory factor and levels
- Sampling unit
- Population samples
- Define model
- Degrees of freedom
- Collect data
- Input to R
- Run model and check assumptions
Meeting Model Assumptions
Always plot data first to check it meets model assumptions
Significance tells nothing about size or precision of effect
For all analyses:
- Significance (p-value) - identifies evidence of pattern
- Effect size (difference between sample means/regression slope) - gives magnitude
- Error bars/coefficient of determination (r) - gives precision
Shape of pattern depends on parameters
Theoretical mathematical models - used to work out how to transform data
- Use biology of species to help understanding
Once collected, data only suits one model - R can run any model on data
- Each model produces a unique set of results pertinent to particular design
- Only one model will represent experiment design - must know what it is before collecting data
Which Test to Use?
Seek difference between averages of 2+ samples
- Parametric ANOVA
- Parametric t-test for two samples
- Non-parametric Kruskal-Wallis
- Non-parametric Mann-Whitney U for two samples
Identify trends between two continuous variables in 1+ samples
- Parametric regression
- Polynomial regressin on non-linear data
- Parametric Pearson product-moment correlation on data that you're not looking for regression with
- Non-parametric Spearman's rank for correlation
Identify a relation between frequencies in categorical classes of one sample
- Chi-square/G-test on frequencies
- Any expected frequencies <3, pool classes or Fisher exact test
Related discussions on The Student Room
- Academic literacy »
- A-Level Geography NEA »
- Interviews Research »
- Applied Science Unit 3 investigation skills »
- Stata - dependent binary variable »
- AQA A LEVEL PSYCHOLOGY Statistical tests »
- Chances of getting onto top MSc Statistics program from BSc Economics »
- Help with Statistical Test! »
- Oxford PPL vs Cambridge PBS? »
- is my epq question too broad? »
Comments
No comments have yet been made