Choice of Tests
- Created by: rosieevie
- Created on: 08-01-18 21:52
The Process of Science
- Observe the world
- Develop theory, run exploratory simulations (develop hypotheses)
- Define statistical models to test theoretical predictions
- Run experiments
- Run statistical models
- Draw conclusions
Science needs breakthroughs at all these levels
Why do we need Statistics?
Need to infer existence of predictable patterns in space/time, based on samples from real world and given assumptions about nature of data
With stats you can interpret biological meaning, identify associations/patterns and infer causes of variation
With careless use of stats you will mess up data, confuse biological understanding, fail to see patterns and identify wrong causes
Data on behaviour, ecology and evolution need statistical treatment because variation occurs and individuals are inherently unpredictable
Simplest way to identify patterns and make predictions is to convincingly reject the null hypothesis that everything is random
Biological Knowledge and Contradictory Issues - TB
How best to reduce the prevalence of TB in cattle - disease hard to identify and causes farmers to kill entire herds to prevent spread
- Badgers sustain endemic infection of TB
- Transmit it to cattle
- 1975-1997 >20,000 badgers were culled on ad-hoc basis as part of British TB control policy = conflict between conservation and farmer groups
Nationwide experiment was completed to establish cause of TB outbreaks
TB and Badgers - 'Dry' Experiments
Simulation of alternatives to reactive culling conducted - determine if best method of prevention was extermination or vaccination of badgers
Effects on regional prevalence of TB in badgers following vaccinations of badgers - predicted to reduce herd breakdowns of TB
- Cheap
- Non-invasive
- No oral vaccine yet available
- Slow to take effect
Reactive culling, is already operational:
- Immediate effect
- Popular w/ farmers
- Expensive
- Invasive
- Untested consequences
TB and Badgers - 'Wet' Krebs Experiments
Suggested by John Krebs - leader of food standards agency
Investigate if badgers are reservoir for TB
National-scale field manipulation of badger numbers - test H1 = badgers cause TB in cattle
Reactive and no culling trial areas superimposed over 1998 testing intervals for cattle in areas with difference incidences of TB (annual testing throughout the trail)
Experimental manipulation - remove badgers in replicate regionsa and compare TB incidence to control sites
Model - TB = Region + Treatment + Region*Treatment
Threshold of probability was P<0.05
F-statistic of ANOVA used
TB and Badgers - Results
Significant effect of culling despite differences between regions in TB incidence
Culling increases incidence
- Destabilisation of territorial grounds surrounding culled sites
- Creates vaccum in social grouns
- Prompts long-distance movement of surviving badgers
- Outbreaks caused often by individual badgers entering cattle housing - better husbandry techniques required
Choosing Statistical Tests
3 main kinds of test:
- 1 sample of frequencies divided into classes
- Chi-squared or G-test
- Test for goodness of fit to theoretical distribution
- Test for iependence between two categories in a contingency table
- 1 sample measuring two variables on continous scales
- Correlation or regression
- Test for association between variables
- Test for cause and effect between variables
- Two or more samples measuring a continous response
- ANOVA
- Used when taking several samples which allows testing for causes of variation against different alternatives
Non-Parametric or Parametric Statistics
Non-parametric tests robust - rough but reliable estimate, work on data with unknown underlying distribution
Parametric in preference - more powerful and versatile
Limitations of non-parametric statistics:
- Test hypotheses but don't always give estimates for parameters of intrest
- Cannot test two-way interactions, or categorical combined with continuous effects
- Work in different ways, own quirks and no grand scheme
- In situations of moderate complexity - not always non-parametric stats available
Advantages of parametric statistics:
- More powerful - use actual data
- Flexible - cope with incomplete data and correlated effects
- Test two-way interactions and categorical combined with continous effects
- Built around single theme - ANOVA = grand scheme and single framework
Wilcoxon and Fisher
Frank Wilcoxon 1892-1965
- Invented Wilcoxon test
- Contributed to pyrethrin-based insecticide development
- Wanted simple and easier ways to test insecticide effectiveness
- Great human - greatness from diversity
Sir Ronald Fisher 1890-1962
- Invented experimental design and ANOVA
- Prof of Eugenics at UCL
- Facist tendencies but terrible eyesight prevented him acting on them
- Founded modern synthesis of Mendelian genetics with Darwinian evolution
- Great statistician - greatness from unity
Analysis of Variance
ANOVA uses statistical models, simplest of which is
Y = X + e
Variation along y-axis l Acounted for by l Variation along x-axis l Plus l Residual variation
Hypothesis
Use statistics to fit models to data
- Data are sacrosanct (sacred) - never fit data to models
- Do compare alternative models to test competing statistical hypotheses
Reject the N0 in favour of a test model - need a refutable null model and interesting test model
- Null model = no true pattern
- Hypotheses concern truth, not significance
Accept statistical model only on basis of rejecting simpler alternative w/ some acceptably small probability e.g. P<0.05
"A good hypothesis is a falsifiable hypothesis" - Karl Popper
Science proceeds by falsification of simple hypotheses in favour of more informative alternatives
Data Plotting
Always plot data first!
- Need to see what data looks like - tranformation potential
Can calibrate observed data from predicted data
Transforming Data
Transform data if necessary to meet parametric assumptions
- Not cheating
- Only use if it makes sense biologically
Example - growth to body weight can be plotted in inverse of body weight so there is a linear relation
Use correct tools for job:
- Excel = data management
- R = statistics
- R - language and environment for statistical computing and graphics
Why do we need stats?
- Apply statistics to samples to predict populations of a distributed variable
Related discussions on The Student Room
- LEIDEN 2022 - selection procedure (IRO) »
- AQA A Level Biology Paper 2 Unofficial Markscheme »
- A warning to psychology students, present and future. »
- A level maths hypothesis tests »
- GSK Future Leaders Programme 2023 »
- Statistics/data science degree apprenticeship help !!!!! Omg omg!!! Lord help me »
- Which is easiest to get A - Geography or History? »
- a level biology evaluation question »
- E&M at Oxford v.s. Economics at Cambridge »
- Test statistics in SPSS Custom tables »
Comments
No comments have yet been made