Correlation and regression
- Created by: Sam_dearnx
- Created on: 24-01-17 20:09
Dependence and scatter plots
- Two or more variables are often measured on the same individuals.
- What relationship may there be between the variables?
- Visualisation with scatter plots.
- Independent variable on the x-axis
- Dependent variable on the y-axis
- One point per observation (x, y pair)
Relationship between two variables
Do they co-vary or co-relate?
More generally - does y vary as a function of x?
Direction: Do both variables move in the same direction?
Do they move in opposite direction?
Degree of strength of the relationship?
- Examples: Does memory decline with age?
Is birth weight related to IQ?
Co-variance
- To look at the strength of relationship between the two variables
- The degree to which two variables vary together (in the same or opposite direction) - if x changes, does y change positively or negatively in the same or opposite direction to x?
- Sum of product of residuals divided by degrees of freedom.
- Cov (x,y) = sum (x - mean (x)) * (y- mean (y))) / n -1
Scaling the co-variance
- Raw covariance can be any value, cannot be easily interpreted.
- Value scales with standard deviation.
Solution
- Scale the covariance by the SD
- This yields correlation coefficient Pearson's R
- Person product-moment correlation coeffecient
Correlation coeffecient
The correlation coefficient is a measure that determines the degree to which two variables' movements are associated.
The range of values for the correlation coefficient (R) is -1.0 to 1.0.
If a calculated correlation is greater than 1.0 or less than -1.0, a mistake has been made.
A correlation of -1.0 indicates a perfect negative correlation. - Increase is x associated with a linear decrease in y
While a correlation of 1.0 indicates a perfect positive correlation. - Increase in x associated with a linear increase in y.
R = 0 means there is no linear relationship between the variables.
Significance testing for correlations
- Distance that r must be from 0 for a significant correlation between X and Y depends on sample size N
- If X and Y are approximately normally distributed, t as calculated below follows a t-distribution with (n – 2) degrees of freedom.
- Hence, probability of there not being a significant correlation can be calculated similar to t test.
- Null hypothesis: true correlation is zero
- H0: r = 0
- Calculate the probability of obtaining observed correlation if the true correlation is zero
- If p < alpha (0.05), we reject H0 an conclude that it is unlikely that the observed correlation is due to chance. Hence we conclude that there is a significant correlation between x and y.- Distance that r must be from 0 for a significant correlation between X and Y depends on sample size N.
Reporting significant correlation results
a) A significant correlation was found:
“There was a significant correlation between age and immediate recall scores on the Hopkins verbal learning test (r(146) = -0.41, p < 0.001.”
Include scatterplot with fitted regression line and same statistics information if possible.
Reporting non-significant correlation results
a) No significant correlation was found (hypothetical values below):
“There was no significant correlation between age and immediate recall scores on the Hopkins verbal learning test (p = 0.26)
Don’t include scatterplot, or if so do not include fitted regression line.
The regression line
-Straight line (i.e ‘linear’ relationship)
-Defines the relationship between x and y
-Add to scatterplot of significant correlations for clarity to show relationship
-Enables us to predict y from x
The regression equation
-B: Slope of the regression line. Change in Y with a one-unit change in X
-A: Intercept. Point where the regression line crosses the y axis. Predicted value of Y when X = 0
No linear effects
- R = 0 means no evidence of a linear relationship between X and Y, i.e. points lie on a straight line
- There may still be a non-linear relationship
- Non-linear regression needed, e.g. quadratic fit
Correlation and causation
- A significant correlation (r far from 0) does not imply that a change in x causes a change in y!
- At least two other possibilities:
1) Change in y causes changes in x
2) Both x and y are affected by other variable(s) (spurious correlation)
- Only logic or intervention studies can establish such a causal link, not correlation alone.
o E.g. age and memory: memory can’t drive age
- Partial correlation to correct for underlying variables driving both x and y, if observed.
Related discussions on The Student Room
- Causality and regression »
- Interviews Research »
- Academic literacy »
- What are the 5 concepts of business intelligence? »
- Large Data Set A level Maths »
- MORSE Warwick »
- Politics and Economics at LSE vs pure Economics at Edinburgh »
- Goldsmiths MSc Forensic Psychology interview presentation »
- Should university grades be switched to a pass/fail system for all degrees? »
- As maths »
Comments
No comments have yet been made