Correlation and regression

?

Dependence and scatter plots

- Two or more variables are often measured on the same individuals.

- What relationship may there be between the variables?

    - Visualisation with scatter plots.

    - Independent variable on the x-axis

    - Dependent variable on the y-axis

    - One point per observation (x, y pair)

1 of 11

Relationship between two variables

Do they co-vary or co-relate?

More generally - does y vary as a function of x?

Direction: Do both variables move in the same direction?

               Do they move in opposite direction?

Degree of strength of the relationship?

   - Examples: Does memory decline with age?

                     Is birth weight related to IQ? 

2 of 11

Co-variance

- To look at the strength of relationship between the two variables

- The degree to which two variables vary together (in the same or opposite direction) - if x changes, does y change positively or negatively in the same or opposite direction to x?

- Sum of product of residuals divided by degrees of freedom.

- Cov (x,y) = sum (x - mean (x)) * (y- mean (y))) / n -1

3 of 11

Scaling the co-variance

- Raw covariance can be any value, cannot be easily interpreted.

- Value scales with standard deviation.

Solution

- Scale the covariance by the SD

- This yields correlation coefficient Pearson's R

 - Person product-moment correlation coeffecient

4 of 11

Correlation coeffecient

The correlation coefficient is a measure that determines the degree to which two variables' movements are associated.

The range of values for the correlation coefficient (R) is -1.0 to 1.0.

If a calculated correlation is greater than 1.0 or less than -1.0, a mistake has been made.

A correlation of -1.0 indicates a perfect negative correlation. - Increase is x associated with a linear decrease in y

While a correlation of 1.0 indicates a perfect positive correlation. - Increase in x associated with a linear increase in y.

R = 0 means there is no linear relationship between the variables.

5 of 11

Significance testing for correlations

-          Distance that r must be from 0 for a significant correlation between X and Y depends on sample size N

-          If X and Y are approximately normally distributed, t as calculated below follows a t-distribution with (n – 2) degrees of freedom.

-          Hence, probability of there not being a significant correlation can be calculated similar to t test.

-          Null hypothesis: true correlation is zero

-          H0: r = 0

-          Calculate the probability of obtaining observed correlation if the true correlation is zero

-          If p < alpha (0.05), we reject H0 an conclude that it is unlikely that the observed correlation is due to chance. Hence we conclude that there is a significant correlation between x and y.-          Distance that r must be from 0 for a significant correlation between X and Y depends on sample size N.

6 of 11

Reporting significant correlation results

a)       A significant correlation was found:

“There was a significant correlation between age and immediate recall scores on the Hopkins verbal learning test (r(146) = -0.41, p < 0.001.”

Include scatterplot with fitted regression line and same statistics information if possible.

7 of 11

Reporting non-significant correlation results

a)       No significant correlation was found (hypothetical values below):

“There was no significant correlation between age and immediate recall scores on the Hopkins verbal learning test (p = 0.26)

Don’t include scatterplot, or if so do not include fitted regression line.

8 of 11

The regression line

-Straight line (i.e ‘linear’ relationship)

-Defines the relationship between x and y 

-Add to scatterplot of significant correlations for clarity to show relationship

-Enables us to predict y from x 

The regression equation

-B: Slope of the regression line. Change in Y with a one-unit change in X

-A: Intercept. Point where the regression line crosses the y axis. Predicted value of Y when X = 0

9 of 11

No linear effects

-          R = 0 means no evidence of a linear relationship between X and Y, i.e. points lie on a straight line

-          There may still be a non-linear relationship

-          Non-linear regression needed, e.g. quadratic fit

10 of 11

Correlation and causation

-          A significant correlation (r far from 0) does not imply that a change in x causes a change in y!

-          At least two other possibilities:

1)      Change in y causes changes in x

2)      Both x and y are affected by other variable(s) (spurious correlation)

-          Only logic or intervention studies can establish such a causal link, not correlation alone.

o   E.g. age and memory: memory can’t drive age

-          Partial correlation to correct for underlying variables driving both x and y, if observed.

11 of 11

Comments

No comments have yet been made

Similar Psychology resources:

See all Psychology resources »See all Visual System resources »