# PSYC214 - Weeks 11-17 - Correlations and Regressions

- Correlation and partial correlation
- Linear regression
- Multiple regression
- Hierarchical and Stepwise regression
- Interactions and Polynomial regression analyses
- Logistic Regression
- Rank Data

- Created by: Collette Richardson
- Created on: 26-04-12 14:02

## Correlation and Partial Correlation

Correlations: how two variables are related.

Scattergrams plot two values for each case in the analysis as a single point located from the two axes.

**Z scores:** makes it easier to directly compare the values of one variable to another

Mean = 0, Standard deviation = 1.

Pearson Product Moment Correlation*:* rxy = total(zx.zy)/n-1

**r²**= proportion of variance in 1 variable that can be predicted if the other is known

**df** = n - 2 (n= #'s of pairs of data)

## Correlation and Partial Correlation: Inflation and

**Inflation**: correlations will be inflated if they correlate in the same direction

- inflated by outliers with very high scores of both variables
- if only high and low scores are analysed

**Suppression:** two variables covary, both contribute to the depenant varuavkes, but they do so in opposite directions

*Inflation and suppression are features of how variables interact in the world*

## Partial Correlation

*used when experimental control is not possible, statistical techniques remove the effect of one or more variables from the correlation, clarifying underlying relationships.*

**df** = n - 2 ( - 1 for every variable removed)

## Linear Regression

*Predicts scores on one variable from the other.* It fits a predicted line to the data points, for predicting x from y.

*Calculates the relationship between the y and the x values.*

Formula for line of best fit: **y = mx + c**m is the slope:

**m = y/x,**c is the

**constant**

Linear regression formula: **y' = mx + c
y'** = the predicted value of y

Method of least squares: fits the best straight line to the data, calcylates the lowest values for the sum of the squared deviation of the y scores from the line.

Straight line: linear regression

Curved line: polynomial regression

## Multiple Regression

Best prediction of a dependant variable from a set of independant variables.

- estimate the relative importance of your variables
- control for some variables
- predict scores combining variables

Predicts the dependant variable y, using more than one independant variable

**Multiple R:** correlation of the value predict by the multiple regression equation

For the best estimate use as few variables as possible & as many cases as possible.

**R²:** the amount of variance in the dependant variable predicted by the equation

**df =** k - 1, n = # of cases, k = # of independant variables

**Adjusted R²:** estimate of the r² if the same regression formula was used for a new set of ppts, lower than r²

## Hierarchial Regression

- can control for variables distorting the results
- one or more variables added to the regression at a time

once added and control, effect of the remaining variable can be evaluated

Stage 1: select the dependant variable, then the *independant variable to control*Stage 2: bring in the variables you are studying

**R² change:** significant: if additional variable significantly improve the prediction

## Stepwise Regression

Allows you to identify the minimum set of independant variables that together significantly predict the dependant variables.

- variables are entered into the regression one by one

program selects ordering of the variables

1st variable: highest correlation with the dependant variable

once selected its effect is semi-partialled out,

If variables are not significantly contributing to the regression there effects are removed.

- -ve, give a minimal set of predictors, but underlying relationships may be more complicated

-ve, do not assume causal relationships, you can get chance effects

## Interactions

**Interactions:** effect of one variable may be different at different levels of the others

Significant: If the R² change is significiant when the product is entered

**Standard Error of Mean:** standard deviation of the distribution of means in the population from which you samples

- indicate accuracy of your findings
- low SEMs are desirable
- use z scores to lower standard errors, usually high

**SE(R²):** standard deviation of the distribution of R²

**SE(B):** standard deviation of the distribution of B

## Polynomial Regression Analyses

Find the best fit to curved data as well as straight line (linear) fits.

Quadratic equation: **y = mx² + mx + c**Cubic equation:

**y = mx³ + mx² + mx + c**

For data first try and fit a linear equation

If this is insignificant, a quadratic equation may be a better fit, and so on.

## Logistic Regression

Combines categorical data analysis with ANOVA. To calculate expected frequencies in categorical data multiplication is used, whereas addition is used in ANOVA. Conversion to log-scores allows the additive process.

- can be used to predict dependant variables with 3+ categories
- used for modelling binary outcome measures
- regressed onto explanatory measures: continuous and categorical

**Proportions and Odds**

**Proportion:** divide each score by the column total

**Odds of passing in 1 category:** dividing the pass proportion by the fail proportion

**Odds ratio: (**measures of effect size) pass ratio of group 1/ the pass ratio of group 2

## Logistic Regression

Problems with the linear regression of a binary variable:

- ends of the regression line = minus, plus infinity
- breach homoscedacity assumptions

Logit model:

- forces the linear model to become unbounded
- smoothes out assymetry
- forces a straight line into a function that is curvelinear

## Statistics Using Ranks

Scoring of data where it is possible to compare scoring of lowest and highest.

Why rank data?

- Data are ordinal
- Cannot assume normal distribution

Types of rank order tests:

- 1 sample
- between: kolomorogrov smirnoff
- within: wilcoxon signed ranks

- 2 sample
- between: mann whitney, wilcoxon signed ranks
- within: wilcoxon signed ranks

- K sample
- between: kruskal wallis
- within: freidman's anova

## Related discussions on The Student Room

- Stats AS Hypothesis question »
- Maths Project »
- Is this data normally distributed? »
- Outliners and Correlation »
- question about logs and correlation (year 13 NEW maths A-level) »
- Is there any evidence of correlation between UKCAT scores and BMAT scores? »
- Extreme points in regression »
- Reporting regression »
- anyone confident in s1 »
- S1 question, 1 marker »

## Comments

No comments have yet been made