# PSYC214 - Weeks 11-17 - Correlations and Regressions

• Correlation and partial correlation
• Linear regression
• Multiple regression
• Hierarchical and Stepwise regression
• Interactions and Polynomial regression analyses
• Logistic Regression
• Rank Data

## Correlation and Partial Correlation

Correlations: how two variables are related.
Scattergrams plot two values for each case in the analysis as a single point located from the two axes.

Z scores: makes it easier to directly compare the values of one variable to another
Mean = 0, Standard deviation = 1.

Pearson Product Moment Correlation: rxy = total(zx.zy)/n-1
= proportion of variance in 1 variable that can be predicted if the other is known
df = n - 2 (n= #'s of pairs of data)

1 of 12

## Correlation and Partial Correlation: Inflation and

Inflation: correlations will be inflated if they correlate in the same direction

• inflated by outliers with very high scores of both variables
• if only high and low scores are analysed

Suppression: two variables covary, both contribute to the depenant varuavkes, but they do so in opposite directions

Inflation and suppression are features of how variables interact in the world

2 of 12

## Partial Correlation

used when experimental control is not possible, statistical techniques remove the effect of one or more variables from the correlation, clarifying underlying relationships.

df = n - 2 ( - 1 for every variable removed)

3 of 12

## Linear Regression

Predicts scores on one variable from the other.
It fits a predicted line to the data points, for predicting x from y.

Calculates the relationship between the y and the x values.

Formula for line of best fit: y = mx + c
m is the slope: m = y/x, c is the constant

Linear regression formula: y' = mx + c
y'
= the predicted value of y

Method of least squares: fits the best straight line to the data, calcylates the lowest values for the sum of the squared deviation of the y scores from the line.

Straight line: linear regression
Curved line: polynomial regression

4 of 12

## Multiple Regression

Best prediction of a dependant variable from a set of independant variables.

• estimate the relative importance of your variables
• control for some variables
• predict scores combining variables

Predicts the dependant variable y, using more than one independant variable

Multiple R: correlation of the value predict by the multiple regression equation
For the best estimate use as few variables as possible & as many cases as possible.

R²: the amount of variance in the dependant variable predicted by the equation

df = k - 1, n = # of cases, k = # of independant variables

Adjusted R²: estimate of the r² if the same regression formula was used for a new set of ppts, lower than r²

5 of 12

## Hierarchial Regression

• can control for variables distorting the results
• one or more variables added to the regression at a time
once added and control, effect of the remaining variable can be evaluated

Stage 1: select the dependant variable, then the independant variable to control
Stage 2: bring in the variables you are studying

R² change: significant: if additional variable significantly improve the prediction

6 of 12

## Stepwise Regression

Allows you to identify the minimum set of independant variables that together significantly predict the dependant variables.

• variables are entered into the regression one by one
program selects ordering of the variables

1st variable: highest correlation with the dependant variable
once selected its effect is semi-partialled out,

If variables are not significantly contributing to the regression there effects are removed.

• -ve, give a minimal set of predictors, but underlying relationships may be more complicated
-ve, do not assume causal relationships, you can get chance effects
7 of 12

## Interactions

Interactions: effect of one variable may be different at different levels of the others
Significant: If the R² change is significiant when the product is entered

Standard Error of Mean: standard deviation of the distribution of means in the population from which you samples

• indicate accuracy of your findings
• low SEMs are desirable
• use z scores to lower standard errors, usually high

SE(R²): standard deviation of the distribution of R²
SE(B): standard deviation of the distribution of B

8 of 12

## Polynomial Regression Analyses

Find the best fit to curved data as well as straight line (linear) fits.

Quadratic equation: y = mx² + mx + c
Cubic equation: y = mx³ + mx² + mx + c

For data first try and fit a linear equation
If this is insignificant, a quadratic equation may be a better fit, and so on.

9 of 12

## Logistic Regression

Combines categorical data analysis with ANOVA. To calculate expected frequencies in categorical data multiplication is used, whereas addition is used in ANOVA. Conversion to log-scores allows the additive process.

• can be used to predict dependant variables with 3+ categories
• used for modelling binary outcome measures
• regressed onto explanatory measures: continuous and categorical

Proportions and Odds

Proportion: divide each score by the column total
Odds of passing in 1 category: dividing the pass proportion by the fail proportion
Odds ratio: (measures of effect size) pass ratio of group 1/ the pass ratio of group 2

10 of 12

## Logistic Regression

Problems with the linear regression of a binary variable:

• ends of the regression line = minus, plus infinity
• breach homoscedacity assumptions

Logit model:

• forces the linear model to become unbounded
• smoothes out assymetry
• forces a straight line into a function that is curvelinear
11 of 12

## Statistics Using Ranks

Scoring of data where it is possible to compare scoring of lowest and highest.

Why rank data?

• Data are ordinal
• Cannot assume normal distribution

Types of rank order tests:

• 1 sample
• between: kolomorogrov smirnoff
• within: wilcoxon signed ranks
• 2 sample
• between: mann whitney, wilcoxon signed ranks
• within: wilcoxon signed ranks
• K sample
• between: kruskal wallis
• within: freidman's anova
12 of 12