Correlation and Partial Correlation
Correlations: how two variables are related.
Scattergrams plot two values for each case in the analysis as a single point located from the two axes.
Z scores: makes it easier to directly compare the values of one variable to another
Mean = 0, Standard deviation = 1.
Pearson Product Moment Correlation: rxy = total(zx.zy)/n-1
r²= proportion of variance in 1 variable that can be predicted if the other is known
df = n - 2 (n= #'s of pairs of data)
Correlation and Partial Correlation: Inflation and
Inflation: correlations will be inflated if they correlate in the same direction
- inflated by outliers with very high scores of both variables
- if only high and low scores are analysed
Suppression: two variables covary, both contribute to the depenant varuavkes, but they do so in opposite directions
Inflation and suppression are features of how variables interact in the world
used when experimental control is not possible, statistical techniques remove the effect of one or more variables from the correlation, clarifying underlying relationships.
df = n - 2 ( - 1 for every variable removed)
Predicts scores on one variable from the other.
It fits a predicted line to the data points, for predicting x from y.
Calculates the relationship between the y and the x values.
Formula for line of best fit: y = mx + c
m is the slope: m = y/x, c is the constant
Linear regression formula: y' = mx + c
y' = the predicted value of y
Method of least squares: fits the best straight line to the data, calcylates the lowest values for the sum of the squared deviation of the y scores from the line.
Straight line: linear regression
Curved line: polynomial regression
Best prediction of a dependant variable from a set of independant variables.
- estimate the relative importance of your variables
- control for some variables
- predict scores combining variables
Predicts the dependant variable y, using more than one independant variable
Multiple R: correlation of the value predict by the multiple regression equation
For the best estimate use as few variables as possible & as many cases as possible.
R²: the amount of variance in the dependant variable predicted by the equation
df = k - 1, n = # of cases, k = # of independant variables
Adjusted R²: estimate of the r² if the same regression formula was used for a new set of ppts, lower than r²
- can control for variables distorting the results
- one or more variables added to the regression at a time
once added and control, effect of the remaining variable can be evaluated
Stage 1: select the dependant variable, then the independant variable to control
Stage 2: bring in the variables you are studying
R² change: significant: if additional variable significantly improve the prediction
Allows you to identify the minimum set of independant variables that together significantly predict the dependant variables.
- variables are entered into the regression one by one
program selects ordering of the variables
1st variable: highest correlation with the dependant variable
once selected its effect is semi-partialled out,
If variables are not significantly contributing to the regression there effects are removed.
- -ve, give a minimal set of predictors, but underlying relationships may be more complicated
-ve, do not assume causal relationships, you can get chance effects
Interactions: effect of one variable may be different at different levels of the others
Significant: If the R² change is significiant when the product is entered
Standard Error of Mean: standard deviation of the distribution of means in the population from which you samples
- indicate accuracy of your findings
- low SEMs are desirable
- use z scores to lower standard errors, usually high
SE(R²): standard deviation of the distribution of R²
SE(B): standard deviation of the distribution of B
Polynomial Regression Analyses
Find the best fit to curved data as well as straight line (linear) fits.
Quadratic equation: y = mx² + mx + c
Cubic equation: y = mx³ + mx² + mx + c
For data first try and fit a linear equation
If this is insignificant, a quadratic equation may be a better fit, and so on.
Combines categorical data analysis with ANOVA. To calculate expected frequencies in categorical data multiplication is used, whereas addition is used in ANOVA. Conversion to log-scores allows the additive process.
- can be used to predict dependant variables with 3+ categories
- used for modelling binary outcome measures
- regressed onto explanatory measures: continuous and categorical
Proportions and Odds
Proportion: divide each score by the column total
Odds of passing in 1 category: dividing the pass proportion by the fail proportion
Odds ratio: (measures of effect size) pass ratio of group 1/ the pass ratio of group 2
Problems with the linear regression of a binary variable:
- ends of the regression line = minus, plus infinity
- breach homoscedacity assumptions
- forces the linear model to become unbounded
- smoothes out assymetry
- forces a straight line into a function that is curvelinear
Statistics Using Ranks
Scoring of data where it is possible to compare scoring of lowest and highest.
Why rank data?
- Data are ordinal
- Cannot assume normal distribution
Types of rank order tests:
- 1 sample
- between: kolomorogrov smirnoff
- within: wilcoxon signed ranks
- 2 sample
- between: mann whitney, wilcoxon signed ranks
- within: wilcoxon signed ranks
- K sample
- between: kruskal wallis
- within: freidman's anova