# Research methods and data analysis

?
• Created by: megs543
• Created on: 20-02-19 14:41

## What is research?

Research process:

Identify research questions, design study, collect data from sample, use descriptive statistics, use inferential statistics, discuss results.

The object of research

This is what it looks like in practice. We have the world from which we abstract a theory on how we think it works. From this we derive hypotheses that we can test. We then design studies to examine the hypotheses through systemative observation and experimentation. We use our hypotheses to interpret the data.

Example experiment

Effect of size on text on reading ability in children. For example say we were interest in helping kids learn to read more easily. We might hypothesize that this might occur if we made the text bigger. But is this text size important at all or is all relative to the text itself (barring basic acuity issues), or may be there is a size which is too big etc.

1 of 220

## First year Basics

Null Hypothesis testing

All hypothesis tests have the following components; a statement of the NULL and ALTERNATIVE hypothesis, a significance level, a test statistic, a rejection region, calculations and a conclusion.

Variables

Independent variable- manipulated variable. Dependent variable- what is measured.

Populations and samples

We need to be representative of the population and be able to generalise our results back to the whole population.

Remember a sample approximates the population.

If one resamples then a different approximation is likely to be found. Both samples can’t be true and are known as estimates.We would normally deal with this by resampling the distribution multiple times and producing a sampling distribution e.g., we test multiple participants on multiple trials for each condition to get multiple estimates of the population.

2 of 220

## First year Basics Continued

Randomization and ordering effects:

IV manipulation will produce systematic variation between the two experimental groups All other sources of variation are called unsystematic variation if properly controlled for How to do this? A number of ways such as: Randomly assign your participants to groups Counterbalance the order of conditions (or randomize them). What these do is to ensure that things like IQ are not systematically related to the IV, so that they can only contribute to the unsystematic variation. How do we deal with the data? We need to get some descriptive statistics, measures of central tendency and measures of dispersal.

3 of 220

## First year Basics Continued

Descriptive statistics; measures of central tendency:

Mode: most frequently occurring score Median: score which lies in the middle of the scores (equal number of scores above and below this point) Mean: aka average. Add up all scores and divide by the number of scores Variable types Categorical (nominal). e.g. country of birth, sex of participant, no mathematical meaning, use mode as a measure of central tendency.  Ordinal (ordered), differences may not be the same e.g. 1st and 2nd not the same difference between 3rd and 4th, no decimal places, cannot use arithmetic, use ranks and use median as measure of ct.  Continuois (scale), differences between points are same, decimal points possible, if 0 indicates an absence then it is a ratio level measurement, if zero does not indicate absence then it is an interval measurement, arithmetic possible, use mean for ct, or median if distribution asymmetric.

4 of 220

## First year Basics Continued

Descriptive statistics; measures of dispersal:

Range: difference between top and bottom score Box Plots is a way of summarising data based on the median and interquartile range which contains 50% of the value. Variance: subtract each score from the mean, square them, add them up and divide by the number of scores minus 1 Standard Deviation: find the variance then take the square root to return to the original measurement units Standard error: take the standard deviation and divide by the square root of the sample size. Confidence intervals: commonly 95%. For any normal distribution, 95% of the cases fall within +/- 1.96SD either side of the mean, so it is 95% likely that a particular sample has a mean that falls within +/- 1.96 SE of the population mean. Null Hypothesis testing:  All hypothesis tests have the following components; significance level, test statistic, rejection region, calculations and a conclusion.

5 of 220

## First year Basics Continued

Significance level (a)

What is the probability that the observed difference between two sample means (or a larger difference) would have occured if both samples were randomly selected from the same population?

Calculate probablity of finding the observed data (e.g. difference) if the null hypothesis is true. This is the p value.

If p very small, reject null hypothesis, otherwise retain null hypothesis. Criterion level of p is called a.

If the probability of the observed difference (or a larger one) under the null hypothesis is < 5% then the null hypothesis can be rejected.

You can say that the hypothesis was supported or that the results of the experiment were statistically significant.

You must not say that the experiment showed the null hypothesis to be true or supported the NH. If the probablilty is >0.05, this just means that you do not have enough evidence against the null hypothesis to reject it.

6 of 220

## First year Basics Continued

Errors we can make

Type 1 error is the probablilty of declaring something 'significant' when there's no genuine effect.

Type 2 error is when the null hypothesis is not rejected but the two samples actually were from different populations. In this case the experiment has "missed" or failed to detect a real effect. You accept the null hypothesis when you should reject it. Therefore, something is significant when you think it is not.

Replication is key to combating Type 1 errors.

One- and two- tailed tests

one-tailed test is a statistical test in which the critical area of a distribution isone-sided so that it is either greater than or less than a certain value, but not both.

A two-tailed test is a statistical test in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values. If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.

7 of 220

## First year Basics Continued

Organising statistical tests

By type of research question- relationships between variables; correlation, regression, discrimination between variables; testing for differences between groups or treatments; t- test.

By type of test-Parametric or non-parametric. parametric statistical test is one that makes assumptions about the parameters (defining properties) of the population distribution(s) from which one's data are drawn, while a non-parametric test is one that makes no such assumptions.

By type of research design used- experimental research is directly hypothesis driven, survey research is concerned with relationhips between variables or whether independent variables predict dependent variables. These may or may not be driven by explicit hypothesis.

8 of 220

## First year Basics Continued

Experimental designs

No ‘right’ answer. Need to understand the science behind the question. What will you measure? What numbers will you actually write down? Subjects? Correlative (measurement) or causal (interventional, ‘true experimental’) study? For interventional studies, will you use a between- or within-groups design? Within-subjects designs are often more powerful but order effects may be a problem: need appropriate counterbalancing. Consider confounds (confounding variables). What is the appropriate control condition? Remember blinding and placebo/sham techniques. Keep it simple. Is your design the simplest way to answer the question? If you find an effect, will it be simple to interpret? If you don’t, what will that tell you? How will you analyse your data? What will your null hypothesis be? But remember, this is not the main focus of the questions! Will you need a series of experiments? Will you alter your plans based on the result of the first few experiments? Do you need to outline a plan? Consider ethics and practicality.

9 of 220

## What is regression?

A way of predicting the value of one variable from another. It is a hypothetical model of the relationship between two variables. The model used is a linear one. Therefore, we describe the relationship using the equation of a straight line. The way to work out regression is: Outcome (y)= [model (b0+b1Xi) + error (Ei)

b1= regression coefficient for the predictor, gradient (slope) of the regression line, direction/strength of relationship.The gradient- what the model looks like.

b0= intercept (value of Y when X=0), points at which the regression line crosses the Y-axis (ordinate). The intercept- Where the model is.

Example: album sales= 50 + (100*5) (album sales*price)+ E Answer= 550 (predicted)

10 of 220

## The model

How do we fit the model?

The method of least squares. The graph shows a scatterplot of some data with a line representing the generic trend. The vertical lines represent the differences (or residuals) between the line and the actual data.

How do we calcualte the model?

Fit the model that best describes the data; method of least squares, minimizes the error in the model, line of best fit.

How good is the model?

The regression line is only a model based on the data. This model might not reflect reality, We need some way of testing how well the model firs the observed data. How?

11 of 220

## Sum of squares?

Sum of squares is a statistical technique used in regression analysis to determine the dispersion of data points.

See powerpoint for different sections.

The total sum of squares (**T)- deviations of individual data from the grand mean. Variability between scores and the mean.

Model sum of squares (**M)- Deviations of mean on Y from regression model. Difference in variability between the model and the mean.

Residual sum of squares (**R)- Deviations of data from regression model, which is also **R=**T-**M. Variability between the regression model and the actual data.

12 of 220

## Testing the Model: ANOVA?

SST (total variance in the data) - SSM (improvement due to the model) and SSR (error in model).

If the model results in better prediction that using the mean, then we expect SSM to be much greated than SSR.

F test: F-test for testing equality of variance is used to test the hypothesis of the equality of two population variances.

Mean Squared error; sum of squares are total values, they can be expressed as averages, these are called mean squares, MS. F=MSM/MSR.

R squared- The proportion of variance accounted for by the regression model, the pearson correlation coefficient squared. R squared= SSm/SSt

13 of 220

## Output tables on SPSS for linear regression.

Variables entered table:
This states that there was one predictor variable entered and under the table says what your dependent variable was.

Model summary table:

R: This is the correlation coefficient between your two variables. Ranges from -1 to 1.

R Square: The correlation coefficient squared and therefore represents the amount of shared variance. It is the amount of variance in the dependent variable that can be accounted for by the predictor. It ranges from 0 to 1 and is mostly displayed by a percentage.

Adjusted R squared: This figure is R squared adjusted for the number of predictor variables in the model. Normally a bit less of R squared to give a more realistic estimate. The most important figure in the table.

Std. Error of the estimate: The standard error is an estimate of how accurate the estimate is likely to be.

14 of 220

## Output tables on SPSS for linear regression contin

ANOVA table:

This table tests the null hypothesis that the predictor does not explain any variance in the dependent variable. If the ANOVA result is statistically significant, the result is unlikely to have occured by chance. If it is not significant you cannot be confident your results did not occur by chance and you would state the regression mdel isn't statistically significant and you do not interpret the remainder of the output.

Coefficients table:

Unstandardized Coefficients B: Displays two figures. By using the two figures under B are the values of a and b in the equation. Using these figures you can complete the equation. The value of b or the regression line gradient also tells you the predicted increase in the dependent variable if the predictor variable increases by 1 unit. (y (dv)= a (y-intercept)+ bx (b- gradient, x-predictor variable).

15 of 220

## Coefficients table continued?

Unstandardized Coeffiecients Standard Error:

The standard errors for the values of a and b. These values enable SPSS to work out the t-value and statistical significance for the predictor variable. Because SPSS automatically calculates t-values and statistical significance you do not need to worry about the standard errors.

Standardized Coefficients Beta:

This figure allows you to predict the increase of the DV in standard deviatios if the predictor variable increases by 1 standard deviation. This is very uselful when interpreting multiple regression but not as necessary in simple regression.

t and Sig: This is the t-value and associated significance level. In simple regression, SPSS presents two sets of values; one for the constant (y-intercept) or one for the predictor value. Focus on the predictor value. These values test the null hypothesis that the regression gradient is 0 or whether the predictor variable is a significant predictor of the DV. A significant result indicates that the predictor variable is a significant predictor and the relationship is unlikely to have occured by chance.

16 of 220

## How you would write up the results for simple line

Includes the ANOVA result, the adjusted R squared value, the t-value and associated significance values for the predictor variable and the regression equation.

Simple linear regression was carried out to determine influence of revision hours on exam scores. This was a statistically significant model (F (1, 32) = 72.68 (f), P <.0.001) (ANOVA table). The adjusted R squared indicated that 68.5% (model summary table) of the variance in exam scores can be explained by the variance in revision hours. Revision hours was shown to be statistically significant predictor of exam score (t=8.52, P=<0.001) (coefficients table). The regression model suggested that each increased revision hour was related to a 1.8 (coefficients table, B) mark improvement in exam score. The regression equation for this model was: Exam score= 33.4+ (1.8* revision hours) (coeffiecients table).

17 of 220

## Other useful statistics in regression?

Histogram:

The histogram of standardized residuals is useful for two reasons;

1. checking for outliers; any points with a standardized residual greated than + or - 3.

2. checking whether the residuals are normally distributed- an assumption of regression.

What are outliers?

Exceptional or atypical values, identify by examing residuals, large residuals mean that there is a serious mismatch between the observation and the prediction. Large is any standardized residual larger than 3 because z-scores (standard deviations from the mean) greater than 3 are highly significant.

Warnings: regressioons are not symmetric, regressing y on x is not the same as regressing x on y

18 of 220

## What is multiple regression?

Linear regression is a model to predict the value of one variable from another. Multiple regression is a natural extension of this model; we use it to predict values of an outcome for several predictors, it is a hypothetical model of the relationship between several variables.

You have more than one predictor variable and one dependent variable. For example exam score can be predicted by revision hours, sex and positive attitude. You can have many predictor variables but you can only ever have one dependent variable.

The equation will slightly change as you will have more than one predictor so more than one x and more than one b (gradient of each regression line).

y (outcome variable)=B0+b1X1+b2X2+bnXn+Ei)

bis the intercept, the intercept is the value of the Y variable when all Xs=0. This is the point at which the regression plane crosses the Y-axis.

19 of 220

## Beta values?

b1 is the regression coefficient for variable 1 (x1).

b2 is the regression coefficient for variable 2 (x2).

bn is the regression coefficient for nth variable (xn).

b0= a in the other textbook.

The regression has the same format as simple regression except now you have 3 predictor variables and each predictor variable has its own gradient (b).

It is inputted into SPSS the same way just with more predictor variables in this box.

20 of 220

## Methods of regression?

Forced entry:

All predictors are entered simultaneously. The results obtained depend on the variables entered into the model; it is important, therefore, to have good theoretical reasons for including a particular variable. E.g. can we predict the number of views of a TV show from the number of trailers shown and the number of previews in the press.

Hierarchial:

Experimenter decided the order in which variables are entered into the model.

Stepwise:

Predictors are selected using their semi-partial correlation with the outcome.

Warnings of multiple regression: Same as linear regression, same assumptions hold, remember to look at your data and remember that regressions are not symetric.

21 of 220

## Output tables on SPSS for multiple regression?

Variables entered table:

Reminds you what predictor variables (IVs) were entered with DV underneath.

Model summary table:

R: Represents a multiple correlation between the actual DV and the predicted DV which is calculated using all predictor variables. Positive figure between 0 and 1 and reflects how well the data points would cluster around the regression line. Not normally reported in results write up.

R Square: Multiple correlation squared and represents the amount of variance in the dependent variable that can be accounted for by all the predictors (this model). Is a percentage 0.256= 25.6%

Adjusted R square: R square adjusted for the number of predictor variables in your model. E.g. indicates 71% of variance in exam scores can be explained by variances in predictor variables.

22 of 220

## Output tables on SPSS for multiple regression cont

Model summary table:

Std. Error of the estimate: an estimate of how accurate the estimate is likely to be: ignore.

Change statistics: Important if variables are entered into the regression in an ordered manner.

ANOVA table: tells you whether the whole model is significant. Looks at whether the variance  explained by the model (SSM) is significantly greater than the error within the model (SSR). It tells us whether using the regression model is significantly better at predicting values of outcome than using the mean. Same sections as in linear simple regression.

Coefficients table:

Unstandardized Coefficients B: Provides value for a (Bo) and b used in the regression equation. Each predictor value has its own related b value or gradient.

Unstandardized Coefficients Std Error: not needed.

23 of 220

## Coefficients table continued for multiple regressi

Standardized coefficients beta: These figures are simply the unstandardized coefficient B values in the first column which have been standardized. Each of the variables is measured on different units. The benefits of this is the all the variables are converted in the same units (SDs). This means that you can make direct comparisons between the predictor variables. The greater the value of the standardised beta weight, the more influence that predictor variable is having on the DV. The figures allow you to predict the increase of the DV in standard deviations if the predictor variables increases by one standard deviations.

t and Sig: This is the t-value and associated significance level for each predictor variables. These values test the null hypothesis that each regression gradient is zero or tests whether each of the predictor variables is a significant predictor of the DV. A significant result indicates that the particular variable is a significant predictor f the criterian.

How to interpret beta values;

beta values; the change in the outcome associated with a unit change in the predictos, standardised beta values; tells us the same but expressed as standard deviations.

24 of 220

## Writing up the results for multiple regression?

Includes the ANOVA result, the adjusted r squared value, the t value and associated significance value for the predictor variable and explain to the reader the relevant contribution made by the variables.

Multiple linear regression was carried out to determine the effect of revision hours, positive attitudes and sex of the participant on exam scores. This was a statistically signficant model (F (3, 30)= 28.20, p<0.001) indicating these results were unlikely to have arisen by chance (assuming the null hypothesis to be true). The adjusted r squared indicated that 71.2 percent of the variance in the exam scores can be explained by variances in the three predictor variables. The analysis suggested that revision hours (B=0.58) was the most influential predictor and sex of the participant (B=-0.01) was the least infleuntial predictor of the model. (Standardized coeffiecients in the coefficients table). Revision hours (t=4.04, p<0.001) and positive attitudes (t=2.24, p=0.03) were shown to be statistically significant predictors of exam score. (coefficients table). Sex of the participant was shown not be a statistically significant predictor of exam score (t=-0.12, p=0.91).

25 of 220

## What is hierarchical regression and output?

It is a type of multiple regression:

Known predictors (based on past research) are entered into the regression model first. New predictors are then entered into a separate step/ block. Experimenter makes the decisions.

It is the best method:

Based on theory testing, you can see the unique predictive influence of a new variable on the outcome because known predictors are held constant in the model. However, it reies on the experimenter knowing what they are doing.

In the variables entered tables it shows the order in which they were entered. In the model summary it also contains the change statistics and R Square change. This shows the percentage of variance for model 1 (first predictor) and the model 2 shows the percentage for both predictors ect. for if there are more predictors.

26 of 220

## Output for heirarchial regression continued?

Model summary output written:

Model 1 is the regression model with Press as the only predictor. It only accounts for 9.8% of the variance (R-square).

Model 2 is the regression model with both Press and Trailer as predictors. This model accounts for 25.6% (r-square) of the variance. The addition of Trailer to the model has additionally accounted for 15.9%  of the variance (R square change) and this change is significant (p = 0.021)

The ANOVA table shows that the first model (with Press alone) was not significant (F(1,29) = 3.14, p = 0.087), but that the second model (with both Press and Trailer) was significant (F(2, 28) = 4.824, p = 0.016). The second model will therefore be more successful in predicting the number of viewers.

T-tests tell you whether the independent variables make a significant contribution to the modelPress does not make a significant contribution to either model (Model 1: t = 1.819, p = 0.080; Model 2: t = 1.769, p = 0.088)Trailer makes a significant contribution to Model 2 (t = 2.413, p = 0.021)

27 of 220

## Output for heirarchial regression continued?

The Excluded Variables table tells you, for each model, which variables were not included. It gives an estimate of what their contribution would have been had they been included (given by the t statistic and the partial correlation).

Note: this only appears for Model 1, as all variables were included in Model 2

Remember that the order in which you enter predictors will change the model that is constructed Assess the contribution of each predictor by looking at R square change

How well does the model fit the data?

The accuracy of the model can be assessed by: residual statistics (standardized residuals) and influential cases (cook's distance).

28 of 220

## Assessing the accuracy of the model?

Standardized residuals:

In an average sample, 95% of standardized residuals should lie between + and - 2.

99% of standardized residuals should lie between + and -2.5.

Outliers are any case for which the absolute value of the standardized residual is 3 or more, is likely to be an outlier.

Cook's distance:

Measures the influence of a single case on the model as a whole. Weisberg (1982): absolute values greater than 1 may be cause for concern.

29 of 220

## Assumptions of regression?

Generalization:

When we run regression, we hope to be able to generalize the sample model to the entire population. To do this, several assumptions must be met. Violating these assumptions stops us generalizing conclusions to our target population.

Straightforward assumptions:

Variable type- outcome must be continuous, predictors can be continuous or dichotomous (axis divided into two branches).

Non-zero variance- predictors must not have zero variance.

Linearity- the relationship we model is, in reality, linear. Can be measured through scatterplots.

Independence- all values of the outcome should come from a different person.

30 of 220

## Outliers by influence?

Cook's distance:
A cook's distance over 1 indicates an outlier by influence. If you check the maximum and minimum figures of Cook's distance for the data set you can see they range from 0 to 1.15. This tells you at least one data point is an outlier by influence, but you do not know what particular data points these are. Outliers by casewise diagnostics would tell you which are the outliers.

A centre leverage value greater than 3 times its mean may also be an outlier by influence. The data file on SPSS will create two new columns of cook's distances and leverage values for each case or participant. You can now check to see which of these values exceed the cut-off values. You should check both cook's distance and the centered leverage values. Maximum and minimum figures of leverage values are 0.008 to 0.283.

31 of 220

## More tricky assumptions?

No Multicollinearity:

Predictors must not be highly correlated.

Homoscedasticity:

For each value of the predictors the variance error term should be constant.

Independent errors:

For any pair of observations, the error terms should be uncorrelated.

Normally- distributed errors.

32 of 220

## Multicollinearity?

In most regression models the predictor variables will be correlated to some degree. This is fine and it makes sense that similar predictor variables should be correlated. However, predictor variables correlating very highly suggests the variables are measuring similar things and you cannot confidently ascertain the unique influence of each predictor. If predictor variable demonstrates multicollinearity (very high correlations with other predictors in your models) it is best to remove it from your analysis.

This assumption can be checked with collinearity diagnostics. Tolerance should more than 0.2 and VIF (variance inflation factor) should be less than 10. It does not matter which one you check because they both do the same thing. They are inverted versons of each other; 1/VIF= tolerance, 1/Tolerance=VIF.

Using the tolerance values, a figure of 1 indicates that the predictor variable isn't correlated to others and therefore has no multicollinearity. Lower tolerance values indicated a higher degree of multicollinearity. Higher VIF values indicates a higher degree of multicollinearity; values above 5 indicate problematic multicollinearity.

33 of 220

## Homoscedasticity?

This is the idea that the residuals should be equally distributed along the regression line. If the residuals are not equally distributed for example, the residuals increase as the predicted values increase, this suggests there is something strange about the underlying distribution of the data and perhaps a linear regression is not an appropriate analysis. You sometimes see the assumption called absence of heteroscedasticity. Heteroscedasticity simply means the residuals are unequally distributed and the assumption is violated.

To ensure the data meets the homoscedasticity assumption; a scatterplot is drawn with ZRESID (standardised residual values) plotted against ZPRED (standardised predicted values).

You expect the values to be equally distributed in relation to the redicted value so you therefore expect to see no pattern in the scatterplot. If you see a random cloud of data points this means your data violates this assumption. For example if your scatterplot was to illustrate a  wedge shape this may indicate the residuals increase as the predicted values get increase. We realise that this is quite a subjective decision and one that can make students feel uncomfortable. If there is not a clear pattern in the scatterplot it normally means your data is fine and has not violated this assumption.

34 of 220

## Type of data in assumptions?

Your criterion or dependent variable must be measured at the interval/ratio level. Your predictors can be measured at the interval/ratio level or they can be dichotomous. This means they only have two responses; e.g. male and female, smokers and non-smokers, yes or no and so on.

You can use categorical predictor variables in linear regression but they have to be recorded into dichotomous dummy variables.

35 of 220

## What is ANOVA (Analysis of Variance)?

Statistics are used in the context of empirical work to test whether your manipulations [independent variable (IV) – experimental groups] have produced no difference in the your participants’ output [dependent variable (DV)], i.e., the means are the same, aka the null hypothesis
ANOVA is like the t-test [which tests whether the means of two experimental groups are equal  but it tests whether all group means  are equal when there is more than two levels to your independent variable or more than one independent variable  e.g., X1=X2=X3

Used to examine the differences between roups. For example, the difference between psychology students, medical students and history students on their knowledge of statistics.

Independent groups means that the groups are separate and distinct.

Always have one DV, if one IV you complete a one-way anova, if two two-way anova ect.

36 of 220

## Why complete ANOVA's if we have the t-test?

If we have an experiment with one factor but three levels we could perform multuple t-tests to compare:

Cond A* Cond B, Cond A* Cond C, Cond B* Cond C.

But this inflates our chances of falesely rejecting the null hypothesis (i.e. saying there is a difference when there isn't).

If we have 0.05 as our significance level, then we are saying that we are happy to have a change of 5% that our data gives us a significant finding when there isn't one really (type 1 error). Increasing the number of tests increases the chances. Best to run the ANOVA in which the chances of the type 1 error is kept at 5%.

Because if we completed multiple t-tests our probability of a type 1 error increases to 14.3%.

37 of 220

## What is an omnibus test and explain F?

ANOVA is am Omnibus test.

ANOVA is known as an omnibus test as it tests forn an effect accross all means but doesn't tell you where this difference lies.

For example, when an experiment is carried out with 3 groups then significant ANOVA's can arise from all situations when the group means are not all equal e.g. X1=X2=X3 is not true.

So it tells us there was an effect of the IV but not where the effect arose (to tackle these problems the data needs to be explored further). This will be discussed in later cards.

F:

To do this ANOVA takes a ratio (called the F-ratio) of the variance of your participants performance which is due to "systematic differences" caused by your independent variable compared with the variance caused by chance:

F-ration= variance caused by systematic experimental manipulation/ variance expected by chance.

38 of 220

## Difference type of ANOVA designs?

1. Independent-measures designs (Between subjects)- total of 6 subjects in this case.

Cond A                                   Cond B

Sub1. score                    Sub4. score

Sub2. score                    Sub5. score

Sub3. score                    Sub6. score

2. Repeated-measures design (within subjects)- total of 3 subjects in this case.

Cond A                                   Cond B

Sub1. score                    Sub1. score

Sub2. score                    Sub2. score

Sub3. score                    Sub3. score

39 of 220

## Datasets of different sources of variance?

Dataset 1: between-group variance                          Dataset 2: within-group variance

A                 B                       C                                    A                B                C

3                  5                       6                                     3                4                 6

3                  5                       6                                     4                6                 3

3                  5                                                              6                3                 5

3                                                                                  5                5                 4

40 of 220

## Sources of variability?

Systematic differences (experimental effects) The different conditions may have caused the samples to be different Individual differences Subjects have different levels of ability, differences between samples reflect individual differences Experimental error Unpredictable changes, measurement device etc.

Variance is additive; total variance= systematic difference + individual differences + experimental error. Tease apart these sources of variance in the data and attribute those found to systematic differences.

41 of 220

## Partioning sources of variability (independent-mea

Total variability (SST) leads to

Between participant variability (SSB) which leads to

Between group variance (effect of experiment SSM) or within group variance (residual error SSR)

Between conditions (group) variance:

Systematic differences, individual differences, experimental error, effect of experiment SSM

Within conditions (groups) variance:

Individual differences, experimental error, residual error SSR.

F-Ratio= effect of experiment (SSM)/ residual error (SSR).

42 of 220

## ANOVA by hand?

Example:

testing the effects of viagra on libido using three groups; placebo, low dose viagra and high dose viagra. The outcome/ dependent variable was an objective measure of libido.

Calculations: Total sum of squares (SST)- SST= Add up( Mean- individual mean of one data) squared.

Model sum of squares (SSM) is the same but times the number of data at the start.

Residual sum of squares (SSR) is the difference between the actual mean and predicted mean.

Then calculate the mean squared error and finally the F-ratio. Construct a summary table.

43 of 220

## How is ANOVA a special form of regression?

Simplest model to fit to the data is the grand mean – the overall mean. This is our Null hypothesis, i.e., there is no relationship between the IV and the DV We can also fit a second model where we use the means of each group. Can we predict individual scores better if we know group (IV) membership? Once we have our two models we can see if the second, experimental, model is an improvement to the simple, no effect, model. Does second model give an improved fit? Note similarity to Regression (like fitting lines).

44 of 220

## What is a one-way ANOVA?

A one-way between-groups ANOVA Aka one-way between-groups ANOVA or one-way ANOVA.

You use a one-way ANOVA to examine the differences between two or more independent groups; usually to examine the difference between three or more groups because you can use an independent t-test to examine the difference between two groups.

A one-way between groups ANOVA tests the null hypothesis that the mean scores for all groups are equal. To test the null hypothesis that the means are equal, you analyse the variance, which might seem a bit strange. The rationale is that the scores that you obtain from individuals vary; not everyone will get the same score on a test. Therefore, the scores from individuals within each group in our research study will vary. When you want to examine differences between groups you need to examine whether the mean scores from each group vary more than the variation that occurs between individuals. In other words, do the scores between people in different groups vary more than the scores of people within each group? If the mean scores for each group vary more than the scores of individuals then you can conclude that there is something about being in the different groups that affects the scores on the test- that is, you wouldnt expect people in the different groups to get the same scores on the test. Therefore, the varition for the mean scores and a measure of variation for individual scores will be calculated and compared.

45 of 220

## How do you calculate a one-way between groups ANOV

The first thing calculated is the variance for individual scores within each group. The first step in doing this is to calculate the sum of the squared deviations of each individual score from its group mean. To do this, you subtract the group mean from each score within that group and then square the result. Then you add up the values obtained. This is known as the sum of squares.

You can then follow the same principle to calculate the sum of squared deviations of each group mean from the grand mean. The grand mean is the mean for all scores in the analysis. For this calculation you replace each individual score with its group mean score and then work out the deviaton of this score from the grand mean. If you follow this procedure for everyone in the data set, and then add up the squared deviations you get a sum of squared deviations. This is the sum of squares between groups.

46 of 220

## Calculating mean square?

The sum of squares value provides a sense of the amount of variance between groups and within groups. However, this isn't the variance, because the variance also takes account of the number of values that contributed to the calculation of the sums of squares. In a normal calculation of the variance, you divide the sum of squares by n-1 where n is the number of values used to calculate the sum of squares. However, in the context of an ANOVA this is more calculated.

In an ANOVA, you divid the sum of squares by its degree of freedom (df). For the between-groups sum of squares, the df is the number of groups-1. The mean square is the sum of squared divided by the df.

For the within groups sum of squares, the df is the number of individuals in the analysis minus the number of groups.

47 of 220

## Calculating the f ratio?

The goal of the ANOVA is to compare the variability between groups to the variability within groups. You do this by dividing the between-groups mean square by the within-groups mean square.

If the f ratio is 1, this indicates that there is no effect on the dependent variable of being in the different groups. In other words, there is no difference between the groups. If the F ratio is greater than 1, this indicates that there is a difference between the groups. The p value that accompanies the F ratio then tells you whether or not this difference is statistically significant.

This can also also be done on SPSS!

between-groups variance = variance due to systematic differences + variance due to individual differences + variance due to experimental error within-groups variance = variance due to individual differences + variance due to experimental error As you can see, what distinguishes the two is that between-groups variance contains variance due to systematic differences. If we can show that between-groups variance is significantly greater than within-groups variance, this is equivalent to showing that variance due to systematic differences is significantly different from zero, i.e. there are significant differences between the groups. Because within-groups variance contains no component due systematic differences, it is often called error variance.

48 of 220

## Output from one-way ANOVA?

Descriptive statistics table:

Descriptive statistics for each group and for the overall sample. These are important because if your ANOVA shows a significant difference between group means then you need the descriptive statistics in order to see what these differences are.

Levene's test of equality of error variances:

Tests whether the variances of each group in the independent variable are approximately equal. If the significance value for this test is 0.05 or greater, then the variances can be considered to be suficiently similar to meet the assumption of the one-way ANOVA.

AKA test of homogeneity of variances- tests null hypothesis that variances of each condition are equal. If significant means variances are not the same, consider running Kruskal-Wallis test.

49 of 220

## ANOVA table?

Contains the sum of squares, df, mean square and F ratio values.

Sevaral rows of information in this table. The most important row is the row that corresponds with the name of the independent variable. The oher importatnt row in the table is labelled "error". This is the term used for the within-groups information. The error mean square is the value used as the denominator in the F ratio calculations. The final column in the ANOVA table is the probablility (p or sig) value associated with the F value. As with all inferential statistics, if this value is less than 0.05, then you can reject the null hypothesis.

Mean square= sum of squares/df

Between groups df= k-1 where k= no. of conditions.

Within groups df= N-k, where N= no. of subjects

F= MS between/MS within

50 of 220

## Reporting an ANOVA?

You should report:

The F ratio value rounded to two decimal places, with the degrees of freedom in parentheses, followed by the significant level.

Two df- one for the between-groups mean square and one for the within-groups mean square, separated by a comma.

An appropriate effect size statistic.

It is also important to report the mean and SD values for each group.

A one-factor independent measures ANOVA was conducted and proved significant, F(2,27)= 33.25, P<0.05 MSE= 13.33.

The ANOVA only indicates whether there is a statistically significant difference between groups. It does not indicate the nature of the difference and where it lies. To uncover this detail post hoc tests or planned comparisons between groups are needed.

51 of 220

## Considering assuptions of a one-way between-groups

There is always a risk that the computer sometimes gives you an answer when the information you input is not appropriate for the analysis you have chosen. Therefore, the conditios that need to be met need to be thought about before you can apply the result produced by the computer.

Conditions for a one-way ANOVA that must be met; the independent variable should be categorical, the dependent variable should be measured at the interval/ratio level, the groups have approximately equal variances. This is sometimes called homogeneity of variances, the residual scores should follow an aproximately normal distribution.

When the assumptions are not met then a Kruskal-Wallis test should be considered.

52 of 220

## Tests for assumptions?

Testing homogeneity of variances:

To examine the homogeneity of variance assumption, you can ask SPSS to conduct a Levene's test. The Levene's test provides a significance value, and if this value is 0.05 or greater then you can conclude that the variances of the groups are similar. When you have equal sample sizes in your groups and when the groups are reasonably large (around 20 cases in each group) then this assumption isn't important.

Testing normality of residuals:

In technical terms, a residual score is the difference between a score and its expected value. In the case of a one-way between-groups ANOVA, a residual is the difference between a score and its group mean. A one-way between-groups ANOVA assumes that these residual, or deviation scores are normally distributed. You can test this by plotting the deviation scores on a graph, such as a histogram and by conducting a test, such as the Kolmogorov-Smirnov test ti exanube the departure of this graph from normality.

53 of 220

## What is a one-way repeated measures (within-groups

Analyses data from a repeated measures study where the same people have participated in three or more testing sessions; the independent variable has three or more levels.

In this type of study you are interested in how one dependent variable changes over the testing sessions. It is an extension of the paired t-test.

This type of ANOVA has an advantage over between-group because you are not comparing different groups of people. Repeated measures tend to be more powerful than independent groups design and subsequently require smaller sample sizes to detect a significant effect. It also removes variability due to individual differences, is a more sensitive test than independent measures and has increased statistical power (more likely to reject H0).

This type of ANOVA tests the null hypothesis that the mean scores for all conditions are equal.

Example design: study to examine the difference in pain relief between ibuprofen, aspirin and a placebo. 10 participants with chronic back pain tries each of the drugs over three days and rates how effective the drugs are for reducing pain.

54 of 220

## Calculating the ANOVA?

1. Calculating sum of squares.

First you calculate the within-groups sum of squares by calculating the sum of the squares of each individual score from the individual's mean for all their scores accross the conditions; subtract each individual's mean score from each individual score, square the result, add up the values obtained.

Next you calculate the model sum of squares. This reflects the amount of variance accross the testing sessions within participants. Subtract the grand mean from the session mean for each testing session (or level of the IV), square each result, add up the values obtained, multiply by the number of participants.

Next you calculate the error sum of squares. This reflects the variation not due to the experimental effect. This is the within-groups sum of squares- model sum of squares.

Any variaton that is not due to the experimental effect must be variation due to error (or random effects).

55 of 220

## Step 2- calculating mean square?

The sum of squares values provides a sense of the amount of variation within groups. However, the sum of sqquares is not the variance, because the variance also takes account the number of values that contributed to the calculation of the sums of squares, Therefore, you need to divide the sum of squares by the degrees of freedom resulting in the variances or the mean squares required

First calculate the df for the model sum of squares. The df is the number of testing sessions-1.

Next calculate the model mean square to convert the variation into variance so need to take int account the number of testing sessions. Divide the model sum of squares divided by df.

Next calculate the df for the error sum of squares. Here the df is the number of individuals- 1 * the number of testing sessions-1.

Finally we calculate the error mean square. In order to convert the variation due to error (or random effects) you simply divide by its df value. The mean square s therefore the sum of squares divided by its df.

56 of 220

## Step 3- calculating the F-ratio?

The goal of the ANOVA is to compare the variability due to the experimental effect to the variability due to error (variations due to unexplained factors). You do this by dividing the model mean square by the error mean square. The result is known as the F ratio.

If the F ratio is 1, this indicates that there is no effect on the dependent variable of the different testing sessions. In other words, there is no difference between testing sessions. If the F ratio is greater than 1, this indicates that there is a difference between the groups. The p value that accompanies the F ratio then tells you whether or not this difference is statistically significant. If done by hand you need to consult a book that contains tables of critical values for ANIVA models; you can look up the value of the F ratio and dfs to see if the value indiscates a statistically significant effect. Or SPSS can be used.

57 of 220

## Outputs for repeated measures ANOVA?

Appears as seven tables.

Within-subject factor table: reminder of how each dependent variable was coded.

Descriptive statistics table:

Shows descriptive statistics for the dependent variable at each level of the independent variable.

Multivariate tests:

Different way to calculate repeated measure differences. The advantage of these tests is they are not dependent on the assumption of sphericity. The disadvantage is that they are often not as powerful as the ANOVA and they are not suitable for analysisng small sample sizes. It is therefore recommended not to interpret this table.

58 of 220

## Outputs for repeated measures ANOVA continued?

Mauchley's test:

Examines the assumption of spherecity, which is that the differences between each testing session have approximately equal variances. If Mauchly's test isn't significant then it means the assumption isn't violated and you can interpret the ANOVA results from the Sphericity assumed values. If Mauchly's test is significant then the assumption is violated and you should interpret the ANOVA results from the Greenhouse-Geisser values. The three epsilon figures are estimates of the degree of sphericity. Because your interpretation of Mauchly's test relies on the significance value, you do not need to worry about the epsilon values.

ANOVA table:

Contains the sum of squares, df, mean square and F ratio values. The final column is the probability (p or sig) value assosciated with the F ratio. As with all inferential tests, if the value is less than 0.05, then you can reject the null hypothesis.The ANOVA you report depends on the Mauchly's test..

59 of 220

## Outputs for repeated measures ANOVA continued?

A table also includes test trends within the data. A linear trend sugests a linear trend between the variables. A quadratic trend suggests the trend is a U or inverted U relationship. Means or plots are better to examine this.

There is a table that is a test for a between subject effect. Because this is a one-way repeated measures ANOVA you have no between-subjects variables so you do not need to interpret this table.

To write up the results you need; the F ratio value rounded to 2 decimal placrs, with the degrees of freedom in parentheses, followed by the significance level. Two df-1 for the model mean square and one for the error mean square, separated by a comma.

A 2 way repeated measure ANOVA was conducted on the data with Number of errors as the DV and the factor of keyboard type as the IV, it was found that there was a significant effect of keyboard type (F(2,6)=16, MSE=1, P<0.001, n squared=0.84.

The mean and SD also needs to be reported for each group.

The ANOVA only indicates whether a statistically significant difference exists between the testing sessions. The ANOVA does not indicate the nature of the difference or where it lies. Therefore, post hoc tests or planned comparisons need t be completed.

60 of 220

## Outputs for repeated measures ANOVA continued?

A table also includes test trends within the data. A linear trend sugests a linear trend between the variables. A quadratic trend suggests the trend is a U or inverted U relationship. Means or plots are better to examine this.

There is a table that is a test for a between subject effect. Because this is a one-way repeated measures ANOVA you have no between-subjects variables so you do not need to interpret this table.

To write up the results you need; the F ratio value rounded to 2 decimal placrs, with the degrees of freedom in parentheses, followed by the significance level. Two df-1 for the model mean square and one for the error mean square, separated by a comma.

A 2 way repeated measure ANOVA was conducted on the data with Number of errors as the DV and the factor of keyboard type as the IV, it was found that there was a significant effect of keyboard type (F(2,6)=16, MSE=1, P<0.001, n squared=0.84.

The mean and SD also needs to be reported for each group.

The ANOVA only indicates whether a statistically significant difference exists between the testing sessions. The ANOVA does not indicate the nature of the difference or where it lies. Therefore, post hoc tests or planned comparisons need t be completed.

61 of 220

## Assumptions of a one-way repeated measures ANOVA?

The ANOVA is a parametric statistic which means it is only appropriate for certain types of data.

Conditions that must be met;

The independent variable should be categorical, the dependent variable should be measures at the interval/ratio level, the residual scores should follow a normal distributio by a histogram and a Kolmogoroz-Smirnov test, the differences between each testing session (or each level of IV) have approximately euqal variances. Mauchly;s test is important. The Huynh-Feldt and Lower bound are similar to the greenhouse-geisser correctiion in that they account for the violation of sphericity and produce results that are which are less and more conservative respectively. These versions are not commonly reported (if the Mauchley's test are significant).

When the assumptions for a one-way ANOVA aren't met then consider conducting a Friedman test.

62 of 220

## Non-Parametric tests instead of ANOVAs?

Kruskall-Wallis Test for Independent-Measures:

Used to examine the differences between two or more independent groups- usually three or more groups, because yiu can use a Mann-Whitney U test to examine the difference between two groups.

Used when: Data uses a nominal or ordinal measurement scale, parametric assumptions (normality, homgeneity of variance) not met.

The output on SPSS presents the sample size of each group and the mean rank for each group. The next table is the Kruskall-Wallis test result. It contains a chi-square statistic, df and an asymp. sig. value. The asymp. sig. value is the significance value that determines whether the difference between the groups is statistically significant or not. As with all inferential tests, if this value is less than 0.05 then you can reject the null hypothesis.

63 of 220

## Writing up the results of a Kruskal-Wallis test?

When reporting Kruskal-Wallis test results you report the Kruskal-Wallis H value rounded to 2 decimal places, with the degrees of freedom in parentheses, followed by the significance level (the H value is the chi-square value). So the report are resulted as followed:

There was a statistically significant difference between the three groups of children in terms of their speech development, H(2)= 0.07, p=0.04.

Some descriptive information about each group to help the reader to understand the nature of the difference., often the median value and intequartile range for each group. As well as the effect size measure that is similar to the eta-squared value calculated for a one-way ANOVA and is related to the Cramer's V statistic calculated for chi-square analysis. This value is obtained by dividing the chi-square value from the Kruskal-Wallis test by the sample size-1.

The Kruskal-Wallis test only indicates whether there is a statistically significant difference between groups. It does not indicate the nature of the difference or where the difference lies. To uncover this detail you need to compair the different pairs of groups in post hoc tests or planned comparisons using Mann-Whitney U tests.

64 of 220

## Considring assumptions of a Kruskal-Wallis test?

Non-parametric means that the test makes no assumptions about the data. The Kruskal- Wallis test therefore does not have many assumptions and so it is considered to be free from assumptions, or non-parametric.

However, the Kruskal-Wallis test is not free from assumptions, because there are some conditions that need to be et by your data before a Kruskal-Wallis test can be used validly. These conditions are:

The independent variables should be categorical, the dependent variable should be measured at the ordinal level. It also might be the case tat the dependent variableis measured at the interval/ratio level, but the assumptions of normality and/or homogeneity of variance that are required by a one-way ANOVA are not met.

65 of 220

## Non-Parametric tests instead of ANOVAs continued?

Friedman test:

Non-parametric equivalent of a one-way repeated measures ANOVA. It is used to test the difference in scores between two or more conditions in repeated measures design. The data uses a nominal or ordinal measurement scale and does not meet parametric assumptions. You can test the difference between two conditions in a repeated measures design using the wilcoxen test, so you normally employ the friedman test when when there are three or more levels of your independent variable.

The Friedman calculation rankes each participant's scores across each of the conditions they participate in. Therefore, if an individual score lowest they score 1 and highest 3. The mean rank is simply the mean of all your ranks for your sample. It reflects how the test was calculated but isn't very informative so we suggest you don't need to interpret this table.

The Friedman test result contains a chi-square statistic, df and the Asymp. Sig. value. Same as previous test., If statistically significant difference exists between the mean scores of the three conditions. The table provides Kendall's coefficient of concordance, which is an estimate of effect size and is denoted by Kendall's W ranges between 0 and 1, with higher scores indicating a larger effect size.

66 of 220

## Writing up the results of a Friedman test?

Report the Friendman Chi-square value rouded to two decimal places, with the degrees of freedom and sample size in parantheses, followed by the significance level:

There was a statistically significant difference between the three genres of movies in terms of mood ratings. Xsquared(2, n=35)= 19.74, p<0.001.

Report some descriptive information about each group to help the reader to understand the nature of the difference. Report the median value and the interquartile range fir each group and the Kendall's coefficient of concordance as an estimate of effect size.

The Friedman test doesn't tell you where the significant difference lies. For example, if you find a significant difference between three conditions it might be the case that all three groups differ significantly from each other or that compare the different pairs of groups using Wilcoxon tests. If three conditions you would need three wilcoxon tests.

Assumptions of the friedman test; the independent variable should be categorical, the dependent variable should be measured at least at the ordinal level.

67 of 220

## What is an ANCOVA?

Analysis of covariance.

When and why:

To test for differences between group means when we know that the extraneous variable affects the outcome variable. Used to control known extraneous variables.

Advantages:

Reduces error variance- by explaining some of the unexplained variance (**R) the error variance in the model can be reduced.

Greater experimental control- by controlling known extraneous variables, we gain greater insight into the effect of the predictor variables (s).

**T (total variance in the data) leads to **M (improvement due to the model) and **R (error in model). **M leads to covariate, **R leads to covariate and **R.

68 of 220

## Example of ANCOVA?

Field's viagea example:

Outcome (DV)= participan't libido

Predictor (IV)= dose of viagra (placebo, low and high)

There are several possible confounding variables- e.g. partner's libido, medication.

We can conduct the same study but measure partner's libido over the same time period following the dose of Viagra. Covariate= partner's libido.

Output- means are recalculated factoring out of the effect of partners libido. So when interpreting your effects you should use these estimated marginal means in order to do so. Note: that the post-hoc tests are less sensitive than a priori contrasts. Bonferroni corrections.

Main effect: F (2, 26)= 4.41, P<0.05

69 of 220

## Recognising factors?

Factors:

Chosen and manipulated e.g. sex, age ect.

Levels:

Level of factor, e.g. sex has two levels: male and female, age: for example 3 levels of 5, 7 and 10.

Factor may be between subjects (independent measures)- different subject for each level, e.g. three classes of 5, 7 or 10 year olds.

Factor may be within subjects (repeated measures)- same subjects on each level e.g. one class tested when they are 5 then 7 then 10 years old.

Dependent variables- what you measure e.g. reaction time, percent correct, ratings on a scale.

70 of 220

## Main effects and interactions?

One way:

One independent variable with more than two levels (if only two levels then we can run a t-test). Straight forward interpretation i.e, its significant or its not.

Two way:

Two independent variables with two or more levels. More complicated to interpret.

With two factors we ask three questions; effect of factor A, effect of factor B, interaction of factor A and B.

Example: children's reading ability;

two factors (independent variables): sex (2 levels, male and female), age (3 levels, 5, 7 and 10 year olds).

Dependent variabls: score on reading test.

71 of 220

## Example continued?

Questions asked:

Main effect of sex- are females better than males? The mean of the female score is higher than the mean of male score.

Main effect of age- do 10 year olds differ to 7 year olds and in turn 5 year olds? The mean of each age is different to the other.

Interaction of sex and age- does the performance of males and females differ in a different way across age? No interaction- the effect of age is the same for both male and female. Interaction- The effect of age is now different for male and female. Females improve at a greater rate than males. The effect of age is still different for male and females (more complicated).

Definition: there is an interaction between two factors if the effect of one factor depends on the levels of the second factor.

Interaction effects: shown when lines in graph are not parallell, are important, always check for an interaction effect before interpreting main effects, can arise without any significant main effects.

72 of 220

## Two Way independent measures ANOVA?

Tests for: A main effect of factor A, a main effect of factor B, an interaction effect (A*B interaction).

These 3 tests are independent: the outcome of one hypothesis test is unrelated to either of the other to. So it is possible to have any combination of significant/non-significant effects. Is used to examine the differences betwee two or more independent groups on two independent variables. Two-Way ANOVA always has two independent variables and one dependent variable.

Two-way ANOVA's are often described by the number of levels of the two-independent variables. For example, a two-way ANOVA where each independent variable has three levels will be known as a 3*3 anova. A two-way ANOVA where where one independent variable has 3 leavels and the other independent variable has 4 levels will be known as a 3*4 anova and so on. You will have three hypothesis, the first two are main effects and the third is known as an interaction.

Main effects: they are the effects of each independent variable on the dependent variable. Interaction effect- is the effect of the combination of the two independent variables on the dependent variable. Useful when you think the effect of IV on DV isn't simple but is influenced by another variable. Most interested in. Overrides the main effects if significant, provides more information.

73 of 220

## Why Two-way ANOVA's take place?

Although, two-way ANOVA's contain two independent variable and one dependent variable and therefore produce two main effect results. It is not the same as completing two one-way ANOVAs.

The F ratio and significance value obtained will be different. The difference in this is because of the error (residual or within-subjects) sums of squares. The F ratio is the mean square for the effect divided by the error mean square. The error mean square for the two-way ANOVA is different from the one-way ANOVA. Hence the different F ratio and associated significance value.

The error mean square is different because there is more information in a two-way ANOVA. A one-way ANOVA only contains one independent variable. A two-way ANOVA contains two independent variables, which results in three terms- two main effects and an interaction.

The addition of this information to the ANOVA means that you can explain more of the variance in the dependent variable. In other words, there will be less error variance- variance that is not explained by the independent variable.

74 of 220

## Output for a two-way between-groups ANOVA?

Between-subjects factors and descriptive statistics tables:

The first table presents the number of participants in each group, the descriptive statistics for each group and the overall sample.

The Levene's test for homogeneity of variances:

The Levene's test tests whether the variances across the groups of the independence variables are approximately equal. If the significance value for this test is 0.05 or greater, then the variances can be considered to be sufficiently similiar to meet the assumption of the two-way ANOVA.

ANOVA table:

It contains the sum of squares, df, mean square and F ratio values. The calculations for a two-way ANOVA are complicated, but the principle is the same as that of the one-way ANOVA.

The most important rows in this table are those that correspond with the names of your variables.

75 of 220

## ANOVA table continued?

Most important row:

So, the information for the main effects can be found along the row labelled by that particular main effect. The interaction effect can be found along the row which is labelled with the product of bth main effects.

The other most important row in the table is the one labelled 'error'. This is like te row in the one-way ANOVA table thtat referred to the within-groups information. So, the error mean square is the value used as the denominator in the F ratio calculations.

The final column in the ANOVA table is the probability (p or sig.) value associated with the F ratio. As with all inferential statistics, if the value is less than 0.05 then you can reject the null hypothesis. In a two-way ANOVA, there are three important significance values- one for each of the two main effects and one for the interaction result.

Examples: If first main effect non-significant you fail to reject null hypothesis that the mean scores for all groups are equal, if second is non-significant youl fail to reject H0 that the mean for both groups is equal. If interaction significant the effect of the first main effect on the dv is influenced by the second main effect visa versa.

76 of 220

## Interaction plot output?

The interaction result in the ANOVA table does not provide you with information about the nature or direction of the interaction, only that it exists. To examine the nature of the interaction you need to look at the interaction in more detail. The best way to do this is to examine the interaction plot. This is a picture of ow the two independent variables interact.

The levels of one variable are provided on the horizontal axis and the levels of the other variable are represented by separate lines. The vertical axis represents the scores on the DV.

Example for cowboy levels: To make sense of the plot, find the point on the horizontal axis that represents people who prefer John Wayne. Then draw a line in your mind upwards from this point until it meets the first line on the plot. This point represents the mean intelligence score for females who prefer John Wayne. If you now follow the line for females you can see that the line increases to the point where it reaches Clint Eastwood and then decreases to the point where it meets the people who do not watch cowboy movies. Therefore, for females the plot suggests that females who prefer Clint Eastwood have higher intelligence scores than other females in the sample.

The same procedure can be followed for the men. There a several way of interpreting this plot, but you should focus your interpretation around your hypothesis.

77 of 220

## Writing up the results of a two-way between-groups

Similar to a one-way ANOVA, but the difference is that when reporting the result you need to clarify whether you are reporting the result from a main effect or an interaction. Often you want to report all three resuts, this can be a bit wordy so sometimes better to record in a table. In the table the **, df, MS, F and p are recorded. In words this is:

A 3*2 way ANOVA with cowboy preference (levels) and gender (levels) as between-subjects factors revealed no main effect for cowboy preference, F(2, 24)= 0.11, p>0.89, n squared=0.08. Do the same for both main effects and interaction. And report MSE. You can then talk about what the interaction plot shows.

It is also recommended that the effect size and mean and SD are reported for each group, particularly when significant as the mean scores allow the reader to understand the nature of the difference. If you report statistically significant main effects, where the IV has more than 2 groups then you might need to conduct post hoc tests or planned comparisons between the groups to unravel the nature of this main effect. This is because all the significance tells you is whether there is a difference between thegroups, not whether this difference is between all three groups and what directions the differences are. Assumptions are same as one-way but if does not meet normality of residual scores then the ANOVA needs to be abandoned or data needs transforming.

78 of 220

## Two-way Repeated measures ANOVA?

This is used when you have two independent variables that are both measured with a repeated measures design. For example, a factory produces two products; complex instructions and simple instructions and there are two shifts; night or day. The DV is the effect of shift on errors.

If each independent variable had three levels it is a 3*3 ANOVA which would mean 9 testing sessions. Main effects and interactions are the same than in an independent measures.

Within-Subject factor table:

Reminder of how each DV was coded, you should check this to ensure everything had been coded correctly. There is also a descriptive statistics table for the DV at each level of the IV or for the DV at each testing session.

Multivariate test:

A different way to calculate repeated measures differences. The advantage of these tests is they are not dependent on the assumption of sphericity, The disadvantage is that they are often not as powerful as the ANOVA and they are not suitable to analyse small sample sizes. Do not interpret.

79 of 220

## Output for two-way repeated continued?

Mauchley's test:

Examines the assumptions of sphericity. If not significant it means the assumption is not violated and you can interpret the ANOVA results from the Spehricity assumed values. If significant it means the assumption is violated and you should interpret the ANOVA results from the Greenhoyse-Geisser values. This test is only needed for three or more levels in your independent variables. If not conducted then we can interpert the ANOVA results from the Sphericity assumed values. However, when it is not conducted all the lines in the ANOVA table will be the same.

ANOVA table:

The most important rows are those that correspond with the names of your variables and the interaction effect. The final column in the ANOVA table is the probability (p or sig.) value associated with the F ratio. Same as independent measures. Look at interaction plot for more detail.

Between-subjects table:

Not relevant because this is a within-groups ANOVA.

80 of 220

## Interaction plot for a two-way within groups ANOVA

Picture of how the two independent-variables interact. The levels of one variable are on the horizontal axis and the other variable are represented by separate lines. The vertical axis represents the scores on the dependent variable.

Example: To make sense of the plot, find the point on the horizontal axis that represents caffeine being consumed (denoted by 1 on the x axis). Then draw a line in your mind straight up from this point until it crosses the two lines. The point at which it crosses the first line represents the mean IQ score when participants consumed coffee and there was background music present. The point at which you cross the second line represents the mean IQ score when participants consumed coffee and there was no music. From this you can see that when caffeine had been consumed IQ scores were higher in the no music condition compared to when background music was presented. If you then look at the point on the horizontal axis that represents no caffeine consumed (2 on the x axis) you see that the mean IQ scores are very similar when music was and was not present. The no caffeine consumed mean scores are also similar to the point on the plot representing the mean score of the caffeine consumed and background music present condition. So, the nature of the interaction seems to be that caffeine has an important influence on IQ scores only when there is no background music.

81 of 220

## Writing up the results of a two-way within-groups

Reported the same as you report a one-way ANOVA. The difference is that when reporting the result of a two-way ANOVA you need to clarify whether you are reportng the result from a main effec or an interaction Often you want to report all three results- the two main effects and the interaction.

Same report as for between-groups ANOVA.

Assumptions: The independent variables should be categorical, the dependent variable should be measured at the interval/ratio level, the differences between each session have approximately equal variances, this is known as sphericity, the residual scores should follow an approximately normal distribution.

82 of 220

## What is a mixed ANOVA?

This is when your study has elements of both designs: one of you rindependent variables may be a between-groups variable and another independent variable may be a within-groups variable. In these cases a mixed ANOVA may be the most appropriate statistical analysis.

Three-way mixed ANOVA:

3 IV's of any type: all could have different particiants or could all have the same or their could be a mix. Mixed; one or more IV uses the same participants, one or more IV uses different participants AKA mixed-plot design.

Example: speed dating: Is personality or looks more important?- IV1 (looks) attractive, average ugly, IV2 (personality) high charisma, some charisma, dullard, IV3 (Gender) male or female?, DV: participants rating of the date (percentage).

Effects: we will get an F-ratio for the main effect of each IV; looks, personality, gender. Two way interactions: we will get f-ratios for all possible interactions between pairs of variables: looks*personality, loos*gender, personality*gender. Three-way interactions; F-ratio for this- looks*personality*gender.

83 of 220

## Interpreting two-way mixed ANOVA?

Within-subject factor table: how each level of within-groups was coded.

Between-subject factor table: how each level of between--groups was coded.

Descriptive statistics: Each group at each level

Box's test of equality of covariance: examines the assumpton that the DV has approximately variance-covariance matrices, only needs to be interpreted if you have unequal sample sizes. Look at p-value if unequal. If greater than 0.001 then this is not violated and can interpret the ANOVA results as normal, if less than 0.001 then it is violated you should not use a mixed ANOVA.

Multivariate tests: do not interpret.

Mauchley's test: same as always.

Levene's test: Tests whether the variances across the groups of the between-groups variables are approximately equal. IF greater than 0.05, then the variances can be considered similar to meet this assumption of the mixed ANOVA and you can interpert the ANOVA result as normal. If less than 0.05 its violated.

84 of 220

## Interpreting two-way mixed ANOVA continued?

ANOVA table:

When you conduct a mixed ANOVA, the table of tests of within-subjects effects presents two important pieces of information. First, it gives you the main effect for your within-groups variable. This is the first row denoted by the name of your within-subects variable. The second is your interaction effect which is denoted by both variable names separated by an asterisk.

Just like the within-groups ANOVA, there are four separate versions of each resilt reported. Which one you should report depends on the results of the Mauchly's test. If Mauchly's test is not significant you can interpret the ANOVA results from the sphericity assumed values, if significant from Greenhouse-Gesiiser. The p value associated with the F ratio is the same as always.

In a mixed ANOVA the table of tests of between-subjects effects presents the main effect for your between-groubs variables. This is a row denoted by the name of your between-subjects variable. Also, you can find the main effect for your within-groups variable and interaction effect in the table of tests of within-subects effects. The significance values are in these individual tables. There is also an interaction plot in a mixed ANOVA as well.

85 of 220

## Writing up the results of a two-way mixed ANOVA?

Three important pieces of information; the between-subjects main effect, the within-subjects main effect and the interaction effect. You should report the F ratio rounded to 2 decimal places, with the degrees of freedom in parentheses. followed by the significance level. You should report two df-one for the effect mean square and one for the error mean square, separated by a comma.

Should report an effect size and descriptive statistics and the interaction plot should be included and described.

If IV has more than two levels, you will need to consider the relevant post hc tests or planned comparisons.

Output for three-way ANOVA: Mauchly's test of sphericity, repeated measures effects table (tests of within-subjects effects), levene's test, between-group effects, bar graphs for each effect, interaction plots for each interaction.

86 of 220

## Mixed ANOVA assumptions?

Parametric statistic. There is no non-parametric equivalent so if your data violates the assumptons outlined, a statistical advisor should be contacted.

Type of data; IV categorical and DV measured at interval/ratio level. The residuals should approximately be a normal distribiton by histogram of Kolmogorov-Smirnov test.

Homogeinity of variances; Levene's test. Sphericity: Mauchly's test (only if more than 2 levels).

Homogeneity of variance-covariance matrices: Mixed ANOVAs have an additional assumption you will not use elsewhere. As well as variances being approximately equal, you now have the covariances of the relationship between the variables to consider. This is known as the homogeneity of variance-covariance matrices assumption and it is assessed by Box M's test. If you have equal sample sizes then you can assume homogeneity of variance-covariance matrices so this test does not have to be considered. If unequal then look st the P value in this test. If greater than 0.001 then assumption not violated and can interpret the rest of the results as usual, if less than 0.001 it may indicate you do not have homogeneity of variance-covariance matrices and this assumption is violated. Then the mixed ANOVA cannot be completed. Quite sensitive when dealing with large sample sizes which is why 0.001 is used rather than 0.005.

87 of 220

## What to do after completing ANOVA?

There are two types of problem; obtaining significant main effect and wanting to investigate further or obtaining significant interaction and wantinfg to investigate further.

What to do after completing an ANOVA?

Mutiple t-tests- Fischers protected t-tests or Bonferonni correction.

Planned comparisons or contrasts- do not require significant ANOVA and carry out a small number of specific comparisons. Hypothesis driven.

Post hoc tests: not planned (no hypothesis). These require significant ANOVA and carry out several comparisons simultaneously.

Simple main effects: in a n-way ANOVA, test for the effect of one factor at each level of the other factor (useful for understanding a significant interaction).

88 of 220

## Post hoc tests for independent group designs?

One way of examining significant results further is by doing three independent t-tests for each variable. This sounds good but in practice it creates a problem called multiplicity. To manage this issue you need to use a post-hoc test.

Multiplicity:

This occurs when you conduct several significance tests to test a single hypothesis. This is problem as it relates to the concept of probability, so it is not an obvious problem in your analysis. Also, there is a lack of consensus among statisticians about when multiplicity is a problem. The main problem relates to the concept of a type 1 error. Usually when you conduct a significance test, there is a 5% chance than you will make a type 1 error. This isn't a error made in analysis it is an error that relates to the fact that all significance tests are based on probability. You can never be certain about the conclusions of a significance test; you can only state your conclusions with a high degree of probabibilty. Multiplicity suggests when you conduct more than one significance test under the same hypothesis, you are increasing your overall chaces of making a type 1 error. The solution to this problem is to use a post hoc test that has been designed to manage the issue of publicity.

89 of 220

## Choosing a post hoc test?

The most commonly used post hoc tests in psychology are the Scheffe test and the Turkey Honestly Significance Difference (HSD) test, AKA turkey test. These tests differ in the way that they adjust the type 1 error rate for the analyses. The Scheffe tests tends to have less power than the turkey test. You are more likely to find statistically significant differences between pairs of groups using the turkey test the than when using the scheffe test. Therefore, the turkey test is often the best option.

There might be situations where your ANOVA provides a statistically significant result but the post hoc tests suggest that there are no significant differences between groups. This happens because when you examine comparisons between pairs of groups you are leaving other groups out of your analysis. This reduces your sample size reducing power. In the case where you conducted a Kruskal-Wallis test and you want to examine differences between pairs of groups, the only option available is to conduct a Mann-Whitney tests on all pairs of groups, because no post hoc test exists for situations where the DV is measured at the ordinal level. To control for multplicity in this case, you need to adjust your conclusions about the Mann-Whitney test by adjusting the cut-off point for determining whether a finding is sig or not. To do this divide the normal cut-off point (0.05) by the number of Mann-Whitney tests conducted. This is known as a Bonferroni correction.

90 of 220

## Turkey HSD post hoc test (independent measures)?

The results from this test will be produced below the ANOVA results. From these turkey results the first line compares one variable to another with the significant value, the second line compares a variable with anover variable with the significant value. Two of the lines will have the same results as they analyse the variables inversly. There will be six results when you only need to take notice of three. There is also a table headed 'Homogenous subsets' which is provided by SPSS. This can be ignored for now. To obtain information about what variables aremore significant you can look at the descriptive statistics.

Turkey's HSD completed pairwise comparisons and is equivalent to a t-test, except the difference lies in the standard error used, turkey tests uses a general error term derived from the ANOVA rather than the error derived from the difference between the means (MSerror). See slides for formula.

Writing up test: You present the results of a post hoc test in conjunction with the results of the ANOVA and along with the mean and standard deviation scores for each group in the analysis.

91 of 220

## Planned comparisons for independent groups designs

After an ANOVA if you found the difference between three groups on a dependent variable, you will need to determie whether this difference is as you might have predicted (your hypothesis) you can conduuct planned comparisons. A planned comparison is a comparison between two groups that you plan to do before you conduct the ANOVA.

One type of planned comparison is to examine the difference between one group in your analysis and all the other groups. For example, say you conduct a research study to examine the difference on a measure of greediness between people who spend money follishly (spenders), people who count their money everyday (counters) and people who photocopy money (fraudsters). Your hypothesis is that there will be a difference between the three groups in terms of their greediness score and, more specifically, that fraudsters will be greedier than the other two groups. You collect greediness scores from 30 of each and conduct a one-way between-groups ANOVA. The ANOVA indicates that a statistically significance exists between these three groups on the measure of greedines. You also conduct a follow-up planned comparison to examine the difference in greediness between fraudsters and the other two groups.

92 of 220

## Choosing a planned comparison?

To compare a group with all other groups in the analysis, you can use the Dunnett test. This test allows you to run a one-tailed test or a two-tailed test. A one-tailed test is more powerful and so you should choose it if possible. The group to which all other groups is compared is referred to as the control group by the dunnett test.

In the case where you have conducted a Kruskal-Wallis test and you wish to examine the differennces between pairs of groups in planned comparisons, the only option available is to conduct Mann-Whitney tests on the pairs of groups for which you want planned comparisons, as there is no planned comparison test developed for situations when the dependent variable is measured at the ordinal level.

Formula for planned comparison: C=C1M1+C2M2+CkMk- general method is to select particular values of c1, c2 ect. and to test whether C is significantly different from zero. The value of C's are known as weights, and the weights must sum to zero.

Coding rules; use positive and negative weight to compare across groups, the sum of the weights across all groups should be 0, if a group is to be excluded from the comparison then it should be assigned 0, weights assisgned to one groups should be equal to the no. of groups in the opposite chunk of variation, if a group is singled out in one comparison then it cannot be used again.

93 of 220

## Dunnett test for independent measures?

Compares experimental conditions with a control condition. Run stanard t-test with MSerror as estimate of variance and compare with tables computed by Dunnett.

Very similar output to previous test, just one test compared with all the others.

You present the results of a planned comparisons in conjunction with the results of the ANOVA and along with the mean and standard deviations for each group.

If you have a specific hypothesis to test it may be more appropriate to conduct a planned contrast.

94 of 220

## Why do you need to use post hoc tests and planned

When you find a significant difference between two conditions or levels in a repeated measures designusing a paired t-test or wilcoxen test, you can easily determine which group is scoring significantly higher than the other group by looking at the mean or median scores of the groups. But when you find a significance difference between three or more conditions using a within-groups ANOVA or a Friedman test, working out where the significant difference lies can be more difficult. This is because the ANOVA or Friedman test only indicates that a statistically significant difference exists between conditions. It doesn't tell you where the significance difference lies. For example, if you find a significant difference between three conditions, all three conditions may differ significantly from each other, or only one condition may differ from the other two. Of course, examining the mean or median scores of each condition gives you an indication of where the significanct difference is likely to be. This might be sufficicent in some circumstances, but usually you want to have more firm evidence about the nature of the significant difference. This is where post hoc tests and planned comparisons are useful; they tell you where exactly the significant differences actually exist.

Do not use t-tests due to the type 1 error risk increased.

95 of 220

## What is the difference between post hoc tests and

The difference between a post hoc test and planned comparison is that post hoc tests compare every possible pair of conditions, whereas planned comparisons only make specific comparisons between conditions you decide upon in advance of conducting the analysis.

Therefore, planned comparisons are usually driven by theory, whereas post hoc tests simply trawl the data looking for any significance findings. As a result post hoc tests are less sensitive than planned comparisons, so post hoc tests are less likely to find significant differences than planned comparisons. Whether you use post hoc tests or planned comparisons depends on your hypothesis. If you need to explore your data by comparing every possible pair of conditions, you should use post hoc tests. If you have a very specific experimental hypothesis, for example, comparing each condition with only the first condition, then planned comparisons are more appropriate.

Before conducting any analyses, decide whether planned comparisons exist that you want to examine. If you find planned comparisons then build these into your plan of analysis. If not, you can conduct post hoc tests. But don't conduct planned comparisons and post hoc tests on the same data- choose one or the other.

96 of 220

## Post hoc tests for repeated measures designs?

Post hoc tests compare every possible pair of conditions. SPSS offers three post hoc tests when you are conduction within-groups ANOVA. Each method compares the difference between every possible pair of conditions but does it in a slightly different way. You need to select the test that suits your data the best: this is normally the Bonferroni test unless you have lots of conditions, in which case the Sidak test is most appropriate. The three post hoc tests available in SPSS are:

Least Significant Difference test (LSD)- This test is available but does not correct for multiplicity, so advised not to use this method.

Bonferroni test- Effectively multiplies each p-value by the number of comparisons made. For example, if you have three conditions then three comparisons are made so to control for multiplicity the p-value for each test is multiplied by 3. This is known as a conservative correction because it reduces the power of the test: you are less likely to obtain a significant result but you are also less likely to obtain a type 1 error. This is the most commonly used post hoc test.

Sidak test- Makes an adjustment to the bonferroni correction so it is not as conservative or has slightly more pwer. The sidak test is most appropriate if you have lots of conditions in your study and therefore have lots of comparisons to make.

97 of 220

## Interpreting the output for a post-hoc test?

Pairwise comparison table: Presents the results of your post-hoc tests. Each condition is compared with every other condition. The first row compares condition 1 and condition 2. The order in which the conditions are coded is based on the order they were entered; the very first table in the SPSS output also gives you this information. If you look at the second row you can see the mean difference between condition 1 and condition 3. 6 rows like previously mentioned, only 3 are needed.

Writing up the results:

Should be presented in conjunctin with the results of the ANOVA and along with the mean and SDs scores for each group in the analysis.

E.g. Post hoc Bonferroni tests indicated than condition 1 (M, SD) had significantly (p) ...

There might be situations when your ANOVA provides a significant result but the post hoc tests suggest that there was no significant differences between groups. See earlier explanation for this.

98 of 220

## Planned comparisons for repeated measures designs?

Post hoc tests examine every possible combination of conditions (or levels of the IV). The alternative is to conduct a planned comparison, which makes specific comparisons. The advantage of planned comparisons is that they tend to be more powerful than post hoc tests whic means they are more likely to detect a significant result if one exists. The disadvantage is that tey don't compare every possible pair of conditions.

There are six types of planned comparisons or contrasts available in SPSS. The most commonly used types of planned comparisons in psychology are simple contrasts and repeated contrasts.

Simple contrasts: Compares each condition to either the first or last condition (depending on the order they were entered as the reference category). Simple contrast makes only two conditions. The simple contrast is more useful if you have baseline or control condition you want to compare with subsequent conditions or inventions. The group to which all other groups are compared is referred to as the reference category by the simple contrast.

Repeated contrasts: Compares each condition to the previous condition. If you have three conditions in your study, repeated contrasts compare condition 1 with condition 2 and then condition 2 with condition 3. Only makes two comparisons. Most useful if interested in change over time and the three conditions in your analysis represent different points in time.

99 of 220

## The other planned comparisons?

Deviation contrasts- Compares each condition (except one) with the grand mean. Deviation contrasts do not compare the first condition with the grand mean if the reference category is set to first; nor does the tecnique compare the last condition with the grand mean if the reference category is set to last. For example, if the reference category is set to last and you have three conditions, contrasts would compare condition 1 with the grand mean and condition 2 with the grand mean.

Helmert contrasts- Compares each condition (except the last) to the mean of the subsequent conditions. If you have four conditions in your study, this method compares condition 1 with the mean of conditions 2, 3 and 4, compares condition 2 with the mean of conditions 3 and 4, and compares condition 3 with condition 4.

Difference contrasts- Similar to the Helmert condition, but in reverse. In this method each condition (except the first) is compared with the mean of the preceding condition. Compares 4 with the mean of 3, 2 and, compares 3 with 2 and 1 and compares 2 with 1.

Polynominal contrasts- looks for a significant trend between the conditions. These trends may be linear (straight line), quadratic (inverted U), cubic or more complex polynominal funtctions. Only useful when you have hypothesised a partiuclar polynomical trend between you rconditions.

100 of 220

## Interpreting the output for planned comparisons te

When you conduct planned comparison tests you obtain an extra table; the Tests of Within-Subjects Contrasts table which displays the result of your planned contrasts. SPSS presents only two tests. First compares level 1 with level 3 and the second compares level 2 with level 3.

Writing up the results: In conjunction with the ANOVA along with the mean and SD scores for each group in the analysis. It is also important to report an effect size for the ANOVA result.

Examining differences between conditions: The Bonferroni correction-

In the case where you have conducted a Friedman test and you want to examine the differences between conditions, the only option available is to conduct several Willcoxon tests, because no post hoc test exists for situations when the dependent variable is measured at the ordinal level.

To control for the multiplicity problem in this case, you need to adjust your conclusions about the Willcoxon tests by adjusting the cut-off points for determining whether a finding is statistically significant or not. One way to do this is to performa Bonferroni correction where you divide the normal cut-off point (0.05) by the number of Willcoxon tests conducted. This would therefore change the significance value.

101 of 220

## REGWQ?

Ryan, Eliot, Gabriel and Welsch Q:

Is a multiple step-down range test. Tests all pairwise means like HSD but is more powerful. Field recommends this test when all group sizes are equal. Multiple step-down procedures first test whether all means are equal. If all means are not equal, subsets of means are tested for equality. No confidence intervals (unlike Turkey HSD).

Simple main effects (simple effects):

Two way ANOVA, however if you have three or more levels of any factors with a significant interaction then simple main effects can be used, use it to look at significant interactions.

Linear trends:

Planned comparison in which differences between individual means are not tested, but rather the comparison looks to see if the overall pattern of means shows a roughly linear increase or decrease. Useful if your IV changed in evenly spaced steps e.g. (1,2,3,4 or 0,5,10,15) and you expect your DV to increase or decrease systematically.

102 of 220

## Choosing the test?

One-way independent (between)-

Contrasts-weights, linear trends, turkey HSD, Dunnett, REGWQ, etc.

One-way repeated (within)-

Contrasts-standard, linear trends, compare main effects; use SPSS, use small number of t-tests, apply Bonferonni correction.

Two-way independent (between)-

Contrasts- standard, weights using syntax, linear trends, Turkey HSD, Dunnett, REGWQ, etc.

Two-way repeated (within)-

Contrasts- standard, linear trends compare main effects;use SPSS, use small number of t-tests, apply Bonferonni correction.

103 of 220

## Quantitative research recap?

In quantitative research numerical data is collected to explain a particular phenomena. Data is analysed using mathematically based methods, in particular statistics. Data that do not naturally appear in quantitative form can be collected in a quantitative way and then analysed statistically e.g. an instrument measuring attitude. The main differences between quantitative and qualitative are philosophical, not methodological Philosophical approaches:  Orientation to the role of theory:  Quatitative- deductive, qualitative- inductive  Epistemological orientation: Quantitative- positivism, qualitative- interpretivism.  Ontological orientation: Quantitative- objectivism, qualitative- constructionism.

104 of 220

## Positivism vs interpretivism?

Positivism is a philosophy of science, valuing objective measures of phenomena. Interpretivism argues that people interpret environments and themselves in a way that shapes them what they do We cannot understand how people live in different cultures and what they do if we don‘t know how people interpret and make sense of the world Perceptions, intentions and beliefs Exploratory orientation Data structured as little as possible with researchers‘ own prior assumptions

105 of 220

## Objectivism vs constructivism?

Objectivism “is an ontological position that asserts that social phenomena and their meanings have an existence that is independent of social actors” On the contrary, constructivism perceives that social phenomena is created from perceptions and consequent actions of those social actors concerned with their existence. Formally, constructivism can be defined as “ontological position which asserts that social phenomena and their meanings are continually being accomplished by social actors”

See slides for philosiphical approaches tree.

106 of 220

## Core differences between qualitative and quantitat

Purpose: Qualitative- to describe a situation, gain insight to particular practice. Quantitative- to measure magntude, how widespread is a practice.

Format- Qualitative- no pre-determined response categories. Quantitative- pre-determined respons categories- measures.

Data- Qualitative- in-depth explanatory data from small sample. Quantitative- wide breadth of data from large representation.

Analysis- Qualitative- draws out patterns of concepts and insights. Quantitative- tests hypotheses, uses data to support conclusion.

Result- Qualitative- illustrative explanation and individual responses. Quantitative- numerical aggregation in summaries, responses are clustered.

Sampling- Qualitative- theoretical. Quantitative- statistical.

107 of 220

## Advantages of qualitative research?

Issues and subjects covered can be evaluated in depth and in detail. Interviews are not limited to particular questions and can be redirected or guided by researchers in real time. The direction and framework of research can be revised quickly as soon as fresh information and findings emerge. The data based on human experience that is obtained is powerful and sometimes more compelling than quantitative data. Qualitative design methods:  Phenomology  Grounded theory  Ethnography

108 of 220

## Phenomenology and example?

Meaning in lived experiences Small sample and purposive In-depth conversations (without predetermined questions) Coping with breast cancer (Doumit et al., 2010) Objective: in depth understanding of coping strategies espoused by Lebanese women with breast cancer Methods: 10 females (diagnosed with breast cancer) were interviewed Results: Seven main themes...negative stigma of cancer, the role of women in Lebanese families, role of religion are bases of the differences in coping strategies of Lebanese women with breast cancer as compared with women from other countries

109 of 220

## Grounded theory and example?

Theory development Social process (explain social processes grounded in data) Theoretical sampling (compare participants to promote theory development) Depth in data collection Observation InterviewsNarratives Constant comparative analysis Williams et al. (2015) Objectives: explore the nature of relationship between the self and the eating disorder in individuals with anorexia nervosa (AN) Method: semi-structured interview with 11 women with AN Results: theoretical framework of the relationship with five related categories: AN taking over the self, AN protecting the self, being no one without AN, sharing the self with AN, and discovering the real me

110 of 220

## Ethnography and example?

Understand culture/ worldview (e.g. Group of nurses) Conducted in natural setting Data collection Observations Documents Interviews and diaries            Ryan (1993) Objectives: Gain insights into mother‘s perspective of adult children with schizophrenia Methods: Five mothers were interviewed Results: Two main themes: lifetime of mothering with disruption and loss (of child‘s independent life and mothers life)

111 of 220

## What are observations (ethnography)?

“…..  involves getting close to people and making them feel comfortable enough with your presence so that you can observe and record information about their lives.” Bernard (1994)

Observation is a tool for understanding more than just what people say about complex situations. It is the systematic observation of behaviour, actions, activities and interactions using audio- or video-recordings and field-notes.

Ethnographic dimensions:

The observer's role: complete outsider- gully intergrated member.

The group's knowledge of the observation  process: covert/concealed- overt

Explication of the study's purpose: full-limited/misleading

Duration: single session- years

Focus: narrow- holistic

112 of 220

## Conducting ethnographic research?

Formulate a research question that can be answered using an ethnographic approach. Define what is to be addressed in the observation process. Define the researcher’s role. Negotiate entry and maintain access. Identify key informants. Decide what/who to sample (purposive/theoretical sampling). Collect data (field notes, audio/video recordings, documents). Exit from the group.

113 of 220

## Example (Rosenhan, 1973)?

Aim: To examine whether mental health professionals are truly able to distinguish between the mentally ill and the mentally healthy.

Methods:

8 pseudo-patients (3 men; 5 women) Presented for admission to 12 hospitals in 5 US states. Instructions: complain of hearing voices that say “empty”; “hollow” and “thud”, otherwise, act normally be truthful to admission staff once in hospital display no symptoms and behave normally (except dispose of medication covertly) gain release by convincing hospital staff you are healthy enough for discharge

Data: pseudo-patients’ observations; records; interviews

114 of 220

## Results?

All participants were admitted to the various hospitals

All but one were admitted with a diagnosis of schizophrenia Length of stay ranged from 7 to 52 days with a mean of 19 days When released “schizophrenia in remission” was recorded in their files. Staff did not detect any of the pseudo-patients 35 out of 118 other patients voiced suspicions that the pseudo-patient were not mentally ill A total of 100 pills were administered (but not taken) Typical response from staff in reply to a common, reasonable question: Pseudo-patient: “Pardon me Dr X. Could you tell me when I am eligible for ground privileges?”

Psychiatrist X: “Good morning Dave. How are you today?”

115 of 220

## What is qualitative interviewing?

“In a qualitative research interview the aim is to discover the interviewee’s own framework of meanings and the research task is to avoid imposing the researcher’s structures and assumptions as far as possible.  The researcher needs to remain open to the possibility that the concepts and variables that emerge may be very different from those that might have been predicted at the outset.” Britten (1995)

116 of 220

## Features of qualitative research?

An interview can be topic- or event-based; historical or cultural. An interview guide/schedule can be used, but there are no fixed questions, pre-coded responses or rigid structure. The interviewer encourages the participant to give “rich”, detailed responses. The interviewer is an active listener, concentrating on what the participant says while formulating questions that will encourage the participant to clarify and elaborate. The interview is steered by participants’ responses to questions. Interviews are usually audio-recorded and transcribed.

Types of interviews:

In depth interview; structured, semi-structured, unstructured. Focus groups.

Can be conducted; face-to-face, telephone, online (skype, email, messenger service)

117 of 220

## Conducting an interview?

Formulate a research question that can be answered using qualitative interviews. Develop the topic guide (a list of the topics the researcher would like to address).  This guide should be used flexibly (i.e. there is no rigid order to the questions), some questions will not need to be asked directly (the participant raise the issue without being prompted); and can be adapted for future interviews. Pilot the interview. Identify and obtain a sample (purposive/theoretical). Select a location.

118 of 220

## Sampling approaches?

Sampling is of key importance in qualitative research using interviews. Random sampling is rarely if ever used. Convenience - sampling particular subjects for reasons of convenience or opportunity Purposive - deliberate non-random sampling that aims to recruit participants with particular characteristics *********** - where no sampling frame exists, and one cannot be created, early participants are asked to identify other potential participants Theoretical - theoretical categories of interest are generated during the research development process

119 of 220

## Example (Andrew & Harvey, 2011)?

Aim: To investigate mothers’ infant feeding decisions.

Methods:

Semi-structured in-depth interviews. 12 mothers of infants aged between 7 and 18 weeks Primi- and mulitparous mothers Purposive sampling (breast- and formula-feeding mothers) Conducted in mothers’ homes Data were audio-recorded, transcribed and analysed using content analyses.

120 of 220

## Topic guide for the experiment?

Topic Guide:

What were your experiences of infant feeding before pregnancy? What was your view of infant feeding prior to pregnancy? What information was provided during pregnancy? What was the role of other people in your decision making process? Did you feel support or pressure to choose a particular method? What were your initial experiences of feeding? Was feeding as expected?

121 of 220

## Results for interview example?

Four themes emerged from the data: Information, knowledge, decision making and the role of health professionals Physical capability Family and social influences Independence, self-identity and lifestyle

122 of 220

## What are focus groups?

Focus groups utilise group dynamics to stimulate discussion, gain insights and generate ideas.  The emphasis is on interaction within the group and the joint construction of meaning. Typically an interview with a small group (6-12 people) and (usually) a moderator Typically the topic is fairly tightly defined Focus groups are usually audio-recorded and observed The potential sampling advantages of focus groups are that they: may encourage participation from those who are reluctant to be interviewed on their own. can encourage contributions from those who feel they have nothing to say, but who are willing to engage in discussion generated by other group members.

123 of 220

## Conducting a focus group?

Formulate a research question that can be answered using a focus group study. Formulate the question(s) to be asked. Decide on the group structure. Decide on the number of focus groups required (at least one group per category). Identify and obtain a sample that optimises the group structure (purposive/theoretical). Select a location.

124 of 220

## Example of a focus group?

Aim: To investigate consumer perceptions and behaviour towards local, national and imported foods.

Methods:

4 focus groups comprising 33 participants Groups: 2 lower SES; 2 higher SES Moderated Approximately 90-minutes duration

Audio-recorded and transcribed

125 of 220

## Example continued (Chambers et al, 2007)?

Protocol: Establish shared understanding within the group of local, national and imported foods. Discussion: Attitudes towards local, national and imported foods Views on the attitudes of friends, family and society towards buying local foods Barriers to buying locally produced foods Perceived control over what they bought

Advanatages of focus groups:

they are useful to obtain detailed information using personal and group feelings, perceptions and opinions, they can save time and money compared to individual interviews, they can provide a broader range of information, they offer the opportunity to seek clarification, they provide useful information e.g. quotes for public relations publication and presentations.

126 of 220

## Summary of collection methods?

Observation: The researcher gets close enough to study subjects to observe (with or withrout participation) usually to understand whether people do what they say they do, and to access tacit knowledge of subjects.

Interview: Involves asking questions, listening and recording answers from individuals or groups on a structured, semi-structured or unstructured format.

Focus group discussion: Focused and interactive session with a small group, enough for everyone to have a chance to talk and large enough to provide diversity of opinions.

Other methods: Rapid assessment procedures (RAP), Free listening, life history (biography), Pile sort, ranking.

127 of 220

## What are the four key stages of qualitative method

1. Preparation of data (transcription)- time consuming

2. Data reduction (coding, software)- coding is organising data into categories (sentences, phrases, words)- breaking data down into smaller parts; line by line analysis- categories are given valid headings and need a rule for inclusion- reassemble into more meaningful parts that relate to each other. Inductive process. Observation- pattern- tentative hypothesis- theory (opposite way to quantitative) e.g. factors affecting physical activity- parents, friends and media are categories. Mum, dad etc. are sub-categories.

3. Displaying data (diagrams, tables)- show hierarchial relationships between different ideas, e.g. benefits of physical activity to physical to cardiovascular and muscular or benefits of physical activity to psychological to self-esteem. Diagram showing this relationship.

4. Verifying data (triangulation, member checking)- use different data collection methods for the same study or use different independent researchers. If multiple methods have the same conclusions, there is greater likelihood of it being accurate.

128 of 220

## What is content analysis and example?

The systematic analysis of data obtained from

interviews, observations, records, documents

and fieldnotes with the aim of identifying

emergent themes.

Scambler and Hopkins (1988)

Aim:

Examine the social effects of epilepsy

Methods:

Semi-structured interviews with 94 people with epilepsy and their families- have some questions in place but are partly directed by what the participant says.  Audio-recorded and transcribed

Analysed using content analysis

129 of 220

## Example abstract from the experiment?

“I just didn’t know what the hell was happening: it was as simple as that!  I had never seen anybody have a – whatever it was.  I didn’t know what to do quite frankly.  And it was, if I remember rightly, about 2.30am or something like that, and it was – just frightening, that’s all I can say.  I didn’t know what to do.  I think that’s what frightened me more than anything: I just didn’t know what to do, how to cope.  I didn’t know what I should be doing – whether I should be trying to stop it, or do something: I just didn’t know.”

Interpretation:

Three typical features of family responses to first onset

epilepsy:

Concern (fear, anxiety) Bewilderment (lack of understanding) Helplessness (uncertain what to do)

130 of 220

## Where is concern in the text?

“I just didn’t know what the hell was happening: it was as simple as that!  I had never seen anybody have a – whatever it was.  I didn’t know what to do quite frankly.  And it was, if I remember rightly, about 2.30am or something like that, and it was – just frightening, that’s all I can say.  I didn’t know what to do.  I think that’s what frightened me more than anything: I just didn’t know what to do, how to cope.  I didn’t know what I should be doing – whether I should be trying to stop it, or do something: I just didn’t know.”

131 of 220

## Where is bewilderment in the text?

“I just didn’t know what the hell was happening: it was as simple as that!  I had never seen anybody have a – whatever it was.  I didn’t know what to do quite frankly.  And it was, if I remember rightly, about 2.30am or something like that, and it was – just frightening, that’s all I can say.  I didn’t know what to do.  I think that’s what frightened me more than anything: I just didn’t know what to do, how to cope.  I didn’t know what I should be doing – whether I should be trying to stop it, or do something: I just didn’t know.”

132 of 220

## Where is helplessness in the text?

“I just didn’t know what the hell was happening: it was as simple as that!  I had never seen anybody have a – whatever it was.  I didn’t know what to do quite frankly.  And it was, if I remember rightly, about 2.30am or something like that, and it was – just frightening, that’s all I can say.  I didn’t know what to do.  I think that’s what frightened me more than anything: I just didn’t know what to do, how to cope.  I didn’t know what I should be doing – whether I should be trying to stop it, or do something: I just didn’t know.”

133 of 220

## 10 steps of content analysis?

The process of content analysis is lengthy and may require the researcher to go over and over the data to ensure they have done a thorough job of analysis.

1) Copy and read through the transcript - make brief notes in the margin 2) Go through the notes made in the margins and list the different types of information found 3) Read through the list and categorise each item in a way that offers a description of what it is about 4) Identify whether or not the categories can be linked any way and list them as major themes and / or minor themes 5) Compare and contrast the various major and minor themes

134 of 220

## Steps continued?

6) If there is more than one transcript, repeat the first five stages again for each 7) When you have done the above with all of the transcripts, collect all of the themes and examine each in detail and consider if it fits and its relevance 8) Once all the transcript data is categorised into minor and major themes, review in order to ensure that the information is categorised as it should be. 9) Review all of the categories and ascertain whether some categories can be merged or if some need to them be sub-categorised (minor themes)

10) Return to the original transcripts

Tools for content analysis: computer aided, e.g. NVIVO and Atlas.ti

135 of 220

## What is conversation analysis?

A detailed analysis of talk as it occurs in interaction. It assumes that: talk is structured talk is forged contextually analysis is grounded in data The analysis is therefore fine-grained and concerned with uncovering the underlying structures of talk in interaction, in particular non-verbal content such as pauses, pitch of voice etc.  ()  at the beginning and end of overlapping speech, words   enclosed  (())    transcriber’s comments (e.g., smile, laughter, body   movements)  (.)    small but detectable pause  (—)    unclear words  speech  emphasis  . . .    omission of text  =   no interval between the end and start of speech units  ?   rising pitch

136 of 220

## Conversation analysis continued?

ºwordº  speech in low volume, words enclosed    we:ll  a colon indicates that the sound immediately before the   colon is prolonged.  More than one colon means further   prolongation (e.g. :::)    .hh    h’s preceded by a dot indicate an intake of breath.  If                       no dot is present, it means breathing out    (0.8)    length of a period of silence, usually measured in tenths of a   second

137 of 220

## Stages in conversation analysis?

Data collection Transcription Selection of the aspect of the transcript to be analysed Interpreting the conversational episode Explication of the interpretation (more clear) Elaboration of the analysis Comparison with episodes from other conversations

Example: Aim:

Examine the process of HIV/AIDS counselling

Methods:

Non-participant observation of interactions between HIV/AIDS counsellors and patients. Audio-recorded and transcribed in detail. Analysed using conversation analysis.

138 of 220

## Example of conversation analysis continued?

Silverman (1994):

Counsellor:

“Can I just briefly ask why you thought about having

an HIV test done”

Patient:

“Well I mean it’s something that you have these… I mean

that you have to think about these days, and I just felt you…

you have had sex with several people and you just don’t

want to go on not knowing”

139 of 220

## Example abstract of conversation analysis?

Counsellor:

1  Can I just briefly ask why: you thought about having

2an HIV test done:

Patient:

3  .hhWe:ll I mean it’s something that you have these

4  I mean that you have to think about these da:ys, and

5  I just uh: m felt (0.8) you- you have had sex with

6  several people and you just don’t want to go on (.)

7  not knowing

Interpretation: patient is trying to deflect any suggestion that there is a special reason that she needs a test (disclosure delayed), patient is keen to depersonalise her behaviour (use of "you"), suggestion that hestitancy and depersonalisation is about a consequence of patient's embarassment about sex.

140 of 220

## What is discourse analysis?

Speech and actions are recorded, both of which are coded for later analysis. It emphasises the ways in which versions of reality are accomplished through language.

Stages of discourse analysis:

As a story is transcribed, each phrase is numbered consecutively. These phrases can then be grouped to reflect the progressive development of the narrative. A long narrative can then be condensed into a small number of categories.

141 of 220

## Example of discourse analysis (Gerhardt, 1996)?

Aim:

Explore the experience of dialysis and transplantation of

patients with end-stage renal failure.

Methods:

In-depth interviews 234 participants with end-stage renal failure from the South East of England 600 hours of audio-recorded material

Discourse analyses

Abstract: well my mother she was willing to give me a kidney but I didn't want it because well if she gives me a kidney that's to say if the kidney doesn't work on me then I will still be disabled and probably my mother starts feeling bad...

142 of 220

## Abstract into discourse analysis?

113 P: well my mother  114 she was willing  115 to give me a kidney

116 but I didn’t want it

117 because

118 well

119 if she gives me a kidney

120 that’s to say

121 if the kidney

122 doesn’t work on me

123 then I will still be disabled

124 and probably

125 my mother starts feeling bad …

143 of 220

## Interpretation of discourse analysis?

“The second step in his [the patient’s] action story is his decision not to accept his mother’s offer of a live–donor organ (15-133).  He again tells an argumentative narrative rather than a fully-fledged story, stating the fact(s) and then giving reason(s).  The facts were: ‘My mother was willing to give me a kidney but I didn’t want it’ (113-16).  The reason is: if this live donor transplant would fail, the situation would be worse than now, that is he would still be ‘disabled’ (123), and she could ‘start feeling bad’ (125).  From this it follows that he rejected the offer…”

Strenghts of qualitative methods. They enable researchers to:

Generate concepts (construct theories), understand the meanings of human behaviour, explore sensitive and/or complex phenomena, study atypical cases, set context, conceptualise and validate what is being counted in quantitative research.

144 of 220

## Weaknesses of qualitative methods?

Subjectivity  Data are interpreted and therefore influenced by the personal experiences and beliefs of the researcher   Replication  Because the researcher influences the data, it cannot necessarily be replicated. Generalisability  There is no attempt to recruit a ‘representative’ sample and the people selected do not reflect the population.
Transparency

It is sometimes difficult to establish exactly what was done and how conclusions were arrived at.

145 of 220

## Assessing rigour in qualitative research?

Credibility

Is the researcher’s interpretation of the data credible? Respondent validation Triangulation

Transferability

Is the description provided rich enough in detail for others to make judgements about its transferability to other milieux?

Dependability

Can the research be “audited”?

Confirmability

Is it apparent that the researcher has not overtly allowed personal values or theoretical inclinations to influence the conduct of the research or the interpretation of the data?

146 of 220

## Conducting rigorous qualitative research?

More than one investigator (credibility). More than one method (triangulation) (credibility). “Thick description” (transferability). Research should be conducted in an explicit and systematic way (dependability). Meticulous record keeping, including a separate diary (dependability). Openness and honesty about theoretical perspectives and biases (confirmability).

147 of 220

## Steps for a questionnaire design?

1. Define goals and objectives

2. Design methodology

3. Determining feasibility

4. Developing the questionnaire

5. Select sample

6. Conduct pilot test.

Questionnaires can be:
Descriptive- how many, what proporting, e.g. census, opinion polls e.g. how many children have experienced bullying.

Analytic- attempts to answer research questions by demonstrating associatins between variables. Set up to explore relationships between particular variables.

148 of 220

## What is a questionnaire?

Form of survey method, used to collect information from people Used in many areas (Psychology, Business, Marketing etc.)

Goals of questionnaire design:

To obtain facts about a person To obtain information about attitudes & beliefs To find out about behaviour (past & intended) Often aim is to measure specific ‘construct’ e.g. depression

149 of 220

## When to use a questionnaire?

What is your research question & how might this be addressed?

Appropriate…

…when the main purpose is information gathering:

used for gathering facts prior to designing research used to obtain information anonymously to measure
some specific construct (e.g. optimism)
can be used as part of an experimental design

…when it is not possible or ethical to use an experimental design:

i.e.  can’t manipulate key variables e.g. does having children change parents’ attitudes to discipline?

150 of 220

## Advantages of questionnaires?

Format will be same for all participants (standardised) Relatively quick to collect information Lots of responses - lots of information More cost effective than interviews Easy to analyse (if designed correctly) Familiar to most people Can allow anonymity Participants not directly influenced by presence of experimenter (experimenter bias)

151 of 220

## Disadvantages of questionnaires?

Potentially low response rates Cannot probe responses Respondent may not be who you sent questionnaire to Can’t control order Q’s are answered in Can’t correct misunderstandings

Difficult to establish causality

152 of 220

## Design methodology and feasibility?

Work through your research question before designing the questionnaire Have some idea about the form of data that will be analysed.

There are lots of established questionnaires, but..

May not be one that taps exactly into the construct you want to measure. May be problems with an existing questionnaire – too long,

different cultural perspective etc.

Hence, may wish to construct your own.

153 of 220

## What is operationalisation?

Sequence of steps or a procedure that a researcher takes to get from what you want to measure and how to measure it.

e.g. if you want to investigate “healthy eating” in students.

Do you wish to ‘measure’ attitudes or behaviour, or both? How can healthy eating be defined? How you will measure this? -food items consumed, foods avoided, buying etc. Defining a concept can be difficult:  Operational definitions are often inadequate; not fully capture the concept (e.g. healthy eating), not enable accurate measurement.  One single definition may capture a concept e.g. eating 5 f or v, avoiding animal fats.  However, may definitions might move beyond a unified concept, e.g. if we include alcohol consumption. Will this relate to healthy eating? or will we find that students attitudes to healthy eating and drinking are not correlated?

154 of 220

## What can you measure?

Attributes- facts about people, e.g. demographics.

Behaviours- things people do

Beliefs- things people believe to be true

Attitudes- thing people believe to be desirable or undersirable.

Advantages and disadvantages of open questions:

Adv- gets all the information, does not lead respondent, is more naturalistic.

Dis- can be difficult to complete, difficult to code and analyse, poor when a numeric result is required.

Advantages and disadvantages of closed questions:

Adv- easy to fill in, easy to code and analyse, good when numerical result is required.

Dis- can encourage bias, can miss possible answers, can create opinions where none exist.

155 of 220

## Developing the questionnaire?

Response options for closed questions:

Likert scale, e.g. my preferred method of cooking is to grill rather than fry- very strongly agree, strongly agree, agree, neither agree or disagree, disagree etc.

Semantic differentials. e.g do you enjoy fast food such as burgers and donot kebab. 1- enjoy, 5 hate- rank in between. Forces an opinion where there may not be one.

Types of data: dependent measure- categorical or ordinal (e.g. likert scale), you must use a non-parametric test. You need at least interval data for parametric analysis. Use of continuous visual analogue gains popularity. e.g. do you believe that a healthy diet can help protect against cancer, not at all or absolutely separated by a line, you draw a line where you are.

Questions on peridocial behaviour:

did you eat takeaway yesterday- not really representative, do you ask all participants on the same day? did you eat takeaway last week- again influenced by when you sample. i.e. holiday periods. When did you last eat a takeaway- if you extend the time periods, causes memory problems. How often do you usually eat takeaway- what does usually mean.

156 of 220

## Response biases and social desirability bias?

People might claim to drink less, eat more vegetables than they do etc. People want to present a good impression of themselves. Difficult to solve but can try not to encourage, so do not use phrases like: research has shown, it is comonly believed or doctors say.  Response biases:  Acquiescence- tendency to agree with every statement, can reduce this by introducing some reversed items, have pairs of questions that are mutually contradictory and look for opposite responses on both.  Please rate how much you think eating lots of fruit is good for you and please rate how much you think eating lot of fruit is bad for you.

157 of 220

## Rules for question writing?

DO

Use simple language Keep questions short Use “don’t know” and “not applicable” Give relevant definitions DON’T Use double-barrelled questions Use leading questions Use double negatives Use words with ambiguous meanings Use questions that create opinions Ask unanswerable questions Do not use abbreviations or information that the participant might not know.

158 of 220

## Order of items in a questionnaire?

Initial information/questions may influence later responses e.g.

Asking about attitudes to healthy eating first may influence reports about which foods they eat. Telling people you are looking at ‘sensible eating’ might initiate social desirability bias.

Filter questions (e.g.  Do you eat meat?   Yes   No)

Used to exclude participants from some questions

- prevents participants being asked questions they can’t answer

Funnel approach

Very broad questions down to very narrow specific questions

- to introduce key concept gradually (e.g. healthy eating) OR

- to encourage completion, questions start off easy
e.g.  What is your favourite meal?

159 of 220

## Order of items continued?

Demographic questions can be put at the end of the questionnaire

Avoids arousing suspicions Questions needing effort need to be earlier on as respondents might get bored/tired

PILOT work is the best way to determine if the order of the questionnaire is appropriate

Ethics:

Need to get your participants informed consent. This should include… -explanation of what your questionnaire is about -assurance that data are confidential and anonymous -understanding that they are free to leave out any question they wish, or not to complete at all.

However…if they choose to complete and submit, consent is assumed.

160 of 220

## Selecting a sample?

Characteristics

It is important to select a representative sample and decide how many responses you will need. Think about the population that you are targeting

Size

Sample size usually depends on the statistical analysis you use – ‘power’. Typically “opportunity” samples for BSc undergraduate projects.

161 of 220

## Increasing response rates?

Might find response rates to mailed out questionnaires to be around 10%-25% Things that improve response rates: addressing respondents by name in a professional looking envelope (avoid looking like junk mail…) small or future incentives assure confidentiality and anonymity reminders after two weeks short as possible return postage paid envelopes

162 of 220

## Conducting a pilot test?

You need to do this when you have Written the questions (items) Ordered items Thought about the order of the questionnaire as a whole Thought about the type of data the questions will produce.

Piloting allows you to: Identify any questions that people do not like or misinterpret. Identify whether people are having difficulty using the answer scale.See how long questionnaire takes. Get some idea of response rates. Also allows you to establish the reliability and validity of your scale.

Administer to a few experienced respondents Comments will enable:vetting (checking for obvious mistakes, or anything about the items or order that looks dubious) of contentsuggestions (e.g. improved ease of completion)alteration to structure During piloting more questions should be included than are intended in the final version...

163 of 220

## Steps to conducting a pilot test?

1.A large number of questions are required because no matter how well worded some will simply fail to measure the concept of interest. 2.Questions may not produce the results even when given to the same person again. This occurs because although the questions are clear to you and other experts they may appear ambiguous to respondents. 3.Therefore the questionnaire needs to be re-administered to a larger number of ‘real’ respondents (ideally at least 30).  This  allows the use of statistics to remove ‘bad’ questions.

Failure to measure the concept of interest- responses might show a lot of variation and not correlate well with other items measuring the construct.

Where does 30 come from. NB 30 not necessarily possible if looking at select sample eg might get a class of 30 kids to pilot but if you are looking at questionnaire on Williams syndrome won’t find 30 parents – could use one or 2 who won’t be involved in research and /or give to 30 students who know sth about Williams syndromecan also ask for feedback on difficulties completing questionnairecan ask for feedback on difficulties completing questionnaire but at this point can also use statistical analysis  to remove questions that are not working.

164 of 220

## Questionnaire analysis steps?

1. Item analysing the questionnaire. 2. testing for reliability. 3. hypothesis testing.

Questionnaires- descriptive (how many?)- chi-square analysis.

Count?

How many days a week do you go to the gym? Difference between female and male? 0, 1-3, 3-6, 7?

Analytic (likert scale)- correlational.

Internal validity:

Item analysis, each question=1 item, performing item analysis allows us to remove items that are not measuring the concept, aim: internal consistency within the questionnaire.

165 of 220

## Steps of internal validity?

Check for sufficient variance for each item! Does it measure difference between people Do any items show bimodal distribution? How well does each item correlate with the total score on the concept/construct? How well do items correlate with each other? Poorly correlated items will reduce the reliability Are there questions that correlate well but are not measuring the construct of interest? Too much error variance (e.g. Ambiguous) Poor validity – not measuring entire construct (total score).  Example: We wish to construct a new questionnaire to measure a specific domain of locus of control (Rotter, 1966) Locus of control: difference between individuals in the extent to which they perceive events as under their control Individuals who perceive events under their control à high internal locus Individuals who are not perceiving events as under their control àHigh external locus

166 of 220

## Example continued?

We are interested in locus of control in a specific domain: relationships (i.e. Single construct) You can have a number of constructs (i.e. also academic performance), but...You will need to analyse them separately What is item-variance? Items with large variance are desirable because..... Hopefully show a difference between participants with internal versus external locus of control The greater the overall variance, the more likely it is measuring differences Small variances are not telling us much about locus of control (e.g. Clustered around the midpoint) Thus, need items that discriminate between high external and low internal respondents

Item analysis in SPSS:

Items need to be coded: use VALUE LABELS. Avoude acquiescance, it is not a good idea to have all questions in the same direction. Click transform>recode into different variables. Transfer q's that need to be recoded to> output variable box. Rename the output variable>click change.

167 of 220

## Item analysis in SPSS?

Checking Internal Reliability of Questionnaire Do all items belong with our construct? (i.e. locus of control in relationships) Inter-item correlations Items that fit should correlate well with one another Item-total correlation for each item Calculate correlation between total scores on each item of the test and the total score on test/subtest Cronbach‘s alpha Keep items that show good correlation with other items Items with poor inter-item correlations should be removed Each time an item is removed > recheck inter-item correlations! Why? Each time an item is removed it can impact on correlations betwen other items The larger the database the more complex it can be!

168 of 220

## What is a hypothesis?

We can generate hypotheses that are testable using questionnaires For example: we may hypothesize that there will be a difference between female and male respondents on LOC in relationships H0 = ? H1 = ? To test this, we firstly need to create a mean average overall index of ‘LOC in relationships‘ Using SPSS‘s Compute function SEE SLIDES FOR STEP-BY-STEP

169 of 220

## What is psychometrics?

Measurement of the mind. Psychometric tests designed to measure 'stable' characteristics of an individual e.g. ability, personality, attitude, commonly used in occupational, educational, clinical settings. Standardised psychometrics tests must prove good psychmetric properties- reliability and valicity.

Example is Myers-Briggs (not psychologists) type indicator- 4 types of sections different colours- but psychologists believe this is not reliable or valid way to study.

What is reliability and validity?

These terms oftenapplied to questionnaires or multi-item performance tests But with any psychological measure, e.g.observation of behaviourinterviewperformance in an experimentreal-life achievement test such as driving test, exams, should consider whether ithasgood enough reliability and/or validity

Targets example: when hit the target in the middle all times this is reliable and valid, when hit the target in the same place again and again but not in the middle this is reliable but not valid, when hit the target in many different places, not reliable nor valid.

170 of 220

## Reliability vs validity?

Reliability of measurement= consistency, steadiness. Does the test consistently measure anything at all? If the test was done again would you get the same results. Reliability is much more straightforward to check.  Validity of measurement= does the test measure 'what it is supposed to measure'?

What is reliability?

Concept comes from classical test theory  measuredscore = 'true score' + random variation (error) True score = person’s real underlying value on measure Random error can come from many sources, e.g. poorly worded questions too few items or trials fluctuating state of person being testedtest items that are idiosyncratic or narrow differences in skill or knowledge of testers or observers A reliable measure is one that is designed to minimise random error

171 of 220

## How can reliability be assessed?

Reliability is assessed by some form of correlation varying from 0 (unreliable) to 1 (maximum). If a test or measure picks up something consistent about a person ('true score'), then scores on different parts of the testOR from the same people on different occasions OR fromthe same people tested by different testers,

should be positively correlated.

If a lot of what the test or measure picks up is random variation, there will be little correlation.

172 of 220

## Continued...

Acceptableor expected levels of reliability vary across types of test: published IQ tests expected to have reliabilities ~ 0.9  personality tests maximum around 0.7 more subjective or open-ended tests (e.g. creativity’)rarely attain reliabilities higher than about 0.5 projective (indirect) tests e.g. Rorschach ink-blot ~ 0.2.   Four types of reliability assessment test-retest, parallel-forms, inter-item, inter-rater

the type used depends partly on the type of test

173 of 220

## Test-retest reliability and parallel-forms reliabi

Test-retest: Give test twice to same people, and correlate scores at time 1 and time 2.

Appropriate when test measures trait assumed to be stable e.g. intelligence, personality.

Not when measure might truly change between test occasions (e.g. mood, symptons, skill, knowledge).

Very dependent on the type of test you are looking at. Mood can fluctulate for example so this reliability test is not the best for mood.

Parallel-forms reliability:

Not something users of tests normally measure, but designers of some tests provide for it. Develop alternative versions of test which give similar scores for the same person. assessed by giving both versions to samegroup of people on same occasion, and correlating performance if highly correlated, they are parallel or equivalentso, can give 2 different forms before and after treatment. When testing same skill twice (e.g. to evaluate training or treatment), must avoid memory/practice artefacts. Many performance tests (e.g. Wechsler IQ/Memory/ driving-test theory)provide2 or more parallel forms. Will need to give two novel tests, so they do not just perform better because they remember first test.

174 of 220

## Inter-item reliability?

Many tests have a number of items designed to measure the same ‘construct' complex psychological concept (e.g. personality trait) the intention is to average the scores on these items. It should be checked that they do measure a similar thing, otherwise it is not sensible to average themin other words, the set of items should correlate with each other. One way to check is split half reliability randomly split test items into two halves (e.g. first v. second half, odd v. even numbers) and test correlation between the half-totals.

A better index is Cronbach’s alpha.

Regarded as the most important index of reliabilitygenerally a reliable indicator of internal consistency.

Takes into account all the inter-associations between all items in the scale provides average measure from all possible ways of splitting the items. The higher the reliability, the smaller the differential between measured scores and the true scores

in other words error has been kept to a minimum.

175 of 220

## Inter-rater reliability?

For some measures, we must ask if the people doing the measuring are reliable (agree with others), e.g.

Many psychometric tests such as IQ tests, or interviews, require training to administer correctly. When behaviour is being measured (e.g. aggression, social interaction) observers need training, and the instructions they receive must be very clear – are they clear enough? Some scales can be completed by several raters and we want to know whether the raters agree with each other.

People will carry things out slightly differently and interpret differently. Best thing to do is look at the inter-rater reliability of these people. E.g. is a slight tap in a nursery aggressive or not. Inter-rater reliability tries to control for these interpretations.

Number ratings: when tests yield a number score, agreement between two raters/testers is checked by correlation. Examples: Trainees A and B, and an expert rater, all administer an anxiety test to the same 7 children. How well do they correlate with the expert?

176 of 220

## Number ratings continued?

NB: This shows the correlations between each trainee and the expert.  We cannot say however that Trainee A’s ratings correlate better than Trainee B’s: this requires an additional test.

Trainee A is significant, Trainee B is not so Trainee B is not reliable. Needs an additional test.

Agreement among 3 or more similar raters can be checked by Cronbach’s alpha. Example: Three carers X, Y, Z independently rate 8 patients on a motivation scale (0=low, 6=high)

Nonparametric equivalent, (Kendall’s coefficient of concordance) is used when judges rank cases, rather than rate them.

Data must be entered into SPSS the other way round (judges = rows not columns)

177 of 220

## Inter-rater reliability for behaviours?

When observers code occurrence of certain behaviours, inter-observer reliability can be checked over time time divided into intervals (e.g. 20 secs) observers record whether they saw behaviour occur (Y)
or not (n) in each interval
count agreements (YY, nn) and disagreements (Yn, nY) fill in table proportion of agreements
13 out of 16 = 0.812 = 81.2 percent

Make it into quantitative behaviour, count how many times they said yes how many times they said no.

Better measure than proportion of % agreement; cohen's kappa.

Uses sames table of agreements and disagreements, but corrects for agreements that might happen by chance. So usually gives a lower value than % agreement, also gives a statistical signifcance measure. K= (OA-AC)/(1-AC)

178 of 220

## Checking reliability yourself?

Various ways to check reliability, depending on measure some are done by designers of the test, some should be checked by users. If you use a questionnaire and plan to average scores from a group of items, check alpha for those items first crucial if you constructed questionnaire yourself. If using rating scales, especially when raters are non-expert or scales are novel, check inter-rater agreement. If doing behavioural observation, check inter-observer reliability if possible may entail video-recording some sessions, so 2 observers can code the same session independently.

179 of 220

## What is validity?

A measure which is unreliable is useless, and can’t possibly be valid. But even a highly reliable (consistent) measure may not be valid. Types of validity: Face validity, content validity, predictive validity, concurrent validity, construct validity.

Face and content validity are both to do with the kind of items in a test- require judgement, they can't be checked by statistics.

Face validity - are items convincing / acceptable to users, i.e. people who will complete the test or use its results (family, teachers)? if test is not face-valid, respondents may not take it seriously.  Content validity - are items appropriate for intended purpose (face-validity is just one aspect of this) e.g. a selection test for a particular kind of course/job should include items relevant to performance in that course/job. Both are judged by consulting expert and / or interested informants.

180 of 220

## Predictive and concurrent validity?

These are both checked by testing correlations Predictive validity: do scores on the test correlate with something else that we expect them to? do A-level results predict university degree performance? does a job-selection test predict job performance? does a risk-taking measure show expected sex difference? Concurrent validity: do scores on the test correlate with an alternative, existing test of similar thing? does a behavioural measure of aggression correlate with parental ratings of aggressiveness? does a new IQ test correlate with existing IQ tests? Not just a circular argument; can be controversial, or practically important (if one test is cheaper/quicker).

181 of 220

## Construct validity?

To do with theoretical basis of assessment

Does the test properly reflect the theoretical nature of the psychological construct that it is intended to measure?

depends on having agreement on what the theoretical nature of the construct is (e.g. intelligence). Another approach

Does test score correlate with other things (or differ between groups) in the way which, according to theory, it should?

Rust & Golombok’s example: Eysenck’s extraversion’ matrix of interrelated experiments validated the construct.

182 of 220

## Construct validity continued?

Implies a test should NOT ALWAYS correlate with other things - it should differentiatewhen appropriate.

Convergent validity: does test correlate
(positively or negatively) with things
that it theoretically should?
Discriminant validity: does test NOT correlate
with things it is theoretically different or
independent from?
A good test should show both. Campbell & Fiske’s multitrait-multimethod matrix is one way to check.

183 of 220

## Multi-trait multi-method matrix?

Suppose a test claims to measure one thing (e.g. anxiety) but not other, partly overlapping things (e.g. depression, physical health). Administer more than one anxiety test and more than one depression test to the same group of people, and examine the matrix of correlations. Lindsay et al (1994): adapted anxiety & depression self-report tests for people with intellectual disabilities. Simplified wording of various tests, as follows: Zung Anxiety and Depression scales GHQ (Gen Health Qu’aire) Anxiety, Depression, Health feelings and Social skills deficit subscales

If you pick out questions of a test then you are adapting the validity of the test.

184 of 220

## Multi-trait multi-method matrix continued?

If simplified Anxiety and Despression scales are valid for respondents with IDs, they should show:

Convergent validity- Zung anxiety and depression should correlate well with GHQ Anxiety and depression respectively.

Discriminant validity- Low correlation between Anx and Dep subscales, or with conceptually different subscales, e.g. health feelings.

All tests administered orally to a group of adults with moderate or mild IDs (IQ 40-69).

If convergent validity is good, the anx-anx and depr-depr correlations should be high.

If discriminant validity is good, the anx-depr correlations should be low.

If discriminant validity is good, the remaining correlations should also be low.

Look at slides for tables and graphs.

185 of 220

## What is the purpose of factor analysis?

Data reduction / simplification

Understand variables and their relationships Reduce to smaller number of underlying factors

Variables

Scores or measures
e.g. questionnaire items, psychometric sub-tests

Factors (components)

Dimensions of intelligence or personality

e.g. extraversion, introversion, neuroticism

Constructs re beliefs or attitudes

e.g. risk acceptance / avoidance / seeking

186 of 220

## Basis of factor analysis?

Personality questionnaire, IQ sub-test data.

Factors are clusters of variables, e.g. vocabulary, similarities, information and comprehension become verbal comprehension. Arithmetic, digit span, letter-number and sequencing becomes working memory.

Works with correlations between variables, all correlations positivem high score on one subset likely to score high on another.

Clusters of subtests inter-correlate more highly with each other than with other subtests. Two specific ability factors corresponding to clusters. Factors not completely independent, clusters of subtests not 'pure', not correlate strongly with each other and weakly with others.

187 of 220

## Aim of factor analysis?

Aim of FA is to explain variance

Why individuals vary, i.e. why high/low scores on certain variables. Account for as much variance as possible in original variables by means of new dimensions or factors. Explain most of the variance in original data by means of small number of factors -each of which explains a ‘good’ proportion of variance -factors that explain only a little variance are useless. Not always able to explain all original variables well - but aim to explain the majority. Factors should, as far as possible, be uncorrelated - so as to give independent information.

188 of 220

## What is principal components analysis?

PCA is the simplest form of factor analysis. Default in SPSS. In PCA output, the factors are called components.

PCA data:

Original variables- must be suitable for correlation, i.e. numeric (interval or ordinal), categorical variables (e.g. gender, political party, supported, ethnic group) not suitable for PCA.

Sample size- preferably at least 100 cases in total, at least 5 (if possible 10) cases per variable analysed. Cases with missing data on any variable being analysed will be omitted. Need to know number of cases actually analysed, usually smaller than total number of people. If number of cases too small, PCA will run (as long as there are more cases than variables), but results not reliable, may not generalise well to other samples.

189 of 220

## PCA example?

Random subsample from 1606 people in USA (completed attitude survey in 1993)

15 variables considered

Age (in years) Educ (years of full-time education) Income (21 bands, < \$1000 - \$75000+) TV hours (hours watching TV per day) Ratings of liking for 11 types of music ØBigband, Bluegrass, Country & Western, Blues/R&B, Broadway Musicals, Classical, Folk, Jazz, Opera, Rap, Heavy Metal Ø1-5 scale: 1=Dislike very much, 5=Like very much

190 of 220

## Analysing data with PCA?

See How to guide on BB for details. You should always check your data before running any analyses. For PCA, check you have labelled missing data correctly.

Analysis 1: check data- descriptive statistics shows how many cases included in analysis, after omitting any with missing data.

Analysis 1: Factors- How many factors needed to explain majority of variance? SPSS starts by finding as many factors as there are variable, but we only want to keep (extract) a few, i.e. those that explain most variance. Measure of amount of variance explained by a factor is the eigenvalue. Conventionally, number of factors extracted is the number that have eigenvalue greater than 1.

Analysis 1: total variance explained- components (factors) were computed. Anything below 1 in first column (total) does not add anything meaningful. These components will be extracted and the cumulative % will show how much these factors explain of total varience.

Analysis 1: communalities- proportion of variance in each variance explained by ALL the extracted factors (components). Variables with communality of 0.5 or more reasonably well explained by extracted factors, others not so well, rough criterion, not a ridid one.

191 of 220

## Analysing data with PCA continued?

Components- shows initial factors, numbers in the lables are loadings (correlations between variables and factors/components).

Loadings A variable with a strong loading (either + or -), strongly correlated with, or representative of, factor.

Loadings or +- 0.6 or more are strong, +-0.4-0.59 are moderate (sometimes informative), +- 0.3 or less are weak (often suppressed in output).

Extracted factors can be 'rotated' to aid interpretation. Makes the ones that are stronger stronger, and the ones that are weaker weaker. Maximises loading of each variable on one of the extracted factors while minimizing the loading of all other factors. Process makes it clearer which variables link to which factor.

The goal of rotation is to obtain a simpler factor loading pattern that is easier to interpret than the original factor pattern. The communalities are unchanged from the unrotated to the rotated solution.

192 of 220

## Analysing data with PCA continued?

Analysis 1: Rotated components- More interpretable, shows how each variable is represented in each of the four factors. Varimax rotation has adjusted loadings to achieve 'pure loadings'.

Effect of rotation:

Pure loading- each factor now correlates 'purely' with a small, separate set of variables, as different as possible from variables of other factors. Pure loading (also called simple structure), strong loading (equal to or less than 6) on this factor AND low loadings (more than 3) on other factors. To interpret a factor- which variables load purely- but even factors with pure loadings may not have obvious interpretation.

Analysis 1: Interpret factors- Interpret which variables have pure loadings and whether they are positive, small co-efficients (low loadings) less than 0.3 are suppressed.

Independence of factos: Varimax-rotated factors designed to be uncorrelated (statistically independent), a person's score on one factor will have little or no correlation with score on any other factor, i.e. the factors give indepdendent information about people.

193 of 220

## Analysis 2?

Similar to analysis 1, but adding 3 demographic variables: e.g. age, educ, tvhours.

Will demographic variables load on same factors or will they load on new factor (s)? SPSS procedure same as before but there will be 3 more variables.

Analysis 2: output- look at how many extracted, what % of variance they explain and how do they compare to analysis 1.

Analysis 2: output- communalities- look at do variables have less than satisfactory values.

Analysis 2: output- rotated component matrix- Loadings <.3 suppressed. Do original variables form similar factors to analysis 1? Demographic variables are also looked at in terms of loadings.

Types of factor analysis:

FA can be exploratory as described; trying to find out how many factors there are and what variables belong with them. Or it can be confirmatory, checking that the set of data does contain a specified or expected set of factors.

194 of 220

## Distinguishing between effect size and statistical

The effect in an effect size is the relationship/association/difference that you,have set out to investigate in your research study. When writing a research report, the American Psychological Association indicated that you should report the effect size associated with your statistical analysis in addition to reporting information about statistical significance. So you clearly indicate the effect size in your research question/ hypothesis.

The effecr size is standardised so that effect sizes can be compared, regardless of the units of the variables that are being investigated. Statistical significance provides information that allows you to make interferences about population based on your sample. Basically, a significance test asks whether the hypothesis we have about the effect in the population is likely to be true or not. It is useful to know not only whether your hypothesis is likely to be true, but also how close your results are to your hypothesis. This is the value of the effect size.

An alternative approach to stating effect sizes is to state the confidence interval of your statistic.

195 of 220

## Exploring effect size for correlations?

Correlation coefificients are the statistics that you generate when you want to analyse the relationship between two variables, which tells you the size of the relationship on a standardised scale. Therefore, the correlation coefficient is the effect size.

The correlation coefficient is reported on a scale between 0 and 1, with plus and minus signs indicating the direction of the relationship. Zero indicates no relationship and the further the coefficient is from 0 (the closer to +1 or -1), the stronger the relationship- so the larger the effect.

Think of a correlation coefficient is 0.1 up to 0.3 indicating a small effect, a corrrelation coefficient of 0.3 to 0.5 indicating a medium effect, and a correlation coefficient of above 0.5 indicating a large effect.

196 of 220

## Considering effect size when comparing differences

When comparing differences between two sets of score, these sets of scores can be scores from independent groups or scores from repeated measurements. The effect size for both types of design are similar.

In these situations the easiest effect size to calculate is obtained by finding the difference between the mean score for each set of scores and dividing by the standard devitation. mean1-mean2/SD

The order that you put the mean scores into this formula determines whether you get a positive or negative effect size, but the size of the effect remains the same. Calculatinf this effect size is straightforward, except for the choice of standard deviation. There are two sets of scores here (either two groups or two repeated measures). There will, therefore, be two means and you use both of these in calculating the effect size. You also have two SD (one for each set of scores), but you need only one SD to calculate the effect size. The one you choose depends on whether you are examining the differences between two groups or differences between two repeated measures.

197 of 220

## Effect size for the difference between the two gro

In this case, the SD that you use to calculate the effect size should be the average SD for each group. So, you add the SD from one group to the SD of the other group and then divide by 2. The answer is the standard deviation that you use to calculate the effect size.

To find the effect size: subtract the mean scores and then divide by the average of the two SD's.

If the result is negative, the effect size is still the same, it is just telling you the direction of the effect. Therefore, you are best to present the effect size as a positive value and just describing the direction of the effect.

198 of 220

## Effect size for the difference between two repeate

Two choices: 1. Use the SD obtained at the first measurement point. The principle here is that the SD is more appropriate because it represents the scores as they were originally. For example, more similar to a population that has not undergone an experimental condition. 2. Use the SD of the difference scores, so calculate the difference between the two repeated measurements for each person in the dataset, so that you have a set of differences scores, and then work out the SD of this set of difference scores. You obtain difference scores by subtracting the score obtained by a person at one point in time from the score obtaied by the same person at the other point in time (for example), resulting in a single set of difference scores. You can then calculate the mean and SD of this set of difference scores.

To find the effect size in this case: Subtract the mean scores, divide the mean difference by a SD by either using the SD from the first time point or using the SD of the difference scores. If you use the first option, then the effect size you calculate is commonly referred to as Cohen's d. If you use the second option, the effect size you calculate is commonly referred to as the standardised response ean. These labels are used as a shorthand way of telling the reader how you calculated the effect size.

199 of 220

## Interpreting an effect size for differences betwee

Cohen's d of the standardised response mean tell you the size of the difference between the two sets of scores, expressed in terms of their standard deviation. In other words, if tthe effect size is 0.5 (using cohen's d) this tells you that the difference between group 1 and group 2 was approximately 0.5 SD's. A SD is a measure of how much you expect the scores to vary on average, so expressing a difference in terms of SD's provides some context for the size of the different.

Consider an effect size small when it has a value of 0.2 up to 0.5, medium when it has a value of 0.5 to 0.8 and large when it has a value greater than 0.8.

200 of 220

## Looking at effect size when comparing differences

When comparing differences between three or more sets of scores, these sets of scores can be scores from independent groups or scores from repeated measures, so comparing three sets of scores from independent groups or comparing three sets of scores on the same variable from the same people. The effect size for both types of designs is the same. A statistical test commonly used to examine differences between more than two sets of scores is ANOVA. The information you obtain from the ANOVA allows you to calculate an effect size known as eta-squared (n2). This is obtained on SPSS so does not need to be calculated by hand. Strictly speaking, eta-squared is produced by SPSS when you have one-way ANOVAs, for two-way ANOVAs or mixed ANOVAs then the effect size is known as parital eta-squared.

Eta-squared- Estimates the proportion of variance in the DV that is explained by the IV (s). When more variables are added to the model the effect size is likely to decrease because the total sums of squares increases. Partial eta-squared- More useful when more variables added. It is the sum of squares for the effect divided by the sum of squares of the effect + the error sum of squares. For both small, medium and large effects are usually to be considered as , 0.01, 0.06 and 0.14. However, when using repeated measures they are more like correlation coefficients so effect sizes are 0.01, 0.09 and 0.25 respectively.

201 of 220

## Understanding statistical power?

The probability of making a correct conclusion to reject a null hypothesis that is false in reality is known as statistical power. In other words, statistical power is the likelihood that you will find a statistically significant result which is correct. Generally, if a null hypothesis is false you want to be quite confident that your analysis will reject it, so it is important that your statistical analysis has a high power.

Also, by increasing statistical power, you decrease the probability of making a type 2 error (because power is equal to 1 minus the probability of making a type 2 error).

There are two good reasons for ensuring that your power in your statistical analysis is maximised. By convention, it is preferable to have statistical power of at least 90%, although a minimum of 80% is also considered acceptable. So it is prefereable to have a 90% chance of rejecting the null hypothesis when it is false, but you settle for an 80% chance as an absolute minimum.

202 of 220

## What factors influence power?

A number of factors determine the level of statistical power in your analysis. These factors interact with one another but in basic terms the following is true:1. The larger the effect size the higher power you have in your analysis, 2. the more liberal your cut-off point (alpha) for determining statistical significance, the more power in your analysis. By convention, the maximum alpha value is 5%. If you choose a more conservative alpha value (0.01) then your power decreases. 3. As the standard deviations of the variables included in your analysis decrease, the power of that analysis increases. 4. power is greater when you use a one-tailed hypothesis test rather than a two-tailed hypothesis test. 5. As your sample size increases, power increases.

In practice, the only thing you have much control over in terms of increasing power is the sample size. You cannot arbitrarily set the effect size, because the effect size is what you find it to be in your analysis. You cannot increase the alpha value beyond 0.05 because this would be unacceptable to an informed reader of your reseach report. And you cannot arbitrarily change the SD's of the variables in your analysis. So you rely on ensuring that your sample size is sufficient to ensure that the power in your analysis is at the required level. In fact, psychologists use the concept of power to work out the sample size required for a research study before collecting data. In this way they ensure that the amount of data collected gives them the power they need in their analysis.

203 of 220

## Considering power and sample size?

Constraints on resources usually means that you aim for the minimum sample size necessary for a research project. So, when calculating sample size, researchers often base the calculation as an alpha (cut-off) value of 0.05 (5 percent) and statstical power of 0.8 (80 percent). To calculate sample size you also need to know the likely statistical analysis that you'll be conducting and the effect size for this analysis.

Calculating an effect size is relatively easy when you have data but when you want an effect size to calculate sample size, you are estimating the likely effect size before you collect any data, so you are estimating the outcome of your analysis so that you can work out how many people you need to recruit for your research project in order for you to conduct your analysis.

Solutions as back to front: use data from a pilot study to calculate an effect size, use data reported in similar previously published studies to calculate effect size, estimate an effect size based on the minimum effect size you consider to be important, this idea is that you ensure that you have a sufficient sample size to detect any important effects smaller than this. Finally, if you can't estimate a specific effect size, estimate whether you expect the effect to be small, medium or large and then you can convert this into an effect size value using the guidelines presented.

204 of 220

## Calculating sample size?

For correlation analysis:

Sample size required to attain 80% power with alpha of 0.05, using a one-tailed test= 1+(2.5/r) squared. Using a two-tailed test= 1+(2.8)/r) squared. Where r= the expected effect size (correlation coefficient).

For independent t-tests:

Sample size required per group to attain 80% power and alpha 0.05, using a one-tailed test= 2*(2.5/ES) squared, using a two tailed test= 2*(2.8/ES) squared. Where ES= the expected effect size expressed as cohen's d.

For paired t-tests:

Sample size required per group to attain 80% power and alpha 0.05, using a one-tailed test= (2.5/ES) squared, using a two-tailed test (2.8/ES) squared. Where ES= the expected effect size expressed as cohen's d. G* power is a sample size calculater for complex designs.

205 of 220

## Power is related to...?

Effect size (ES)- larger effect size= greater power.

Number of participants (N)- more participants= greater power.

Significance criterion (a)- larger p-value= greater power and 1-tailed test=greater power than two-tailed test. If power is less than 0.8 then the study is a waste of time and something needs to be done to increase power.

Power and tests:

Comparing means (Cohen's d) large= 0.8, medium=0.5, small=0.2

Correlation (Pearson's r) large =0.5, medium=0.3, small=0.1

ANOVA (n2 eta squared) large=0.8, medium=0.5, small=0.2

Multiple regression (Cohen's f) large= 0.35. medium=0.15, small=0.02

Association (w omega) large=0.5, medium=0.3, small=0.1

206 of 220

## Project planning?

Calculate power before you start; number of participants, good ethical standards.

Research process: Generalization, problem, model and hypothesis, research design, measurement, data collection, data analysis, generalization, problem etc.

Developing strong research questions you should ask:
Do I know the field and its literature well, what are the important research questions in my field, what areas need further exploration, has a great deal of research already been conducted in this topic area, is the timing right for this question to be answered, is it a hot topic or is it becoming obsolete? It needs to pass the 'so what' test.

The need to READ- in order to answer these questions, you must know exisiting literatrure well, theories and research build on each ther, basic information from books but they are not peer-reviewd, lack detail, out of date by publication, therefore, best to read recent, published papers from peer-reviewd journals.

Study design- what observations/tests/measures, which participants and why, how many needed (power analysis), ethical considerations for the school/university ethics and disclosure.

207 of 220

## Project planning continued?

What and why- what are you aiming to do (research question, hypothesis), why are you planning to do this (why is it interesting, how does it connect to previous research, what will we know that we don't know now, is this the best way to answer the question).

How- must be discussed with supervisor (timeframe and sequence e.g. Gannt chart, proposed methods, proposed data analysis, resources needed (especially participants)).

BPS ethics:

Three different policies that are useful: ØCode of ethics and conduct (2009) ØCode of Human Research Ethics (2014) ØEthics Guidelines for Internet Mediated Research (IMR) (2013)

208 of 220

## BPS code of ethics & conduct (2009)?

Based on four principles:

Respect general respect (individual differences) privacy and confidentiality (disclosure of information) informed consent  (details of research) self-determination (right to withdraw) Responsibility protection of research participants (eliminate risks; effects of individual differences; right to refuse to answer certain questions) debriefing (outcomes and nature of research) Competence Integrity

209 of 220

## BPS code of Human research ethics (2014)?

Four ‘first principles’, designed to be flexible to changing demands on ethics Link together with Code of Ethics & Conduct (2009) ØRespect for the autonomy, privacy and dignity of individuals and communities ØScientific integrity ØSocial responsibility ØMaximising benefit and minimising harm

210 of 220

## The DPA (1988)?

Eight Data Protection Principles. In summary these state that personal data shall: Øbe obtained and processed fairly and lawfully and shall not be processed unless certain conditions are met; Øbe obtained for a specified and lawful purpose and shall not be processed in any manner incompatible with that purpose; Øbe adequate, relevant and not excessive for those purposes; Øbe accurate and kept up to date; Ønot be kept for longer than is necessary for that purpose; Øbe processed in accordance with the data subject's rights; Øbe kept safe from unauthorised access, accidental loss or destruction; Ønot be transferred to a country outside the European Economic Area, unless that country has adequate levels of protection for personal data.

211 of 220

## The DPA & research?

In some circumstances, some of these principles can be exempted: The circumstances are unlikely: ØThe personal data are not processed to support measures or decisions with respect to particular individuals (not just the individual subjects, but anyone who may be affected by the research), and ØThe personal data are not processed in such a way that substantial damage or substantial distress is, or is likely to be, caused to any data subject.
If the circumstances are met, the following principles can be exempted: Ø2. Data can be processed for research purposes other than for which they were obtained Ø5. Data can be held indefinitely Ø6. No right of data subject to access own data

212 of 220

## What does this mean for the project?

You will need ethical approval for any research involving data from data subjects It doesn’t matter if you are collecting it or not You will need to follow both BPS ethical principles, and the DPA legal requirements However, you should exercise your judgement about ethical issues: what constitutes risk?  Prolonged testing?  Distress?  Be sensible here.

Ethical approval: UoRM Research Ethics Committee (UoRM REC)

Chair of the committee: Dr Dan Jones All applications go through a member of staff A member from each section in the University: psychology, henley business school, pharmacy, built environment, foundation.

213 of 220

## Applying for ethical approval?

Submit to your supervisor: ethics application form, information sheet for participant or parent/guardian, information sheet for children (if of reading age), consent form for participant or parent/guardian, questionnaires to be used (if applicable), debriefing procedure (if applicable), recruitment letters (e.g. to schools), supervisor's checklist (for them to complete), two hard copies (signed) and on electronic copy.

Writing an ethics application: title of project, your details, summary (background, purpose and justification, research question and hypothesis), procedure, ethical issues, data protection & confidentiality, participant information (consent, sample size, recruitment).

Information sheet: general; must be printed on school headed paper, write in simple language-avoid jargon, information to include; you and your supervisor's name, together with contact details, title and purpose of study, full and clear account of what is required of the participant, how the participant will be selected.

Information to include; participant is voluntary, right to withdraw at any stage without having to explain, arrangement to ensure confidentiality, storage and disposal of data, arrangements for providing research results if required (debriefing) and statement regarding ethical review process.

214 of 220

## Child participants?

Applying for a Disclosure Subject yourself to search by Disclosure &  Barring Service (DBS: formerly CRB check) Guidelines and Submission Forms will be available Your Information Sheet must include statement that you have gone through Disclosure

Parental Consent Required for children aged <16 years How this is done depends on method of recruitment e.g.  if a school is helping, the Head will have a view. Assent vs consent You should always ask the child’s consent too; if they are very young then their assent is okay.

Schools Procedure

Before approaching a school about assisting with your research project, you must inform the Schools co-ordinator (see Ethics Guidelines). On completion of the data collection phase of your research in a school, a questionnaire must be given to the Head (see the Schools co-ordinator about this).

215 of 220

## A priori power?

You can calculate power a priori or post hoc. When it is a priori you need to know 3 of the following ES, sample size, power, a in order to calculate the other. Usually usedd to determing how many participants are needed for a fully-powered study.

Calculating sample size: decide on the effect size to use for your study, now need to determine how many participants you require to achieve the effect size with power of 0.8.

Power analysis resources:  Cohen, J. (1992) A power primer. Psychological Bulletin, 112(1), 155-159.  [Available electronically] Øquick reference guide for most common tests

Clark-Carter D. (2004) Quantitative psychological research: student’s handbook. [Ch. 13 and Appendix XV Power Tables] Øin course collection Øsame ES as Cohen: t-tests, correlation, chi-square Ødifferent ES to Cohen: ANOVA, regression G*Power (everything you need and a lot you don’t!)

216 of 220

## What to do if power is too low?

If possible, increase sample size. If you cannot use a 1-tailed test rather than 2-tailed test, but that is not often justifiable, maximise your effect size.

Maximising effect size ESs get bigger if less random variation in data Øas SD gets smaller, d gets larger. To minimise random variation Øuse well-controlled procedures to minimise confounds Øuse reliable measures (e.g. average across more trials or items; ensure validity). Sometimes, using within-S rather than between-S designs improves power Øbut within-S power is VERY hard to calculate!

Between subjects t-test use Clark-Carter table (left side one-tailed, right side two-tailed. For two means or correlation use approximate values or r for correlation.

ANOVA effect size- explained variability/total variablility=** between/** total. If variability not known calculate by number of groups (g)-1/ number of groups-1F+g(n-1).

Effect size multiple regression (Cohen's f)- calculate from previous research fsquared=rsquared/1-rsquared. Or use Cohen's approximate values.

Association effect size- use Clark Carter for chi-square or approximate values.

217 of 220

## Power analysis?

Correlation: Clark-Carter has similar tables for correlation (Table A15.6) and association (Table A15.4) Example: Correlation If trying to detect r = 0.3 (medium-size effect); n= 50 (2-tailed): power = 0.56 For satisfactory power, you need either a larger sample (80-90), or a larger effect size (0.4).

One-way ANOVA: Clark-Carter (2004) Ø p.585 for effect size levels ØTable A15.5 power tables ØTreatment df = (number of conditions minus 1)

Cohen (1992) Øp.157 for effect size levels Øp.158 for quick reference table Multiple regression: Clark-Carter  p.585 for effect size levels Table A15.7 power tables. Cohen (1992) Øp.157 for effect size levels Øp.158 for quick reference table Ø G*Power ØTest family: F tests ØLinear multiple regression:Fixed model, R² deviation from zero

218 of 220

## Null hypothesis significant testing?

Confidence intervals are constructed at a confidence level, such as 95 %, selected by the user.  It means that if the same population is sampled on numerous occasions and interval estimates are made on each occasion, the resulting intervals would bracket the true population parameter in approximately 95 % of the cases NHST?  Majority of psychologists still use NHST And many don’t understand it! Historically, there were 2 different theoretical approaches: ØStatistical testing, using p, as an estimation of the strength of finding (Fisher) ØThe H0 and H1 are decided upon by a pre-determined α criterion (Neyman & Pearson) We often operate in a weird hybrid of the two

219 of 220

## What does the p value tell us?

The p value is the probability to obtain an effect equal to or more extreme than the one observed presuming the null hypothesis of no effect is true.

In other words, the p value tells us the probability of us getting those results IF the H0 is correct

Many of us assume it’s the probability of the H1 being wrong

Or the likelihood of getting this result if H1 is correct

Or a measure of the chance of getting this result.

Try to stop thinking about p values as an ‘all or nothing’ Effect sizes are more useful, and should be considered in context Remember that p values can be useful, but use your judgement The journal Basic and Applied Social Psychology (BASP) has just banned NHST: no p values here, just ES The APA recommends using ES when reporting analyses

220 of 220

## Comments

No comments have yet been made

## Similar Psychology resources:

See all Psychology resources »See all Statistics resources »