# Research Methods and Statistics

?

## Theories vs. Hypotheses

• Fact: a statement about a direct observation of nature that is so consistently repeated that virtually no doubt exists as to its truth.
• Theory: a collection of statements (propositions, hypotheses) that together attempt to explain a set of observed phenomena.
• Hypothesis: a clear but tentative explanation for an observed phenomenon.
1 of 65

## Theories

Intergrated set of proposals that:

• Define
• Explain
• Organise
• Interrelate

Proposals that provide a model of how the observed phenomena 'work'.

Makes general predictions upon which specific hypotheses can be based.

Examples:

• Tiredness leads to poorer cognitive function.
• Lecturing improves student knowledge
• Schizophrenia is genetically determined
• Phonological skills underlie reading ability
2 of 65

## Hypotheses

Make specific predictions and must be:

• Falsifiable: can the hypothesis potentially be disproven?
• Testable: can a test be designed to adequately test the hypothesis?
• Precisely stated: are all terms clearly defined?
• Rational: is hypothesis consistent with known information?
• Parsimonious: is the explanation the simplest possible?
3 of 65

## Generating Hypotheses

• Theory: Tiredness leads to poorer cognitive function
• Hypothesis: Students who have had less than 6 hours sleep the night before an exam will perform worse than those who have had 8 or more hours sleep the previous night.
4 of 65

## Hypothetico-Deductive Model

Start out with observation and intuition, which make a theory

Generate hypotheses based on the theory

Conduct empirical tests to test the hypothesis

If the hypothesis is supported by the results, uphold theory as undefeated, with an estimate of confidence.

If the hypothesis is not supported, refine or abandon the theory.

5 of 65

## Constructs

Theoretical constructs formulated to serve as causal or descriptive explanations

• e.g. Psychosis: a mental state characterised by a "loss of contact with reality" (DSM IV)

Don't directly indicate a means by which they can be measured.

6 of 65

## Variables

Any characteristic that can assume multiple values (i.e. can vary)

• e.g. age, gender, body, weight, alcohol consumption, occupation, test score, etc.

An event or condition the researcher observes or measures.

Variables must be operational

• i.e. explicity stated
7 of 65

## Constructs vs Variables

Constructs defined by theoretical definitions

• e.g. psychosis: a mental state characterised by a "loss of contact with reality"

Variables defined by operational definitions

• e.g. contact with reality "defined" by a score on a questionnaire.
8 of 65

## Scales of Measurement

Variables differ from one another in terms of their underlying properties

• Nominal (category membership)
• Ordinal (ranked or ordered)
• Interval (equal increments, but no real 0 point)
• Ratio (real 0 point)
9 of 65

## Nominal (Categorical) Data

• Category membership
• Numbers assigned serve as labels but do not indicate numerical relationship
• e.g. gender, political party, religion
10 of 65

## Ordinal Data

• Data can be ranked along a continuum
• Intervals between ranks are not equal
• e.g. race positions, attractiveness
11 of 65

## Interval Data

• Intervals between successive values are equal
• But no 'true' zero point
• e.g. temperature, shoe size
12 of 65

## Ratio Data

• Highest level of data
• Equal intervals and a true zero point
• e.g. height, distance
13 of 65

## Experimental Methods

A research design which allows us to make causal inferences about the influence of one or more variables in a variable of interest.

The researcher manipulates one or more variables and measures the effect on other variables.

e.g. effects of alcohol on memory function

• manipulated variable: amount of alcohol consumed
• measured variable: score on memory test
14 of 65

## Independent Variables

The variable that is manipulated and is hypothesised to bring about a change in the variable of interest

• aka the grouping variable

Independent variables each have at least two levels

E.g.

• Two levels: drug, no drug
• Four levels: drug, counselling, mentoring, group therapy
15 of 65

## Dependent Variable

The variable that is measured

• aka the outcome variable

We compare differences in the DV under the different levels of the IV

E.g.

• exam score
• score on a test of intelligence
• score on a test of mood
• reaction time
16 of 65

## Subjects Design

The assignment of participants to experimental condition/levels of the IV

• Between-subjects/independent groups
• Within-subjects/repeated measures
• Mixed-designs
17 of 65

## Between-Subjects Design

Participants each exposed to one level of the IV

Example: effects of alcohol consumption on short term memory performance

• IV: alcohol consumption
• DV: memory performance
• Assign participants to one of two groups (alcohol or no alcohol)
• Measure each group's memory performance and compare
18 of 65

## Considerations

• How do we ensure that any differences in rresults are due to the two variables involved and not a third variable?
- e.g. age, experience, tiredness
• We can't eliminate the effects of these other variables.
• But we can minimise these effects by spreading their influence across the different levels of the IV(s).

Random Allocation

• Ensures that each participant is equally likely to be assigned to any IV level.
• Distributes the occurrence of potential moderating variables equally among experimental conditions.
• Prevents experimenters (un)intentionally biasing their results.
• Enables the use of powerful statistical tests that can help determine
19 of 65

## Within-Subjects Design

Participants each exposed to all levels of the IV

Example: effects of alcohol consumption on short term memory performance

• IV: alcohol consumption
• DV: memory performance
• Participants now take part in both levels of the IV - test before alcohol and test after alcohol
• Measure each participants performance before and after alcohol, and compare
20 of 65

## Considerations

• Potentially moderating characteristics are kept equal across the levels of the IV (each participant acts as their own control).
• Requires fewer participants than between-subjects design.
• Order effects - once participants have been exposed to one level of the IV there's no way to return them to their original state.

Counterbalancing

• Split the group of participants in half (A and B).
• Group A can participate in Level 1 then Level 2.
• Group B can participate in Level 2 then Level 1.
• Order effects will still influence participants performance, but the effect of that influence will be evenly spread out across each level of the IV.
21 of 65

## Factorial Designs

• Experimental designs with 2 or more IVs.
- What effect does IV (1) have on the DV?
- What effect does IV (2) have on the DV?
- What effect does the interaction of IV (1) and IV (2) have on the DV?
• Example: effects of alcohol consumption and work shift patterns on work productvity.
- DV: work output
- IV: shift pattern
- IV: alcohol consumption.
22 of 65

## Fully Independent Factorial Designs

Each participant takes part in just one experimental condition (level of a single IV).

Example: effects of alcohol consumption on work productivity.

• Participants 1-20: dayshift, alcohol
• Participants 21 - 40: nightshift, alcohol
• Participants 41 - 60: dayshift, no alcohol
• Participants 61 - 80: nightshift, no alcohol
23 of 65

## Fully Repeated Measures Factorial Design

Each participant takes part in all experimental conditons.

Example: effects of alcohol consumption and work shift patterns on work productivity.

• Participants 1 - 20 take part in
- dayshift, alcohol
- nightshift, alcohol
- dayshift, no alcohol
- nightshift, no alcohol
24 of 65

## Fully Repeated Measures Factorial Design

Each participant takes part in all experimental conditons.

Example: effects of alcohol consumption and work shift patterns on work productivity.

• Participants 1 - 20 take part in
- dayshift, alcohol
- nightshift, alcohol
- dayshift, no alcohol
- nightshift, no alcohol
25 of 65

## Factorial Mixed Designs

Always contain at least:

• 1 or more within-subjects IV(s)
• 1 or more between-subjects IV(s)

Each participant takes part in all levels of within-subjects IV(s) but just one level of between-subjects IV(s).

Example: effects of alcohol consumption and work shift patterns on work productivity.

• Participants 1-20 take part in both dayshift and nightshift conditions but only the alcohol condition.
• Participants 21-40 take part in both dayshift and nightshift conditions but only the no alcohol condition.
26 of 65

## Choice in Experimental Design

We are often able to choose with study design to employ:

• Independent (all IVs are between-subjects)
• Repeated measures (all IVs are within-subjects)
• Mixed (a mixture of between and within).

Choice depends on:

- Between subjects - eliminate order effects
- Within subjects - eliminate individual difference effects
• Number of participants (availability)
- Within subjects requires fewer participants
27 of 65

## Between-Subjects Design without Random Allocation

• True-experimental designs: experimenter has complete control over the assignment of participants to experimental conditions (e.g. assign participants to groups that consume different amounts of alcohol)
• Quasi-experimental designs: the assignment of participants to experimental conditions is pre-determined (e.g. compare pre-existing alcohol consumption groups).
• Occassionally its not possible to randomly assign participants to the levels of the IV, for example looking at the short-term memory of alcoholics and controls.
• The assignment of participants to levels of the IV is based on fixed characteristics. This poses a serious problem as there are likely to be differences between the groups other than the variable of interest.
• These limitations mean that we have to be cautious about inferring causality on the basis of quasi-experimental designs.
28 of 65

## Matched Pairs

• For quasi-experimental between-subjects designs, where participants can't be randomly assigned to IV levels.
• Identify potentially moderating variables and match the groups on this basis.
• Even better than matching the groups on the basis of potentially moderating variables is a matched pairs design.
• Match (pair) individual participants on the basis of such variables.
• It's usually impossible to perfectly match participants in this way.
29 of 65

## Within-Subjects Design without Counterbalancing

• Occassionally it's not possible to counterbalance the order in which participants are exposed to the levels of the IV, for example examining the effectiveness of mnemonic training on memory performance.
• The order in which participants are exposed to levels of the IV is fixed. This poses a serious problem as there are likely to be differences between time 1 and time 2 other than the variable of interest.
• These limitations mean that we have to be cautious about inferring causality on the basis of within-subjects designs that don't allow for counterbalancing.
30 of 65

## Pretest Posttest Control Group Design

• For within-subjects designs where IV levels can't be counterbalanced.
• Split participants into 2 groups and manipulate the IV of interest in one group only.
• The inclusion of a control group allows us to account for any order effects that might be present.
• We can then statistically control for the difference in the treatment group accounted for by order effects.
• NB: this is a mixed design.
31 of 65

## Measurement Error

There are two broad categories of error associated with measurement:

• Random error - obscure the results (e.g. measuring the height of women with a tape measure and making small errors while reading it).
• Constant error - bias the result (e.g. you forget to task the women to take their shoes off before you measure them).
32 of 65

## Extraneous Variables

Extraneous variables: undesirable variables that add error to experiments and add error to the measurement of the DV.

Aim of research design is to eliminate or at least control the influence of extraneous variables:

• Random allocation/counterbalancing
• Results in an even addition of error variance across levels of the IV.
33 of 65

## Confounding Variables

Confounding variables: extraneous variables that disproportionately affect one level of the IV more than the other levels. They add constant/systematic error at the level of the IV.

Confounding variables introduce a threat to the internal validity of our experiments. Random allocation/counterbalancing spreads the influence of extraneous variables (so that they do not become confounding variables).

Confounds can result in us measuring:

• an effect of the IV on the DV when it is not present
• no effect of the IV on the DV when it is present

As researchers we ideally want to eliminate these variables; where this is not possible we aim to control for these variables. At the very least we must acknowledge these variables.

34 of 65

## Threats to Internal Validity

There are many sources of confounding variables.

These can be categorised as arising due to:

• Selection
• History
• Maturation
• Instrumentation
35 of 65

## Selection

• Bias resulting from the selection or assignment of participants to different levels of the IV.
• Results if participants who are assigned to different levels of the IV differ systematically in some way that could influence the measurement of the DV (other than the manipulation of interest).
• Particular problem for quasi-experimental designs.
36 of 65

## Threats to Internal Validity

History

Uncontrolled events that take place between testing occasions.

Maturation

Intrinsic changes in the characteristics of participants betwee test occasions.

Instrumentation

Changes in the sensitivity or reliability of measurement instruments during the course of the study.

37 of 65

## Validity - Reactivity

Reactivity: awareness that they are being observed may alter participants' behaviour. Can threaten internal validity if participants are more influenced by reactivity at one level of the IV than the other.

Resulting artefacts can be:

• Subject related - demand characteristics
• Experimenter related - experimenter bias

Counteracting reactivity

• Blind procedures - single or double.
38 of 65

## Causation: Necessity and Sufficiency

When can we say that X caused Y?

Need to satisfy necessary and sufficient criteria in order to make true claims about causality.

• Sufficient - Y is adequate to cause X.
• Necessary - Y must be present to cause X.
39 of 65

## Necessary not Sufficient

• To be good at psychology you need to be good at research methods (RM is necessary to make you good at psychology.
• But to be good at psychology, you also need to be good at other subjects in psychology, e.g. cognitive, developmental, perception, etc. (RM is not sufficient to make you good at psychology).
40 of 65

## Sufficient not Neccesary

• Completing and passing an undergraduate degree in psychology at one university will get you a BSc: the degree is sufficient in order to obtain a BSc.
• There are other universities and other courses that upon completion award a BSc, therefore studying and completing undergraduate psychology at one university is not necessary to obtain a BSc.
41 of 65

## Necessary and Sufficient

• To obtain full marks on the final RM exam, it is necessary to answer every question correctly.
• To obtain full marks on the final RM exam, it is sufficient to answer every question correctly.
42 of 65

## True Causation

True causation can only be established when necessity and sufficiency criteria are satisfied.

• The manipulation of the IV, in the absence of all other factors, will always result in the DV change (sufficient).
• The DV change will not be measured in the absence of the IV manipulation, i.e. in response to other factors (necessary).

However, human behaviour is very complex and it's usually impossible to control for all other factors because it's usually impossible to identify them.

Multifactorial causation:

• Phenomenon is determined by many interacting factors.
43 of 65

## How Good are my Measures?

The data gathered to test a hypothesis are only as good as the measures that were used to obtain them.

As psychologists, we need to consider:

• Precision and accuracy
• Reliability and validity
44 of 65

## Reliability and Validity

Reliability: precision (consistency)

• The extent to which our measure would provide the same results under the same conditions.

Validity: accuracy (truthfulness)

• The extent to which it is measuring the construct we are interested in.
45 of 65

## Test-Retest Reliability

Measures fluctuations from one time to another

• If we administered our measure to the same participants on separate occasions, would we obtain the same results?
• Important for constructs which we expect to be stable (e.g. personality type)
• Beware of order effects
46 of 65

## Inter-Rater Reliability

Measures fluctuations between observers

• If two different rates/observers measured the variable of interest, would they obtain the same results?
47 of 65

## Parallel Forms Reliability

• If we administer different versions of our measure to the same participants, would we obtain the same results?
• Beware of order effects
48 of 65

## Internal Consistency

Determines whether all items (e.g. in a questionnaire) are measuring the same construct.

Can be assessed in a number of ways. One example is:

• Split-half reliability: questionnaire items split into two groups and the halves are administered to participants on separate occasions.
• Beware of order effects!
49 of 65

## Content Validity

Does our test measure the construct fully?

• e.g. a RM exam should cover knowledge of quantitative and qualitative methods.
50 of 65

## Face Validity

Does it look like a good test?

• e.g. do the questions in the RM exam reflect the R< knowledge students should have learnt?
51 of 65

## Criterion Validity

Does the measure give results which are in agreement with other measures of the same thing?

• e.g. do RM exam scores relate to related grades?

Concurrent: comparison of new test with established test

Predictive: does the test predict outcome on another variable?

52 of 65

## Construct Validity

Is the construct we are trying to measure valid?

• i.e. does the construct itself exist?

The validity of a construct is supported by cumulative research evidence collected over time.

• Together, supporting the existence of the construct itself.

In the short term construct valdiity can be assessed in terms of:

• Convergent validity: correlates with tests of related constructs.
• Discriminant validity: doesn't correlate with tests of different constructs.
53 of 65

## Samples vs. Populations

Most psychology students will obtain a sample and then try to generalise to the population.

• What is the population of interest?
• Is the sample representative?
• Is the sample free from bias?
54 of 65

## Populations

The entire collection of people, animals, plants or objects that we are interested in, sharing a common characteristic.

Defined by population parameters, i.e. measurements which describe the population.

Vary in size, e.g. all students, all university students, all university psychology students.

55 of 65

## Samples

A selection of individuals from the larger population.

• For any population there are many possible samples
• Vary in size, e.g. 100 students, 50 students, 20 students.

Defined by sample statistics, i.e. measurements which describe the sample.

Sample statistics are used to infer population parameters.

56 of 65

## Why do we Sample?

Number of reasons

• Time (difficult to collect data from everyone quickly, though not impossible, e.g. vote, census)
• Money (expensive to collect data from everyone, i.e. production costs, payment to participants)
• Access (not always possible to reach all members of a population)
• Sufficiency (pattern of results don't change much even if we have data from everyone)
57 of 65

## Random Sample

The gold standard

Each member of the population has an equal chance of being selected

Usually quasi-random

58 of 65

## Systematic Sampling

Draw from the population at fixed intervals

Problematic in populations with a periodic function.

59 of 65

## Stratified Sample

Proportional

Specified groups appear in numbers proportional to their size in the population

Disproportional

Specified groups which are not equally represented in the population, are selected in equal proportions.

60 of 65

## Cluster Sample

Researcher samples an entire group or cluster from the population of interest.

61 of 65

## Opportunity/Convenience Sample

People who are easily available

But can lead to a biased sample

62 of 65

## Snowball Sampling

Recruit small number of participants and then use those initial concepts to recruit further participants.

Biases the sample, but useful if you want to recruit very specific populations.

63 of 65

## External Validity

We gather data from samples in order to infer population parameters.

External validity refers to the ability to generalise our results.

• Population validity: is our sample representative?
• Ecological validity: does the behaviour measured reflect naturally occurring behaviour?
64 of 65

## Sample Size

Size matters!

Sampling error can result if your sample is not large enough

Trade off between size and time/cost

Factors in deciding on sample size:

• Analysis (subjects design, number of IVs or IV levels)
• Response rate
• Heterogeneity of population
65 of 65