Research Methods and Statistics

Theories vs. Hypotheses

Fact: a statement about a direct observation of nature that is so consistently repeated that virtually no doubt exists as to its truth.
Theory: a collection of statements (propositions, hypotheses) that together attempt to explain a set of observed phenomena.
Hypothesis: a clear but tentative explanation for an observed phenomenon.

1 of 65

Theories

Intergrated set of proposals that:

Define
Explain
Organise
Interrelate

Proposals that provide a model of how the observed phenomena 'work'.

Makes general predictions upon which specific hypotheses can be based.

Examples:

Tiredness leads to poorer cognitive function.
Lecturing improves student knowledge
Schizophrenia is genetically determined
Phonological skills underlie reading ability

2 of 65

Hypotheses

Make specific predictions and must be:

Falsifiable: can the hypothesis potentially be disproven?
Testable: can a test be designed to adequately test the hypothesis?
Precisely stated: are all terms clearly defined?
Rational: is hypothesis consistent with known information?
Parsimonious: is the explanation the simplest possible?

3 of 65

Generating Hypotheses

Theory: Tiredness leads to poorer cognitive function
Hypothesis: Students who have had less than 6 hours sleep the night before an exam will perform worse than those who have had 8 or more hours sleep the previous night.

4 of 65

Hypothetico-Deductive Model

Start out with observation and intuition, which make a theory

Generate hypotheses based on the theory

Conduct empirical tests to test the hypothesis

If the hypothesis is supported by the results, uphold theory as undefeated, with an estimate of confidence.

If the hypothesis is not supported, refine or abandon the theory.

5 of 65

Constructs

Theoretical constructs formulated to serve as causal or descriptive explanations

e.g. Psychosis: a mental state characterised by a "loss of contact with reality" (DSM IV)

Don't directly indicate a means by which they can be measured.

6 of 65

Variables

Any characteristic that can assume multiple values (i.e. can vary)

e.g. age, gender, body, weight, alcohol consumption, occupation, test score, etc.

An event or condition the researcher observes or measures.

Variables must be operational

i.e. explicity stated

7 of 65

Constructs vs Variables

Constructs defined by theoretical definitions

e.g. psychosis: a mental state characterised by a "loss of contact with reality"

Variables defined by operational definitions

e.g. contact with reality "defined" by a score on a questionnaire.

8 of 65

Scales of Measurement

Variables differ from one another in terms of their underlying properties

Nominal (category membership)
Ordinal (ranked or ordered)
Interval (equal increments, but no real 0 point)
Ratio (real 0 point)

9 of 65

Nominal (Categorical) Data

Category membership
Numbers assigned serve as labels but do not indicate numerical relationship
e.g. gender, political party, religion

10 of 65

Ordinal Data

Data can be ranked along a continuum
Intervals between ranks are not equal
e.g. race positions, attractiveness

11 of 65

Interval Data

Intervals between successive values are equal
But no 'true' zero point
e.g. temperature, shoe size

12 of 65

Ratio Data

Highest level of data
Equal intervals and a true zero point
e.g. height, distance

13 of 65

Experimental Methods

A research design which allows us to make causal inferences about the influence of one or more variables in a variable of interest.

The researcher manipulates one or more variables and measures the effect on other variables.

e.g. effects of alcohol on memory function

manipulated variable: amount of alcohol consumed
measured variable: score on memory test

14 of 65

Independent Variables

The variable that is manipulated and is hypothesised to bring about a change in the variable of interest

aka the grouping variable

Independent variables each have at least two levels

E.g.

Two levels: drug, no drug
Four levels: drug, counselling, mentoring, group therapy

15 of 65

Dependent Variable

The variable that is measured

aka the outcome variable

We compare differences in the DV under the different levels of the IV

E.g.

exam score
score on a test of intelligence
score on a test of mood
reaction time

16 of 65

Subjects Design

The assignment of participants to experimental condition/levels of the IV

Between-subjects/independent groups
Within-subjects/repeated measures
Mixed-designs

17 of 65

Between-Subjects Design

Participants each exposed to one level of the IV

Example: effects of alcohol consumption on short term memory performance

IV: alcohol consumption
DV: memory performance
Assign participants to one of two groups (alcohol or no alcohol)
Administer alcohol accordingly
Measure each group's memory performance and compare

18 of 65

Considerations

How do we ensure that any differences in rresults are due to the two variables involved and not a third variable?
- e.g. age, experience, tiredness
We can't eliminate the effects of these other variables.
But we can minimise these effects by spreading their influence across the different levels of the IV(s).

Random Allocation

Ensures that each participant is equally likely to be assigned to any IV level.
Distributes the occurrence of potential moderating variables equally among experimental conditions.
Prevents experimenters (un)intentionally biasing their results.
Enables the use of powerful statistical tests that can help determine

19 of 65

Within-Subjects Design

Participants each exposed to all levels of the IV

Example: effects of alcohol consumption on short term memory performance

IV: alcohol consumption
DV: memory performance
Participants now take part in both levels of the IV - test before alcohol and test after alcohol
Measure each participants performance before and after alcohol, and compare

20 of 65

Considerations

Potentially moderating characteristics are kept equal across the levels of the IV (each participant acts as their own control).
Requires fewer participants than between-subjects design.
Order effects - once participants have been exposed to one level of the IV there's no way to return them to their original state.

Counterbalancing

Split the group of participants in half (A and B).
Group A can participate in Level 1 then Level 2.
Group B can participate in Level 2 then Level 1.
Order effects will still influence participants performance, but the effect of that influence will be evenly spread out across each level of the IV.

21 of 65

Factorial Designs

Experimental designs with 2 or more IVs.
Allows us to ask:
- What effect does IV (1) have on the DV?
- What effect does IV (2) have on the DV?
- What effect does the interaction of IV (1) and IV (2) have on the DV?

Example: effects of alcohol consumption and work shift patterns on work productvity.
- DV: work output
- IV: shift pattern
- IV: alcohol consumption.

22 of 65

Fully Independent Factorial Designs

Each participant takes part in just one experimental condition (level of a single IV).

Example: effects of alcohol consumption on work productivity.

Participants 1-20: dayshift, alcohol
Participants 21 - 40: nightshift, alcohol
Participants 41 - 60: dayshift, no alcohol
Participants 61 - 80: nightshift, no alcohol

23 of 65

Fully Repeated Measures Factorial Design

Each participant takes part in all experimental conditons.

Example: effects of alcohol consumption and work shift patterns on work productivity.

Participants 1 - 20 take part in
- dayshift, alcohol
- nightshift, alcohol
- dayshift, no alcohol
- nightshift, no alcohol

24 of 65

Fully Repeated Measures Factorial Design

Each participant takes part in all experimental conditons.

Example: effects of alcohol consumption and work shift patterns on work productivity.

Participants 1 - 20 take part in
- dayshift, alcohol
- nightshift, alcohol
- dayshift, no alcohol
- nightshift, no alcohol

25 of 65

Factorial Mixed Designs

Always contain at least:

1 or more within-subjects IV(s)
1 or more between-subjects IV(s)

Each participant takes part in all levels of within-subjects IV(s) but just one level of between-subjects IV(s).

Example: effects of alcohol consumption and work shift patterns on work productivity.

Participants 1-20 take part in both dayshift and nightshift conditions but only the alcohol condition.
Participants 21-40 take part in both dayshift and nightshift conditions but only the no alcohol condition.

26 of 65

Choice in Experimental Design

We are often able to choose with study design to employ:

Independent (all IVs are between-subjects)
Repeated measures (all IVs are within-subjects)
Mixed (a mixture of between and within).

Choice depends on:

Concerns about potential problems
- Between subjects - eliminate order effects
- Within subjects - eliminate individual difference effects
Number of participants (availability)
- Within subjects requires fewer participants

27 of 65

Between-Subjects Design without Random Allocation

True-experimental designs: experimenter has complete control over the assignment of participants to experimental conditions (e.g. assign participants to groups that consume different amounts of alcohol)
Quasi-experimental designs: the assignment of participants to experimental conditions is pre-determined (e.g. compare pre-existing alcohol consumption groups).
Occassionally its not possible to randomly assign participants to the levels of the IV, for example looking at the short-term memory of alcoholics and controls.
The assignment of participants to levels of the IV is based on fixed characteristics. This poses a serious problem as there are likely to be differences between the groups other than the variable of interest.
These limitations mean that we have to be cautious about inferring causality on the basis of quasi-experimental designs.

28 of 65

Matched Pairs

For quasi-experimental between-subjects designs, where participants can't be randomly assigned to IV levels.
Identify potentially moderating variables and match the groups on this basis.

Even better than matching the groups on the basis of potentially moderating variables is a matched pairs design.
Match (pair) individual participants on the basis of such variables.
It's usually impossible to perfectly match participants in this way.

29 of 65

Within-Subjects Design without Counterbalancing

Occassionally it's not possible to counterbalance the order in which participants are exposed to the levels of the IV, for example examining the effectiveness of mnemonic training on memory performance.
The order in which participants are exposed to levels of the IV is fixed. This poses a serious problem as there are likely to be differences between time 1 and time 2 other than the variable of interest.
These limitations mean that we have to be cautious about inferring causality on the basis of within-subjects designs that don't allow for counterbalancing.

30 of 65

Pretest Posttest Control Group Design

For within-subjects designs where IV levels can't be counterbalanced.
Split participants into 2 groups and manipulate the IV of interest in one group only.
The inclusion of a control group allows us to account for any order effects that might be present.
We can then statistically control for the difference in the treatment group accounted for by order effects.
NB: this is a mixed design.

31 of 65

Measurement Error

There are two broad categories of error associated with measurement:

Random error - obscure the results (e.g. measuring the height of women with a tape measure and making small errors while reading it).
Constant error - bias the result (e.g. you forget to task the women to take their shoes off before you measure them).

32 of 65

Extraneous Variables

Extraneous variables: undesirable variables that add error to experiments and add error to the measurement of the DV.

Aim of research design is to eliminate or at least control the influence of extraneous variables:

Random allocation/counterbalancing
Results in an even addition of error variance across levels of the IV.

33 of 65

Confounding Variables

Confounding variables: extraneous variables that disproportionately affect one level of the IV more than the other levels. They add constant/systematic error at the level of the IV.

Confounding variables introduce a threat to the internal validity of our experiments. Random allocation/counterbalancing spreads the influence of extraneous variables (so that they do not become confounding variables).

Confounds can result in us measuring:

an effect of the IV on the DV when it is not present
no effect of the IV on the DV when it is present

As researchers we ideally want to eliminate these variables; where this is not possible we aim to control for these variables. At the very least we must acknowledge these variables.

34 of 65

Threats to Internal Validity

There are many sources of confounding variables.

These can be categorised as arising due to:

Selection
History
Maturation
Instrumentation

35 of 65

Selection

Bias resulting from the selection or assignment of participants to different levels of the IV.
Results if participants who are assigned to different levels of the IV differ systematically in some way that could influence the measurement of the DV (other than the manipulation of interest).
Particular problem for quasi-experimental designs.

36 of 65

Threats to Internal Validity

History

Uncontrolled events that take place between testing occasions.

Maturation

Intrinsic changes in the characteristics of participants betwee test occasions.

Instrumentation

Changes in the sensitivity or reliability of measurement instruments during the course of the study.

37 of 65

Validity - Reactivity

Reactivity: awareness that they are being observed may alter participants' behaviour. Can threaten internal validity if participants are more influenced by reactivity at one level of the IV than the other.

Resulting artefacts can be:

Subject related - demand characteristics
Experimenter related - experimenter bias

Counteracting reactivity

Blind procedures - single or double.

38 of 65

Causation: Necessity and Sufficiency

When can we say that X caused Y?

Need to satisfy necessary and sufficient criteria in order to make true claims about causality.

Sufficient - Y is adequate to cause X.
Necessary - Y must be present to cause X.

39 of 65

Necessary not Sufficient

To be good at psychology you need to be good at research methods (RM is necessary to make you good at psychology.
But to be good at psychology, you also need to be good at other subjects in psychology, e.g. cognitive, developmental, perception, etc. (RM is not sufficient to make you good at psychology).

40 of 65

Sufficient not Neccesary

Completing and passing an undergraduate degree in psychology at one university will get you a BSc: the degree is sufficient in order to obtain a BSc.
There are other universities and other courses that upon completion award a BSc, therefore studying and completing undergraduate psychology at one university is not necessary to obtain a BSc.

41 of 65

Necessary and Sufficient

To obtain full marks on the final RM exam, it is necessary to answer every question correctly.
To obtain full marks on the final RM exam, it is sufficient to answer every question correctly.

42 of 65

True Causation

True causation can only be established when necessity and sufficiency criteria are satisfied.

The manipulation of the IV, in the absence of all other factors, will always result in the DV change (sufficient).
The DV change will not be measured in the absence of the IV manipulation, i.e. in response to other factors (necessary).

However, human behaviour is very complex and it's usually impossible to control for all other factors because it's usually impossible to identify them.

Multifactorial causation:

Phenomenon is determined by many interacting factors.

43 of 65

How Good are my Measures?

The data gathered to test a hypothesis are only as good as the measures that were used to obtain them.

As psychologists, we need to consider:

Precision and accuracy
Reliability and validity

44 of 65

Reliability and Validity

Reliability: precision (consistency)

The extent to which our measure would provide the same results under the same conditions.

Validity: accuracy (truthfulness)

The extent to which it is measuring the construct we are interested in.

45 of 65

Test-Retest Reliability

Measures fluctuations from one time to another

If we administered our measure to the same participants on separate occasions, would we obtain the same results?
Important for constructs which we expect to be stable (e.g. personality type)
Beware of order effects

46 of 65

Inter-Rater Reliability

Measures fluctuations between observers

If two different rates/observers measured the variable of interest, would they obtain the same results?

47 of 65

Parallel Forms Reliability

If we administer different versions of our measure to the same participants, would we obtain the same results?
Beware of order effects

48 of 65

Internal Consistency

Determines whether all items (e.g. in a questionnaire) are measuring the same construct.

Can be assessed in a number of ways. One example is:

Split-half reliability: questionnaire items split into two groups and the halves are administered to participants on separate occasions.
Beware of order effects!

49 of 65

Content Validity

Does our test measure the construct fully?

e.g. a RM exam should cover knowledge of quantitative and qualitative methods.

50 of 65

Face Validity

Does it look like a good test?

e.g. do the questions in the RM exam reflect the R< knowledge students should have learnt?

51 of 65

Criterion Validity

Does the measure give results which are in agreement with other measures of the same thing?

e.g. do RM exam scores relate to related grades?

Concurrent: comparison of new test with established test

Predictive: does the test predict outcome on another variable?

52 of 65

Construct Validity

Is the construct we are trying to measure valid?

i.e. does the construct itself exist?

The validity of a construct is supported by cumulative research evidence collected over time.

Together, supporting the existence of the construct itself.

In the short term construct valdiity can be assessed in terms of:

Convergent validity: correlates with tests of related constructs.
Discriminant validity: doesn't correlate with tests of different constructs.

53 of 65

Samples vs. Populations

Most psychology students will obtain a sample and then try to generalise to the population.

What is the population of interest?
Is the sample representative?
Is the sample free from bias?

54 of 65

Populations

The entire collection of people, animals, plants or objects that we are interested in, sharing a common characteristic.

Defined by population parameters, i.e. measurements which describe the population.

Vary in size, e.g. all students, all university students, all university psychology students.

55 of 65

Samples

A selection of individuals from the larger population.

For any population there are many possible samples
Vary in size, e.g. 100 students, 50 students, 20 students.

Defined by sample statistics, i.e. measurements which describe the sample.

Sample statistics are used to infer population parameters.

56 of 65

Why do we Sample?

Number of reasons

Time (difficult to collect data from everyone quickly, though not impossible, e.g. vote, census)
Money (expensive to collect data from everyone, i.e. production costs, payment to participants)
Access (not always possible to reach all members of a population)
Sufficiency (pattern of results don't change much even if we have data from everyone)

57 of 65

Random Sample

The gold standard

Each member of the population has an equal chance of being selected

Usually quasi-random

58 of 65

Systematic Sampling

Draw from the population at fixed intervals

Problematic in populations with a periodic function.

59 of 65

Stratified Sample

Proportional

Specified groups appear in numbers proportional to their size in the population

Disproportional

Specified groups which are not equally represented in the population, are selected in equal proportions.

60 of 65

Cluster Sample

Researcher samples an entire group or cluster from the population of interest.

61 of 65

Opportunity/Convenience Sample

People who are easily available

But can lead to a biased sample

62 of 65

Snowball Sampling

Recruit small number of participants and then use those initial concepts to recruit further participants.

Biases the sample, but useful if you want to recruit very specific populations.

63 of 65

External Validity

We gather data from samples in order to infer population parameters.

External validity refers to the ability to generalise our results.

Population validity: is our sample representative?
Ecological validity: does the behaviour measured reflect naturally occurring behaviour?

64 of 65

Sample Size

Size matters!

Sampling error can result if your sample is not large enough

Trade off between size and time/cost

Factors in deciding on sample size:

Analysis (subjects design, number of IVs or IV levels)
Response rate
Heterogeneity of population

65 of 65

Theories vs. Hypotheses

Theories

Hypotheses

Generating Hypotheses

Hypothetico-Deductive Model

Constructs

Variables

Constructs vs Variables

Scales of Measurement

Nominal (Categorical) Data

Ordinal Data

Interval Data

Ratio Data

Experimental Methods

Independent Variables

Dependent Variable

Subjects Design

Between-Subjects Design

Considerations

Within-Subjects Design

Considerations

Factorial Designs

Fully Independent Factorial Designs

Fully Repeated Measures Factorial Design

Fully Repeated Measures Factorial Design

Factorial Mixed Designs

Choice in Experimental Design

Between-Subjects Design without Random Allocation

Matched Pairs

Within-Subjects Design without Counterbalancing

Pretest Posttest Control Group Design

Measurement Error

Extraneous Variables

Confounding Variables

Threats to Internal Validity

Selection

Threats to Internal Validity

Validity - Reactivity

Causation: Necessity and Sufficiency

Necessary not Sufficient

Sufficient not Neccesary

Necessary and Sufficient

True Causation

How Good are my Measures?

Reliability and Validity

Test-Retest Reliability

Inter-Rater Reliability

Parallel Forms Reliability

Internal Consistency

Content Validity

Face Validity

Criterion Validity

Construct Validity

Samples vs. Populations

Populations

Samples

Why do we Sample?

Random Sample

Systematic Sampling

Stratified Sample

Cluster Sample

Opportunity/Convenience Sample

Snowball Sampling

External Validity

Sample Size

Comments

Related discussions on The Student Room

Similar Psychology resources: