# Data Analysis

PY3 Data analysis Revision

## Stages of Data Analysis

Data analysis is completed in two stages; descriptive statistics and inferential statistics.

** Descriptive Statistics** – The extent to which we can be sure that changes in the DV are caused by changes in the IV and nothing else.

** Inferential Statistics** – The extent to which scores recorded on a scale are matched by scores on an equivalent, well respected scale.

## Descriptive statistics: Levels of Data

There a variety of ways in which we can describe the data we have collected. Put simply, we can illustrate the data on a graph, find out what the average score was in each level of the IV, (central tendency) and find out how the data is spread, (dispersal). To decide which methods we should use to describe data, it is often useful to know what type of data we have.

** Nominal Data** – A level of data where scores of participants have been grouped into

**categories.**

** Ordinal Data** – Data where it is possible to

**rank**scores in order.

** Interval data** – Data where each unit on a scale represents an equal interval, i.e. represents exactly the same quantity of the thing being measured, such as centimetres, seconds etc

## Descriptive statistics: Measures of Central Tenden

** Measure of central tendency** – A way of establishing a central point in a set of data, i.e.

**mean, mode, median**.

* Mean* - an average of n numbers computed by adding some function of the numbers and dividing by some function of n

Advantage; Takes into account all the values in the data set so is most sensative to variations in the data

Disadvantage; Can be artificially affected by outliers

** Median** - Denoting the middle term of a series arranged in order of magnitude

Advantage; Less distorted by extreem value

Disadvantage; Less sensitive to variables in the data

** Mode** - The value that occurs most frequently in a given set of data

Advantage; Only measure for summarising category/frequency data

Disadvantage; For many data sets there is no modal value

## Descriptive statistics: Measures of Dispersal

** Measure of dispersal** – A way of establishing how widely a set of data is spread around the mean, i.e.

**range, standard deviation.**

*Range***- **The area of variation between upper and lower limits on a particular scale

Advantage; Quick and easy to calculate

Disadvantage; Distorted by extreem values

* Standard Deviation* - A quantity calculated to indicate the extent of deviation for a group as a whole

Advantage; Most sensitive of measures of dispertion

Disadvantage; More time consuming to calculate

## Inferential statistics

Descriptive statistics can give us some idea about what has happened in a piece of research. However, we still need to determine whether or not the results are significant. Significance is a statistical concept which is usually stated as **p≤0.05.** For most research, this equation can be translated as ‘a less than 5% probability that differences occurred by chance’. (in correlational research it translates as ‘a less than 5% probability that relationships between co-variables occurred by chance’). Remember that ‘p’ stands for probability and ‘0.05’ is the same as 5%.

Inferential statistics tell us if differences or relationships are significant by comparing an observed value with a critical value.

## Inferential Statistics: Key Words

** Observed value** – The result of a statistical test. This is a numerical representation of the difference between scores in two levels of an IV, (or the strength of a relationship between co-variables).

** Critical value** – The minimum statistical value required for a difference between scores in levels of an IV, (or the strength of a relationship between co-variables) to be considered significant.

*p= ≤0.05***–** the level of probability used to establish critical values for inferential statistics.

## Choosing a statistical test

To decide which statistical test is most appropriate, (or to justify why a test was chosen) you need to answer 2 or 3 questions about the research and the data.

**Is this a test of difference or a test of relationship (i.e. a correlation)?****If it is a test of difference, was the experimental design independent groups or repeated measures.****Was the level of data nominal or ordinal / interval.**

* Mann Whitney* (Difference / Independent / Interval or Ordinal)

*(Difference / Independent / Nominal)*

**Chi Squared****(Difference / Repeated / Interval or Ordinal)**

*Wilcoxon***(Difference / Repeated / Nominal)**

*Sign**(Correlation / Interval or Ordinal)*

**Spearmans***(Cprrelation / Nominal)*

**Chi Squared**## Using Observed and Critical Values

Whichever statistic you choose, this will provide you with an observed value for your data. You then need to compare this number with a critical value. The critical value indicates the level needed to be sure that there was less than a 5% probability that either; differences between levels of the IV occurred by chance, or that relationships between co-variables occurred by chance (for correlations).

Some tests require that the observed value, (what you got) is less than the critical value to be significant, whilst others require the observed value to be greater than the critical value.

Here are the rules for the tests you need to know:

**Chi-Square, Spearmans** – Observed value needs to be **greater than** the critical value for results to be significant. (R in name of test)

**Mann-Whitney, Sign test, Wilcoxon** – Observed value needs to be **less than** the critical value for results to be significant. (No R in name of test)

