Probability distributions/transformations

?

Interpreting p values

P = probability

Probability of null hypothesis being true

Null hypothesis for Shapro-Wilk: normal distribution

Let's set alpha = 0.05 (cut off for significantly unlikely results)

The p > 0.05, we do not have sufficient evidence to reject null hypothesis (that distribution is normal)

This does not mean that we are certain that it is normal.

1 of 16

Probability distributions

Many different types

Important to estimate

 - Event probabilities - p values, outliers

 - Uncertainty - p values, confidence intervals.

Common types in psychology

 - Normal; lognormal; binomial; Poisson

2 of 16

The log-normal distribution

- Where logrithm of data is normally distributed

- Assymmetrical with right skew (long tails)

 - Parameters: mean, SD

 - High chance of high values than normal

 - Common in biology, finance, natural events

- Log-transformation yields normal data.

3 of 16

Binary data

- Only two values possible, usually recorded as 0 and 1.

Key parameter: probability of success (p)

        - Often analysed as normal percentage data

Problem: largest variance around p = 0.5, little variance as we go towards p=0 or p=1, hence skewed distribution.

4 of 16

Binomial distribution

- For binary data

 - Parameter: P = success probability

                    N = number of trials

- Assymmetrical is p = 0.5

- Assymmetry increases towards the extremes (p=0, or p=1)

5 of 16

Count data

- Count of events in fixed period of time or space.

 - Only whole numbers are possible.

Key parameters: Number of events counted. Often analysed as normal data.

Problems: Assymmetrical distribution when average count is very low (0.5) that we cannot assume its normal.

6 of 16

Poisson distribution

Probability that a given number of events occur independently in a fixed/space interval.

Single parameter 'lambda': expected number of events, based on average rate and interval size.

Assymmetrical

Related to binomial

For event counts and single detection theory in psychology.

7 of 16

Importance of probability distributions

- Statistical tests to calculate probalities based on known probability desnsity function (PDF)

PDF - a function that necessary to understanding whether a value of a variable you're measuring lies within the same unterval so area that that function runs through.

- Hence, accurate statistical results depend entirely on using the most appropriate PDF to describe your data.

- Selection of appropraite PDF guided by:

          - Type of data (e.g. binomial for binary data)

          - Match to shape of data distribution (e.g. histogram)

8 of 16

Practical uses of knowing your data os one of the

- Specific stats tests for group comparisons

        - binomial test equivalent to t-test for binomial data

        - exact rate ratio test for Poisson data

- More generic non-parametric methods largely rely on ranked data

- Generalised linear models are multiple regression models where you can specify the distributions type of your outcome data, no transformation need.

9 of 16

Confidence intervals

- Symmetrical of normal distribution

- 95% confidence interval often used especially in Psychology.

       - Based on 2SD (1.96) distance from the mean

       - 2 sigma rule (SD)

- 3 sigma (99.7% confidence) usually considered near-certain.

10 of 16

Outliers

Depend of definition

 - Very strict definition for normal data can be baswed on 2SD from mean 95% confidence interval)

      - If you recorded 100 values, how many would be falsly considered as outliers? 

      - What effect would the 2SD have on the SD of your data?

      - What is the likely effect on significance test?

 - 3SD (inner 99.7% of values) are often used.

 - Often better to transform data and not exclude outliers or use both strategies.

- Run analyses with and without outliers to check if results are robust or driven by potential outliers.

11 of 16

Data transformations

- Can scale data without 'affecting distribution shapes e.g. z-score transformations.

- Change shape of your data distribtion, e.g. log transformation.

- Should not re-order values (larger must remain larger)

- Transformation data may be better suited for analysis.

      - if better fit to a known pdf.

12 of 16

Log transformations

- When data has right-hand skew and could be log-normal.

- Reduces larger values more than smaller values, hence reduces skew.

13 of 16

Z-score transformations

To normalise scale of distribution to mean =0, SD = 1.

Does not affect shape of the distribution

14 of 16

Rank transformations

- Assign 1 to lowest value, 2 to the next etc.

- Same scores for same values or tie breaking methods.

- Generic way to deal with heavily skewed or difficult data.

- E.g. identical rank scoresfor log-normal and normal from examples above.

15 of 16

Data cleaning

Always check your data before analysis

First make sure you have resonable values - e.g. height in cm should not contain 0 or 354. Unreasonable numbers should be deleted with no number left in cell, not in a 0.

Then check for floor/cieling effects - were data points bunch up around lowest/highest possible values.

Then check distribution shape -apply transformations if necessary. Deal with outliers - filter or delete. Possibly run analyses with and without outliers to determine their effect on overall result, robust effect should not change.

16 of 16

Comments

No comments have yet been made

Similar Psychology resources:

See all Psychology resources »See all Visual System resources »