Statistics 01

?

Types of Data

Nominal:

  • Unrelated categories
  • No numerical relationship/order
  • Example = What type of pet do you own?

Ordinal:

  • Has an order or sequence
  • Cannot do maths with it
  • Example = How is your health (good, bad, reasonable)

Scale:

  • Goes in a specific order
  • Can do maths with it
  • Example = What is your height?
1 of 20

Measures of Central Tendency

Mode = Most

  • Most frequent number
  • One mode = Unimodal
  • Two modes = Bimodal

Median = Middle

  • Sort the data from lowest to highest, find the middle
  • Cannot have more than one media, instead take the mid value

Mean = Average

  • Add all the scores together and divide by how many there is

What are Outliers?

  • Description of the data can be affected by extreme scores
  • Use mode or median or remove extreme values
2 of 20

Which data to use?

Nominal = Mode

Ordinal = Mode/Median

Scale = Any

3 of 20

Measures of Dispersion

Range

  • Highest value - Lowest value 
  • Dispersion of a score - How much spread there is

Variance

  • Sum of squared differences from the mean divided by n-1
  • Working out variance = Work out the mean, Take it from each score and square the number, Add up all the square mean totals, Divide by how many scores in total. 

Standard Deviation

  • The square root of Variance
4 of 20

Histograms & Distributions

  • Shows how an attribute is distributed 
  • A histogram is not a bar chart 
  • Plot the number of percentage of observations at each level of the measure
  • Different histograms - Different bins (2,3,4, or 10-20), Split by another variable

Known attributes:

  • Symmetrical around the mean
  • Mean, median, and mode are equal
  • Bell shaped curve

Normal Distribution:

  • Population data is often assumed to be normally distributed 
  • This means that if we pull out a sample from a population, it is likely to be somewhere around the mean 
  • For this reason, we can use sample means as an estimate of population means
5 of 20

Asking questions about people

Associations

  • Do more intelligent people have more facebook friends?
  • Is there a relationship between study hours and exam results?

Differences

  • Do those who break the law have a higher level of extraversion
  • Are there sex differences in IQ?
  • These are questions about differences in populations
6 of 20

Testing the Null Hypothesis

Null = true

  • We don't know what the population looks like
  • If the null is true, then its highly likely that both of our means would come out somewhere in the middle 
  • So we might get a difference, but be unable to reject the null as our data could easily have come from populations with no difference 
  • If the null is true, It's highly unlikely that our means would come out at the extremes 
  • So if they do then we have to conclude that the null is not true and reject it as a model of the data 

How do we do this?

  • We never know whether sample mean is higher, low or same as the population mean
  • Inferential statistics use the size of the sample difference, variability in the sample data and number of participants to tell us...

"The probability of getting the observed or more extreme results, given that the null hypothesis is true"

7 of 20

Likelihood = Probability

100% chance = 'p = 1.00'

50% chance = 'p = .50'

10% chance = 'p = .10'

  • If P>.05 then the difference is not significant, because the chance of pulling these two samples from two identical populations is more than 5%

5% chance = 'p = .05'

1% chance = 'p = .01'

<5% chance = 'p = <.05'

  • If P<.05 then the difference is signficant, because the chance of pulling these two samples from two identical populations is less than 5%
8 of 20

Two outcomes: Significant/Not significant

Significant

  • In our sample, we get a big difference between the two sample means with low variance:
  • The likelihood of getting this data from a population with no real difference would be very low (p<0.5 - less than 5%)
  • So it's unlikely enough that the populations are the same that we can reject the null
  • Our difference is significant and evidence there is a difference in the population

Not Significant

  • In our sample, we get a small difference between the two sample means, with high variance:
  • The likelihood of getting this data from a population with no real difference would be high (p>.05 - more than 5%)
  • So it's possible the population means are the same, and we fail to reject the null
  • Our difference is not significant and no evidence that there is a difference in population
9 of 20

Z-Scores

What is a Z-score?

  • A particular value expressed as the number of standard deviations that it lies away from the mean 
  • Example = Mean (10); SD (2); Your score (8); Z-score (-1)

What if your Z-score is not clearly shown on the histogram?

  • Using look-up tables 
  • For positive z-scores, read off the probability of obtaining that z-score or below
  • For negative z-scores, take "1-" probability to get the probability to get the probability of obtaining that negative z-score or lower 
  • Example for negative - Z-score (-1.52); P(0.936) "1 - 0.936 = 0.064" - 6.4% chance of scoring z of -1.52 or below

What else can we do with Z-scores?

  • Example = Calculate probability of earning between A "Z-score of -2 & 1"
  • Below 1 = 84%, Below -2 = 2.2% (Below 1 - Below 2 = 81.8%)
10 of 20

Choosing Inferentials

The test you want depends on:

  • The type of data you have
  • Whether you are looking for a difference or relationship between variables
  • How many conditions you have
  • Whether the data for those conditions come from different groups of people (between subjects) or the same people (within subjects)
11 of 20

Independent t-test

Independent t-test:

  • Scale data
  • Looking for a difference 
  • Two conditions
  • Between subjects

What matters?

  • The size of the difference - Bigger difference = more likely to be significant 
  • The variance within each group - Smaller variance = more likely to be significant difference
12 of 20

T-test

  • A t-test essentially compares the within condition variance with the between condition variance 
  • Difference between groups ÷ Variance within groups = t
  • Big difference between groups ÷ Small variance within groups = Big t value 

All you need for a t-test

  • Hypothesis
  • Scale data from two groups
  • For each: Mean, variance, number of values

Calculating the t-test

  • Collect sample data
  • Test the null hypothesis
  • How likely is it that we would get the observed sample difference from a population in which the null hypothesis was true? 
  • If it's very unlikely p<.05, then we can reject the null
13 of 20

Degrees of Freedom

  • Degrees of freedom come up with most statistics 
  • Calculation varies by statistic
  • Broadly a measure of sample size

If n = 10...

For independent t-test

  • DF = (n1 - 1)+(n2 - 1)
  • DF = (10-1)+(10-1)
  • DF = 9+9
  • DF = 18

For paired t-test

  • DF = n - 1
  • DF = (10 - 1) 
  • DF = 9
14 of 20

Assumptions of the Independent t-test

Types of variable, Random sampling, Normal distribution, Homogeneity of variance

Types of variable

  • IV must be categorical
  • DV must be scale 

Random sampling

  • Quasi-random selection from the population
  • Not truly random, but no bias in allocation to groups or inclusion in experiment 
  • No participant can be in both conditions 

Normal Distribution

  • The DV should be normally distributed in each group
  • Much of our rational depends on this

Homogeneity of Variance

  • The two groups should have similar variances
15 of 20

Paired t-test

  • Average size of change for each individual
  • Don't have to worry about individual differences
  • Every value is hooked up to its equivalent in the other condition 
  • Dealing with differences between values
  • Score change ÷ variance of change = t

Calculating a paired sample t-test

  • Mean difference = condition 1 - condition 2
  • Did everyone get exactly the same difference?
  • Did difference vary widely? 
  • Lots of variance means we can't be sure that the difference will go in the same direction in the population
  • Look up on table, t & DF

Reporting t-test - Cabers were thrown significantly further when contestants wore trainers         (M = 11.70, SD = 3.86) than when they wore high heels (M = 5.00, Sd = 2.40), t(9) = 3.87, p<.01

16 of 20

Assumptions of the Paired t-test

Types of variable, random sampling, normal distribution, homogeneity of variance

Types of variable

  • Iv must be categorical
  • DV must be scale

Random sampling

  • Quasi-random selection from the population
  • Not truly random, but no bias in inclusion in experiment
  • Every participant must be in both conditions

Normal distribution

  • The differences should be normally distributed 

Homogeneity of variance

  • The two conditions should have similar variances
17 of 20

The steps in Chi-Squared

1. Calculate frequencies (observed values) - Add up how many in each combination

2. Calculate frequencies we would expect if the null is true (expected values)

3. Calculate how far observed are from expected (x squared)

4. Calculate DF

5. Look up critical (x squared) in look-up table 

18 of 20

Chi Squared - DF

Calculate x squared

  • We need to think about how the observed values differ from the expected values

Calculate Degrees of Freedom

  • (number of columns - 1) x (number of rows - 1) = 1x1 = 1

Reporting your x squared results -

  • Analysis using a Chi-Square test shows no significant relationship between sex and smoking, x squared (1, N = 50) = 0.927, p = 0.34
19 of 20

Assumptions of Chi Squared

Random sampling, Sample size, and expected cell count

Random sampling

  • The sample data is a random sampling from a population

Sample size

  • A sample with a sufficiently large size is assumed. If a chi-squared test is conducted on a sample of a smaller size, then the chi-squared test will yield an inaccurate inference

Expected cell count

  • Adequate expected cell counts. Some require 5 or more, and others 10 or more. A common rule is 5 or more in all cells of a 2-by-2 table and 5 or more in 80% of cells in larger tables, but no cells with zero expected count
20 of 20

Comments

No comments have yet been made

Similar Psychology resources:

See all Psychology resources »See all Central Tendency & Dispersion resources »