Back to quiz

6. Which statement is incorrect about clustered colomn charts?

  • The standard deviations are different within the sample datasets because it does not change with sample size but the standard error bars are smaller for the full data sets as there is more data to balance it out
  • The more datasets there are the less unreliable it is
  • When the error bars are confidence intervals, the rule of thumb is that two means differ significantly if each mean lies outside the confidence interval of the other one.
  • Clustered coloumn charts show four variables, length which changes continuously, country which has four categories, market which has two levels and also the mean of each
  • Each cluster show the the mean and the error bars

7. Which is not am R method of collecting a sample?

  • Reduction- making sure you use a precise representative sample size
  • Replication- need enough, gives us the power to achieve confidence in conclusions and ability to state mean with precision. Depends on the statistical error and variability in the sample data and also as many as possible given the time and resources.
  • Randomisation- this is where you avoid bias in order for estimates to be accurate. Each participant has an equal chance of being chosen

8. Define confidence

  • The application of probability levels to statements made about the sample and whether they fit with the true population parameters
  • Statistics from samples for population
  • Statistical methods used to make statements about the poplation
  • The process of moving from info about samples to statement about population

9. Which is not a reason for principle of data display?

  • Make the data look appealing to interpret
  • Data junk can be distracting
  • Important information indicated by annotations adds value
  • Well annotated illustrations saves lengthy worded captions
  • The reader is less familiar than you are
  • High quality data display gives a clear understanding
  • Different ways of illustrating data can reveal and hide different things

10. Why do you use a two tails test?

  • In order to incorporate both tails of the dataset
  • To work out the probability that we are at least 4 standard errors away from the sample mean, not that we are 4SE to the right of the sample mean.
  • to work out the probability that we are at least 4 standard errors away from the sample mean, not that we are 4SE to the left of the sample mean.

11. Which is not a characteristic of the explained and unexplained variance of the graph?

  • We can think of the difference between the two graphs as the amount of the overall variation that has been accounted for by using the country mean, instead of the overall mean.
  • The difference (1.32 – 1.08) is 0.24, and this is the variation we have accounted for. The remaining 1.08 is not explained by using the country mean instead of the overall mean: it is unexplained variance.
  • We have reduced the unexplained variance
  • There is more variation remaining to be accounted for (‘explained’) when the country mean is used than when the overall mean is used.
  • There is less variation remaining to be accounted for (‘explained’) when the country mean is used than when the overall mean is used.

12. Which statement is incorrect about reporting CI?

  • The mean length of rice grains in the sample was 6.95 mm (95% CI: lower limit = 6.75 mm, upper limit = 7.15 mm, N = 100).
  • The mean was 6.75 mm at 95% CI and N=100
  • 6.75 mm is less than mean length of rice grains in the sample is less than 7.15 mm (95% CI, N = 100).
  • Without this information we cannot sensibly interpret the statistic. Note that it is good practice to report the sample size, for similar reasons.
  • The mean length of rice grains in the sample was 6.95 ± 0.20 mm (95% CI, N = 100).
  • All sample statistics have confidence intervals and a statistic such as a sample mean is only meaningful when reported with its CI (or SD or SE, depending on context).

13. Which statement is incorrect about determining t?

  • The critical value of t is completely different from the tabulated t value
  • The areas that lie outside the confidence interval lines are symmetrical and are known as the tabulated t-value which is also found in tables
  • Decide what level of confidence you want
  • Calculate your degrees of freedom by how many population parameters you have calculated

14. Which is not of TI and TII error of the Bonferroni correction?

  • The chance of a Type II error rapidly increases. When we reduce alpha for each test, we lose power to find any real differences that exist
  • We need a test that compares lots of means at the same time, with one overall TI error rate of 5%, so have a good chance of detecting any real differences that exist. This is called analysis of variance- ANOVA and is based on the F test
  • Controls the TI rate
  • Controls the TII rate
  • We may well end up with no significant differences simply because we made it too difficult to find them. Type I and Type II errors trade off against each other.

15. What is nominal data?

  • Where you rank items that you measure depending on which has a more or less of an influence that we want to measure. Intervals are not necessarily equal and there is not true zero point
  • Where there are equal intervals between the data and an absolute zero, eg: time
  • Where you allocate a score to a category and it indicates a group of data
  • Where there are equal intervals of data on a continuous numerical scale, eg: farenheit

16. Which is not a feature of ANOVA comparing variances with an F test?

  • F test compares variances by variance 1 + variance 2
  • The F test needs to range from infinity because sometime the explained variance can be smaller than the unexplained variance
  • F test compares variances by variance 1 - variance 2
  • ANOVA classes variance 1 as the variance accounted for (model) and variance 2 is the error variance (not accounted for)

17. What is rounding down also known as?

  • Decline
  • Truncation
  • Summation
  • Simplifying

18. Which statement is incorrect as to why should P values not be written on their own?

  • Because of the variability of data
  • Give info on only two of these to support P value
  • Because of the sample size
  • Because of the size of difference

19. Which is not a behaviour of the degrees of freedom in equal and unequal variances?

  • When the variances, sample size and mean of the two samples are equal, the df reduction is more
  • If both the variances and the sample sizes are different, df ranges from a little less than Na+Nb–2 when the bigger sample has the bigger variance to a lot less when the smaller sample has the bigger variance.
  • The main thing to remember about the d.f. in the unequal-variances t-test is that it can be fractional.
  • The formula is used to allow for the fact that the variances (and often the sample sizes) are not equal. The adjustment produces what we can call ‘effective’ or ‘adjusted’ degrees of freedom. T distribution can take fractional d.f values
  • If the variances are different and the sample sizes are the same, df is less than Na+Nb–2, the reduction being more when the difference in variance is bigger.
  • The formula for d.f. in the unequal-variances two-sample t-test is quite complex
  • If the variances are the same and the sample sizes are different, df is less than Na+Nb–2, the reduction being more the more the difference in sample size.

20. Which statement is incorrect about a good measure?

  • A good measure should increase with sample size increase
  • It should increase when more variation
  • A good measurement is independent of its variation
  • Extreme data values should have a moderate influence on the statistic
  • Extreme ata should not influence the statistics
  • A good measure of variation should change as the sample changes