# Statistics Key Words 9-1

?

## Raw Data

Data just as it has been collected

1 of 77

## Quantitative data

Numerical observations or measurements such as 10, 5.5 or 39cm

2 of 77

## Qualitative data

non-numerical observations such as blue or cat

3 of 77

## Continuous data

Quantitative data which can take any value on a continuous scale such as length or mass

4 of 77

## Discrete data

Quantitative data which can only take particular values such as shoe size or number of pets

5 of 77

## Categorical data

Can be sorted into non-overlapping categories

6 of 77

## Bivariate data

Involves pairs of related data e.g height and weight

7 of 77

## Multivariate data

Involves sets of three or more related data values e.g a plants colour, leaf size and height

8 of 77

## Ordinal data

Can be written in order or can be given a numerical rating scale

9 of 77

## Primary data

Data collected by, or for, the person who is going to use it

10 of 77

## Secondary data

Has been collected by someone else e.g websites, newspapers ect.

11 of 77

- collection method known

- accuracy known

- can be very specific questions

12 of 77

- Time consuming

- Expensive to collect

13 of 77

- easy to obtain

- cheap to obtain

- data from organisations can be more reliable

14 of 77

- method of collection unknown

- data might be out of date

- may contain mistakes

- may be unreliable

- may be difficult to find specific answers

15 of 77

## Population

everything or everybody that could possibly be involved in an investigation e.g a delivery company wants information about the number of miles travelled by its lorries. its population would  be all the company lorries.

16 of 77

## Census

a survey or investigation with data taken from every member of a population

17 of 77

## Sample

you can take a saple from a population. It contains information about part of the population and can be used to make conclusions for the whole population.

18 of 77

- unbiased

- accurate

- takes whole populace into account

19 of 77

- time-consuming

- expensive

- difficult to ensure the whole population is used

- lots of data to handle

20 of 77

- cheaper

- less time consuming

- less data to be considered

21 of 77

- not completely represented

- may be biased

22 of 77

## Sampling Units

people or items that are to be sampled

23 of 77

## Sampling Frame

is a list of all sampling units

24 of 77

## Capture Recapture

first capture/total number = number tagged/second capture

25 of 77

## Capture recapture Assumptions

- the population hasn't changed

- the probability of being caught is equal for each individual

- marks/tags are not lost and are always recognisable

- the sample size is large enough to be representative of the population

26 of 77

## Random Sampling

every member of the population has an equal chance of being picked

27 of 77

## Judgement Sampling

non-random sampling where you use your judgement to select a representative of the population

28 of 77

## Opportunity sampling

non random sampling where you use the people available at the time

29 of 77

## Cluster Sampling

Non random sampling when the data naturally splits into groups

30 of 77

## Systematic sampling

Non random sampling where you choose a random starting point from the frame and then choose regular intervals e.g every 5th person

31 of 77

## Quota Sampling

non random sampling where you group the population by characteristics like hair or gender and interview a group from each group

32 of 77

## Stratified sampling

Non random sampling which contains members of each group in proportion to the size of the group

33 of 77

## How to decide which sampling method to use

- biased?

- Sensible sample size?

- Quick and easy?

- Expensive?

34 of 77

## Direct observation

collect primary data systematically as you observe them

35 of 77

## data collection sheet

a table or tally chart for recording results

36 of 77

what you change

37 of 77

## response/dependant variable

the one that is affected

38 of 77

## extraneous variable

variables you are not interested in nut could change the results

39 of 77

## replicating

if repeating the experiment gives you very similiar results, they are likely to be reliable

40 of 77

## simulation

you can use simulation to model random real life events to help you predict what would actually happen. simulation can be easier and cheaper than analysing real data.

41 of 77

## laboratory experiments

conducted in control environments

advantage: easy to be replicated, you can control extraneous variables

disadvantage: test subjects may behave differently

42 of 77

## field experiments

carried out in subjects everyday environment, controls one or more variables

advantages: more likely to reflect real life behavior

disadvantages: cant control some extraneous variables

43 of 77

## natural experiments

carried out in subjects everyday environment with no variables controlled

advantages: more likely to reflect real life behaviour

disadvantages: cant control any variables, harder to replicate

44 of 77

## Questionnaire

Set of questions designed to obtain data

45 of 77

## Respondant

Person completing questionnaire

46 of 77

## Open and Closed Questions

Closed: gives answers to choose from

47 of 77

## Open and Closed Questions

Closed: gives answers to choose from

48 of 77

## Pilot survey

Conducted on a small sample to test the design and methods of the survey

49 of 77

## Closed questionnaires

often involve an opinion scale. The problem with an opinion scale is that most people answer somewhere in the middle

50 of 77

## Interview

Advantages: interviewer can explain questions, respondant can explain answers, high response rate.

disadvantages: respondants may be less honest, can take a long time, sample size is smaller, respondants could try to impress interviewer

51 of 77

## Anonymous Questionnaire

advantages: more likely to be honest, takes much less time, large sample size, less bias

disadvantages: respondant may not understand question, lower response rate due to ability to skip question

52 of 77

## anomalous data value

a value that does not fit the pattern of the data

53 of 77

## Cleaning data

identifying and either correcting or removing inaccurate data values or extreme values. Removing units or other symbols from data. you decide what to do with the data.

54 of 77

## Control group

to test effectiveness of a treatment

55 of 77

## Matched pairs

Two groups of people are used to test effects of a particular factor. They are paired with someone similiar to them in the opposite group (e.g same hair, intelligence, gender ect.).

56 of 77

## Hypothesis

An idea that can be tested by collecting and analysing data

57 of 77

## Factors in designing investigations

- time

- cost

- ethical issues

- confidentiality

- convenience

- how to select population/ sample

- how to deal with non-response

- how to deal with unexpected results

58 of 77

## Two-Way Tables

Shows information in two categories

59 of 77

## Composite Bar Chart

Each bar shows how the frequency for that category is made up from different component groups. the total frequencies and the frequencies of each component group can be compared.

60 of 77

## Comparative Pie Charts

Can be used to compare two sets of data. The areas of the two circles should be in the same ratios as the two total frequencies. To compare the total frequencies, compare the areas. To compare proportions, compare the individual angles.

61 of 77

## Index number

index number = price/base year price x 100

62 of 77

## retaill price index

rate of change of prices in everyday life (ie. morgage, food, heating)

63 of 77

## consumer price index

same as retail price index but does not include morage payments

64 of 77

## gross domestic product

the value of goods and services a country produces within a time period

65 of 77

## weighted index number

weighted index number = current weighted mean price/base year weighted mean price x 100

66 of 77

## chain base index number

chain base index number = price/last years price x100

67 of 77

## crude rate

crude rate = number of (deaths/births/etc.)/total population x1000

68 of 77

## standard population

standard population = number in age group/total population x 1000

69 of 77

## standardised rate

standardised rate = crude rate/1000 x standard population

70 of 77

## risk

number of trials where the event happens/ total number of trials

71 of 77

## relative risk

risk for those in that group/ risk for those not in the group

72 of 77

## exhaustive

all possible outcomes for a set of mutually exclusive, exhaustive events, the probability must add to 1

73 of 77

## binomial distribution

B(n,p)

n = the number of trials

p= the probability of success

it follows the pattern of pascals triangle

74 of 77

## seasonal effect

real value - value from the trend line

75 of 77

## persons product moment correlation coefficient

measures the linear correlation between bivariate data.

measured between -1 and 1

76 of 77

## standardised score

x - mean / standard deviation

77 of 77