Statistics Key Words 9-1

Raw Data
Data just as it has been collected
Quantitative data
Numerical observations or measurements such as 10, 5.5 or 39cm
Qualitative data
non-numerical observations such as blue or cat
Continuous data
Quantitative data which can take any value on a continuous scale such as length or mass
Discrete data
Quantitative data which can only take particular values such as shoe size or number of pets
Categorical data
Can be sorted into non-overlapping categories
Bivariate data
Involves pairs of related data e.g height and weight
Multivariate data
Involves sets of three or more related data values e.g a plants colour, leaf size and height
Ordinal data
Can be written in order or can be given a numerical rating scale
Primary data
Data collected by, or for, the person who is going to use it
Secondary data
Has been collected by someone else e.g websites, newspapers ect.
Advantages of Primary Data
- collection method known - accuracy known - can be very specific questions
Disadvantages of Primary Data
- Time consuming - Expensive to collect
Advantages of Secondary Data
- easy to obtain - cheap to obtain - data from organisations can be more reliable
Disadvantages of Secondary Data
- method of collection unknown - data might be out of date - may contain mistakes - may be unreliable - may be difficult to find specific answers
everything or everybody that could possibly be involved in an investigation e.g a delivery company wants information about the number of miles travelled by its lorries. its population would be all the company lorries.
a survey or investigation with data taken from every member of a population
17 of 76
you can take a saple from a population. It contains information about part of the population and can be used to make conclusions for the whole population.
Advantages of a Census
- unbiased - accurate - takes whole populace into account
19 of 76
Disadvantages of a Census
- time-consuming - expensive - difficult to ensure the whole population is used - lots of data to handle
Advantages of a sample
- cheaper - less time consuming - less data to be considered
Disadvantages of a sample
- not completely represented - may be biased
Sampling Units
people or items that are to be sampled
Sampling Frame
is a list of all sampling units
Capture Recapture
first capture/total number = number tagged/second capture
Capture recapture Assumptions
- the population hasn't changed - the probability of being caught is equal for each individual - marks/tags are not lost and are always recognisable - the sample size is large enough to be representative of the population
Random Sampling
every member of the population has an equal chance of being picked advantage: its fair and unbiased Diadvantage: needs large sample size
Judgement Sampling
non-random sampling where you use your judgement to select a representative of the population
Opportunity sampling
non random sampling where you use the people available at the time
Cluster Sampling
Non random sampling when the data naturally splits into groups
Systematic sampling
Non random sampling where you choose a random starting point from the frame and then choose regular intervals e.g every 5th person
Quota Sampling
non random sampling where you group the population by characteristics like hair or gender and interview a group from each group
Stratified sampling
Non random sampling which contains members of each group in proportion to the size of the group
How to decide which sampling method to use
- biased? - Sensible sample size? - Quick and easy? - Expensive?
Direct observation
collect primary data systematically as you observe them
data collection sheet
a table or tally chart for recording results
explanatory/independant variable
what you change
response/dependant variable
the one that is affected
extraneous variable
variables you are not interested in nut could change the results
if repeating the experiment gives you very similiar results, they are likely to be reliable
40 of 76
you can use simulation to model random real life events to help you predict what would actually happen. simulation can be easier and cheaper than analysing real data.
laboratory experiments
conducted in control environments advantage: easy to be replicated, you can control extraneous variables disadvantage: test subjects may behave differently
field experiments
carried out in subjects everyday environment, controls one or more variables advantages: more likely to reflect real life behavior disadvantages: cant control some extraneous variables
natural experiments
carried out in subjects everyday environment with no variables controlled advantages: more likely to reflect real life behaviour disadvantages: cant control any variables, harder to replicate
Set of questions designed to obtain data
45 of 76
Person completing questionnaire
Open and Closed Questions
Open: has no suggested answers Closed: gives answers to choose from
Pilot survey
Conducted on a small sample to test the design and methods of the survey
Closed questionnaires
often involve an opinion scale. The problem with an opinion scale is that most people answer somewhere in the middle
Advantages: interviewer can explain questions, respondant can explain answers, high response rate. disadvantages: respondants may be less honest, can take a long time, sample size is smaller, respondants could try to impress interviewer
50 of 76
Anonymous Questionnaire
advantages: more likely to be honest, takes much less time, large sample size, less bias disadvantages: respondant may not understand question, lower response rate due to ability to skip question
anomalous data value
a value that does not fit the pattern of the data
Cleaning data
identifying and either correcting or removing inaccurate data values or extreme values. Removing units or other symbols from data. you decide what to do with the data.
Control group
to test effectiveness of a treatment
Matched pairs
Two groups of people are used to test effects of a particular factor. They are paired with someone similiar to them in the opposite group (e.g same hair, intelligence, gender ect.). Advantage: control extraneous variables Disadvantage: finding matc
An idea that can be tested by collecting and analysing data
56 of 76
Factors in designing investigations
- time - cost - ethical issues - confidentiality - convenience - how to select population/ sample - how to deal with non-response - how to deal with unexpected results
Two-Way Tables
Shows information in two categories
Composite Bar Chart
Each bar shows how the frequency for that category is made up from different component groups. the total frequencies and the frequencies of each component group can be compared.
Comparative Pie Charts
Can be used to compare two sets of data. The areas of the two circles should be in the same ratios as the two total frequencies. To compare the total frequencies, compare the areas. To compare proportions, compare the individual angles.
Index number
index number = price/base year price x 100
retail price index
rate of change of prices in everyday life (ie. morgage, food, heating)
consumer price index
same as retail price index but does not include morage payments
gross domestic product
the value of goods and services a country produces within a time period
weighted index number
weighted index number = current weighted mean price/base year weighted mean price x 100
chain base index number
chain base index number = price/last years price x100
crude rate
crude rate = number of (deaths/births/etc.)/total population x1000
standard population
standard population = number in age group/total population x 1000
standardised rate
standardised rate = crude rate/1000 x standard population
number of trials where the event happens/ total number of trials
relative risk
risk for those in that group/ risk for those not in the group
all possible outcomes for a set of mutually exclusive, exhaustive events, the probability must add to 1
72 of 76
binomial distribution
B(n,p) n = the number of trials p= the probability of success it follows the pattern of pascals triangle
seasonal effect
real value - value from the trend line
persons product moment correlation coefficient
measures the linear correlation between bivariate data. measured between -1 and 1
standardised score
x - mean / standard deviation
