# Statistics Key Words 9-1

?

## Raw Data

Data just as it has been collected

## Quantitative data

Numerical observations or measurements such as 10, 5.5 or 39cm

## Qualitative data

non-numerical observations such as blue or cat

## Continuous data

Quantitative data which can take any value on a continuous scale such as length or mass

## Discrete data

Quantitative data which can only take particular values such as shoe size or number of pets

## Categorical data

Can be sorted into non-overlapping categories

## Bivariate data

Involves pairs of related data e.g height and weight

## Multivariate data

Involves sets of three or more related data values e.g a plants colour, leaf size and height

## Ordinal data

Can be written in order or can be given a numerical rating scale

## Primary data

Data collected by, or for, the person who is going to use it

## Secondary data

Has been collected by someone else e.g websites, newspapers ect.

- collection method known

- accuracy known

- can be very specific questions

- Time consuming

- Expensive to collect

- easy to obtain

- cheap to obtain

- data from organisations can be more reliable

- method of collection unknown

- data might be out of date

- may contain mistakes

- may be unreliable

- may be difficult to find specific answers

## Population

everything or everybody that could possibly be involved in an investigation e.g a delivery company wants information about the number of miles travelled by its lorries. its population would  be all the company lorries.

## Census

a survey or investigation with data taken from every member of a population

## Sample

you can take a saple from a population. It contains information about part of the population and can be used to make conclusions for the whole population.

- unbiased

- accurate

- takes whole populace into account

- time-consuming

- expensive

- difficult to ensure the whole population is used

- lots of data to handle

- cheaper

- less time consuming

- less data to be considered

- not completely represented

- may be biased

## Sampling Units

people or items that are to be sampled

## Sampling Frame

is a list of all sampling units

## Capture Recapture

first capture/total number = number tagged/second capture

## Capture recapture Assumptions

- the population hasn't changed

- the probability of being caught is equal for each individual

- marks/tags are not lost and are always recognisable

- the sample size is large enough to be representative of the population

## Random Sampling

every member of the population has an equal chance of being picked

## Judgement Sampling

non-random sampling where you use your judgement to select a representative of the population

## Opportunity sampling

non random sampling where you use the people available at the time

## Cluster Sampling

Non random sampling when the data naturally splits into groups

## Systematic sampling

Non random sampling where you choose a random starting point from the frame and then choose regular intervals e.g every 5th person

## Quota Sampling

non random sampling where you group the population by characteristics like hair or gender and interview a group from each group

## Stratified sampling

Non random sampling which contains members of each group in proportion to the size of the group

## How to decide which sampling method to use

- biased?

- Sensible sample size?

- Quick and easy?

- Expensive?

## Direct observation

collect primary data systematically as you observe them

## data collection sheet

a table or tally chart for recording results

what you change

## response/dependant variable

the one that is affected

## extraneous variable

variables you are not interested in nut could change the results

## replicating

if repeating the experiment gives you very similiar results, they are likely to be reliable

## simulation

you can use simulation to model random real life events to help you predict what would actually happen. simulation can be easier and cheaper than analysing real data.

## laboratory experiments

conducted in control environments

advantage: easy to be replicated, you can control extraneous variables

disadvantage: test subjects may behave differently

## field experiments

carried out in subjects everyday environment, controls one or more variables

advantages: more likely to reflect real life behavior

disadvantages: cant control some extraneous variables

## natural experiments

carried out in subjects everyday environment with no variables controlled

advantages: more likely to reflect real life behaviour

disadvantages: cant control any variables, harder to replicate

## Questionnaire

Set of questions designed to obtain data

## Respondant

Person completing questionnaire

## Open and Closed Questions

Closed: gives answers to choose from

## Pilot survey

Conducted on a small sample to test the design and methods of the survey

## Closed questionnaires

often involve an opinion scale. The problem with an opinion scale is that most people answer somewhere in the middle

## Interview

Advantages: interviewer can explain questions, respondant can explain answers, high response rate.

disadvantages: respondants may be less honest, can take a long time, sample size is smaller, respondants could try to impress interviewer

## Anonymous Questionnaire

advantages: more likely to be honest, takes much less time, large sample size, less bias

disadvantages: respondant may not understand question, lower response rate due to ability to skip question

## anomalous data value

a value that does not fit the pattern of the data

## Cleaning data

identifying and either correcting or removing inaccurate data values or extreme values. Removing units or other symbols from data. you decide what to do with the data.

## Control group

to test effectiveness of a treatment

## Matched pairs

Two groups of people are used to test effects of a particular factor. They are paired with someone similiar to them in the opposite group (e.g same hair, intelligence, gender ect.).

## Hypothesis

An idea that can be tested by collecting and analysing data

## Factors in designing investigations

- time

- cost

- ethical issues

- confidentiality

- convenience

- how to select population/ sample

- how to deal with non-response

- how to deal with unexpected results

## Two-Way Tables

Shows information in two categories

## Composite Bar Chart

Each bar shows how the frequency for that category is made up from different component groups. the total frequencies and the frequencies of each component group can be compared.

## Comparative Pie Charts

Can be used to compare two sets of data. The areas of the two circles should be in the same ratios as the two total frequencies. To compare the total frequencies, compare the areas. To compare proportions, compare the individual angles.

## Index number

index number = price/base year price x 100

## retaill price index

rate of change of prices in everyday life (ie. morgage, food, heating)

## consumer price index

same as retail price index but does not include morage payments

## gross domestic product

the value of goods and services a country produces within a time period

## weighted index number

weighted index number = current weighted mean price/base year weighted mean price x 100

## chain base index number

chain base index number = price/last years price x100

## crude rate

crude rate = number of (deaths/births/etc.)/total population x1000

## standard population

standard population = number in age group/total population x 1000

## standardised rate

standardised rate = crude rate/1000 x standard population

## risk

number of trials where the event happens/ total number of trials

## relative risk

risk for those in that group/ risk for those not in the group

## exhaustive

all possible outcomes for a set of mutually exclusive, exhaustive events, the probability must add to 1

## binomial distribution

B(n,p)

n = the number of trials

p= the probability of success

it follows the pattern of pascals triangle

## seasonal effect

real value - value from the trend line

## persons product moment correlation coefficient

measures the linear correlation between bivariate data.

measured between -1 and 1

## standardised score

x - mean / standard deviation

