# Statistics Edexcel GCSE

?
What is a hypothesis?
A prediction as to what your data finds
What are the 5 constraints of data collection?
Cost, Confidentiality, Convenience, Ethics, Time
What is data cleaning?
Formatting or removing of data
What is quantitative data?
Numerical data
What is qualitative data?
Non-numerical data
What is categorical data?
Data put into categories
What is ordinal data?
Data which has an order or scale
What is discrete data?
Data which can only fit certain values
What is continuous data?
Data which can be any positive value (theoretically)
What is ungrouped data?
Data which isn't put into intervals
What is grouped data?
Data in intervals
What is bivariate data?
Data which has 2 variables eg. weight and height
What is multivariate data?
Data which has more than 2 variables
What is an issue in grouping data?
Less accurate to calculate with
What is an independent variable?
A variable which changes throughout the investigation - goes on the x axis
What is a dependent variable?
A variable which is being measured
What is a control variable?
A variable which is kept the same throughout
What is primary data?
Data collected by the researcher
What is secondary data?
Data collected from outside sources
What is an advantage of primary data?
You can be sure it is reliable, as you know the method used
What is a disadvantage of primary data?
It takes a long time to collect
What is an advantage of secondary data?
It is faster and cheaper to collect
What is a disadvantage of secondary data?
It may be unreliable, as the method may be unknown
What is a population?
The group that your investigation applies to
What is a sample frame?
A list of all members of a population
What is a census?
Data coming from the whole population
What is a sample?
Data coming from a sample of the population
What is cluster sampling?
Taking a census of everyone in a small area of the population
What is convenience sampling?
What is quota sampling?
Asking a certain number of people from different groups
What is systematic sampling?
Asking people in a pre-defined system eg. every 2 pieces of data
What is random sampling?
Numbering the population, randomly generating numbers, match to population
What is stratified sampling?
Selecting a number of people from different groups proportionally
What is an advantage of cluster sampling?
Easy, representative of area
What is a disadvantage of cluster sampling?
Unrepresentative of whole population
What is an advantage of convenience sampling?
Cheap, easy
What is a disadvantage of convenience sampling?
Unreliable, may be biased
What is an advantage of quota sampling?
It removes bias
What is a disadvantage of quota sampling?
It may unrepresentative
What is an advantage of systematic sampling?
Easy, fast
What is a disadvantage of systematic sampling?
The data may follow a pattern, so it is unreliable
What is an advantage of random sampling?
Unbiased, reliable
What is a disadvantage of random sampling?
It may not be representative
What is an advantage of stratified sampling?
It is representative and reliable
What is a disadvantage of stratified sampling?
It takes a long time to calculate
What is a strata?
The category used in a stratified sample
What are 5 ways to collect data?
Experimental, observations, census, questionnaire, simulation
What should be done with secondary data in investigations?
It should be acknowledged
What is validity?
How well a test gets the needed data
What is reliability?
How likely data is to yield the same results
What is the formula for stratification?
(strata size/Σfrequencies) x sample size
What is the random response technique used for?
Sensitive questions
How is a random response techniques question written?
What is done with the random response technique answers?
Estimate number who answered based on event, subtract from total, find proportion left, and apply proportion to sample
A question which manipulates to get a specific answer
55 of 59
What 2 reasons are pilot questionaires used for?
Checking if there is a problem in a question, ensuring correct data is collected
In what 5 situations is data cleaned?
Missing data, incorrect format, non-responses, incomplete responses, outliers
What is matched pairs?
When 2 people with similar traits are put into different groups
What is a control group?
A group put under no different circumstance eg. given a placebo
