Statistics

?
a statistical model/experiment
a test in which you collect data to procide evidence for/against a hypothesis
1 of 32
an event
a subset of possible outcomes of an experiment
2 of 32
statistical model process
1) OBSERVE the real world 2) DEVISE a statistical model 3) use model to PREDICT outcomes 4) COLLECT data 5) COMPARE data with predicitions/hypothesis 6) TEST how well the model describes reality 7) REFINE the model if necessary
3 of 32
Advantages of statistical models
1) It simplifies the real world 2) It allows you to predict future outcomes 3) It is quicker and easier than using real data
4 of 32
Disadvantage of statistical models
The model doesn't replicate the real world in every detail
5 of 32
Comment on the assumption that A and B are independent
If A and B affect each other, they are not independent
6 of 32
Type of data in a normal distribution
Type of data: continuous
7 of 32
Discrete uniform distribution
Type of data: discrete - almost certainly going to have equal probabilities for each outcome
8 of 32
Shape of distribution
Describe the skew, usually back up with mean/median comparison or upper quartile/median vs lower quartile and median comparison / could say 'most of the data is at the lower/upper end'
9 of 32
Reason to justify the use of a histrogram to represent data
The variable (eg: height) is continuous, and the data is grouped
10 of 32
Describe the main feature of a histogram
The area of each bar is proportional to the frequency of that group
11 of 32
Main features of comparison for box plots
1) Outliers 2) Skew 3) IQR 4) Range 5) Median - make comparison then put into context
12 of 32
An outlier
1) A value which is very different from the rest of the data 2) Should be treated with caution 3) It is usually identified as anything above Q3+3/2(Q3-Q1), or below Q1-3/2(Q3-Q1)
13 of 32
When to use the median
If data has any skew, because the median is not affected by skew but the mean is pulled up/down by particularly high/low data
14 of 32
Effect on the mean if we add similar data
Adding/removing similar data will improve/reduce the validity of the mean/standard deviation, but may not affect their value
15 of 32
Effect on the mean if we add outliers
Will pull the mean up/down and increase/decrease the standard deviation
16 of 32
Effect on the mean if we code the data
Effects the mean/standard deviation as in decrete random variables E(aX+b) and Var(aX+b)
17 of 32
Main features of a normal distribution
1) Bell shaped curve 2) The curve is asymtotic to the z axis 3) Symmetrical about the mean 4) mean=mode=median 5) 68% of the data lies within mean +/- one sd 6) 95% of the data lies within mean +/- 2 sd 7) Almost all data lies within mean and 3 sd
18 of 32
Why is the normal distribution a suitiable model
1) The data is continuous (height,weight, length, width etc) 2) Data is clustered around a central value 3) Data isnot skewed
19 of 32
Why is the nomral distribution not a suitibe model
1) Data is skewed 2) Data is bimodal
20 of 32
Say whether a linear regression model is suitible
The PMCC is close to 1/-1, so the data lies close to a line so a linear regression model is suitible OR The PMCC is close to 0 so the data does not lie close to a line so a linear regression model is not suitible
21 of 32
Interpret the PMCC
The PMCC is positive/negative so th regression line has a positive/egative gradient/. As (name of x thing) increases, (name of y thing) increases/decreases
22 of 32
Explain why x is the independent (explanatory) variable
Because it influences y
23 of 32
Explain why y is the dependent (response) variable
Because it is influenced by x
24 of 32
Interpret the value of a in the regression line (the 'y' intercept)
(a) is the amount of (y-what y is) when (x-what x is) is zero
25 of 32
Why this value may be unrealistic
(x-what x is) being zero is well outside the range of the data
26 of 32
Interpret the value of b in the regression (the gradient of the line)
For each increase of (unit of x + what x is), y (what y is) increases/decreases by b (value)
27 of 32
What the b value allows you to do
Allows you to work out how much extra y will be gained when x increases by a certain amount
28 of 32
Is the regression line suitible to make a prediction for this value of x?
Only if the value is within the original data set. Otherwise, if it is outside the original data set, then there is no evidence the model will apply
29 of 32
Effect of adding/removing data or coding on the PMCC/regression line
The PMCC/ gradient of the regression line are not affected by coding but will be affected by addine/removing outliers from the data set. The intercept of the regression line will be affected by coding and by adding/removing outliers
30 of 32
wedrfgh
erfghj
31 of 32
erftgh
wedrfgh
32 of 32

Other cards in this set

Card 2

Front

a subset of possible outcomes of an experiment

an event

Card 3

Front

1) OBSERVE the real world 2) DEVISE a statistical model 3) use model to PREDICT outcomes 4) COLLECT data 5) COMPARE data with predicitions/hypothesis 6) TEST how well the model describes reality 7) REFINE the model if necessary

Card 4

Front

1) It simplifies the real world 2) It allows you to predict future outcomes 3) It is quicker and easier than using real data

Card 5

Front

The model doesn't replicate the real world in every detail

Comments

No comments have yet been made

Similar Mathematics resources:

See all Mathematics resources »See all Statistics, averages and distributions resources »