Types of Data
There are 2 types of data:
Continuous data canot take exact values.
Discrete data can take exact values.
When continuous data is grouped:
e.g 120-129 is te class interval
120 is the lower class limit.
129 is the uppe class limit.
We use: 119.5 < h < 129.5
119.5 is the lower class boundary
129.5 is the uper class boundary.
The class width = u.c.b-l.c.b
Organising Data by Grouping- Stem and Leaf Diagram
The simlest method is a stem and leaf diagram.
It shows the isribution of th data.
Grouped Frequency distributions
These are used for summarising large sets of data.
No rules for selecting the groupings ( between 5 and 15 groupings- use equal class widths if possible.)
ou often have to use the u.c.b ad te l.c.bs of the intervals to make sure there are no gaps.
In a histogram the frequency is represented by te aea of the bars.
Area is roportional to Frequency.
Frequency density= Frequency
Extra Histogram Questions
A histogram represents the age group distribution of people buying a magazine in a newsagent. There were 15 people aged 15-19 and the height and width of the rectangle are 8cm and 1cm respectively.
a) If there were 20 people aged 35-49, what is the height of the rectangle representing this group?
Sketch a rectangle and label 8 cm and 2 cm on it. 8cm squared represents 15 people.
so 1cm squared represents 15/8 people and 1 person represents 8/15 cm squared. also 1cm width represents a class width of 5.
width= 3 cm. Area= 8/15 x 20=32/3 3h=32/3 so h=3 and 5/9cm
Organising Data by ordering- Median and Quartiles.
In an Ordered set of Data the median is the middle value and it is an average.
The range is a measure of spread and is the highest value - the lowest value.
The interquartile range is the highest quarlie - the lowest quartile and is the spread around te middle 50% of the data (it ignores extreme values or anomalies.)
Fo n values (a list or ungrouped frequency table)
To find Q2
Find n+1 to find the position of Q2 in the data.
To find Q1
If it is a decimal the round up to te next whole number to get position Q1 in the whole of the data.
If it is a whole number the Q1 is the mean of the value at this position and the next value in the data.
To find Q3
Then proceed as for Q1
Found in a similar way to Q1 and Q2
3rd decile: D3= (3n/10)
92nd percentile: P92= (92n/100)
Then proceed as with Q1 and Q3.
Two basic comparisons to make when comparing box plots are the interquartile range and the median.
Outliers are te extreme high or low values.
There are lots of rules for identifying them.
A common rule is that a value is an outlier if it is:
<Q1 - (1.5(Q3-Q1))
>Q3 + (1.5(Q3-Q1))
It all depends on how big the interquartile range is.
On a box plot diagram an outlier is marked with a cross.
The highest ranger is the next lowest number inside the boundary.
If we are working with data that is already grouped then we do not have the actual values and so can only find an estimate for the median and quartiles.
This method is called linear interpolation.
To find the median:
- Work out which class the median is in.
- Work out how far into the interval (which is x) the median is and then calculate
- lower class boundary + ( x x class width)
frequency of interval
- e.g. (n/2) = (310/2)= 155th value
- 155th value is in interval 9.5-12.5
- median= 9.5 + (115/150 x 3) = 11.8