# Statistics- S1 chapter 1

HideShow resource information

## Types of Data

There are 2 types of data:

• Continuous
• Discrete

Continuous data canot take exact values.

Discrete data can take exact values.

1 of 13

## Grouping Data

When continuous data is grouped:

e.g 120-129 is te class interval

120 is the lower class limit.

129 is the uppe class limit.

We use: 119.5 < h < 129.5

119.5 is the lower class boundary

129.5 is the uper class boundary.

The class width = u.c.b-l.c.b

2 of 13

## Organising Data by Grouping- Stem and Leaf Diagram

The simlest method is a stem and leaf diagram.

It shows the isribution of th data.

3 of 13

## Grouped Frequency distributions

These are used for summarising large sets of data.

No rules for selecting the groupings ( between 5 and 15 groupings- use equal class widths if possible.)

ou often have to use the u.c.b ad te l.c.bs of the intervals to make sure there are no gaps.

In a histogram the frequency is represented by te aea of the bars.

Area is roportional to Frequency.

Frequency density= Frequency

Class width

4 of 13

## Extra Histogram Questions

A histogram represents the age group distribution of people buying a magazine in a newsagent. There were 15 people aged 15-19 and the height and width of the rectangle are 8cm and 1cm respectively.

a) If there were 20 people aged 35-49, what is the height of the rectangle representing this group?

Sketch a rectangle and label 8 cm and 2 cm on it. 8cm squared represents 15 people.

so 1cm squared represents 15/8 people and 1 person represents 8/15 cm squared. also 1cm width represents a class width of 5.

width= 3 cm. Area= 8/15 x 20=32/3 3h=32/3 so h=3 and 5/9cm

5 of 13

## Organising Data by ordering- Median and Quartiles.

In an Ordered set of Data the median is the middle value and it is an average.

The range is a measure of spread and is the highest value - the lowest value.

The interquartile range is the highest quarlie - the lowest quartile and is the spread around te middle 50% of the data (it ignores extreme values or anomalies.)

Fo n values (a list or ungrouped frequency table)

6 of 13

## To find Q2

Find n+1 to find the position of Q2 in the data.

2

7 of 13

## To find Q1

Find    n

4

If it is a decimal the round up to te next whole number to get position Q1 in the whole of the data.

If it is a whole number the Q1 is the mean of the value at this position and the next value in the data.

8 of 13

## To find Q3

Find    3n

4

Then proceed as for Q1

9 of 13

## Decilesand Percentiles

Found in a similar way to Q1 and Q2

Deciles:

3rd decile: D3= (3n/10)

Percentile:

92nd percentile: P92= (92n/100)

Then proceed as with Q1 and Q3.

10 of 13

## Box Plots

Two basic comparisons to make when comparing box plots are the interquartile range and the median.

11 of 13

## Outliers

Outliers are te extreme high or low values.

There are lots of rules for identifying them.

A common rule is that a value is an outlier if it is:

<Q1 - (1.5(Q3-Q1))

>Q3 + (1.5(Q3-Q1))

It all depends on how big the interquartile range is.

On a box plot diagram an outlier is marked with a cross.

The highest ranger is the next lowest number inside the boundary.

12 of 13

## Linear Interpolation

If we are working with data that is already grouped then we do not have the actual values and so can only find an estimate for the median and quartiles.

This method is called linear interpolation.

To find the median:

• Work out which class the median is in.
• Work out how far into the interval (which is x) the median is and then calculate
• lower class boundary + (        x          x class width)

frequency of interval

• e.g. (n/2) = (310/2)= 155th value
• 155th value is in interval 9.5-12.5
• x=155-40=115
• median= 9.5 + (115/150 x 3) = 11.8
13 of 13