# Stats Unit 1

- Created by: Meg
- Created on: 10-03-13 16:55

## Mean Mode Median

**MEAN- Add all x values up and divide by number **

**MEDIAN-****Middle data value when all the data values are placed in order of size**

**MODE-Most frequently occuring data**

**MEDIAN: (N+1)/2**

**If n/2 is a whole number (even) then median is the AVERAGE of this term and one above.**

**If n/2 is not a whole number (odd) just ROUND up the number to find positive median.**

## Locations

**Linear Interpolation- assumes values are evenly spread**

**Example:**Median Position above is (60+!1)/2 =30.5

So the median is the 30.5th reading. Your 'running total' tells you the median must be in the '6-10'class. Now you have to assume that all the readings in this class are evenly spread.

There are 26 trees before class 6-10 so the 30.5th tree is the 4.5th value of this class.

Divide the class into 17 equally wide parts and assume theres a reading at end of each part.Then you want 4.5th reading (4.5 * width of each part)

So estimated median= 5.5+(4.5*5/17) =6.8mm

The modal class is class with highest frequency density.

Mean: Good aveage-use all your data

Heavily affected by outliers

Only used with quantitive data

Median: Not affected by outliers- good to use when dont have outliers

Mode: Used with qualititive data

Some have more than one mode-not helpful

## Histograms

**Vertical Axis**-Frequency Density

**Frequency Density**- Frequency Divided by Width of Class

**NO GAPS BETWEEN COLUMN**

Find Upper and Lower Class Boundaries And Frequency Density

## Interquartile Range

**Range-**measure of dispersion

RANGE=HIGHEST VALUE-LOWEST VALUE

Quartiles divided into 4- 25% data lower than lower quartile and 75% lower than upper.

Lower Quartile: n/4- if a whole number then its the average of this term and one above.

- if not a whole number just round the number up to find postion

Upper Quartile: 3n/4- if whole number then its average is term and one above

-not a whole number just round number up to find postion.

**Interquartile Range-**measure of dispersion

INTERQUARTILE RAGE=UPPER QUARTILE-LOWER QUARTILE

**Percentiles**- divide data into 100

## Standard Deviation

**Standard Deviation in the Squareroot of the Variance**

**VARIANCE: (EX2 / n) - Xbar2 **Basically find the mean (x bar) then find sum of squares.

Example:

The mean of10 boys height is 180cm and standard deviation is 10cm. Mean for 9 girls is 165cm and standard deviation is 8cm. Find mean and standard deviation for all 19 girls and boys.

Girls height=y Boys height=x

Mean boys =Ex/10=180 therefore Ex=1800

Mean girls=Ey/9=165 therefore Ey=1485

Sum of boy and girls heights=Ex+Ey=3285

Mean Height of girls and boys=3285/19=172.9cm

## Dispersion

**If data is in a table, use mid-class values**

Example: Heights of sunflowers in a garden were measured and recorded in table. Estimate the mean height and standard deviation.

Height: Mid-class (x): x2 f fx fx2

150

## Outliers

OUTLIERS fall OUTSIDE fences

Lies long way from rest of the readings. To decide whether it is an outlier you have to measure how far away from the rest of the data it is.

Data value is considered to be an outlier if:

More than 1.5 time the IQR ABOVE upper quartile

More than 1.5 times the IQR BELOW lower quartile

Example:

Lower and Upper quartiles of data is 70 and 100. Decide if 30 and 210 are outliers

IQR=Q3-Q1=100-70=30

LQ=70-(1.5*30)=25

UQ=100+(1.5*30)=145

Therefore 30 is within area and NOT and outlier where as 210 IS an outlier

Outliers affect what measure of dispersion is best to use.

Affects whether the vairance and standard deviation are good measures of dispersion

Can make the variance much larger than it would be giving outliers more influence

Dara set contains outliers, then better measure of dispersion is Interquartile Range.

## Coding

Makes nunbers much easier.

You usually change your original variable,x, to an easier one to work with y

y=(x-b)/a

Example:

Find mean and standard deviation of 1000020, 1000040, 1000010, 1000050

1) SUBTRACT a million from every reading leaving 20, 40, 10, 50

2) DIVIDE by 10 to give 2,4,1,5

3)So y=(x-1000000)/10. then ybar=(xbar-1000000)/10 and sy=sy/10

4)Find mean and standard deviation of y values:

ybar=(2+4+1+5)/3=3 sy=1.58 to 3 sig fig

5) Then use formulas to find the mean and standard deviation of original values:

xbar=10ybar+1000000=(10*3)+1000000=1000030 sy=10sy=10*1.58=15.8

## Skewness

- Tells you whether your data is symmetrical or lopsided
- Mean=Median=Mode Gives Symmetical Data
- Tail on the left. Most data values on higher side.

Gives Negative Skewness

Tail on the right. Most data values on lower side.

Gives Positve Skewness

MEAN-MODE=3*(MEAN-MEDIAN)

Pearson's Coefficeint of Skewness=3(Mean-Median)/Standard Deviation

Quartile Coefficient of Skewness

- Q3-Q2=Q2-Q1 Skewness is Zero
- Q3-Q2<Q2-Q1 Negative Skewness
- Q3-Q2>Q2-Q1 Postitive Skewness

## Comparing Distributions

Box Plots- Visual Summary of a Distribution

Show median and quaritles

Use Locations, Dispersion and Skewness to Compare Distributions

Example:

Calculator Paper Non-Calculator Paper

40 Q1 35

58 Q2 42

70 Q3 56

55 MEAN 46.1

21.2 S.D 17.8

Calculate Skewness for each paper and comment on location and dispersion.

PEARSONS SKEWNESS:

3*(55-58)/21.2=-0.425 Calc Paper 3*(46.1-42)/17.8=0.691 Non Calc

QUARTILE SKEWNESS:

70-2(58)+40/30=-0.2 Calc Paper 56-2(42)+35/21=0.333 Non Calc

Non Calc= Positive Skewness Calc Paper=Negative Skewness

LOCATIONS: Mean, Median and Quartlies all higher in Calc Paper.

DISPERSION: IQR for Calc Paper=30 IQR for Non-Calc Paper=21

IQR and Standard Deviation both higher for Calc, therefore more spread out.

## Comments

No comments have yet been made