Stats Unit 1

Mean Mode Median

MEAN- Add all x values up and divide by number

MEDIAN-Middle data value when all the data values are placed in order of size

MODE-Most frequently occuring data

MEDIAN: (N+1)/2

If n/2 is a whole number (even) then median is the AVERAGE of this term and one above.

If n/2 is not a whole number (odd) just ROUND up the number to find positive median.

1 of 10

Locations

Linear Interpolation- assumes values are evenly spread

Example:
Median Position above is (60+!1)/2 =30.5
So the median is the 30.5th reading. Your 'running total' tells you the median must be in the '6-10'class. Now you have to assume that all the readings in this class are evenly spread.
There are 26 trees before class 6-10 so the 30.5th tree is the 4.5th value of this class.
Divide the class into 17 equally wide parts and assume theres a reading at end of each part.Then you want 4.5th reading (4.5 * width of each part)

So estimated median= 5.5+(4.5*5/17) =6.8mm
The modal class is class with highest frequency density.
Mean: Good aveage-use all your data
Heavily affected by outliers
Only used with quantitive data
Median: Not affected by outliers- good to use when dont have outliers
Mode: Used with qualititive data
Some have more than one mode-not helpful

2 of 10

Histograms

Vertical Axis-Frequency Density

Frequency Density- Frequency Divided by Width of Class

NO GAPS BETWEEN COLUMN

Find Upper and Lower Class Boundaries And Frequency Density

3 of 10

Interquartile Range

Range-measure of dispersion

RANGE=HIGHEST VALUE-LOWEST VALUE

Quartiles divided into 4- 25% data lower than lower quartile and 75% lower than upper.
Lower Quartile: n/4- if a whole number then its the average of this term and one above.
- if not a whole number just round the number up to find postion
Upper Quartile: 3n/4- if whole number then its average is term and one above
-not a whole number just round number up to find postion.

Interquartile Range-measure of dispersion

INTERQUARTILE RAGE=UPPER QUARTILE-LOWER QUARTILE

Percentiles- divide data into 100

4 of 10

Standard Deviation

Standard Deviation in the Squareroot of the Variance

VARIANCE: (EX2 / n) - Xbar2
Basically find the mean (x bar) then find sum of squares.

Example:
The mean of10 boys height is 180cm and standard deviation is 10cm. Mean for 9 girls is 165cm and standard deviation is 8cm. Find mean and standard deviation for all 19 girls and boys.

Girls height=y Boys height=x
Mean boys =Ex/10=180 therefore Ex=1800
Mean girls=Ey/9=165 therefore Ey=1485
Sum of boy and girls heights=Ex+Ey=3285
Mean Height of girls and boys=3285/19=172.9cm

5 of 10

Dispersion

If data is in a table, use mid-class values

Example: Heights of sunflowers in a garden were measured and recorded in table. Estimate the mean height and standard deviation.

Height: Mid-class (x): x2 f fx fx2

150

6 of 10

Outliers

OUTLIERS fall OUTSIDE fences

Lies long way from rest of the readings. To decide whether it is an outlier you have to measure how far away from the rest of the data it is.

Data value is considered to be an outlier if:
More than 1.5 time the IQR ABOVE upper quartile
More than 1.5 times the IQR BELOW lower quartile

Example:
Lower and Upper quartiles of data is 70 and 100. Decide if 30 and 210 are outliers
IQR=Q3-Q1=100-70=30
LQ=70-(1.5*30)=25
UQ=100+(1.5*30)=145
Therefore 30 is within area and NOT and outlier where as 210 IS an outlier

Outliers affect what measure of dispersion is best to use.
Affects whether the vairance and standard deviation are good measures of dispersion
Can make the variance much larger than it would be giving outliers more influence
Dara set contains outliers, then better measure of dispersion is Interquartile Range.

7 of 10

Coding

Makes nunbers much easier.

You usually change your original variable,x, to an easier one to work with y
y=(x-b)/a

Example:
Find mean and standard deviation of 1000020, 1000040, 1000010, 1000050
1) SUBTRACT a million from every reading leaving 20, 40, 10, 50
2) DIVIDE by 10 to give 2,4,1,5
3)So y=(x-1000000)/10. then ybar=(xbar-1000000)/10 and sy=sy/10
4)Find mean and standard deviation of y values:
ybar=(2+4+1+5)/3=3 sy=1.58 to 3 sig fig
5) Then use formulas to find the mean and standard deviation of original values:
xbar=10ybar+1000000=(10*3)+1000000=1000030 sy=10sy=10*1.58=15.8

8 of 10

Skewness

Tells you whether your data is symmetrical or lopsided
Mean=Median=Mode Gives Symmetical Data
Tail on the left. Most data values on higher side.
Gives Negative Skewness

Tail on the right. Most data values on lower side.
Gives Positve Skewness

MEAN-MODE=3*(MEAN-MEDIAN)

Pearson's Coefficeint of Skewness=3(Mean-Median)/Standard Deviation

Quartile Coefficient of Skewness

Q3-Q2=Q2-Q1 Skewness is Zero
Q3-Q2<Q2-Q1 Negative Skewness
Q3-Q2>Q2-Q1 Postitive Skewness

9 of 10

Comparing Distributions

Box Plots- Visual Summary of a Distribution
Show median and quaritles

Use Locations, Dispersion and Skewness to Compare Distributions
Example:
Calculator Paper Non-Calculator Paper
40 Q1 35
58 Q2 42
70 Q3 56
55 MEAN 46.1
21.2 S.D 17.8
Calculate Skewness for each paper and comment on location and dispersion.
PEARSONS SKEWNESS:
3*(55-58)/21.2=-0.425 Calc Paper 3*(46.1-42)/17.8=0.691 Non Calc
QUARTILE SKEWNESS:
70-2(58)+40/30=-0.2 Calc Paper 56-2(42)+35/21=0.333 Non Calc
Non Calc= Positive Skewness Calc Paper=Negative Skewness
LOCATIONS: Mean, Median and Quartlies all higher in Calc Paper.
DISPERSION: IQR for Calc Paper=30 IQR for Non-Calc Paper=21
IQR and Standard Deviation both higher for Calc, therefore more spread out.

10 of 10

Get Revising