Stats Unit 1
- Created by: Meg
- Created on: 10-03-13 16:55
Mean Mode Median
MEAN- Add all x values up and divide by number
MEDIAN-Middle data value when all the data values are placed in order of size
MODE-Most frequently occuring data
MEDIAN: (N+1)/2
If n/2 is a whole number (even) then median is the AVERAGE of this term and one above.
If n/2 is not a whole number (odd) just ROUND up the number to find positive median.
Locations
Linear Interpolation- assumes values are evenly spread
Example:
Median Position above is (60+!1)/2 =30.5
So the median is the 30.5th reading. Your 'running total' tells you the median must be in the '6-10'class. Now you have to assume that all the readings in this class are evenly spread.
There are 26 trees before class 6-10 so the 30.5th tree is the 4.5th value of this class.
Divide the class into 17 equally wide parts and assume theres a reading at end of each part.Then you want 4.5th reading (4.5 * width of each part)
So estimated median= 5.5+(4.5*5/17) =6.8mm
The modal class is class with highest frequency density.
Mean: Good aveage-use all your data
Heavily affected by outliers
Only used with quantitive data
Median: Not affected by outliers- good to use when dont have outliers
Mode: Used with qualititive data
Some have more than one mode-not helpful
Histograms
Vertical Axis-Frequency Density
Frequency Density- Frequency Divided by Width of Class
NO GAPS BETWEEN COLUMN
Find Upper and Lower Class Boundaries And Frequency Density
Interquartile Range
Range-measure of dispersion
RANGE=HIGHEST VALUE-LOWEST VALUE
Quartiles divided into 4- 25% data lower than lower quartile and 75% lower than upper.
Lower Quartile: n/4- if a whole number then its the average of this term and one above.
- if not a whole number just round the number up to find postion
Upper Quartile: 3n/4- if whole number then its average is term and one above
-not a whole number just round number up to find postion.
Interquartile Range-measure of dispersion
INTERQUARTILE RAGE=UPPER QUARTILE-LOWER QUARTILE
Percentiles- divide data into 100
Standard Deviation
Standard Deviation in the Squareroot of the Variance
VARIANCE: (EX2 / n) - Xbar2
Basically find the mean (x bar) then find sum of squares.
Example:
The mean of10 boys height is 180cm and standard deviation is 10cm. Mean for 9 girls is 165cm and standard deviation is 8cm. Find mean and standard deviation for all 19 girls and boys.
Girls height=y Boys height=x
Mean boys =Ex/10=180 therefore Ex=1800
Mean girls=Ey/9=165 therefore Ey=1485
Sum of boy and girls heights=Ex+Ey=3285
Mean Height of girls and boys=3285/19=172.9cm
Dispersion
If data is in a table, use mid-class values
Example: Heights of sunflowers in a garden were measured and recorded in table. Estimate the mean height and standard deviation.
Height: Mid-class (x): x2 f fx fx2
150
Outliers
OUTLIERS fall OUTSIDE fences
Lies long way from rest of the readings. To decide whether it is an outlier you have to measure how far away from the rest of the data it is.
Data value is considered to be an outlier if:
More than 1.5 time the IQR ABOVE upper quartile
More than 1.5 times the IQR BELOW lower quartile
Example:
Lower and Upper quartiles of data is 70 and 100. Decide if 30 and 210 are outliers
IQR=Q3-Q1=100-70=30
LQ=70-(1.5*30)=25
UQ=100+(1.5*30)=145
Therefore 30 is within area and NOT and outlier where as 210 IS an outlier
Outliers affect what measure of dispersion is best to use.
Affects whether the vairance and standard deviation are good measures of dispersion
Can make the variance much larger than it would be giving outliers more influence
Dara set contains outliers, then better measure of dispersion is Interquartile Range.
Coding
Makes nunbers much easier.
You usually change your original variable,x, to an easier one to work with y
y=(x-b)/a
Example:
Find mean and standard deviation of 1000020, 1000040, 1000010, 1000050
1) SUBTRACT a million from every reading leaving 20, 40, 10, 50
2) DIVIDE by 10 to give 2,4,1,5
3)So y=(x-1000000)/10. then ybar=(xbar-1000000)/10 and sy=sy/10
4)Find mean and standard deviation of y values:
ybar=(2+4+1+5)/3=3 sy=1.58 to 3 sig fig
5) Then use formulas to find the mean and standard deviation of original values:
xbar=10ybar+1000000=(10*3)+1000000=1000030 sy=10sy=10*1.58=15.8
Skewness
- Tells you whether your data is symmetrical or lopsided
- Mean=Median=Mode Gives Symmetical Data
- Tail on the left. Most data values on higher side.
Gives Negative Skewness
Tail on the right. Most data values on lower side.
Gives Positve Skewness
MEAN-MODE=3*(MEAN-MEDIAN)
Pearson's Coefficeint of Skewness=3(Mean-Median)/Standard Deviation
Quartile Coefficient of Skewness
- Q3-Q2=Q2-Q1 Skewness is Zero
- Q3-Q2<Q2-Q1 Negative Skewness
- Q3-Q2>Q2-Q1 Postitive Skewness
Comparing Distributions
Box Plots- Visual Summary of a Distribution
Show median and quaritles
Use Locations, Dispersion and Skewness to Compare Distributions
Example:
Calculator Paper Non-Calculator Paper
40 Q1 35
58 Q2 42
70 Q3 56
55 MEAN 46.1
21.2 S.D 17.8
Calculate Skewness for each paper and comment on location and dispersion.
PEARSONS SKEWNESS:
3*(55-58)/21.2=-0.425 Calc Paper 3*(46.1-42)/17.8=0.691 Non Calc
QUARTILE SKEWNESS:
70-2(58)+40/30=-0.2 Calc Paper 56-2(42)+35/21=0.333 Non Calc
Non Calc= Positive Skewness Calc Paper=Negative Skewness
LOCATIONS: Mean, Median and Quartlies all higher in Calc Paper.
DISPERSION: IQR for Calc Paper=30 IQR for Non-Calc Paper=21
IQR and Standard Deviation both higher for Calc, therefore more spread out.
Comments
No comments have yet been made