# AQRM part 1

?
• Created by: charlie
• Created on: 21-05-17 10:51
What does OLS estimator estimate?
coefficients B1,B2,...Bk of a linear regression model Y=B1+B2X+...+BkXk+ui using a random sample of observations (X1,X2,...Xk)
1 of 82
Deterministic and random component
B1 + B2X (conditional mean of Y given X E(Y|X)) ui (unobservable characteristics)
2 of 82
How do you compute OLS estimates?
3 of 82
population regression function characteristics: Yi, E(Y|X), ui
.
4 of 82
sample regression function characteristics: Yi, Y(hat), ei
.
5 of 82
3 desirable properties of OLS
1. consistent (larger samples closer to true parameter) 2. unbiased (for any sample size avg value across repeated samples = true parameter E(bk)=Bk) 3. minimum variance
6 of 82
b1 and b2 (estimates) characteristics
have multivariable normal distribution centred (B1,B2) where covariances/ variances depend on: 1. sample size 2. unknown population characteristics
7 of 82
standard error equation (relation to s.d and sample size)
.
8 of 82
OLS 7 assumptions
1. Linear in coefficients 2. regressors fixed/ non-stochastics (random) 3. exogeneity 4. homoscedasticity 5. uncorrelated errors 6. no multicollinearity 7. normality of error term
9 of 82
OLS 7 assumptions: (1)
linear in regression coefficients B1,B2,... (NOT parameters X1,X2...)
10 of 82
OLS 7 assumptions: (2)
Regressors are fixed/ non-stochastic (not random)
11 of 82
OLS 7 assumptions: (3)
Exogeneity holds (error term is uncorrelated with the regressors)
12 of 82
OLS 7 assumptions: (4)
Homoscedasticity (error term has constant variance)
13 of 82
OLS 7 assumptions: (5)
Uncorrelated error terms
14 of 82
OLS 7 assumptions: (6)
No multicollinearity (no linear relationship between regressors - Xi has to take at least 2 different values)
15 of 82
OLS 7 assumptions: (7)
Normality of error term (ui has normal distribution centred at 0 with variance = s.d^2)
16 of 82
if OLS 7 assumptions hold, OLS estimator is...
BLUE (best linear unbiased estimator) amongst linear unbiased (E(bk)=Bk) estimators has minimum variance
17 of 82
t-ratio equation
.
18 of 82
Type 1 error
probability that we incorrectly reject null hypothesis when it is true (rate= significance level (alpha))
19 of 82
Type 2 error
probability that we fail to reject the null hypothesis when it is false (rate= power (1-beta)) - usually due to a small B2 value
20 of 82
p-value
probability of an SINGLE observed result (more extreme) when H0 is true
21 of 82
confidence interval equation
confidence with coverage probability (1-alpha) will include the true parameter with probability (1-alpha) (reject if number lies outside)
22 of 82
testing some other value equation
.
23 of 82
dummy variables (large changes)
qualitative in nature, quantified by taking value of 0 or 1, compared to base category (coefficient is the difference between them)
24 of 82
multiple linear regression
include more important cofounders as ignoring these will bias estimates (omitted variable bias)
25 of 82
OVB when correlated with independent variables (causes endogeneity)
not including variable that belongs in regression model causing the error term to be correlated with X (endogeneity) and X to be over/underestimated
26 of 82
OVB when uncorrelated with independent variables
OLS will still produce unbiased estimates
27 of 82
non-linear effects (small changes)
captured by adding new variable (X^2 or X^3) which captures marginal effect (small changes)
28 of 82
interactions
captured by adding new variable (X1X2) which captures possibility that effect of one regressor on the outcome varies due to the level of another regressor (regressors effect on outcome non-additive)
29 of 82
interactions including non linear effects
.
30 of 82
functional form
Cobb-Douglas production function, take logs to get linear coefficients (OLS assumption 1)
31 of 82
functional form: log-log
coefficients are elasticities, % changes (relative), good for large values (monetary). log(1)=0
32 of 82
functional form: log-lin
coefficient are semi-elasticities, % change (relative) and unit change (absolute), x100 (if coef=0.01) or take exponential (if coef>0.01) to get log form
33 of 82
functional form: lin-log
unit change (absolute) and % change (relative), divide by 100 to get linear form
34 of 82
functional form: lin-lin
unit changes (absolute), take note of units that they're in
35 of 82
interpreting coefficients
always ceteris paribus (keeping other variables constant)
36 of 82
including reciprocal/ inverse regressor
as X increases B2 tends to 0, B1 tends to Yi, slope = derivative of reciprocal (slope will be -ve if coef B2 is +ve)
37 of 82
Dummy variables: structural change
include dummy =1 for after change (=0 for before), intercept shift B1+B3 (how much Yi changes after structural change for same Xi), include interaction to see Xi after structural change (possible slope change B2+B4)
38 of 82
Dummy variables: structural change (2) issues
1. may be >1 structural change 2. don't want to include too many dummy variables (use up too much DF (n-k) for statistical significance)
39 of 82
Dummy variables: seasonally adjust data (explains whats unexpected)
include (m-1) dummies to remove seasonal effects from data (CPI...) =1 for that time period (=0 otherwise, all coef dummies measured against baseline/ reference (intercept)
40 of 82
Dummy variables: dummy variable trap
distinguish between m categories, can only introduce (m-1) dummy variables if regression model includes an intercept (baseline/reference category) to avoid multicollinearity (get unique set of parameters allowing comparison)
41 of 82
Dummy variables: how to seasonally adjust the data (2 steps to smoothen)
(1) run regression with dummy variables + obtain residuals (2) add sample mean to residuals (brings back to same scale for comparison)
42 of 82
Residuals
estimates of error term (s.d) as we don't observe it, difference between Yi and Yi(hat) (fitted values predicted by OLS estimator)
43 of 82
adding new variables to see their effect on Y, will automatically be seasonally adjusted by previous dummies (no need to seasonally adjust separately and run de-seasonalised combined regression)
44 of 82
testing for linear combinations (lincom after regression)
tests hypothesis involving multiple coefficients using correct s.e (can't use individual s.e as coef could be correlated), e.g. null hypothesis that C-D has CRS/ is gender wage gap at 12 years stat. sig? (interaction coef), then use normal t-ratio
45 of 82
coefficient of determination: ESS (model/explained sum of squares)
EXPLAINED VARIATION (outcome variation regressors explain)
46 of 82
coefficient of determination: RSS (residual sum of squares)
UNEXPLAINED VARIATION (outcome variation that regression doesn't explain)
47 of 82
coefficient of determination: TSS (total sum of squares)
how well you could predict the outcome without any regressors (amount of outcome variation in dependent variable)
48 of 82
coefficient of determination: R^2
overall measure of goodness of fit of estimated regression line (0-->1) (proportion of total variance in dependent variable that is explained by regressors)
49 of 82
coefficient of determination: effects of adding another variable
ESS= Higher (more variation will be explained) RSS= Lower (less unexplained) TSS= no change (same outcomes + sample mean) R^2= higher (even if random will explain more variation)
50 of 82
takes into account the number of regressors in model (penalises/ decreases for adding more random regressors), can be -ve,
51 of 82
coefficient of determination: (2) problems with R^2
1. doesn't compare models with different units of dependent variables (relative vs absolute) 2. doesn't answer research questions (just tells us how PRECISELY we can predict outcome)
52 of 82
F-stat
tests significance of all coefficients (null hypothesis H0 that none of coef are sig./ matter), compare to F-tables/ Prob>F
53 of 82
F-stat: relationship to R^2 equation
.
54 of 82
heteroscedasticity: definition (against assump (3) of CLRM)
ui's have unequal variance for different observations (e.g. people at 20 yrs of education will have different variation in wages compared to people with 10 years of education)
55 of 82
heteroscedasticity: consequence
OLS (good): still consistent + unbiased OLS (bad): incorrect s.e. (so incorrect hypothesis tests), OLS no long BLUE (inefficient as not minimum variance)
56 of 82
heteroscedasticity: detection/ test
(small samples): graphically will show correl between res2 + regressors/ fitted values (large samples): Breusch-Pagan + White test
57 of 82
heteroscedasticity: using res2
as OLS assumes that error term is uncorrelated with regressors (exogenous) so comparing res and reg coef= 0
58 of 82
heteroscedasticity: Breusch-Pagan test (manual 4 stage regression)
(1) regress Yi on X's (2) compute fitted values and obtain res (3) regress res2 on fitted value (4) check F-stat (that there is no relationship between error term and regressors for all regressors - homoscedasticity)
59 of 82
heteroscedasticity: White test
picks up more complicated cases of heteroscedasticity (non-linear and interactive effects on variance of ui's), add squares of X's and interactions
60 of 82
heteroscedasticity: Stata (shortcut 2 stage regression)
(1) regress Yi on Xi (2) estat ettest, fstat (look at fstat coef/ prob>F)
61 of 82
heteroscedasticity: test no proof for homoscedasticity (2 reasons)
(1) heteroscedasticity may require larger sample size (2) have to guess how errors are related to X's to construct the test (linear/ non linear)
62 of 82
heteroscedasticity: 3 solutions
(1) change specification (e.g to log model relative variances in ui's may be similar) (2) compute heterosdecastic-robust s.e (same coef) (3) use WLS estimator (divides each observation by s.d then use OLS)
63 of 82
heteroscedasticity: WLS (2 stages)/ results/ limitations (not widely used)
(1) divide each obs by s.d (2) use OLS to estimate coef/ results: noisy obs less important + precise obs more important/ limitations: true variance unknown + have to estimate how variance related to regressors
64 of 82
perfect multicollinearity: definition
perfect linear relationship between 2 or more regressors (1 regressor = linear combination of others)
65 of 82
perfect multicollinearity: (2) consequences
(1) cannot estimate coefficient (2) effects of variables on outcome cannot be separated (move together)
66 of 82
perfect multicollinearity: detection/ test + solution
state automatically detects + drops variable (=0)/ only need m-1 variables (e.g. dummy variables seasonal adjustment) as sum of two effects will be effect of third variable
67 of 82
imperfect multicollinearity: definition
one dependent variable is always equal to linear combination of others + small error term vi
68 of 82
imperfect multicollinearity: (2) consequences
(1) harder but not impossible to make stat inferences (like with perfect multicollinearity) (2) OLS still BLUE but estimates have larger variance + covariances (tests make harder to reject null/ sensitive to small changes)
69 of 82
imperfect multicollinearity: (5) detection/ test
in stata add variation ('noise') to perfect collinear regression then detect by (1) R^2 (2) pairwise correlation (3) partial correlation coef (4) Auxillary regressions (5) Variance inflation factors
70 of 82
imperfect multicollinearity: detection (1) R^2 (2) pairwise correl (3) partial correl coef
(1) high R^2 with few coef (2) high pairwise correl (correlate...) but only between 2 variables (3) high partial correl coef is correl between res of 2 regressions (run 2 regressions, then pcorr for all variables)
71 of 82
imperfect multicollinearity: detection (4) auxiliary regression (if others insufficient)
regress each regressor on remaining regressors (takes time) and check R^2 (high= 1 reg almost perfectly explained by linear combination of others)
72 of 82
imperfect multicollinearity: detection (5) VIF (larger s.e)
degree to which variance of OLS coefficient is 'inflated' due to collinearity of Xi with other X's (estat VIF) if =1 (uncorrelated) if >10 (high degree of multicollinearity)
73 of 82
imperfect multicollinearity: (3) solutions/ things to think about
(1) solve data deficiency problem (find data providing more independent variation) (2) careful about coef interpretation (some always move together) (3) rethink model (combine related variables into single index/avg.)
74 of 82
regression diagnostics: omission of relevant variables (serious)
if correlated with X's (serious) = OVB (endogeneity), if not correlated with X's (less serious) = only larger variances of ui and X's
75 of 82
regression diagnostics: inclusion of irrelevant variables (less serious)
estimates still CONSISTENT and UNBIASED, but estimates INEFFICIENT (could be less d.f. and more precise results), possible MULTICOLLINEARITY problems
76 of 82
Autocorrelation: definition
correlation between error terms in the model (no problem if regressors are correlated), in time-series ui (time=t) correlated with ui (t=-1,t=-2...), CLRM assumes cov = 0
77 of 82
Autocorrelation: (3) consequences
(1) OLS inefficient (similar to heterosced.) (2) OLS s.e incorrect (usually underestimated=higher t-ratios= Type 1 error) (3) endogeneity problem (using previous period outcome)
78 of 82
Autocorrelation: detection
run regression + obtain residuals then... (1) plot against time on line graph + observe (look for period of correlation) (2) plot against lagged residuals on scatter graph (look for correlation)
79 of 82
Autocorrelation: test (null hypothesis)
regress res on lagged res (test no autocorrelation null hypothesis H0 lagged res = 0), if reject null can say evidence for autocorrelation
80 of 82
Autocorrelation: solution
use Newey-West/HAC (heteroscedastic + autocorrelated consistent) s.e to correct OLS s.e (doesn't change coef)
81 of 82
Autocorrelation: (3) types of autocorrelation
(1) time series (2) Clustering (X-section) (ui will be correlated amongst obs that belong to same group, solve using cluster-robust s.e) (3) spatial (obs geographically close may have same ui)
82 of 82

## Other cards in this set

### Card 2

#### Front

B1 + B2X (conditional mean of Y given X E(Y|X)) ui (unobservable characteristics)

#### Back

Deterministic and random component

.

.