* AED08.do March 2015 For Stata version 12 log using AED08.txt, text replace ********** OVERVIEW OF AED08.do ********** * STATA Program * copyright C 2015 by A. Colin Cameron * Used for "Analyis of Economics Data: An Introduction to Econometrics" * by A. Colin Cameron (2015) W.W. Norton * To run you need file * AED_EARNINGS_CH19.DTA * AED_HOUSE.DTA * in your directory ********** SETUP ********** set more off version 12 clear all set scheme s1manual // Graphics scheme ************ * This STATA does analysis for Chapter 18: CROSS-SECTION DATA * 19.1 DATA DESCRIPTION * 19.2 HETEROSKEDASTIC ERRORS * 19.3 CLUSTERED ERRORS * 19.4 MODELS FOR BINARY OUTCOME DATA * 19.5 NONREPRESENTATIVE SAMPLES * 19.A APPENDIX: WEIGHTED LEAST SQUARES * 19.B APPENDIX: NONLINEAR MODELS BONUS * 19.C APPENDIX: CATEGORICAL DATA **** 19.1 DATA EXAMPPLE clear use AED_EARNINGS_CH19.DTA summarize * Table 19.1 describe summarize earnings education age agesq gender * Table 19.2 * OLS with default standard errors regress earnings education age agesq gender estimates store olsdef test age agesq **** 19.2 HETEROSCEDASTIC ERRORS * Table 19.3 * OLS with heteroskedastic robust standard errors regress earnings education age agesq gender, vce(robust) * Figure 19.1 - two panels quietly regress earnings education age agesq gender predict uhat, resid graph twoway (scatter earnings education) graph twoway (scatter uhat education) * Compare high and low levels of education summarize earnings uhat if education <= 12 summarize earnings uhat if education > 12 drop uhat * Generalization of Koenker Breusch-Pagan LM test quietly regress earnings education age agesq gender estat hettest education age agesq gender, fstat * Same test done manually quietly regress earnings education age agesq gender predict uhat, resid gen uhatsqnormed = (uhat/e(rmse))^2 regress uhatsqnormed education age agesq gender test education age agesq gender drop uhat **** 19.3 CLUSTERED ERRORS * See how earnings vary by state bysort statefip: egen aveearnings = mean(earnings) tabulate aveearnings * Intraclass correlation coefficient squared for the data and OLS residual loneway earnings statefip loneway education statefip loneway age statefip loneway gender statefip quietly regress earnings education age agesq gender predict uhat, resid loneway uhat statefip drop uhat * Table 19.4 * OLS with cluster robust standard errors regress earnings education age agesq gender, vce(cluster statefip) * Bonus - not done in text * Need to give Stata a cluster identifier xtset statefip * Random effects default - with cluster robust standard errors xtreg earnings education age agesq gender, re vce(cluster statefip) * Fixed effects using xtreg, fe (could also use areg or regress) xtreg earnings education age agesq gender, fe vce(robust) **** 19.4 MODELS FOR BINARY OUTCOMES ** Figure 19.2 uses generated data clear set obs 100 set seed 12345 gen x = rnormal(5,2) replace x = -0.5 if _n == 1 gen ystar = 2 + x + rnormal(0,2) gen y = ystar > 6 regress y x predict yols logit y x predict plogit probit y x predict pprobit sort x * Figure 19.2 - two panels graph twoway (scatter y x) (line yols x) graph twoway (scatter y x) (line plogit x) use AED_EARNINGS_CH19.DTA, clear ** LOGIT * Table 19.5 - first column * Logit regression with default standard errors logit dearnings education age agesq gender predict plogit sum plogit, d * Table 19.5 - second column * This will also compute the marginal effect for age which appears as age and agesq quietly logit dearnings education c.age##c.age gender margins, dydx(*) * Table 19.6 - classification table quietly logit dearnings education c.age##c.age gender estat classification * Bonus: Heteroskedastic-robust logit dearnings education age agesq gender, vce(robust) ** PROBIT * Table 19.5 - third column * Probit regression with default standard errors probit dearnings education age agesq gender predict pprobit summarize pprobit, d correlate pprobit plogit * Table 19.5 - fourth column * This will also compute the marginal effect for age which appears as age and agesq quietly probit dearnings education c.age##c.age gender margins, dydx(*) * Bonus: Heteroskedastic-robust probit dearnings education age agesq gender, vce(robust) ** OLS = LINEAR PROBABILITY MODEL * Table 19.5 - fifth column regress dearnings education age agesq gender predict plpm sum plpm, d count if plpm < 0 * Table 19.5 - sixth column * This will also compute the marginal effect for age which appears as age and agesq quietly regress dearnings education c.age##c.age gender margins, dydx(*) * Bonus: Heteroskedastic-robust regress dearnings education age agesq gender, vce(robust) **** 19.A APPENDIX: WEIGHTED LEAST SQUARES **** 19.A APPENDIX: WEIGHTED LEAST SQUARES * Weighted least squares * This example supposes Var[u | regressors] = education * sigma^2 * Complication is some have zero education so weight is then 1/0 * So drop those with zero education drop if education == 0 * Automatically weight using aweight option of regress * Default standard errors regress earnings education age agesq gender [aweight=1/education] * Heteroskedastic robust standard errors regress earnings education age agesq gender [aweight=1/education], vce(robust) * Do the same thing manually generate weight = sqrt(education) generate trintercept = 1/weight generate trearnings = earnings/weight generate treducation = education/weight generate treducsq = educsq/weight generate trage = age/weight generate tragesq = agesq/weight generate trgender = gender/weight * Default standard errors regress trearnings treducation trage tragesq trgender trintercept, noconstant * Heteroskedastic robust standard errors regress trearnings treducation trage tragesq trgender trintercept, noconstant vce(robust) **** 19.B APPENDIX: NONLINEAR MODELS: BONUS * Consider model with exponential conditional mean * Here only popsitive earnings considered ** Nonlinear least squares with heteroskedastic robust standard errors nl (earnings = exp({xb: education age agesq gender} + {_cons})), vce(robust) * loglinear model regress lnearnings education age agesq gender, vce(robust) ** Poisson poisson earnings education age agesq gender, vce(robust) **** 19.C APPENDIX: CATEGORICAL DATA clear use AED_HOUSE.DTA * Create categorical variables generate pricerange = price recode pricerange(1/249999=1) (250000/400000=2) generate sizerange = size recode sizerange (1/1799=1) (1800/2399=2) (2400/4000=3) tabulate pricerange sizerange, expected tabulate pricerange sizerange, expected chi2 cchi * Pearson test manually * Note that tabulate rounds to one decimal place * To get chisquare test statistic we need more accuracy * Pearson's chisquared test di "First row expected: " 17*13/29 " " 17*13/29 " " 17*3/29 di "Second row expected: " 12*13/29 " " 12*13/29 " " 12*3/29 di "Pearson = " (11-7.62)^2/7.62 + (6-7.62)^2/7.62 + (0-1.76)^2/1.76 /// + (2-5.38)^2/5.38 + (7-5.38)^2/5.38 + (3-1.24)^2/1.24 ********** CLOSE OUTPUT log close