* AED08.do March 2015 For Stata version 12 

log using AED08.txt, text replace

********** OVERVIEW OF AED08.do **********

* STATA Program 
* copyright C 2015 by A. Colin Cameron
* Used for "Analyis of Economics Data: An Introduction to Econometrics"
* by A. Colin Cameron (2015) W.W. Norton

* To run you need file
*   AED_EARNINGS_CH19.DTA
*   AED_HOUSE.DTA
* in your directory

********** SETUP **********

set more off
version 12
clear all
set scheme s1manual  // Graphics scheme

************

* This STATA does analysis for Chapter 18: CROSS-SECTION DATA
*    19.1 DATA DESCRIPTION
*    19.2 HETEROSKEDASTIC ERRORS
*    19.3 CLUSTERED ERRORS
*    19.4 MODELS FOR BINARY OUTCOME DATA
*    19.5 NONREPRESENTATIVE SAMPLES
*    19.A APPENDIX: WEIGHTED LEAST SQUARES
*    19.B APPENDIX: NONLINEAR MODELS BONUS
*    19.C APPENDIX: CATEGORICAL DATA

****  19.1 DATA EXAMPPLE

clear
use AED_EARNINGS_CH19.DTA
summarize

* Table 19.1
describe
summarize earnings education age agesq gender

* Table 19.2
* OLS with default standard errors
regress earnings education age agesq gender
estimates store olsdef 
test age agesq

****  19.2 HETEROSCEDASTIC ERRORS 

* Table 19.3
* OLS with heteroskedastic robust standard errors
regress earnings education age agesq gender, vce(robust) 

* Figure 19.1 - two panels
quietly regress earnings education age agesq gender
predict uhat, resid
graph twoway (scatter earnings education)
graph twoway (scatter uhat education)

* Compare high and low levels of education
summarize earnings uhat if education <= 12
summarize earnings uhat if education > 12
drop uhat

* Generalization of Koenker Breusch-Pagan LM test
quietly regress earnings education age agesq gender
estat hettest education age agesq gender, fstat

* Same test done manually
quietly regress earnings education age agesq gender
predict uhat, resid
gen uhatsqnormed = (uhat/e(rmse))^2
regress uhatsqnormed education age agesq gender
test education age agesq gender
drop uhat

****  19.3 CLUSTERED ERRORS

* See how earnings vary by state
bysort statefip: egen aveearnings = mean(earnings)
tabulate aveearnings

* Intraclass correlation coefficient squared for the data and OLS residual
loneway earnings statefip
loneway education statefip
loneway age statefip
loneway gender statefip
quietly regress earnings education age agesq gender
predict uhat, resid
loneway uhat statefip
drop uhat

* Table 19.4
* OLS with cluster robust standard errors
regress earnings education age agesq gender, vce(cluster statefip)

* Bonus - not done in text
* Need to give Stata a cluster identifier
xtset statefip
* Random effects default - with cluster robust standard errors
xtreg earnings education age agesq gender, re vce(cluster statefip)
* Fixed effects using xtreg, fe (could also use areg or regress) 
xtreg earnings education age agesq gender, fe vce(robust)

****  19.4 MODELS FOR BINARY OUTCOMES

** Figure 19.2 uses generated data
clear
set obs 100
set seed 12345
gen x = rnormal(5,2)
replace x = -0.5 if _n == 1
gen ystar = 2 + x + rnormal(0,2)
gen y = ystar > 6
regress y x
predict yols
logit y x
predict plogit
probit y x
predict pprobit
sort x

* Figure 19.2 - two panels
graph twoway (scatter y x) (line yols x)
graph twoway (scatter y x) (line plogit x)

use AED_EARNINGS_CH19.DTA, clear

** LOGIT

* Table 19.5 - first column
* Logit regression with default standard errors
logit dearnings education age agesq gender
predict plogit
sum plogit, d

* Table 19.5 - second column
* This will also compute the marginal effect for age which appears as age and agesq
quietly logit dearnings education c.age##c.age gender
margins, dydx(*)

* Table 19.6 - classification table
quietly logit dearnings education c.age##c.age gender
estat classification

* Bonus: Heteroskedastic-robust
logit dearnings education age agesq gender, vce(robust)

** PROBIT

* Table 19.5 - third column
* Probit regression with default standard errors
probit dearnings education age agesq gender 
predict pprobit
summarize pprobit, d
correlate pprobit plogit

* Table 19.5 - fourth column
* This will also compute the marginal effect for age which appears as age and agesq
quietly probit dearnings education c.age##c.age gender
margins, dydx(*)

* Bonus: Heteroskedastic-robust
probit dearnings education age agesq gender, vce(robust)

** OLS = LINEAR PROBABILITY MODEL

* Table 19.5 - fifth column
regress dearnings education age agesq gender
predict plpm
sum plpm, d
count if plpm < 0

* Table 19.5 - sixth column
* This will also compute the marginal effect for age which appears as age and agesq
quietly regress dearnings education c.age##c.age gender
margins, dydx(*)

* Bonus: Heteroskedastic-robust
regress dearnings education age agesq gender, vce(robust)

**** 19.A APPENDIX: WEIGHTED LEAST SQUARES

**** 19.A APPENDIX: WEIGHTED LEAST SQUARES

* Weighted least squares
* This example supposes Var[u | regressors] = education * sigma^2
* Complication is some have zero education so weight is then 1/0
* So drop those with zero education
drop if education == 0

* Automatically weight using aweight option of regress
* Default standard errors
regress earnings education age agesq gender [aweight=1/education]
* Heteroskedastic robust standard errors
regress earnings education age agesq gender [aweight=1/education], vce(robust)

* Do the same thing manually
generate weight = sqrt(education)
generate trintercept = 1/weight
generate trearnings = earnings/weight
generate treducation = education/weight
generate treducsq = educsq/weight
generate trage = age/weight
generate tragesq = agesq/weight
generate trgender = gender/weight
* Default standard errors
regress trearnings treducation trage tragesq trgender trintercept, noconstant
* Heteroskedastic robust standard errors
regress trearnings treducation trage tragesq trgender trintercept, noconstant vce(robust)

****  19.B  APPENDIX: NONLINEAR MODELS: BONUS

* Consider model with exponential conditional mean
* Here only popsitive earnings considered

** Nonlinear least squares with heteroskedastic robust standard errors
nl (earnings = exp({xb: education age agesq gender} + {_cons})), vce(robust)

* loglinear model
regress lnearnings education age agesq gender, vce(robust)

** Poisson
poisson earnings education age agesq gender, vce(robust)

****  19.C APPENDIX: CATEGORICAL DATA

clear
use AED_HOUSE.DTA

* Create categorical variables
generate pricerange = price
recode pricerange(1/249999=1) (250000/400000=2)
generate sizerange = size
recode sizerange (1/1799=1) (1800/2399=2) (2400/4000=3)
tabulate pricerange sizerange, expected
tabulate pricerange sizerange, expected chi2 cchi

* Pearson test manually 
* Note that tabulate rounds to one decimal place
* To get chisquare test statistic we need more accuracy
* Pearson's chisquared test
di "First row expected:  " 17*13/29 "   " 17*13/29  "  " 17*3/29
di "Second row expected: " 12*13/29 "   " 12*13/29  "  " 12*3/29
di "Pearson = " (11-7.62)^2/7.62 + (6-7.62)^2/7.62 + (0-1.76)^2/1.76 /// 
    + (2-5.38)^2/5.38 + (7-5.38)^2/5.38 + (3-1.24)^2/1.24

********** CLOSE OUTPUT
log close



