* MMA16P1TOBIT.DO  March 2005 for Stata version 8.0

log using mma16p1tobit.txt, text replace

********** OVERVIEW OF MMA16P1TOBIT.DO **********

* STATA Program 
* copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi 
* used for "Microeconometrics: Methods and Applications" 
* by A. Colin Cameron and Pravin K. Trivedi (2005)
* Cambridge University Press 

* Chapter 16.2.1 pages 530-1 and 16.9.2 page 565
* Classic Tobit model with generated data
* Provides
*   (1) Graph of various conditional means Figure 16.1 (ch16condmeans.wmf) 
*   (2) Tobit model estimation: various estimators not reported in book
*   (3) Tobit model estimation: CLAD estimation mentioned on page 565 
* using generated data (see below)

********** SETUP **********

set more off
version 8.0
set scheme s1mono   /* Used for graphs */
  
********** GENERATE DATA **********

* Data generating process is 
* Regressor:           lnwage ~ N(2.75, 0.6^2)
* Error term:               e ~ N(0, 1000^2)
* Latent variable:      ystar = -2500 + 1000*lnwage + e
* Truncated variable:  ytrunc = 1(ystar>0)*ystar
* Censored variable:    ycens = 1(ystar<=0)*0 + 1(ystar>0)*ystar
* Censoring Indicator:     dy = 1(ycens>0) 
 
set seed 10101
set obs 200    
gen e = 1000*invnorm(uniform( ))
gen lnwage = 2.75 + 0.6*invnorm(uniform( ))
gen ystar = -2500 + 1000*lnwage + e
gen ytrunc = ystar
replace ytrunc = . if (ystar < 0)
gen ycens = ystar
replace ycens = 0 if (ystar < 0)
gen dy = ycens
replace dy = 1 if (ycens>0)

summarize

* Save data as text (ascii) so that can use programs other than Stata
outfile e lnwage ystar ytrunc ycens dy using mma16p1tobit.asc, replace

********** (1) PLOT THEORETICAL CONDITIONAL MEANS **********

* Here we use the true parameter values used in the dgp 

* Compute the censored and truncated means
gen xb = -2500 + 1000*lnwage
gen sigma = 1000
gen capphixb = normprob(xb/sigma)
gen phixb = normd(xb/sigma)
gen lamda = phixb/capphixb
gen eytrunc = xb + sigma*lamda
gen eycens = capphixb*eytrunc

* Descriptive Statistics
summarize

* Plot Figure 16.1 on page 531
sort lnwage
graph twoway (scatter ystar lnwage, msize(small)) /* 
  */ (scatter eytrunc lnwage, c(l) msize(vtiny) clstyle(p3) clwidth(medthick)) /*
  */ (scatter eycens lnwage, c(l) msize(vtiny) clstyle(p2) clwidth(medthick)) /*
  */ (scatter xb lnwage, c(l) msize(vtiny) clstyle(p1) clwidth(medthick)), /*
  */ scale (1.2) plotregion(style(none)) /*
  */ title("Tobit: Censored and Truncated Means") /*
  */ xtitle("Natural Logarithm of Wage", size(medlarge)) xscale(titlegap(*5)) /* 
  */ ytitle("Different Conditional Means", size(medlarge)) yscale(titlegap(*5)) /*
  */ legend(pos(5) ring(0) col(1)) legend(size(small)) /*
  */ legend( label(1 "Actual Latent Variable") label(2 "Truncated Mean") /*
  */        label(3 "Censored Mean") label(4 "Uncensored Mean"))
graph export ch16condmeans.wmf, replace

********** (2) TOBIT MODEL ESTIMATION FOR THESE DATA **********

* These are computations not reported in the book.

* With only 200 observations the Heckman 2-step estimates given below 
* are very inefficient.  To verify that they are consistent 
* increase the sample size e.g. set obs 20000

* (2A) ESTIMATE THE VARIOUS MODELS

*** UNCENSORED OLS REGRESSION 
* Possible here since for these generated data we actually know ystar
* Yelds consistent estimate. Expect slope = 1000 approximately.
regress ystar lnwage, robust
estimates store ols
predict ystarols

*** CENSORED OLS REGRESSION
* Yields inconsistent estimates
* From subsection 16.3.6 for slope coefficient OLS converges to p times b
* where p is fraction of sample with positive values. Here 0.65*1000 = 650.
regress ycens lnwage, robust
estimates store censols
predict ycensols

*** TRUNCATED OLS REGRESSION for POSITIVE WAGE
* Yields inconsistent estimates
* See subsection 16.3.6 for discussion.
regress ytrunc lnwage, robust
estimates store truncols
predict ytrunols

*** CENSORED TOBIT MLE REGRESSION for HWAGE
* Yields consistent estimates
tobit ycens lnwage, ll(0)
estimates store censtobit
predict ycenstob

*** TRUNCATED TOBIT MLE REGRESSION for HWAGE
* If done propoerly yields consistent estimates
* Not sure how to do this in Stata 
* The obvious command is 
*    tobit ytrunc lnwage, ll(0)
* but this gives the same estimates as truncated OLS

*** PROBIT REGRESSION for HWAGE
* Yields consistent estimates for slope b/s = 1000/1000 = 1 
* but uses less information so expect less efficient than tobit
probit dy lnwage
estimates store probit
predict yprobit

*** HECKMAN 2-STEP ESTIMATOR DONE MANUALLY
* Yields consistent estimates but less efficient than censored tobit MLE
* The second stage standard errors will be incorrect
probit dy lnwage
predict probity, xb
gen invmills = normd(probity)/normprob(probity)
summarize dy probity invmills
regress ytrunc lnwage invmills
estimates store heck2step
correlate lnwage invmills
* And more robust standard errors may be found by
regress ytrunc lnwage invmills, robust
estimates store heck2srobust

*** HECKMAN 2-STEP ESTIMATOR DONE USING BUILT-IN HECKMAN COMMAND
* Yields consistent estimates but less efficient than censored tobit MLE
heckman ytrunc lnwage, select(lnwage) twostep
estimates store heckman
predict ystarhec, xb
predict ytrunhec, ycond
predict ycenshec, yexpected
predict yinvmill, mills
predict yprobsel, psel
correlate lnwage yinvmill

* (2B) DISPLAY COEFFICIENT ESTIMATES

* OLS estimates  True model is -2500 + 1000*lnwage
estimates table ols censols truncols, b(%10.2f) se(%10.2f) t stats(N ll) 

* Tobit estimates  True model is -2500 + 1000*lnwage
estimates table censtobit probit, b(%10.2f) se(%10.2f) t stats(N ll) 

* Tobit estimates using Heckman manual  True model is -2500 + 1000*lnwage
estimates table heck2step heck2srobust, b(%10.2f) se(%10.2f) t stats(N ll) 

* Tobit estimates using Heckman built-in True model is -2500 + 1000*lnwage
estimates table heckman, b(%10.2f) se(%10.2f) t stats(N ll)  

********** (3) CLAD ESTIMATION FOR THESE DATA page 565 **********

* Compare tobit MLE with censored least absolute deviations (CLAD) estimator
* Gives results at end of section 16.9.3 page 565

tobit ycens lnwage, ll(0)
clad ycens lnwage, reps(100) ll(0)

********** CLOSE OUTPUT
log close
clear 
exit