* MMA27P3MILOGIT.DO January 2007 for Stata version 8.0
* based on logitmcar.do
clear
capture log close
log using mma27p3milogit.txt, text replace
********** OVERVIEW OF MMA27P3MILOGIT.DO **********
* STATA Program by A. Colin Cameron and Pravin K. Trivedi (2005) for
* "Microeconometrics: Methods and Applications, Cambridge University Press
* Chapter 27.8.2 pp. 937-939 Missing Data Imputation in a Logit Model
* This program creates the first three columns of Tables 27.5-27.6
* and it creates the data sets analyzed by SAS for multiple imputations
* To give the remaining columns of Tables 27.5-27.6
* There are four cases
* 1: 10% missing rho=0.64 for Table 27.5 and mma27logit1.asc
* 2: 25% missing rho=0.64 for mma27logit2.asc
* 3: 10% missing rho=0.36 for mma27logit3.asc
* 4: 35% missing rho=0.36 for Table 27.6 and mma27logit4.asc
* THIS PROGRAM DIFFERS FROM THE PROGRAM THAT CREATED THE TABLE GIVEN IN THE BOOK.
* IT USES A DIFFERENT SEED LEADING TO DIFFERENT DATA SETS
* The created data are then analyzed using MMA27P4MILOGIT.SAS
* to construct the remaining columns of Tables 27.5-27.6
********** SETUP **********
set more off
version 8.0
set scheme s1mono /* Graphics scheme */
********** SIMULATION OVERVIEW **********
* The data generating process is logit with
* y = 1(ystar > 0)
* ystar = constant + x1 + x2 + u,
* x1, x2 ~ bivariate normal with covariance matrix(1,rho\rho,1)
* u ~ logistic with variance pi^2/3
* N = 1000
* The missing data process is
* 10% (or 25%) of x1 are randomly missing
* 10% (or 25%) of x2 are randomly missing
* They are not necessary to be missing on the same observation.
* Note that estimated model will give
* estimated coefficients -1/sqrt(p1^2/3) equals -0.551 approx.
************ PROGRAM TO CREATE AND ANALYZE MISSING DATA ***********
* This program has four arguments
* `1' is rho - correlation between x1 and x2
* `2' is percentage nonmissing (so 100 - `2' is percentage missing)
* `3' is the number for the data set created
* `4' is the variance of u set so that R^2 = 0.25 in true OLS regression
* The program
* creates a missing data set
* estimates using listwise deletion and mean imputation
* writes out data set for later multiple imputation by SAS
capture program drop missing
program define missing
/* (1) Create complete data set */
di
clear
set obs 1000 /* set sample size*/
matrix covvar = (1,`1' \ `1',1) /* set covariance matrix for x1, x2*/
matrix means = (0,0) /* set mean for x1, x2*/
drawnorm x1 x2, seed(123) cov(covvar) means(means) /* draw x1, x2*/
sum x1 x2 /* check x1, x2 corectly drawn*/
corr x1 x2
gen u = sqrt(_pi^2/3)*logit(uniform()) /* draw logistic error u */
sum u /* check draws of u*/
gen cons = 1
gen ystar = x1 + x2 + u + cons /* generate ystar */
gen y = 0 /* generate y*/
replace y=1 if ystar<=0
gen id = _n
sort id
save x1x2uy.dta, replace
/* (2) Create data set with some observations missing */
use x1x2uy.dta, clear /* randomly set 100-`2' % of x1 missing*/
keep x1
gen id=_n
sample `2'
sort id
rename x1 x1missing /* rename resulting x1 as x1missing*/
save x1.dta, replace
use x1x2uy.dta, clear /* randomly set 100-`2' % of x2 missing*/
keep x2
gen id=_n
sample `2'
sort id
rename x2 x2missing /*rename resulting x2 as x2missing*/
save x2.dta, replace
use x1x2uy, clear /* merge x1missing and x2missing */
sort id
merge id using x1
rename _merge merge1
sort id
merge id using x2
/* (3) Create the first three columns of Tables 27.5-27.6 */
/* OLS with no data missing */
di _n "Column 1: OLS with no data missing"
logit y x1 x2
/* OLS with listwise deletion of missing data */
di _n "Column 2: OLS with listwise deletion of missing data"
logit y x1missing x2missing
/* OLS with mean imputation of missing data */
/* Generate mean imputations of x1 and x2 */
gen x1meanimpute=x1missing
gen x2meanimpute=x2missing
sum x1missing
replace x1meanimpute=r(mean) if x1meanimpute==.
sum x2missing
replace x2meanimpute=r(mean) if x2meanimpute==.
di _n "Column 3: OLS with mean imputation of missing data"
logit y x1meanimpute x2meanimpute
/* Save data for later SAS multiple imputation use */
/* save x1x2missuy.dta, replace */
outfile y x1missing x2missing using mma27logit`3'.asc, replace
clear
end
************ RUN THE PROGRAM TO CREATE SEVERAL MISSING DATA SETS ***********
* This program has four arguments
* `1' is rho - correlation between x1 and x2
* `2' is percentage nonmissing (so 100 - `2' is percentage missing)
* `3' is the number for the data set created
* e.g. the first will be mma27lineardata1.asc
* `4' is the variance of u set so that R^2 = 0.25 in true OLS regression
* Table 27.5
missing 0.64 90 1 10 /* Case 1: high correlation and low missing */
* Not tabulated
missing 0.64 75 2 10 /* Case 2: high correlation and high missing */
* Not tabulated
missing 0.36 90 3 10 /* Case 3: low correlation and low missing */
* Table 27.6
missing 0.36 75 4 10 /* Case 4: low correlation and high missing */
********** CLOSE OUTPUT **********
log close
clear
exit