* MMA27P1MILINEAR.DO January 2007 for Stata version 8.0
* based on mcar.do
clear
capture log close
log using mma27p1milinear.txt, text replace
********** OVERVIEW OF MMA27P1MILINEAR.DO **********
* STATA Program by A. Colin Cameron and Pravin K. Trivedi (2005) for
* "Microeconometrics: Methods and Applications, Cambridge University Press
* Chapter 27.8.1 pp. 936-937 Missing Data Imputation in a Linear Model
* This program creates the first three columns of Tables 27.2-27.4
* and it creates the data sets analyzed by SAS for multiple imputations
* To give the remaining columns of Tables 27.2-27.4
* There are four cases
* 1: 10% missing rho=0.64 for Table 27.5 and mma27linear1.asc
* 2: 25% missing rho=0.64 for Table 27.5 and mma27linear2.asc
* 3: 10% missing rho=0.36 for Table 27.5 and mma27linear3.asc
* 4: 35% missing rho=0.36 for Table 27.5 and mma27linear4.asc
* THIS PROGRAM DIFFERS FROM THE PROGRAM THAT CREATED THE TABLE GIVEN IN THE BOOK.
* IT USES A DIFFERENT SEED LEADING TO DIFFERENT DATA SETS
* The created data are then analyzed using MMA27P2MILINEAR.SAS
* to construct the remaining columns of Tables 27.2-27.4
********** SETUP **********
set more off
version 8.0
set scheme s1mono /* Graphics scheme */
********** SIMULATION OVERVIEW **********
* The data generating process is
* y = 1 + x1 + x2 + u,
* x1, x2 ~ bivariate normal with covariance matrix(1,rho\rho,1)
* u ~ normal( with variance set so R^2 = 0.25 in the true OLS regression
* N = 1000
* The missing data process is
* 10% (or 25%) of x1 are randomly missing
* 10% (or 25%) of x2 are randomly missing
* They are not necessary to be missing on the same observation.
************ PROGRAM TO CREATE AND ANALYZE MISSING DATA ***********
* This program has four arguments
* `1' is rho - correlation between x1 and x2
* `2' is percentage nonmissing (so 100 - `2' is percentage missing)
* `3' is the number for the data set created
* `4' is the variance of u set so that R^2 = 0.25 in true OLS regression
* The program
* creates a missing data set
* estimates using listwise deletion and mean imputation
* writes out data set for later multiple imputation by SAS
capture program drop missing
program define missing
/* (1) Create complete data set */
di
clear
set obs 1000 /* set sample size*/
matrix covvar = (1,`1' \ `1',1) /* set covariance matrix for x1, x2*/
matrix means = (0,0) /* set mean for x1, x2*/
drawnorm x1 x2, seed(123) cov(covvar) means(means) /* draw x1, x2*/
sum x1 x2 /* check x1, x2 corectly drawn*/
corr x1 x2
drawnorm u, seed(1234) means(0) cov(`4') /* draw error u*/
sum u /* check draws of u*/
gen cons = 1
gen y = x1 + x2 + u + cons /* generate y*/
gen id = _n
sort id
save x1x2uy.dta, replace
/* (2) Create data set with some observations missing */
use x1x2uy.dta, clear /* randomly set 100-`2' % of x1 missing*/
keep x1
gen id=_n
sample `2'
sort id
rename x1 x1missing /* rename resulting x1 as x1missing*/
save x1.dta, replace
use x1x2uy.dta, clear /* randomly set 100-`2' % of x2 missing*/
keep x2
gen id=_n
sample `2'
sort id
rename x2 x2missing /*rename resulting x2 as x2missing*/
save x2.dta, replace
use x1x2uy, clear /* merge x1missing and x2missing */
sort id
merge id using x1
rename _merge merge1
sort id
merge id using x2
/* (3) Create the first three columns of Tables 27.2-27.4 */
/* OLS with no data missing */
di _n "Column 1: OLS with no data missing"
reg y x1 x2
/* OLS with listwise deletion of missing data */
di _n "Column 2: OLS with listwise deletion of missing data"
reg y x1missing x2missing
/* OLS with mean imputation of missing data */
/* Generate mean imputations of x1 and x2 */
gen x1meanimpute=x1missing
gen x2meanimpute=x2missing
sum x1missing
replace x1meanimpute=r(mean) if x1meanimpute==.
sum x2missing
replace x2meanimpute=r(mean) if x2meanimpute==.
di _n "Column 3: OLS with mean imputation of missing data"
reg y x1meanimpute x2meanimpute
/* Save data for later SAS multiple imputation use */
/* save x1x2missuy.dta, replace */
outfile y x1missing x2missing using mma27linear`3'.asc, replace
clear
end
************ RUN THE PROGRAM TO CREATE SEVERAL MISSING DATA SETS ***********
* This program has four arguments
* `1' is rho - correlation between x1 and x2
* `2' is percentage nonmissing (so 100 - `2' is percentage missing)
* `3' is the number for the data set created
* e.g. the first will be mma27lineardata1.asc
* `4' is the variance of u set so that R^2 = 0.25 in true OLS regression
* Table 27.2
missing 0.64 90 1 10 /* Case 1: high correlation and low missing */
* Table 27.3
missing 0.64 75 2 10 /* Case 2: high correlation and high missing */
* Not tabulated
missing 0.36 90 3 10 /* Case 3: low correlation and low missing */
* Table 27.4
missing 0.36 75 4 10 /* Case 4: low correlation and high missing */
********** CLOSE OUTPUT **********
log close
clear
exit