--------------------------------------------------------------------------------- log: c:\Imbook\bwebpage\section6jan2007\mma27p3milogit.txt log type: text opened on: 30 Jan 2007, 21:42:51 . . ********** OVERVIEW OF MMA27P3MILOGIT.DO ********** . . * STATA Program by A. Colin Cameron and Pravin K. Trivedi (2005) for . * "Microeconometrics: Methods and Applications, Cambridge University Press . . * Chapter 27.8.2 pp. 937-939 Missing Data Imputation in a Logit Model . . * This program creates the first three columns of Tables 27.5-27.6 . * and it creates the data sets analyzed by SAS for multiple imputations . * To give the remaining columns of Tables 27.5-27.6 . . * There are four cases . * 1: 10% missing rho=0.64 for Table 27.5 and mma27logit1.asc . * 2: 25% missing rho=0.64 for mma27logit2.asc . * 3: 10% missing rho=0.36 for mma27logit3.asc . * 4: 35% missing rho=0.36 for Table 27.6 and mma27logit4.asc . . * THIS PROGRAM DIFFERS FROM THE PROGRAM THAT CREATED THE TABLE GIVEN IN THE B > OOK. . * IT USES A DIFFERENT SEED LEADING TO DIFFERENT DATA SETS . . * The created data are then analyzed using MMA27P4MILOGIT.SAS . * to construct the remaining columns of Tables 27.5-27.6 . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ . . ********** SIMULATION OVERVIEW ********** . . * The data generating process is logit with . * y = 1(ystar > 0) . * ystar = constant + x1 + x2 + u, . * x1, x2 ~ bivariate normal with covariance matrix(1,rho\rho,1) . * u ~ logistic with variance pi^2/3 . * N = 1000 . . * The missing data process is . * 10% (or 25%) of x1 are randomly missing . * 10% (or 25%) of x2 are randomly missing . * They are not necessary to be missing on the same observation. . . * Note that estimated model will give . * estimated coefficients -1/sqrt(p1^2/3) equals -0.551 approx. . . ************ PROGRAM TO CREATE AND ANALYZE MISSING DATA *********** . . * This program has four arguments . * `1' is rho - correlation between x1 and x2 . * `2' is percentage nonmissing (so 100 - `2' is percentage missing) . * `3' is the number for the data set created . * `4' is the variance of u set so that R^2 = 0.25 in true OLS regression . . * The program . * creates a missing data set . * estimates using listwise deletion and mean imputation . * writes out data set for later multiple imputation by SAS . . capture program drop missing . . program define missing 1. . /* (1) Create complete data set */ . di 2. clear 3. set obs 1000 /* set sample size*/ 4. matrix covvar = (1,`1' \ `1',1) /* set covariance matrix for x1, x2*/ 5. matrix means = (0,0) /* set mean for x1, x2*/ 6. drawnorm x1 x2, seed(123) cov(covvar) means(means) /* draw x1, x2*/ 7. sum x1 x2 /* check x1, x2 corectly drawn*/ 8. corr x1 x2 9. gen u = sqrt(_pi^2/3)*logit(uniform()) /* draw logistic error u */ 10. sum u /* check draws of u*/ 11. gen cons = 1 12. gen ystar = x1 + x2 + u + cons /* generate ystar */ 13. gen y = 0 /* generate y*/ 14. replace y=1 if ystar<=0 15. gen id = _n 16. sort id 17. save x1x2uy.dta, replace 18. . /* (2) Create data set with some observations missing */ . use x1x2uy.dta, clear /* randomly set 100-`2' % of x1 missing*/ 19. keep x1 20. gen id=_n 21. sample `2' 22. sort id 23. rename x1 x1missing /* rename resulting x1 as x1missing*/ 24. save x1.dta, replace 25. use x1x2uy.dta, clear /* randomly set 100-`2' % of x2 missing*/ 26. keep x2 27. gen id=_n 28. sample `2' 29. sort id 30. rename x2 x2missing /*rename resulting x2 as x2missing*/ 31. save x2.dta, replace 32. use x1x2uy, clear /* merge x1missing and x2missing */ 33. sort id 34. merge id using x1 35. rename _merge merge1 36. sort id 37. merge id using x2 38. . /* (3) Create the first three columns of Tables 27.5-27.6 */ . . /* OLS with no data missing */ . di _n "Column 1: OLS with no data missing" 39. logit y x1 x2 40. . /* OLS with listwise deletion of missing data */ . di _n "Column 2: OLS with listwise deletion of missing data" 41. logit y x1missing x2missing 42. . /* OLS with mean imputation of missing data */ . /* Generate mean imputations of x1 and x2 */ . gen x1meanimpute=x1missing 43. gen x2meanimpute=x2missing 44. sum x1missing 45. replace x1meanimpute=r(mean) if x1meanimpute==. 46. sum x2missing 47. replace x2meanimpute=r(mean) if x2meanimpute==. 48. di _n "Column 3: OLS with mean imputation of missing data" 49. logit y x1meanimpute x2meanimpute 50. . /* Save data for later SAS multiple imputation use */ . /* save x1x2missuy.dta, replace */ . outfile y x1missing x2missing using mma27logit`3'.asc, replace 51. clear 52. . end . . ************ RUN THE PROGRAM TO CREATE SEVERAL MISSING DATA SETS *********** . . * This program has four arguments . * `1' is rho - correlation between x1 and x2 . * `2' is percentage nonmissing (so 100 - `2' is percentage missing) . * `3' is the number for the data set created . * e.g. the first will be mma27lineardata1.asc . * `4' is the variance of u set so that R^2 = 0.25 in true OLS regression . . * Table 27.5 . missing 0.64 90 1 10 /* Case 1: high correlation and low missing */ obs was 0, now 1000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1 | 1000 -.0016071 1.003757 -4.27458 3.808294 x2 | 1000 .0081246 1.009194 -3.609674 3.751572 (obs=1000) | x1 x2 -------------+------------------ x1 | 1.0000 x2 | 0.6459 1.0000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- u | 1000 .0201264 3.337423 -10.88489 12.81112 (394 real changes made) file x1x2uy.dta saved (100 observations deleted) file x1.dta saved (100 observations deleted) file x2.dta saved Column 1: OLS with no data missing Iteration 0: log likelihood = -670.50375 Iteration 1: log likelihood = -573.21465 Iteration 2: log likelihood = -569.95242 Iteration 3: log likelihood = -569.92808 Iteration 4: log likelihood = -569.92807 Logistic regression Number of obs = 1000 LR chi2(2) = 201.15 Prob > chi2 = 0.0000 Log likelihood = -569.92807 Pseudo R2 = 0.1500 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | -.6716367 .1001088 -6.71 0.000 -.8678463 -.4754271 x2 | -.5018925 .096534 -5.20 0.000 -.6910956 -.3126893 _cons | -.5271033 .0731481 -7.21 0.000 -.670471 -.3837357 ------------------------------------------------------------------------------ Column 2: OLS with listwise deletion of missing data Iteration 0: log likelihood = -540.88154 Iteration 1: log likelihood = -460.87405 Iteration 2: log likelihood = -458.07626 Iteration 3: log likelihood = -458.05342 Iteration 4: log likelihood = -458.05341 Logistic regression Number of obs = 813 LR chi2(2) = 165.66 Prob > chi2 = 0.0000 Log likelihood = -458.05341 Pseudo R2 = 0.1531 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1missing | -.6148284 .110222 -5.58 0.000 -.8308595 -.3987972 x2missing | -.57235 .1092985 -5.24 0.000 -.786571 -.3581289 _cons | -.5876585 .0820429 -7.16 0.000 -.7484597 -.4268573 ------------------------------------------------------------------------------ (100 missing values generated) (100 missing values generated) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1missing | 900 -.001332 1.000239 -4.27458 3.299405 (100 real changes made) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x2missing | 900 .0021292 1.01061 -3.609674 3.751572 (100 real changes made) Column 3: OLS with mean imputation of missing data Iteration 0: log likelihood = -670.50375 Iteration 1: log likelihood = -582.79803 Iteration 2: log likelihood = -579.99608 Iteration 3: log likelihood = -579.97793 Iteration 4: log likelihood = -579.97793 Logistic regression Number of obs = 1000 LR chi2(2) = 181.05 Prob > chi2 = 0.0000 Log likelihood = -579.97793 Pseudo R2 = 0.1350 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1meanimpute | -.6634577 .0999748 -6.64 0.000 -.8594047 -.4675108 x2meanimpute | -.5288173 .0961733 -5.50 0.000 -.7173135 -.340321 _cons | -.5169505 .0721383 -7.17 0.000 -.658339 -.3755621 ------------------------------------------------------------------------------ . . * Not tabulated . missing 0.64 75 2 10 /* Case 2: high correlation and high missing */ obs was 0, now 1000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1 | 1000 -.0016071 1.003757 -4.27458 3.808294 x2 | 1000 .0081246 1.009194 -3.609674 3.751572 (obs=1000) | x1 x2 -------------+------------------ x1 | 1.0000 x2 | 0.6459 1.0000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- u | 1000 .0201264 3.337423 -10.88489 12.81112 (394 real changes made) file x1x2uy.dta saved (250 observations deleted) file x1.dta saved (250 observations deleted) file x2.dta saved Column 1: OLS with no data missing Iteration 0: log likelihood = -670.50375 Iteration 1: log likelihood = -573.21465 Iteration 2: log likelihood = -569.95242 Iteration 3: log likelihood = -569.92808 Iteration 4: log likelihood = -569.92807 Logistic regression Number of obs = 1000 LR chi2(2) = 201.15 Prob > chi2 = 0.0000 Log likelihood = -569.92807 Pseudo R2 = 0.1500 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | -.6716367 .1001088 -6.71 0.000 -.8678463 -.4754271 x2 | -.5018925 .096534 -5.20 0.000 -.6910956 -.3126893 _cons | -.5271033 .0731481 -7.21 0.000 -.670471 -.3837357 ------------------------------------------------------------------------------ Column 2: OLS with listwise deletion of missing data Iteration 0: log likelihood = -381.57758 Iteration 1: log likelihood = -328.9304 Iteration 2: log likelihood = -327.38974 Iteration 3: log likelihood = -327.38047 Iteration 4: log likelihood = -327.38047 Logistic regression Number of obs = 572 LR chi2(2) = 108.39 Prob > chi2 = 0.0000 Log likelihood = -327.38047 Pseudo R2 = 0.1420 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1missing | -.5979623 .1286703 -4.65 0.000 -.8501514 -.3457732 x2missing | -.5262477 .1239634 -4.25 0.000 -.7692115 -.2832839 _cons | -.5464325 .0963105 -5.67 0.000 -.7351975 -.3576675 ------------------------------------------------------------------------------ (250 missing values generated) (250 missing values generated) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1missing | 750 -.0009517 .9978155 -3.408802 3.299405 (250 real changes made) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x2missing | 750 .0187426 1.02357 -3.609674 3.751572 (250 real changes made) Column 3: OLS with mean imputation of missing data Iteration 0: log likelihood = -670.50375 Iteration 1: log likelihood = -591.92702 Iteration 2: log likelihood = -589.49443 Iteration 3: log likelihood = -589.47999 Iteration 4: log likelihood = -589.47999 Logistic regression Number of obs = 1000 LR chi2(2) = 162.05 Prob > chi2 = 0.0000 Log likelihood = -589.47999 Pseudo R2 = 0.1208 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1meanimpute | -.643689 .1022226 -6.30 0.000 -.8440416 -.4433363 x2meanimpute | -.6281951 .0990987 -6.34 0.000 -.8224249 -.4339653 _cons | -.4946273 .0710693 -6.96 0.000 -.6339206 -.3553339 ------------------------------------------------------------------------------ . . * Not tabulated . missing 0.36 90 3 10 /* Case 3: low correlation and low missing */ obs was 0, now 1000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1 | 1000 -.0016071 1.003757 -4.27458 3.808294 x2 | 1000 .0105351 1.007028 -2.773818 3.677286 (obs=1000) | x1 x2 -------------+------------------ x1 | 1.0000 x2 | 0.3702 1.0000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- u | 1000 .0201264 3.337423 -10.88489 12.81112 (395 real changes made) file x1x2uy.dta saved (100 observations deleted) file x1.dta saved (100 observations deleted) file x2.dta saved Column 1: OLS with no data missing Iteration 0: log likelihood = -670.93218 Iteration 1: log likelihood = -585.9526 Iteration 2: log likelihood = -583.71967 Iteration 3: log likelihood = -583.70909 Logistic regression Number of obs = 1000 LR chi2(2) = 174.45 Prob > chi2 = 0.0000 Log likelihood = -583.70909 Pseudo R2 = 0.1300 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | -.6774234 .0826805 -8.19 0.000 -.8394743 -.5153726 x2 | -.4898934 .0793405 -6.17 0.000 -.6453978 -.3343889 _cons | -.504826 .0717842 -7.03 0.000 -.6455205 -.3641315 ------------------------------------------------------------------------------ Column 2: OLS with listwise deletion of missing data Iteration 0: log likelihood = -541.82874 Iteration 1: log likelihood = -471.54478 Iteration 2: log likelihood = -469.61024 Iteration 3: log likelihood = -469.6002 Logistic regression Number of obs = 813 LR chi2(2) = 144.46 Prob > chi2 = 0.0000 Log likelihood = -469.6002 Pseudo R2 = 0.1333 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1missing | -.6479046 .0908898 -7.13 0.000 -.8260454 -.4697638 x2missing | -.5408394 .0897945 -6.02 0.000 -.7168334 -.3648455 _cons | -.5567487 .0803693 -6.93 0.000 -.7142696 -.3992278 ------------------------------------------------------------------------------ (100 missing values generated) (100 missing values generated) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1missing | 900 -.001332 1.000239 -4.27458 3.299405 (100 real changes made) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x2missing | 900 .0029466 1.002902 -2.773818 3.677286 (100 real changes made) Column 3: OLS with mean imputation of missing data Iteration 0: log likelihood = -670.93218 Iteration 1: log likelihood = -597.00123 Iteration 2: log likelihood = -595.28881 Iteration 3: log likelihood = -595.28272 Logistic regression Number of obs = 1000 LR chi2(2) = 151.30 Prob > chi2 = 0.0000 Log likelihood = -595.28272 Pseudo R2 = 0.1128 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1meanimpute | -.6718073 .0857127 -7.84 0.000 -.839801 -.5038136 x2meanimpute | -.4814971 .0820387 -5.87 0.000 -.64229 -.3207042 _cons | -.494628 .0707196 -6.99 0.000 -.633236 -.3560201 ------------------------------------------------------------------------------ . . * Table 27.6 . missing 0.36 75 4 10 /* Case 4: low correlation and high missing */ obs was 0, now 1000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1 | 1000 -.0016071 1.003757 -4.27458 3.808294 x2 | 1000 .0105351 1.007028 -2.773818 3.677286 (obs=1000) | x1 x2 -------------+------------------ x1 | 1.0000 x2 | 0.3702 1.0000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- u | 1000 .0201264 3.337423 -10.88489 12.81112 (395 real changes made) file x1x2uy.dta saved (250 observations deleted) file x1.dta saved (250 observations deleted) file x2.dta saved Column 1: OLS with no data missing Iteration 0: log likelihood = -670.93218 Iteration 1: log likelihood = -585.9526 Iteration 2: log likelihood = -583.71967 Iteration 3: log likelihood = -583.70909 Logistic regression Number of obs = 1000 LR chi2(2) = 174.45 Prob > chi2 = 0.0000 Log likelihood = -583.70909 Pseudo R2 = 0.1300 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | -.6774234 .0826805 -8.19 0.000 -.8394743 -.5153726 x2 | -.4898934 .0793405 -6.17 0.000 -.6453978 -.3343889 _cons | -.504826 .0717842 -7.03 0.000 -.6455205 -.3641315 ------------------------------------------------------------------------------ Column 2: OLS with listwise deletion of missing data Iteration 0: log likelihood = -382.03652 Iteration 1: log likelihood = -337.02328 Iteration 2: log likelihood = -336.0382 Iteration 3: log likelihood = -336.03485 Logistic regression Number of obs = 572 LR chi2(2) = 92.00 Prob > chi2 = 0.0000 Log likelihood = -336.03485 Pseudo R2 = 0.1204 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1missing | -.6039695 .1061647 -5.69 0.000 -.8120485 -.3958905 x2missing | -.5008986 .1017432 -4.92 0.000 -.7003115 -.3014857 _cons | -.5194839 .0943204 -5.51 0.000 -.7043485 -.3346194 ------------------------------------------------------------------------------ (250 missing values generated) (250 missing values generated) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1missing | 750 -.0009517 .9978155 -3.408802 3.299405 (250 real changes made) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x2missing | 750 .0200107 1.02082 -2.773818 3.677286 (250 real changes made) Column 3: OLS with mean imputation of missing data Iteration 0: log likelihood = -670.93218 Iteration 1: log likelihood = -608.7628 Iteration 2: log likelihood = -607.52511 Iteration 3: log likelihood = -607.52193 Logistic regression Number of obs = 1000 LR chi2(2) = 126.82 Prob > chi2 = 0.0000 Log likelihood = -607.52193 Pseudo R2 = 0.0945 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1meanimpute | -.628373 .0909477 -6.91 0.000 -.8066273 -.4501188 x2meanimpute | -.5418008 .0874381 -6.20 0.000 -.7131763 -.3704254 _cons | -.4727023 .069536 -6.80 0.000 -.6089903 -.3364142 ------------------------------------------------------------------------------ . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\section6jan2007\mma27p3milogit.txt log type: text closed on: 30 Jan 2007, 21:42:51 -------------------------------------------------------------------------------