---------------------------------------------------------------------------------
       log:  c:\Imbook\bwebpage\section6jan2007\mma27p3milogit.txt
  log type:  text
 opened on:  30 Jan 2007, 21:42:51

. 
. ********** OVERVIEW OF MMA27P3MILOGIT.DO **********
. 
. * STATA Program by A. Colin Cameron and Pravin K. Trivedi (2005) for
. * "Microeconometrics: Methods and Applications, Cambridge University Press 
. 
. * Chapter 27.8.2 pp. 937-939  Missing Data Imputation in a Logit Model
. 
. * This program creates the first three columns of Tables 27.5-27.6
. * and it creates the data sets analyzed by SAS for multiple imputations
. * To give the remaining columns of Tables 27.5-27.6
. 
. * There are four cases
. *  1: 10% missing rho=0.64 for Table 27.5 and mma27logit1.asc  
. *  2: 25% missing rho=0.64 for                mma27logit2.asc  
. *  3: 10% missing rho=0.36 for                mma27logit3.asc  
. *  4: 35% missing rho=0.36 for Table 27.6 and mma27logit4.asc  
. 
. * THIS PROGRAM DIFFERS FROM THE PROGRAM THAT CREATED THE TABLE GIVEN IN THE B
> OOK.
. * IT USES A DIFFERENT SEED LEADING TO DIFFERENT DATA SETS
. 
. * The created data are then analyzed using MMA27P4MILOGIT.SAS 
. * to construct the remaining columns of Tables 27.5-27.6
. 
. ********** SETUP ********** 
. 
. set more off

. version 8.0

. set scheme s1mono  /* Graphics scheme */

. 
. ********** SIMULATION OVERVIEW ********** 
. 
. * The data generating process is logit with
. *   y = 1(ystar > 0)
. *   ystar = constant + x1 + x2 + u, 
. *   x1, x2 ~ bivariate normal with covariance matrix(1,rho\rho,1)
. *   u ~ logistic with variance pi^2/3
. *   N = 1000
. 
. * The missing data process is
. *   10% (or 25%) of x1 are randomly missing
. *   10% (or 25%) of x2 are randomly missing
. * They are not necessary to be missing on the same observation.
. 
. * Note that estimated model will give
. * estimated coefficients -1/sqrt(p1^2/3) equals -0.551 approx.
. 
. ************ PROGRAM TO CREATE AND ANALYZE MISSING DATA ***********
. 
. * This program has four arguments
. *   `1' is rho - correlation between x1 and x2
. *   `2' is percentage nonmissing (so 100 - `2' is percentage missing)
. *   `3' is the number for the data set created
. *   `4' is the variance of u set so that R^2 = 0.25 in true OLS regression 
. 
. * The program 
. *    creates a missing data set
. *    estimates using listwise deletion and mean imputation
. *    writes out data set for later multiple imputation by SAS 
. 
. capture program drop missing

. 
. program define missing
  1. 
.   /* (1) Create complete data set */
.   di
  2.   clear 
  3.   set obs 1000                       /* set sample size*/
  4.   matrix covvar = (1,`1' \ `1',1)    /* set covariance matrix for x1, x2*/
  5.   matrix means = (0,0)               /* set mean for x1, x2*/
  6.   drawnorm x1 x2, seed(123) cov(covvar) means(means)  /* draw x1, x2*/
  7.   sum x1 x2                          /* check x1, x2 corectly drawn*/
  8.   corr x1 x2
  9.   gen u = sqrt(_pi^2/3)*logit(uniform())     /* draw logistic error u */
 10.   sum u                                      /* check draws of u*/
 11.   gen cons = 1
 12.   gen ystar = x1 + x2 + u + cons      /* generate ystar */
 13.   gen y = 0                           /* generate y*/
 14.   replace y=1 if ystar<=0
 15.   gen id = _n
 16.   sort id
 17.   save x1x2uy.dta, replace
 18. 
.   /* (2) Create data set with some observations missing */
.   use x1x2uy.dta, clear       /* randomly set 100-`2' % of x1 missing*/
 19.   keep x1
 20.   gen id=_n
 21.   sample `2'
 22.   sort id
 23.   rename x1 x1missing         /* rename resulting x1 as x1missing*/
 24.   save x1.dta, replace
 25.   use x1x2uy.dta, clear       /* randomly set 100-`2' % of x2 missing*/
 26.   keep x2
 27.   gen id=_n
 28.   sample `2'
 29.   sort id
 30.   rename x2 x2missing         /*rename resulting x2 as x2missing*/
 31.   save x2.dta, replace
 32.   use x1x2uy, clear           /* merge x1missing and x2missing */
 33.   sort id
 34.   merge id using x1
 35.   rename _merge merge1
 36.   sort id
 37.   merge id using x2
 38. 
.   /* (3) Create the first three columns of Tables 27.5-27.6 */
. 
.   /* OLS with no data missing */
.   di _n "Column 1: OLS with no data missing"
 39.   logit y x1 x2                  
 40. 
.   /* OLS with listwise deletion of missing data */
.   di _n "Column 2: OLS with listwise deletion of missing data"
 41.   logit y x1missing x2missing                
 42. 
.   /* OLS with mean imputation of missing data */
.   /* Generate mean imputations of x1 and x2 */
.   gen x1meanimpute=x1missing    
 43.   gen x2meanimpute=x2missing
 44.   sum x1missing
 45.   replace x1meanimpute=r(mean) if x1meanimpute==.
 46.   sum x2missing
 47.   replace x2meanimpute=r(mean) if x2meanimpute==.
 48.   di _n "Column 3: OLS with mean imputation of missing data"
 49.   logit y x1meanimpute x2meanimpute 
 50. 
.   /* Save data for later SAS multiple imputation use */
.   /* save x1x2missuy.dta, replace */
.   outfile y x1missing x2missing using mma27logit`3'.asc, replace
 51.   clear
 52. 
. end

. 
. ************ RUN THE PROGRAM TO CREATE SEVERAL MISSING DATA SETS ***********
. 
. * This program has four arguments
. *   `1' is rho - correlation between x1 and x2
. *   `2' is percentage nonmissing (so 100 - `2' is percentage missing)
. *   `3' is the number for the data set created
. *       e.g. the first will be mma27lineardata1.asc 
. *   `4' is the variance of u set so that R^2 = 0.25 in true OLS regression 
. 
. * Table 27.5
. missing 0.64 90 1 10   /* Case 1: high correlation and low missing  */

obs was 0, now 1000

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          x1 |      1000   -.0016071    1.003757   -4.27458   3.808294
          x2 |      1000    .0081246    1.009194  -3.609674   3.751572
(obs=1000)

             |       x1       x2
-------------+------------------
          x1 |   1.0000
          x2 |   0.6459   1.0000


    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
           u |      1000    .0201264    3.337423  -10.88489   12.81112
(394 real changes made)
file x1x2uy.dta saved
(100 observations deleted)
file x1.dta saved
(100 observations deleted)
file x2.dta saved

Column 1: OLS with no data missing

Iteration 0:   log likelihood = -670.50375
Iteration 1:   log likelihood = -573.21465
Iteration 2:   log likelihood = -569.95242
Iteration 3:   log likelihood = -569.92808
Iteration 4:   log likelihood = -569.92807

Logistic regression                               Number of obs   =       1000
                                                  LR chi2(2)      =     201.15
                                                  Prob > chi2     =     0.0000
Log likelihood = -569.92807                       Pseudo R2       =     0.1500

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |  -.6716367   .1001088    -6.71   0.000    -.8678463   -.4754271
          x2 |  -.5018925    .096534    -5.20   0.000    -.6910956   -.3126893
       _cons |  -.5271033   .0731481    -7.21   0.000     -.670471   -.3837357
------------------------------------------------------------------------------

Column 2: OLS with listwise deletion of missing data

Iteration 0:   log likelihood = -540.88154
Iteration 1:   log likelihood = -460.87405
Iteration 2:   log likelihood = -458.07626
Iteration 3:   log likelihood = -458.05342
Iteration 4:   log likelihood = -458.05341

Logistic regression                               Number of obs   =        813
                                                  LR chi2(2)      =     165.66
                                                  Prob > chi2     =     0.0000
Log likelihood = -458.05341                       Pseudo R2       =     0.1531

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   x1missing |  -.6148284    .110222    -5.58   0.000    -.8308595   -.3987972
   x2missing |    -.57235   .1092985    -5.24   0.000     -.786571   -.3581289
       _cons |  -.5876585   .0820429    -7.16   0.000    -.7484597   -.4268573
------------------------------------------------------------------------------
(100 missing values generated)
(100 missing values generated)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
   x1missing |       900    -.001332    1.000239   -4.27458   3.299405
(100 real changes made)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
   x2missing |       900    .0021292     1.01061  -3.609674   3.751572
(100 real changes made)

Column 3: OLS with mean imputation of missing data

Iteration 0:   log likelihood = -670.50375
Iteration 1:   log likelihood = -582.79803
Iteration 2:   log likelihood = -579.99608
Iteration 3:   log likelihood = -579.97793
Iteration 4:   log likelihood = -579.97793

Logistic regression                               Number of obs   =       1000
                                                  LR chi2(2)      =     181.05
                                                  Prob > chi2     =     0.0000
Log likelihood = -579.97793                       Pseudo R2       =     0.1350

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1meanimpute |  -.6634577   .0999748    -6.64   0.000    -.8594047   -.4675108
x2meanimpute |  -.5288173   .0961733    -5.50   0.000    -.7173135    -.340321
       _cons |  -.5169505   .0721383    -7.17   0.000     -.658339   -.3755621
------------------------------------------------------------------------------

.  
. * Not tabulated 
. missing 0.64 75 2 10   /* Case 2: high correlation and high missing */

obs was 0, now 1000

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          x1 |      1000   -.0016071    1.003757   -4.27458   3.808294
          x2 |      1000    .0081246    1.009194  -3.609674   3.751572
(obs=1000)

             |       x1       x2
-------------+------------------
          x1 |   1.0000
          x2 |   0.6459   1.0000


    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
           u |      1000    .0201264    3.337423  -10.88489   12.81112
(394 real changes made)
file x1x2uy.dta saved
(250 observations deleted)
file x1.dta saved
(250 observations deleted)
file x2.dta saved

Column 1: OLS with no data missing

Iteration 0:   log likelihood = -670.50375
Iteration 1:   log likelihood = -573.21465
Iteration 2:   log likelihood = -569.95242
Iteration 3:   log likelihood = -569.92808
Iteration 4:   log likelihood = -569.92807

Logistic regression                               Number of obs   =       1000
                                                  LR chi2(2)      =     201.15
                                                  Prob > chi2     =     0.0000
Log likelihood = -569.92807                       Pseudo R2       =     0.1500

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |  -.6716367   .1001088    -6.71   0.000    -.8678463   -.4754271
          x2 |  -.5018925    .096534    -5.20   0.000    -.6910956   -.3126893
       _cons |  -.5271033   .0731481    -7.21   0.000     -.670471   -.3837357
------------------------------------------------------------------------------

Column 2: OLS with listwise deletion of missing data

Iteration 0:   log likelihood = -381.57758
Iteration 1:   log likelihood =  -328.9304
Iteration 2:   log likelihood = -327.38974
Iteration 3:   log likelihood = -327.38047
Iteration 4:   log likelihood = -327.38047

Logistic regression                               Number of obs   =        572
                                                  LR chi2(2)      =     108.39
                                                  Prob > chi2     =     0.0000
Log likelihood = -327.38047                       Pseudo R2       =     0.1420

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   x1missing |  -.5979623   .1286703    -4.65   0.000    -.8501514   -.3457732
   x2missing |  -.5262477   .1239634    -4.25   0.000    -.7692115   -.2832839
       _cons |  -.5464325   .0963105    -5.67   0.000    -.7351975   -.3576675
------------------------------------------------------------------------------
(250 missing values generated)
(250 missing values generated)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
   x1missing |       750   -.0009517    .9978155  -3.408802   3.299405
(250 real changes made)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
   x2missing |       750    .0187426     1.02357  -3.609674   3.751572
(250 real changes made)

Column 3: OLS with mean imputation of missing data

Iteration 0:   log likelihood = -670.50375
Iteration 1:   log likelihood = -591.92702
Iteration 2:   log likelihood = -589.49443
Iteration 3:   log likelihood = -589.47999
Iteration 4:   log likelihood = -589.47999

Logistic regression                               Number of obs   =       1000
                                                  LR chi2(2)      =     162.05
                                                  Prob > chi2     =     0.0000
Log likelihood = -589.47999                       Pseudo R2       =     0.1208

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1meanimpute |   -.643689   .1022226    -6.30   0.000    -.8440416   -.4433363
x2meanimpute |  -.6281951   .0990987    -6.34   0.000    -.8224249   -.4339653
       _cons |  -.4946273   .0710693    -6.96   0.000    -.6339206   -.3553339
------------------------------------------------------------------------------

.  
. * Not tabulated
. missing 0.36 90 3 10   /* Case 3: low correlation and low missing   */

obs was 0, now 1000

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          x1 |      1000   -.0016071    1.003757   -4.27458   3.808294
          x2 |      1000    .0105351    1.007028  -2.773818   3.677286
(obs=1000)

             |       x1       x2
-------------+------------------
          x1 |   1.0000
          x2 |   0.3702   1.0000


    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
           u |      1000    .0201264    3.337423  -10.88489   12.81112
(395 real changes made)
file x1x2uy.dta saved
(100 observations deleted)
file x1.dta saved
(100 observations deleted)
file x2.dta saved

Column 1: OLS with no data missing

Iteration 0:   log likelihood = -670.93218
Iteration 1:   log likelihood =  -585.9526
Iteration 2:   log likelihood = -583.71967
Iteration 3:   log likelihood = -583.70909

Logistic regression                               Number of obs   =       1000
                                                  LR chi2(2)      =     174.45
                                                  Prob > chi2     =     0.0000
Log likelihood = -583.70909                       Pseudo R2       =     0.1300

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |  -.6774234   .0826805    -8.19   0.000    -.8394743   -.5153726
          x2 |  -.4898934   .0793405    -6.17   0.000    -.6453978   -.3343889
       _cons |   -.504826   .0717842    -7.03   0.000    -.6455205   -.3641315
------------------------------------------------------------------------------

Column 2: OLS with listwise deletion of missing data

Iteration 0:   log likelihood = -541.82874
Iteration 1:   log likelihood = -471.54478
Iteration 2:   log likelihood = -469.61024
Iteration 3:   log likelihood =  -469.6002

Logistic regression                               Number of obs   =        813
                                                  LR chi2(2)      =     144.46
                                                  Prob > chi2     =     0.0000
Log likelihood =  -469.6002                       Pseudo R2       =     0.1333

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   x1missing |  -.6479046   .0908898    -7.13   0.000    -.8260454   -.4697638
   x2missing |  -.5408394   .0897945    -6.02   0.000    -.7168334   -.3648455
       _cons |  -.5567487   .0803693    -6.93   0.000    -.7142696   -.3992278
------------------------------------------------------------------------------
(100 missing values generated)
(100 missing values generated)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
   x1missing |       900    -.001332    1.000239   -4.27458   3.299405
(100 real changes made)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
   x2missing |       900    .0029466    1.002902  -2.773818   3.677286
(100 real changes made)

Column 3: OLS with mean imputation of missing data

Iteration 0:   log likelihood = -670.93218
Iteration 1:   log likelihood = -597.00123
Iteration 2:   log likelihood = -595.28881
Iteration 3:   log likelihood = -595.28272

Logistic regression                               Number of obs   =       1000
                                                  LR chi2(2)      =     151.30
                                                  Prob > chi2     =     0.0000
Log likelihood = -595.28272                       Pseudo R2       =     0.1128

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1meanimpute |  -.6718073   .0857127    -7.84   0.000     -.839801   -.5038136
x2meanimpute |  -.4814971   .0820387    -5.87   0.000      -.64229   -.3207042
       _cons |   -.494628   .0707196    -6.99   0.000     -.633236   -.3560201
------------------------------------------------------------------------------

.  
. * Table 27.6
. missing 0.36 75 4 10   /* Case 4: low correlation and high missing  */

obs was 0, now 1000

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          x1 |      1000   -.0016071    1.003757   -4.27458   3.808294
          x2 |      1000    .0105351    1.007028  -2.773818   3.677286
(obs=1000)

             |       x1       x2
-------------+------------------
          x1 |   1.0000
          x2 |   0.3702   1.0000


    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
           u |      1000    .0201264    3.337423  -10.88489   12.81112
(395 real changes made)
file x1x2uy.dta saved
(250 observations deleted)
file x1.dta saved
(250 observations deleted)
file x2.dta saved

Column 1: OLS with no data missing

Iteration 0:   log likelihood = -670.93218
Iteration 1:   log likelihood =  -585.9526
Iteration 2:   log likelihood = -583.71967
Iteration 3:   log likelihood = -583.70909

Logistic regression                               Number of obs   =       1000
                                                  LR chi2(2)      =     174.45
                                                  Prob > chi2     =     0.0000
Log likelihood = -583.70909                       Pseudo R2       =     0.1300

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |  -.6774234   .0826805    -8.19   0.000    -.8394743   -.5153726
          x2 |  -.4898934   .0793405    -6.17   0.000    -.6453978   -.3343889
       _cons |   -.504826   .0717842    -7.03   0.000    -.6455205   -.3641315
------------------------------------------------------------------------------

Column 2: OLS with listwise deletion of missing data

Iteration 0:   log likelihood = -382.03652
Iteration 1:   log likelihood = -337.02328
Iteration 2:   log likelihood =  -336.0382
Iteration 3:   log likelihood = -336.03485

Logistic regression                               Number of obs   =        572
                                                  LR chi2(2)      =      92.00
                                                  Prob > chi2     =     0.0000
Log likelihood = -336.03485                       Pseudo R2       =     0.1204

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   x1missing |  -.6039695   .1061647    -5.69   0.000    -.8120485   -.3958905
   x2missing |  -.5008986   .1017432    -4.92   0.000    -.7003115   -.3014857
       _cons |  -.5194839   .0943204    -5.51   0.000    -.7043485   -.3346194
------------------------------------------------------------------------------
(250 missing values generated)
(250 missing values generated)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
   x1missing |       750   -.0009517    .9978155  -3.408802   3.299405
(250 real changes made)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
   x2missing |       750    .0200107     1.02082  -2.773818   3.677286
(250 real changes made)

Column 3: OLS with mean imputation of missing data

Iteration 0:   log likelihood = -670.93218
Iteration 1:   log likelihood =  -608.7628
Iteration 2:   log likelihood = -607.52511
Iteration 3:   log likelihood = -607.52193

Logistic regression                               Number of obs   =       1000
                                                  LR chi2(2)      =     126.82
                                                  Prob > chi2     =     0.0000
Log likelihood = -607.52193                       Pseudo R2       =     0.0945

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1meanimpute |   -.628373   .0909477    -6.91   0.000    -.8066273   -.4501188
x2meanimpute |  -.5418008   .0874381    -6.20   0.000    -.7131763   -.3704254
       _cons |  -.4727023    .069536    -6.80   0.000    -.6089903   -.3364142
------------------------------------------------------------------------------

. 
. ********** CLOSE OUTPUT **********
. log close
       log:  c:\Imbook\bwebpage\section6jan2007\mma27p3milogit.txt
  log type:  text
 closed on:  30 Jan 2007, 21:42:51
-------------------------------------------------------------------------------