-------------------------------------------------------------------------------------------------------------------------------
      name:  <unnamed>
       log:  c:\acdbookrevision\webpage_finalize\racd06p4.txt
  log type:  text
 opened on:   6 Jun 2013, 15:06:35

. 
. ********** OVERVIEW OF racd06p4.do **********
. 
. * STATA Program 
. * copyright C 2013 by A. Colin Cameron and Pravin K. Trivedi 
. * used for "Regression Analyis of Count Data" SECOND EDITION
. * by A. Colin Cameron and Pravin K. Trivedi (2013)
. * Cambridge University Press
. 
. * Chapter 6.6 only
. *   6.6 MODEL SELECTION CRITERIA: A DIGRESSION
. 
. * To run you need no dataset as simulated data
. * and user-written Stata addon
. *   fmm
. * in your directory
. 
. ********** SETUP **********
. 
. set more off

. version 12

. clear all

. set mem 10m
set memory ignored.
    Memory no longer needs to be set in modern Statas; memory adjustments are performed on the fly automatically.

. * set linesize 82
. set scheme s1mono  /* Graphics scheme */

. 
. ********** DATA DESCRIPTION
. 
. * Generated data
. 
. ********** RESULTS HERE DIFFER FROM THE BOOK
. 
. * The book presents results summarized in section 4 of Deb and Trivedi (1997). 
. * The more detailed results given in Table 6.18 (2nd ed.) and Table 6.15 (1st ed.)
. * were not given in the Deb and Trivedi article.
. * Furthermore we do not have the code that produced these tables.
. 
. * Instead, here we simply generate data from similar dgp's. 
. * And we do a small amount of the analysis in Table 6.18 (part of column 1)
. 
. ********** 6.6 MODEL SELECTION CRITERIA: A DIGRESSION
. 
. * Three data generating processes
. *   (1)  Poisson         mu = exp(-1.445 + 3.0*x)
. *   (2)  Poisson Hurdle  Zeros      mu = exp(-1.6 + 3.0*x)
. *                        Positives  mu = exp(-1.35 + 3.0*x)
. *   (3)  FM Poisson      Component1 mu = exp(-1.225 + 3.0*x)
. *                        Component1 mu = exp(-1.5 + 0.75x*x)
. 
. 
. **** DATA GENERATING PROCESSES
. 
. clear

. set obs 100000
obs was 0, now 100000

. set seed 10101

. generate x = runiform()

. 
. * Poisson
. * generate yP = rpoisson(exp(-1.6 + 3.0*x))
. generate yP = rpoisson(exp(-1.445 + 3.0*x))

. 
. * Hurdle Poisson - not so easy to draw from
. generate yH = 0

. generate muH2 = exp(-1.35 + 3.0*x)

. scalar miny = 0

. * The following draws from truncated at zero Poisson
. while miny == 0 {
  2.  generate yph = rpoisson(muH2) 
  3.  quietly replace yH = yph if yH==0 
  4.  drop yph
  5.  quietly summarize yH
  6.  scalar miny = r(min)
  7. }

. * Now bring in the hurdle
. replace yH = 0 if runiform() < (1 - (1-exp(-exp(-1.6 + 3.0*x))))
(40426 real changes made)

. 
. * Finite mixture
. generate yFM = rpoisson(exp(-1.225 + 3.0*x))

. replace yFM = rpoisson(exp(-1.5 + .75*x)) if runiform() > .75
(17544 real changes made)

. generate yP0 = yP==0

. generate yH0 = yH==0

. generate yFM0 = yFM==0

. 
. * Calculate the fraction of positives
. generate dyP = yP > 0

. generate dyH = yH > 0

. generate dyFM = yFM > 0

. 
. summarize

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
           x |    100000    .4988239     .288796   7.72e-06    .999995
          yP |    100000     1.49485    1.726645          0         14
          yH |    100000      1.5215    1.860052          0         14
        muH2 |    100000    1.644359    1.335116   .2592463   5.206902
         yFM |    100000     1.47886    1.903552          0         15
-------------+--------------------------------------------------------
         yP0 |    100000      .36454    .4813033          0          1
         yH0 |    100000      .40426    .4907507          0          1
        yFM0 |    100000      .41024    .4918796          0          1
         dyP |    100000      .63546    .4813033          0          1
         dyH |    100000      .59574    .4907507          0          1
-------------+--------------------------------------------------------
        dyFM |    100000      .58976    .4918796          0          1

. 
. *** Check the dgps
. 
. * Poisson: dgp was rpoisson(exp(-1.445 + 3.0*x))
. poisson yP x

Iteration 0:   log likelihood = -135086.53  
Iteration 1:   log likelihood = -135084.56  
Iteration 2:   log likelihood = -135084.56  

Poisson regression                                Number of obs   =     100000
                                                  LR chi2(1)      =   92154.15
                                                  Prob > chi2     =     0.0000
Log likelihood = -135084.56                       Pseudo R2       =     0.2543

------------------------------------------------------------------------------
          yP |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   3.004097   .0109232   275.02   0.000     2.982688    3.025506
       _cons |  -1.448268   .0082638  -175.25   0.000    -1.464465   -1.432071
------------------------------------------------------------------------------

. 
. * Finite mixture: dgp was pi = .75 mixture of 
. *  rpoisson(exp(-1.225 + 3.0*x)) & rpoisson(exp(-1.5 + .75*x)) 
. fmm yFM x, components(2) mixtureof(poisson)

Fitting Poisson model:

Iteration 0:   log likelihood = -148969.14  
Iteration 1:   log likelihood = -148967.97  
Iteration 2:   log likelihood = -148967.97  

Fitting 2 component Poisson model:

Iteration 0:   log likelihood =  -148961.8  (not concave)
Iteration 1:   log likelihood = -148781.18  (not concave)
Iteration 2:   log likelihood = -146838.28  (not concave)
Iteration 3:   log likelihood = -145740.73  
Iteration 4:   log likelihood = -144883.39  (not concave)
Iteration 5:   log likelihood = -143885.92  (not concave)
Iteration 6:   log likelihood = -143697.95  (not concave)
Iteration 7:   log likelihood = -143592.13  (not concave)
Iteration 8:   log likelihood = -143515.87  
Iteration 9:   log likelihood = -143437.18  
Iteration 10:  log likelihood = -143420.88  
Iteration 11:  log likelihood = -143418.04  
Iteration 12:  log likelihood = -143418.04  

2 component Poisson regression                    Number of obs   =     100000
                                                  Wald chi2(2)    =   50277.10
Log likelihood = -143418.04                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
         yFM |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
component1   |
           x |   .8005967   .0984805     8.13   0.000     .6075784    .9936149
       _cons |   -1.51907   .0759676   -20.00   0.000    -1.667963   -1.370176
-------------+----------------------------------------------------------------
component2   |
           x |   3.002581   .0159919   187.76   0.000     2.971237    3.033925
       _cons |  -1.233385   .0128465   -96.01   0.000    -1.258564   -1.208206
-------------+----------------------------------------------------------------
 /imlogitpi1 |  -1.120153   .0214457   -52.23   0.000    -1.162186    -1.07812
------------------------------------------------------------------------------
         pi1 |   .2459829   .0039776                      .2382703    .2538619
         pi2 |   .7540171   .0039776                      .7461381    .7617297
------------------------------------------------------------------------------

. scalar AICFM = -2*e(ll) + 2*e(k)

. scalar BICFM = -2*e(ll) + ln(e(N))*e(k)

. display "BIC Finite Mixture = " BICFM
BIC Finite Mixture = 286893.64

. 
. * Hurdle: 
. * Hurdle first component: Poisson for the zeroes
. program lfpoissonbinary
  1.   version 10.1
  2.   args lnf theta1                 
  3.   tempvar mu p0
  4.   local y $ML_y1                  
  5.   generate double `mu'  = exp(`theta1')
  6.   generate double `p0' = exp(-`mu')
  7.   quietly replace `lnf' = ln(`p0') if $ML_y1 == 0
  8.   quietly replace `lnf' = ln(1-`p0') if $ML_y1 == 1
  9. end

. ml model lf lfpoissonbinary (dyH = x)

. ml maximize, nolog

                                                  Number of obs   =     100000
                                                  Wald chi2(1)    =   26046.66
Log likelihood = -51369.712                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
         dyH |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   2.997719   .0185744   161.39   0.000     2.961313    3.034124
       _cons |  -1.596699   .0114249  -139.76   0.000    -1.619091   -1.574307
------------------------------------------------------------------------------

. estimates store H1p

. scalar llH1p = e(ll)

. scalar kH1p = e(k)

. scalar nH = e(N)

. * Hurdle second component: Poisson for the positives 
. ztp yH x if yH>0

Iteration 0:   log likelihood = -88620.795  
Iteration 1:   log likelihood = -84951.234  
Iteration 2:   log likelihood = -84950.333  
Iteration 3:   log likelihood = -84950.333  

Zero-truncated Poisson regression                 Number of obs   =      59574
                                                  LR chi2(1)      =   43580.53
                                                  Prob > chi2     =     0.0000
Log likelihood = -84950.333                       Pseudo R2       =     0.2041

------------------------------------------------------------------------------
          yH |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   3.012353   .0171157   176.00   0.000     2.978807    3.045899
       _cons |   -1.36163   .0140886   -96.65   0.000    -1.389244   -1.334017
------------------------------------------------------------------------------

. estimates store H2p

. scalar llH2p = e(ll)

. scalar kH2p = e(k)

. * Combine two parts 
. scalar llHp = llH1p + llH2p

. scalar kHp = kH1p + kH2p

. scalar AICHp = -2*llHp + 2*kHp

. scalar BICHp = -2*llHp + ln(nH)*kHp

. 
. * Display hurdle results
. estimates table H1p H2p, eq(1) b se

----------------------------------------
    Variable |    H1p          H2p      
-------------+--------------------------
           x |  2.9977185    3.0123527  
             |   .0185744    .01711565  
       _cons |  -1.596699   -1.3616305  
             |  .01142485    .01408858  
----------------------------------------
                            legend: b/se

. display "BIC Hurdle Poisson = " BICHp
BIC Hurdle Poisson = 272686.14

. 
. 
. **** SIMULATIONS
. 
. * This does part of the first column of Table 6.18
. * The data generating process is Poisson
. * Just Poisson and Poisson FM models are estimated
. * The models are compared using LR test, AIC and BIC
. 
. * To speed up program there are just 50 simulations
. * numsims shoudl be increased
. 
. clear

. set seed 10101

. global numsims 50

. global numobs 500

. program simbypost
  1.     version 10.1 
  2.     tempname simfile
  3.     postfile `simfile' bpcons bpx bfmp1cons bfmp1x bfmp2cons bfmp2x pi lrtest lrreject AICplow BICplow using simresults, r
> eplace
  4.     quietly {
  5.        forvalues i = 1/$numsims { 
  6.          display "simulation: " `i'
  7.          drop _all
  8.          set obs $numobs
  9.          generate x = runiform()                 
 10.          generate y = rpoisson(exp(-1.445 + 3.0*x))
 11.          poisson y x
 12.          scalar bpx =_b[x]
 13.          scalar bpcons = _b[_cons]
 14.          scalar llp = e(ll)
 15.          scalar AICp = -2*e(ll) + 2*e(k)
 16.          scalar BICp = -2*e(ll) + ln(e(n))*e(k)
 17.          fmm y x, components(2) mixtureof(poisson) iter(20)
 18.          scalar llfmm = e(ll)
 19.          scalar AICFM = -2*e(ll) + 2*e(k)
 20.          scalar BICFM = -2*e(ll) + ln(e(n))*e(k)
 21.          scalar converge = e(converge)         
 22.          matrix bfmp = e(b)
 23.          scalar bfmp1x = exp(bfmp[1,1])
 24.          scalar bfmp1cons = exp(bfmp[1,2])
 25.          scalar bfmp2x = exp(bfmp[1,3])
 26.          scalar bfmp2cons = exp(bfmp[1,4])
 27.          scalar pi = exp(bfmp[1,5])/(1+exp(bfmp[1,5]))
 28.          scalar lrtest = 2*(llfmm - llp)
 29.          scalar lrreject = lrtest > invchi2(1,.95)
 30.          scalar AICplow = AICp < AICFM
 31.          scalar BICplow = BICp < BICFM
 32.          post `simfile' (bpcons) (bpx) (bfmp1cons) (bfmp1x) (bfmp2cons) (bfmp2x) (pi) (lrtest) (lrreject) (AICplow) (AICpl
> ow)
 33.       }
 34.     }
 35.     postclose `simfile'
 36. end

. simbypost

. use simresults, clear

. summarize

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
      bpcons |        50    -1.43788    .0983307  -1.629938  -1.163989
         bpx |        50    2.984132    .1324086   2.657044   3.233268
   bfmp1cons |        50    .2362758    .1124856   3.79e-31   .8152104
      bfmp1x |        50    3.34e+29    2.36e+30   1.208939   1.67e+31
   bfmp2cons |        50    .2400324    .0840982   .0252217   .6401486
-------------+--------------------------------------------------------
      bfmp2x |        50    35.78528    76.34528    8.66124   553.6423
          pi |        50    .6642202     .279737   .0261622          1
      lrtest |        50    1.316403    2.131809  -.0001312   10.78937
    lrreject |        50          .1    .3030458          0          1
     AICplow |        50         .96    .1979487          0          1
-------------+--------------------------------------------------------
     BICplow |        50         .96    .1979487          0          1

. mean bpcons bpx bfmp1cons bfmp1x bfmp2cons bfmp2x pi lrtest lrreject AICplow BICplow, noheader
--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
      bpcons |   -1.43788   .0139061     -1.465825   -1.409935
         bpx |   2.984132   .0187254      2.946502    3.021762
   bfmp1cons |   .2362758   .0159079      .2043077    .2682439
      bfmp1x |   3.34e+29   3.34e+29     -3.37e+29    1.01e+30
   bfmp2cons |   .2400324   .0118933       .216132    .2639328
      bfmp2x |   35.78528   10.79685      14.08819    57.48237
          pi |   .6642202   .0395608      .5847198    .7437206
      lrtest |   1.316403   .3014833      .7105498    1.922257
    lrreject |         .1   .0428571      .0138753    .1861247
     AICplow |        .96   .0279942      .9037436    1.016256
     BICplow |        .96   .0279942      .9037436    1.016256
--------------------------------------------------------------

. 
. summarize lrtest, detail

                           lrtest
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -.0001312      -.0001312
 5%     -.000065      -.0000928
10%    -.0000478       -.000065       Obs                  50
25%    -6.98e-06      -.0000494       Sum of Wgt.          50

50%     .1231615                      Mean           1.316403
                        Largest       Std. Dev.      2.131809
75%     2.224793       4.802664
90%     3.642071       4.986424       Variance       4.544609
95%     4.986424       7.072522       Skewness       2.389082
99%     10.78937       10.78937       Kurtosis       9.741848

. centile lrtest, centile(10 20 30 40 50 60 70 80 90 95)

                                                       -- Binom. Interp. --
    Variable |     Obs  Percentile      Centile        [95% Conf. Interval]
-------------+-------------------------------------------------------------
      lrtest |      50         10     -.0000483       -.0001046   -.0000245
             |                 20      -.000023       -.0000481    .0000196
             |                 30     -3.60e-07       -.0000271    .0041848
             |                 40      .0019106       -2.27e-06    .3670091
             |                 50      .1231615        .0005006    .9623025
             |                 60      .7864019        .0350388    1.973008
             |                 70      1.668329        .4769537    2.615625
             |                 80      2.569803        1.574888     3.76231
             |                 90       3.81399        2.574371     8.21305
             |                 95      5.925168        3.401095    10.78937*

* Lower (upper) confidence limit held at minimum (maximum) of sample

. display "chisquare 1 degree at 80% " invchi2(1,.80) " and 90% " invchi2(1,.90) " and 95% " invchi2(1,.95)
chisquare 1 degree at 80% 1.6423744 and 90% 2.7055435 and 95% 3.8414588

. display "chisquare 1 degree at 80% " invchi2(2,.80) " and 90% " invchi2(2,.90) " and 95% " invchi2(2,.95)
chisquare 1 degree at 80% 3.2188758 and 90% 4.6051702 and 95% 5.9914645

. 
. kdensity lrtest

. quietly sum lrtest

. generate evalpoint = r(max)*_n/_N

. generate chi1dens = (1/sqrt(2))*exp(-lngamma(1/2))*(evalpoint^(-1/2))*exp(-evalpoint/2) 

. generate chi2dens = (1/2)*1*1*exp(-evalpoint)

. graph twoway kdensity lrtest || line chi1dens evalpoint ||  line chi2dens evalpoint

. 
. ********** CLOSE OUTPUT
. 
. * log close
. * clear
. * exit
. 
. 
end of do-file

. exit, clear