------------------------------------------------------------------------------------------------------------------------------- name: log: c:\acdbookrevision\webpage_finalize\racd06p4.txt log type: text opened on: 6 Jun 2013, 15:06:35 . . ********** OVERVIEW OF racd06p4.do ********** . . * STATA Program . * copyright C 2013 by A. Colin Cameron and Pravin K. Trivedi . * used for "Regression Analyis of Count Data" SECOND EDITION . * by A. Colin Cameron and Pravin K. Trivedi (2013) . * Cambridge University Press . . * Chapter 6.6 only . * 6.6 MODEL SELECTION CRITERIA: A DIGRESSION . . * To run you need no dataset as simulated data . * and user-written Stata addon . * fmm . * in your directory . . ********** SETUP ********** . . set more off . version 12 . clear all . set mem 10m set memory ignored. Memory no longer needs to be set in modern Statas; memory adjustments are performed on the fly automatically. . * set linesize 82 . set scheme s1mono /* Graphics scheme */ . . ********** DATA DESCRIPTION . . * Generated data . . ********** RESULTS HERE DIFFER FROM THE BOOK . . * The book presents results summarized in section 4 of Deb and Trivedi (1997). . * The more detailed results given in Table 6.18 (2nd ed.) and Table 6.15 (1st ed.) . * were not given in the Deb and Trivedi article. . * Furthermore we do not have the code that produced these tables. . . * Instead, here we simply generate data from similar dgp's. . * And we do a small amount of the analysis in Table 6.18 (part of column 1) . . ********** 6.6 MODEL SELECTION CRITERIA: A DIGRESSION . . * Three data generating processes . * (1) Poisson mu = exp(-1.445 + 3.0*x) . * (2) Poisson Hurdle Zeros mu = exp(-1.6 + 3.0*x) . * Positives mu = exp(-1.35 + 3.0*x) . * (3) FM Poisson Component1 mu = exp(-1.225 + 3.0*x) . * Component1 mu = exp(-1.5 + 0.75x*x) . . . **** DATA GENERATING PROCESSES . . clear . set obs 100000 obs was 0, now 100000 . set seed 10101 . generate x = runiform() . . * Poisson . * generate yP = rpoisson(exp(-1.6 + 3.0*x)) . generate yP = rpoisson(exp(-1.445 + 3.0*x)) . . * Hurdle Poisson - not so easy to draw from . generate yH = 0 . generate muH2 = exp(-1.35 + 3.0*x) . scalar miny = 0 . * The following draws from truncated at zero Poisson . while miny == 0 { 2. generate yph = rpoisson(muH2) 3. quietly replace yH = yph if yH==0 4. drop yph 5. quietly summarize yH 6. scalar miny = r(min) 7. } . * Now bring in the hurdle . replace yH = 0 if runiform() < (1 - (1-exp(-exp(-1.6 + 3.0*x)))) (40426 real changes made) . . * Finite mixture . generate yFM = rpoisson(exp(-1.225 + 3.0*x)) . replace yFM = rpoisson(exp(-1.5 + .75*x)) if runiform() > .75 (17544 real changes made) . generate yP0 = yP==0 . generate yH0 = yH==0 . generate yFM0 = yFM==0 . . * Calculate the fraction of positives . generate dyP = yP > 0 . generate dyH = yH > 0 . generate dyFM = yFM > 0 . . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x | 100000 .4988239 .288796 7.72e-06 .999995 yP | 100000 1.49485 1.726645 0 14 yH | 100000 1.5215 1.860052 0 14 muH2 | 100000 1.644359 1.335116 .2592463 5.206902 yFM | 100000 1.47886 1.903552 0 15 -------------+-------------------------------------------------------- yP0 | 100000 .36454 .4813033 0 1 yH0 | 100000 .40426 .4907507 0 1 yFM0 | 100000 .41024 .4918796 0 1 dyP | 100000 .63546 .4813033 0 1 dyH | 100000 .59574 .4907507 0 1 -------------+-------------------------------------------------------- dyFM | 100000 .58976 .4918796 0 1 . . *** Check the dgps . . * Poisson: dgp was rpoisson(exp(-1.445 + 3.0*x)) . poisson yP x Iteration 0: log likelihood = -135086.53 Iteration 1: log likelihood = -135084.56 Iteration 2: log likelihood = -135084.56 Poisson regression Number of obs = 100000 LR chi2(1) = 92154.15 Prob > chi2 = 0.0000 Log likelihood = -135084.56 Pseudo R2 = 0.2543 ------------------------------------------------------------------------------ yP | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 3.004097 .0109232 275.02 0.000 2.982688 3.025506 _cons | -1.448268 .0082638 -175.25 0.000 -1.464465 -1.432071 ------------------------------------------------------------------------------ . . * Finite mixture: dgp was pi = .75 mixture of . * rpoisson(exp(-1.225 + 3.0*x)) & rpoisson(exp(-1.5 + .75*x)) . fmm yFM x, components(2) mixtureof(poisson) Fitting Poisson model: Iteration 0: log likelihood = -148969.14 Iteration 1: log likelihood = -148967.97 Iteration 2: log likelihood = -148967.97 Fitting 2 component Poisson model: Iteration 0: log likelihood = -148961.8 (not concave) Iteration 1: log likelihood = -148781.18 (not concave) Iteration 2: log likelihood = -146838.28 (not concave) Iteration 3: log likelihood = -145740.73 Iteration 4: log likelihood = -144883.39 (not concave) Iteration 5: log likelihood = -143885.92 (not concave) Iteration 6: log likelihood = -143697.95 (not concave) Iteration 7: log likelihood = -143592.13 (not concave) Iteration 8: log likelihood = -143515.87 Iteration 9: log likelihood = -143437.18 Iteration 10: log likelihood = -143420.88 Iteration 11: log likelihood = -143418.04 Iteration 12: log likelihood = -143418.04 2 component Poisson regression Number of obs = 100000 Wald chi2(2) = 50277.10 Log likelihood = -143418.04 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ yFM | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- component1 | x | .8005967 .0984805 8.13 0.000 .6075784 .9936149 _cons | -1.51907 .0759676 -20.00 0.000 -1.667963 -1.370176 -------------+---------------------------------------------------------------- component2 | x | 3.002581 .0159919 187.76 0.000 2.971237 3.033925 _cons | -1.233385 .0128465 -96.01 0.000 -1.258564 -1.208206 -------------+---------------------------------------------------------------- /imlogitpi1 | -1.120153 .0214457 -52.23 0.000 -1.162186 -1.07812 ------------------------------------------------------------------------------ pi1 | .2459829 .0039776 .2382703 .2538619 pi2 | .7540171 .0039776 .7461381 .7617297 ------------------------------------------------------------------------------ . scalar AICFM = -2*e(ll) + 2*e(k) . scalar BICFM = -2*e(ll) + ln(e(N))*e(k) . display "BIC Finite Mixture = " BICFM BIC Finite Mixture = 286893.64 . . * Hurdle: . * Hurdle first component: Poisson for the zeroes . program lfpoissonbinary 1. version 10.1 2. args lnf theta1 3. tempvar mu p0 4. local y $ML_y1 5. generate double `mu' = exp(`theta1') 6. generate double `p0' = exp(-`mu') 7. quietly replace `lnf' = ln(`p0') if $ML_y1 == 0 8. quietly replace `lnf' = ln(1-`p0') if $ML_y1 == 1 9. end . ml model lf lfpoissonbinary (dyH = x) . ml maximize, nolog Number of obs = 100000 Wald chi2(1) = 26046.66 Log likelihood = -51369.712 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ dyH | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 2.997719 .0185744 161.39 0.000 2.961313 3.034124 _cons | -1.596699 .0114249 -139.76 0.000 -1.619091 -1.574307 ------------------------------------------------------------------------------ . estimates store H1p . scalar llH1p = e(ll) . scalar kH1p = e(k) . scalar nH = e(N) . * Hurdle second component: Poisson for the positives . ztp yH x if yH>0 Iteration 0: log likelihood = -88620.795 Iteration 1: log likelihood = -84951.234 Iteration 2: log likelihood = -84950.333 Iteration 3: log likelihood = -84950.333 Zero-truncated Poisson regression Number of obs = 59574 LR chi2(1) = 43580.53 Prob > chi2 = 0.0000 Log likelihood = -84950.333 Pseudo R2 = 0.2041 ------------------------------------------------------------------------------ yH | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 3.012353 .0171157 176.00 0.000 2.978807 3.045899 _cons | -1.36163 .0140886 -96.65 0.000 -1.389244 -1.334017 ------------------------------------------------------------------------------ . estimates store H2p . scalar llH2p = e(ll) . scalar kH2p = e(k) . * Combine two parts . scalar llHp = llH1p + llH2p . scalar kHp = kH1p + kH2p . scalar AICHp = -2*llHp + 2*kHp . scalar BICHp = -2*llHp + ln(nH)*kHp . . * Display hurdle results . estimates table H1p H2p, eq(1) b se ---------------------------------------- Variable | H1p H2p -------------+-------------------------- x | 2.9977185 3.0123527 | .0185744 .01711565 _cons | -1.596699 -1.3616305 | .01142485 .01408858 ---------------------------------------- legend: b/se . display "BIC Hurdle Poisson = " BICHp BIC Hurdle Poisson = 272686.14 . . . **** SIMULATIONS . . * This does part of the first column of Table 6.18 . * The data generating process is Poisson . * Just Poisson and Poisson FM models are estimated . * The models are compared using LR test, AIC and BIC . . * To speed up program there are just 50 simulations . * numsims shoudl be increased . . clear . set seed 10101 . global numsims 50 . global numobs 500 . program simbypost 1. version 10.1 2. tempname simfile 3. postfile `simfile' bpcons bpx bfmp1cons bfmp1x bfmp2cons bfmp2x pi lrtest lrreject AICplow BICplow using simresults, r > eplace 4. quietly { 5. forvalues i = 1/$numsims { 6. display "simulation: " `i' 7. drop _all 8. set obs $numobs 9. generate x = runiform() 10. generate y = rpoisson(exp(-1.445 + 3.0*x)) 11. poisson y x 12. scalar bpx =_b[x] 13. scalar bpcons = _b[_cons] 14. scalar llp = e(ll) 15. scalar AICp = -2*e(ll) + 2*e(k) 16. scalar BICp = -2*e(ll) + ln(e(n))*e(k) 17. fmm y x, components(2) mixtureof(poisson) iter(20) 18. scalar llfmm = e(ll) 19. scalar AICFM = -2*e(ll) + 2*e(k) 20. scalar BICFM = -2*e(ll) + ln(e(n))*e(k) 21. scalar converge = e(converge) 22. matrix bfmp = e(b) 23. scalar bfmp1x = exp(bfmp[1,1]) 24. scalar bfmp1cons = exp(bfmp[1,2]) 25. scalar bfmp2x = exp(bfmp[1,3]) 26. scalar bfmp2cons = exp(bfmp[1,4]) 27. scalar pi = exp(bfmp[1,5])/(1+exp(bfmp[1,5])) 28. scalar lrtest = 2*(llfmm - llp) 29. scalar lrreject = lrtest > invchi2(1,.95) 30. scalar AICplow = AICp < AICFM 31. scalar BICplow = BICp < BICFM 32. post `simfile' (bpcons) (bpx) (bfmp1cons) (bfmp1x) (bfmp2cons) (bfmp2x) (pi) (lrtest) (lrreject) (AICplow) (AICpl > ow) 33. } 34. } 35. postclose `simfile' 36. end . simbypost . use simresults, clear . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- bpcons | 50 -1.43788 .0983307 -1.629938 -1.163989 bpx | 50 2.984132 .1324086 2.657044 3.233268 bfmp1cons | 50 .2362758 .1124856 3.79e-31 .8152104 bfmp1x | 50 3.34e+29 2.36e+30 1.208939 1.67e+31 bfmp2cons | 50 .2400324 .0840982 .0252217 .6401486 -------------+-------------------------------------------------------- bfmp2x | 50 35.78528 76.34528 8.66124 553.6423 pi | 50 .6642202 .279737 .0261622 1 lrtest | 50 1.316403 2.131809 -.0001312 10.78937 lrreject | 50 .1 .3030458 0 1 AICplow | 50 .96 .1979487 0 1 -------------+-------------------------------------------------------- BICplow | 50 .96 .1979487 0 1 . mean bpcons bpx bfmp1cons bfmp1x bfmp2cons bfmp2x pi lrtest lrreject AICplow BICplow, noheader -------------------------------------------------------------- | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ bpcons | -1.43788 .0139061 -1.465825 -1.409935 bpx | 2.984132 .0187254 2.946502 3.021762 bfmp1cons | .2362758 .0159079 .2043077 .2682439 bfmp1x | 3.34e+29 3.34e+29 -3.37e+29 1.01e+30 bfmp2cons | .2400324 .0118933 .216132 .2639328 bfmp2x | 35.78528 10.79685 14.08819 57.48237 pi | .6642202 .0395608 .5847198 .7437206 lrtest | 1.316403 .3014833 .7105498 1.922257 lrreject | .1 .0428571 .0138753 .1861247 AICplow | .96 .0279942 .9037436 1.016256 BICplow | .96 .0279942 .9037436 1.016256 -------------------------------------------------------------- . . summarize lrtest, detail lrtest ------------------------------------------------------------- Percentiles Smallest 1% -.0001312 -.0001312 5% -.000065 -.0000928 10% -.0000478 -.000065 Obs 50 25% -6.98e-06 -.0000494 Sum of Wgt. 50 50% .1231615 Mean 1.316403 Largest Std. Dev. 2.131809 75% 2.224793 4.802664 90% 3.642071 4.986424 Variance 4.544609 95% 4.986424 7.072522 Skewness 2.389082 99% 10.78937 10.78937 Kurtosis 9.741848 . centile lrtest, centile(10 20 30 40 50 60 70 80 90 95) -- Binom. Interp. -- Variable | Obs Percentile Centile [95% Conf. Interval] -------------+------------------------------------------------------------- lrtest | 50 10 -.0000483 -.0001046 -.0000245 | 20 -.000023 -.0000481 .0000196 | 30 -3.60e-07 -.0000271 .0041848 | 40 .0019106 -2.27e-06 .3670091 | 50 .1231615 .0005006 .9623025 | 60 .7864019 .0350388 1.973008 | 70 1.668329 .4769537 2.615625 | 80 2.569803 1.574888 3.76231 | 90 3.81399 2.574371 8.21305 | 95 5.925168 3.401095 10.78937* * Lower (upper) confidence limit held at minimum (maximum) of sample . display "chisquare 1 degree at 80% " invchi2(1,.80) " and 90% " invchi2(1,.90) " and 95% " invchi2(1,.95) chisquare 1 degree at 80% 1.6423744 and 90% 2.7055435 and 95% 3.8414588 . display "chisquare 1 degree at 80% " invchi2(2,.80) " and 90% " invchi2(2,.90) " and 95% " invchi2(2,.95) chisquare 1 degree at 80% 3.2188758 and 90% 4.6051702 and 95% 5.9914645 . . kdensity lrtest . quietly sum lrtest . generate evalpoint = r(max)*_n/_N . generate chi1dens = (1/sqrt(2))*exp(-lngamma(1/2))*(evalpoint^(-1/2))*exp(-evalpoint/2) . generate chi2dens = (1/2)*1*1*exp(-evalpoint) . graph twoway kdensity lrtest || line chi1dens evalpoint || line chi2dens evalpoint . . ********** CLOSE OUTPUT . . * log close . * clear . * exit . . end of do-file . exit, clear