------------------------------------------------------------------------------------------------------------------------------- name: log: c:\acdbookrevision\stata_final_programs_2013\racd04.txt log type: text opened on: 14 Jan 2013, 20:37:54 . . ********** OVERVIEW OF racd04.do ********** . . * STATA Program . * copyright C 2013 by A. Colin Cameron and Pravin K. Trivedi . * used for "Regression Analyis of Count Data" SECOND EDITION . * by A. Colin Cameron and Pravin K. Trivedi (2013) . * Cambridge University Press . . * This STATA program analyzes docor visits data for chapter 4 . * 4.2 TWO CROSSINGS THEOREM . * 4.8.1 MIXTURE OF 2 POISSON . * 4.8.8 EXAMPLE: PATENTS . . * To run you need file . * racd09data.dta . * and user-written Stata addon . * fmm . * in your directory . . ********** SETUP ********** . . set more off . version 12 . clear all . * set linesize 82 . set scheme s1mono /* Graphics scheme */ . . ********** DATA DESCRIPTION . . * The original data is from . * Bronwyn Hall, Zvi Griliches, and Jerry Hausman (1986), . * "Patents and R&D: Is There a Lag?", . * International Economic Review, 27, 265-283. . * See this article for more detailed discussion . * Also see racd09makedata.do for further details . * NOTE: Here we use just 1979 data . . ********** 4.2 TWO CROSSINGS THEOREM . . * Poisson with mean mu = 10 . * Negative binomial with mean mu = 10 and alpha = 0.2 . * so 1/alpha = 1/.2 = 5 and 1/(1+alpha*mu) = 1/(1+0.2*10) = 1/3 . * and alpha/(1+alpha*mu) = 0.2*10/(1+0.2*10) = 2/3 . . * ASIDE: The following gave unexpected wrong result . * Stata function nbinomialp(n,k,p) returns the probability of . * observing k or fewer failures before the nth success . * So p = a*mu/(1+a*mu) = 2/3 = 0.666 and n = 1/a = 1/.2 = 5 . * But generate ynegbin = nbinomialp(5,k,0.6666666) gave mean 2.5, var = 3.75 . . set obs 21 obs was 0, now 21 . generate k = _n - 1 . generate prob_poisson = exp(-10)*(10^k)/exp(lngamma(k+1)) . generate check_poisson = poissonp(10,k) . generate prob_nb = exp(lngamma(k+5)-lngamma(k+1)-lngamma(5))*((1/3)^5)*((2/3)^k) . generate badprob_nb = nbinomialp(5,k,0.6666666) . list, clean k prob_p~n check_~n prob_nb badpro~b 1. 0 .0000454 .0000454 .0041152 .1316872 2. 1 .000454 .000454 .0137174 .2194787 3. 2 .00227 .00227 .0274348 .2194787 4. 3 .0075667 .0075667 .0426764 .1707057 5. 4 .0189166 .0189166 .0569019 .1138038 6. 5 .0378333 .0378333 .0682823 .0682823 7. 6 .0630555 .0630555 .0758692 .0379346 8. 7 .0900792 .0900792 .079482 .0198705 9. 8 .112599 .112599 .079482 .0099353 10. 9 .12511 .12511 .0765382 .0047836 11. 10 .12511 .12511 .0714357 .0022324 12. 11 .1137364 .1137364 .0649415 .0010147 13. 12 .0947803 .0947803 .0577258 .000451 14. 13 .0729079 .0729079 .0503251 .0001966 15. 14 .0520771 .0520771 .0431358 .0000842 16. 15 .0347181 .0347181 .0364258 .0000356 17. 16 .0216988 .0216988 .0303548 .0000148 18. 17 .012764 .012764 .0249981 6.10e-06 19. 18 .0070911 .0070911 .0203688 2.49e-06 20. 19 .0037322 .0037322 .016438 1.00e-06 21. 20 .0018661 .0018661 .0131504 4.01e-07 . . graph twoway (line prob_nb k, connect(stairstep)) /// > (line prob_poisson k, connect(stairstep) lstyle(p3)), scale(1.2) /// > legend( ring(0) rows(2) pos(1) label(1 "NB {&mu} = 10 {&alpha} = .2") /// > label(2 "Poisson {&mu} = 10")) ytitle("Probability that Y = y") xtitle("y") . . *** FIGURE 4.1: TWO CROSSINGS THEOREM . . graph export racd04fig1.eps, replace (file racd04fig1.eps written in EPS format) . graph export racd04fig1.wmf, replace (file c:\acdbookrevision\stata_final_programs_2013\racd04fig1.wmf written in Windows Metafile format) . . ********** 4.8.1 MIXTURE OF 2 POISSON . . clear . set obs 100000 obs was 0, now 100000 . set seed 10101 . generate xpmix = rpoisson(0.2) . replace xpmix = rpoisson(6) if runiform() > 0.5 (49857 real changes made) . label variable xpmix "X is .5xP[0.2] + .5xP[6.0]" . tabulate xpmix X is | .5xP[0.2] + | .5xP[6.0] | Freq. Percent Cum. ------------+----------------------------------- 0 | 40,949 40.95 40.95 1 | 8,945 8.95 49.89 2 | 3,048 3.05 52.94 3 | 4,498 4.50 57.44 4 | 6,818 6.82 64.26 5 | 8,112 8.11 72.37 6 | 8,084 8.08 80.45 7 | 6,780 6.78 87.23 8 | 5,174 5.17 92.41 9 | 3,418 3.42 95.83 10 | 1,997 2.00 97.82 11 | 1,146 1.15 98.97 12 | 560 0.56 99.53 13 | 278 0.28 99.81 14 | 127 0.13 99.93 15 | 42 0.04 99.98 16 | 13 0.01 99.99 17 | 6 0.01 100.00 18 | 4 0.00 100.00 22 | 1 0.00 100.00 ------------+----------------------------------- Total | 100,000 100.00 . . * For appearance drop a few of the largest values . histogram xpmix if xpmix < 17, discrete scale(1.2) (start=0, width=1) . . *** FIGURE 4.2: 50/50 MIXTURE OF POISSONS . . graph export racd04fig2.eps, replace (file racd04fig2.eps written in EPS format) . graph export racd04fig2.wmf, replace (file c:\acdbookrevision\stata_final_programs_2013\racd04fig2.wmf written in Windows Metafile format) . . ********** 4.8.8 EXAMPLE: PATENTS . . use racd09data.dta, clear . . * Create log of total R&D over five years . generate LOGRandD = ln(exp(LOGR)+exp(LOGR1)+exp(LOGR2)+exp(LOGR3)+exp(LOGR4)+exp(LOGR5)) . . * Use only 1979 data . keep if YEAR==5 (1384 observations deleted) . . * Regressor list . global XLIST LOGRandD LOGK SCISECT . . * Variable descriptions and summary statistics . describe PAT $XLIST storage display value variable name type format label variable label ------------------------------------------------------------------------------------------------------------------------------- PAT float %9.0g Number of (successful) patents applied for this year LOGRandD float %9.0g LOGK float %9.0g Logarithm of the book value of capital in 1972 SCISECT float %9.0g Equals 1 if firm in the scientific sector . summarize PAT $XLIST Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- PAT | 346 32.10116 66.36197 0 515 LOGRandD | 346 3.071489 1.965798 -1.543733 8.723914 LOGK | 346 3.921063 2.095542 -1.76965 9.66626 SCISECT | 346 .4248555 .4950369 0 1 . . *** TABLE 4.2: FREQUENCIES OF PAT (with grouping) . . recode PAT (0=0) (1=1) (2/5=2) (6/20=6) (21/50=31) (51/100=51) /// > (101/200=101) (201/300=201) (301/600=301), gen(PATgrouped) (194 differences between PAT and PATgrouped) . tabulate PATgrouped RECODE of | PAT (Number | of | (successful | ) patents | applied for | this year) | Freq. Percent Cum. ------------+----------------------------------- 0 | 76 21.97 21.97 1 | 41 11.85 33.82 2 | 74 21.39 55.20 6 | 51 14.74 69.94 31 | 43 12.43 82.37 51 | 30 8.67 91.04 101 | 19 5.49 96.53 201 | 8 2.31 98.84 301 | 4 1.16 100.00 ------------+----------------------------------- Total | 346 100.00 . . * Poisson . poisson PAT $XLIST, vce(robust) Iteration 0: log pseudolikelihood = -3366.1483 Iteration 1: log pseudolikelihood = -3365.8303 Iteration 2: log pseudolikelihood = -3365.8303 Poisson regression Number of obs = 346 Wald chi2(3) = 641.60 Prob > chi2 = 0.0000 Log pseudolikelihood = -3365.8303 Pseudo R2 = 0.7632 ------------------------------------------------------------------------------ | Robust PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- LOGRandD | .4753631 .064298 7.39 0.000 .3493413 .601385 LOGK | .2708473 .0552599 4.90 0.000 .1625398 .3791547 SCISECT | .4672292 .1549081 3.02 0.003 .163615 .7708435 _cons | -.3402801 .1900477 -1.79 0.073 -.7127667 .0322066 ------------------------------------------------------------------------------ . estimates store POISS . . * NB models: NB1 and NB2 . nbreg PAT $XLIST, dispersion(constant) vce(robust) Fitting Poisson model: Iteration 0: log pseudolikelihood = -3366.1483 Iteration 1: log pseudolikelihood = -3365.8303 Iteration 2: log pseudolikelihood = -3365.8303 Fitting constant-only model: Iteration 0: log pseudolikelihood = -7036.5891 Iteration 1: log pseudolikelihood = -3372.0324 Iteration 2: log pseudolikelihood = -1554.1264 (backed up) Iteration 3: log pseudolikelihood = -1346.9519 Iteration 4: log pseudolikelihood = -1346.8822 Iteration 5: log pseudolikelihood = -1346.8822 Fitting full model: Iteration 0: log pseudolikelihood = -1346.8822 Iteration 1: log pseudolikelihood = -1224.7323 Iteration 2: log pseudolikelihood = -1180.7271 Iteration 3: log pseudolikelihood = -1148.109 Iteration 4: log pseudolikelihood = -1147.9317 Iteration 5: log pseudolikelihood = -1147.9316 Negative binomial regression Number of obs = 346 Dispersion = constant Wald chi2(3) = 714.08 Log pseudolikelihood = -1147.9316 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Robust PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- LOGRandD | .4895402 .0484591 10.10 0.000 .3945622 .5845183 LOGK | .1727126 .0434001 3.98 0.000 .08765 .2577752 SCISECT | .3958147 .1096853 3.61 0.000 .1808356 .6107939 _cons | .2449866 .1689932 1.45 0.147 -.086234 .5762072 -------------+---------------------------------------------------------------- /lndelta | 3.101643 .1525779 2.802596 3.40069 -------------+---------------------------------------------------------------- delta | 22.23446 3.392486 16.48739 29.98479 ------------------------------------------------------------------------------ . estimates store NB1 . nbreg PAT $XLIST, vce(robust) Fitting Poisson model: Iteration 0: log pseudolikelihood = -3366.1483 Iteration 1: log pseudolikelihood = -3365.8303 Iteration 2: log pseudolikelihood = -3365.8303 Fitting constant-only model: Iteration 0: log pseudolikelihood = -1551.5708 Iteration 1: log pseudolikelihood = -1347.2714 Iteration 2: log pseudolikelihood = -1346.8824 Iteration 3: log pseudolikelihood = -1346.8822 Iteration 4: log pseudolikelihood = -1346.8822 Fitting full model: Iteration 0: log pseudolikelihood = -1277.5529 (not concave) Iteration 1: log pseudolikelihood = -1196.6667 Iteration 2: log pseudolikelihood = -1137.5362 Iteration 3: log pseudolikelihood = -1127.8388 Iteration 4: log pseudolikelihood = -1127.5664 Iteration 5: log pseudolikelihood = -1127.5662 Iteration 6: log pseudolikelihood = -1127.5662 Negative binomial regression Number of obs = 346 Dispersion = mean Wald chi2(3) = 616.24 Log pseudolikelihood = -1127.5662 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Robust PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- LOGRandD | .8159469 .0827754 9.86 0.000 .6537101 .9781837 LOGK | .114636 .0687745 1.67 0.096 -.0201595 .2494315 SCISECT | -.0897712 .1381794 -0.65 0.516 -.3605979 .1810555 _cons | -.8106295 .180078 -4.50 0.000 -1.163576 -.4576832 -------------+---------------------------------------------------------------- /lnalpha | -.1224513 .1134868 -.3448814 .0999788 -------------+---------------------------------------------------------------- alpha | .884749 .1004074 .7083043 1.105148 ------------------------------------------------------------------------------ . estimates store NB2 . . * PIG model . * Not included though ideally would be included. . . * Finite mixture models: NB1 and NB2 . fmm PAT $XLIST, components(2) mixtureof(negbin1) vce(robust) Fitting Negative Binomial-1 model: Iteration 0: log likelihood = -3366.1483 Iteration 1: log likelihood = -3365.8303 Iteration 2: log likelihood = -3365.8303 Iteration 0: log likelihood = -7036.5891 Iteration 1: log likelihood = -3372.0324 Iteration 2: log likelihood = -1554.1264 (backed up) Iteration 3: log likelihood = -1346.9519 Iteration 4: log likelihood = -1346.8822 Iteration 5: log likelihood = -1346.8822 Iteration 0: log likelihood = -1346.8822 Iteration 1: log likelihood = -1224.7323 Iteration 2: log likelihood = -1180.7271 Iteration 3: log likelihood = -1148.109 Iteration 4: log likelihood = -1147.9317 Iteration 5: log likelihood = -1147.9316 Fitting 2 component Negative Binomial-1 model: Iteration 0: log pseudolikelihood = -1147.9298 (not concave) Iteration 1: log pseudolikelihood = -1146.4074 (not concave) Iteration 2: log pseudolikelihood = -1139.7786 (not concave) Iteration 3: log pseudolikelihood = -1137.6292 (not concave) Iteration 4: log pseudolikelihood = -1136.5663 Iteration 5: log pseudolikelihood = -1135.2088 Iteration 6: log pseudolikelihood = -1133.2384 Iteration 7: log pseudolikelihood = -1132.8083 Iteration 8: log pseudolikelihood = -1132.8081 Iteration 9: log pseudolikelihood = -1132.8081 2 component Negative Binomial-1 regression Number of obs = 346 Wald chi2(6) = 1038.98 Log pseudolikelihood = -1132.8081 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Robust PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- component1 | LOGRandD | .525363 .0642356 8.18 0.000 .3994636 .6512624 LOGK | .188418 .0692903 2.72 0.007 .0526116 .3242244 SCISECT | .6013736 .1455978 4.13 0.000 .3160073 .88674 _cons | -.4335583 .2665898 -1.63 0.104 -.9560647 .0889482 -------------+---------------------------------------------------------------- component2 | LOGRandD | .6806978 .0996255 6.83 0.000 .4854354 .8759602 LOGK | .1690618 .0905431 1.87 0.062 -.0083994 .3465231 SCISECT | -.2368876 .2142517 -1.11 0.269 -.6568131 .1830379 _cons | .1118132 .2601031 0.43 0.667 -.3979795 .6216059 -------------+---------------------------------------------------------------- /imlogitpi1 | .8732938 .5833803 1.50 0.134 -.2701105 2.016698 /lndelta1 | 2.504354 .2737051 9.15 0.000 1.967902 3.040806 /lndelta2 | 2.734771 .4449584 6.15 0.000 1.862668 3.606873 ------------------------------------------------------------------------------ delta1 | 12.23565 3.34896 7.155645 20.9221 delta2 | 15.40621 6.855124 6.4409 36.85065 pi1 | .7054306 .1212254 .43288 .8825392 pi2 | .2945694 .1212254 .1174608 .56712 ------------------------------------------------------------------------------ . estimates store FMMNB1 . . fmm PAT $XLIST, components(2) mixtureof(negbin2) vce(robust) Fitting Negative Binomial-2 model: Iteration 0: log likelihood = -3366.1483 Iteration 1: log likelihood = -3365.8303 Iteration 2: log likelihood = -3365.8303 Iteration 0: log likelihood = -1551.5708 Iteration 1: log likelihood = -1347.2714 Iteration 2: log likelihood = -1346.8824 Iteration 3: log likelihood = -1346.8822 Iteration 4: log likelihood = -1346.8822 Iteration 0: log likelihood = -1277.5529 (not concave) Iteration 1: log likelihood = -1196.6667 Iteration 2: log likelihood = -1137.5362 Iteration 3: log likelihood = -1127.8388 Iteration 4: log likelihood = -1127.5664 Iteration 5: log likelihood = -1127.5662 Iteration 6: log likelihood = -1127.5662 Fitting 2 component Negative Binomial-2 model: Iteration 0: log pseudolikelihood = -1127.5701 (not concave) Iteration 1: log pseudolikelihood = -1127.5046 (not concave) Iteration 2: log pseudolikelihood = -1118.0772 (not concave) Iteration 3: log pseudolikelihood = -1115.7068 Iteration 4: log pseudolikelihood = -1115.4488 (not concave) Iteration 5: log pseudolikelihood = -1109.4996 (not concave) Iteration 6: log pseudolikelihood = -1107.1628 Iteration 7: log pseudolikelihood = -1106.4365 Iteration 8: log pseudolikelihood = -1106.1363 Iteration 9: log pseudolikelihood = -1106.1329 Iteration 10: log pseudolikelihood = -1106.1329 2 component Negative Binomial-2 regression Number of obs = 346 Wald chi2(6) = 1496.35 Log pseudolikelihood = -1106.1329 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Robust PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- component1 | LOGRandD | .9181075 .0634763 14.46 0.000 .7936963 1.042519 LOGK | .0653257 .056885 1.15 0.251 -.0461668 .1768182 SCISECT | .1028518 .1416338 0.73 0.468 -.1747454 .380449 _cons | -1.273755 .1607671 -7.92 0.000 -1.588853 -.9586572 -------------+---------------------------------------------------------------- component2 | LOGRandD | .3519998 .0896255 3.93 0.000 .1763371 .5276625 LOGK | .2771243 .0617372 4.49 0.000 .1561216 .398127 SCISECT | -.5731597 .3459274 -1.66 0.098 -1.251165 .1048455 _cons | 1.516435 .1911198 7.93 0.000 1.141847 1.891023 -------------+---------------------------------------------------------------- /imlogitpi1 | 2.571207 .3377545 7.61 0.000 1.90922 3.233194 /lnalpha1 | -.4287745 .1222261 -3.51 0.000 -.6683332 -.1892157 /lnalpha2 | -2.77605 .7533299 -3.69 0.000 -4.252549 -1.29955 ------------------------------------------------------------------------------ alpha1 | .6513068 .0796067 .5125622 .827608 alpha2 | .0622841 .0469204 .0142279 .2726544 pi1 | .9289854 .0222822 .8709315 .9620645 pi2 | .0710146 .0222822 .0379355 .1290685 ------------------------------------------------------------------------------ . estimates store FMMNB2 . predict mu1, equation(component1) . predict mu2, equation(component2) . summarize mu* Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- mu1 | 346 39.611 132.7543 .0689383 1511.818 mu2 | 346 67.0761 117.1558 1.166299 1176.551 . . * estimates table (including NB1 and FMNB2) . estimates table POISS NB1 NB2 FMMNB1 FMMNB2, b(%9.3f) se stats(ll aic bic N k) equations(1) -------------------------------------------------------------------------- Variable | POISS NB1 NB2 FMMNB1 FMMNB2 -------------+------------------------------------------------------------ #1 | LOGRandD | 0.475 0.490 0.816 0.525 0.918 | 0.064 0.048 0.083 0.064 0.063 LOGK | 0.271 0.173 0.115 0.188 0.065 | 0.055 0.043 0.069 0.069 0.057 SCISECT | 0.467 0.396 -0.090 0.601 0.103 | 0.155 0.110 0.138 0.146 0.142 _cons | -0.340 0.245 -0.811 -0.434 -1.274 | 0.190 0.169 0.180 0.267 0.161 -------------+------------------------------------------------------------ lndelta | _cons | 3.102 | 0.153 -------------+------------------------------------------------------------ lnalpha | _cons | -0.122 | 0.113 -------------+------------------------------------------------------------ component2 | LOGRandD | 0.681 0.352 | 0.100 0.090 LOGK | 0.169 0.277 | 0.091 0.062 SCISECT | -0.237 -0.573 | 0.214 0.346 _cons | 0.112 1.516 | 0.260 0.191 -------------+------------------------------------------------------------ imlogitpi1 | _cons | 0.873 2.571 | 0.583 0.338 -------------+------------------------------------------------------------ lndelta1 | _cons | 2.504 | 0.274 -------------+------------------------------------------------------------ lndelta2 | _cons | 2.735 | 0.445 -------------+------------------------------------------------------------ lnalpha1 | _cons | -0.429 | 0.122 -------------+------------------------------------------------------------ lnalpha2 | _cons | -2.776 | 0.753 -------------+------------------------------------------------------------ Statistics | ll | -3365.830 -1147.932 -1127.566 -1132.808 -1106.133 aic | 6739.661 2305.863 2265.132 2287.616 2234.266 bic | 6755.046 2325.095 2284.365 2329.927 2276.577 N | 346 346 346 346 346 k | 4.000 5.000 5.000 11.000 11.000 -------------------------------------------------------------------------- legend: b/se . . *** TABLE 4.3: MODEL ESTIMATES . . estimates table POISS NB2 FMMNB2, b(%9.3f) se stats(ll aic bic N k) equations(1) -------------------------------------------------- Variable | POISS NB2 FMMNB2 -------------+------------------------------------ #1 | LOGRandD | 0.475 0.816 0.918 | 0.064 0.083 0.063 LOGK | 0.271 0.115 0.065 | 0.055 0.069 0.057 SCISECT | 0.467 -0.090 0.103 | 0.155 0.138 0.142 _cons | -0.340 -0.811 -1.274 | 0.190 0.180 0.161 -------------+------------------------------------ lnalpha | _cons | -0.122 | 0.113 -------------+------------------------------------ component2 | LOGRandD | 0.352 | 0.090 LOGK | 0.277 | 0.062 SCISECT | -0.573 | 0.346 _cons | 1.516 | 0.191 -------------+------------------------------------ imlogitpi1 | _cons | 2.571 | 0.338 -------------+------------------------------------ lnalpha1 | _cons | -0.429 | 0.122 -------------+------------------------------------ lnalpha2 | _cons | -2.776 | 0.753 -------------+------------------------------------ Statistics | ll | -3365.830 -1127.566 -1106.133 aic | 6739.661 2265.132 2234.266 bic | 6755.046 2284.365 2276.577 N | 346 346 346 k | 4.000 5.000 11.000 -------------------------------------------------- legend: b/se . . ********** CLOSE OUTPUT . . * log close . * clear . * exit . end of do-file . exit, clear