* racd06p2.do  January 2013 for Stata version 12

capture log close
log using racd06p2.txt, text replace

********** OVERVIEW OF racd06p2.do **********

* STATA Program 
* copyright C 2013 by A. Colin Cameron and Pravin K. Trivedi 
* used for "Regression Analyis of Count Data" SECOND EDITION
* by A. Colin Cameron and Pravin K. Trivedi (2013)
* Cambridge University Press

* To run you need file
*   racd06data2rectrips.dta
* and user-written Stata addon
*   hnblgit
* in your directory

********** SETUP **********

set more off
version 12
clear all
* set linesize 82
set scheme s1mono  /* Graphics scheme */

************

* This STATA program analyzes doctor visits data for chapter 6.3
*   6.4 RECREATIONAL TRIPS
 
********** DATA DESCRIPTION

* A detailed discussion of the variables can be found in 
* C. Sellar, J.R. Stoll and J.P. Chavas (1985), 
* "Validation of Empirical Measures of Welfare Change: A Comparison of nonmarket 
* Techniques," Land Economics, 61, 156-175.  
* Data used with permission of Sellar et al. (1985)
* And also T. Ozuna and I. Gomaz (1995) 
* "Specification and Testing of Count Data Recreation Demand Functions," 
* Empirical Economics, 20, 543-550.

* See these articles for more detailed discussion 
* Also see racd06makedata2rectrips.dta.do for further details

********** RESULTS FOR ONE MODEL HERE DIFFER FROM THE BOOK

* This Stata program reanalyzes the data given in the published paper by
* Gurmu and Trivedi (1996). Their results used quite different code written 
* in a program other than Stata.

* Virtually all the results are reproduced here, except chisquare goodness-of-fit
* tests and predicted probabilities are obtained only for some of the models.

* Also the results obtained here for the finite mixture Poisson two-component
* model differ from the Gurmu and Trivedi (1996) estimates.
* Their paper found a higher log-likelihood for this model than we find here.
* So the book reports the original Gurmu and Trivedi (1996) estimates

* The results obtained below for Table 6.12 are 
*   Finite mixture 2 component Poisson regression
*     Variable      Low Users              High users
* -------------+----------------------------------------------------------------
*        _cons | -1.766     6.19         2.479     6.19
*           SO |   .655    14.97          .086     0.63
*          SKI |   .438     2.45          .631     3.43
*            I |  -.010     0.20          .003     0.02 
*          FC3 |  1.543     8.04         -.687     1.90
*           C1 |  -.044     2.40          .074     3.36
*           C3 |  -.030     2.85         -.073     5.66
*           C4 |   .060     5.03         -.014     0.83
*           pi |   .909                   .092 
*         -lnL |                   939 
*          BIC |                  1987
*   
*     Variable      Low Users              High users
* -------------+----------------------------------------------------------------
*        _cons | -1.243    -5.09         4.707     6.46
*           SO |   .616    16.84         -.053    -0.63
*          SKI |   .476     2.67          .363     1.64
*            I |  -.073    -1.60         -.374    -3.24 
*          FC3 |  1.316     7.02         -.849    -1.54
*           C1 |  -.002    -0.14          .005     0.30
*           C3 |  -.058    -7.53         -.012    -1.04
*           C4 |   .054     5.22         -.005    -0.58 
*           pi |   .920                   .080 
*          lnL |                   956 
*          BIC |                  1947


********** 6.4.1 RECREATIONAL TRIPS DATA: READ AND SUMMARIZE 

use racd06data2rectrips.dta, clear

********* Tables 6.9 and 6.10  Data Description

*** TABLE 6.9: FREQUENCIES

tabulate TRIPS

*** TABLE 6.10: VARIABLE DESCRIPTIONS AND SUMMARY STATISTICS

describe
summarize
correlate

********* 6.4.2 INITIAL SPECIFICATIONS: POISSON and NB2 MODELS (Table 6.11)

* Global for the regressors
global XLIST SO SKI I FC3 C1 C3 C4 

* Poisson
poisson TRIPS $XLIST
estimates store POISSdef
poisson TRIPS $XLIST, vce(robust) 
estimates store POISSON

* Diagnostics
quietly glm TRIPS SO SKI I C1 C3 C4 FC3, vce(robust) family(poisson)
display "Pearson statistic = " e(dispers_p)
display "Deviance statistic = " e(dispers)

* Now get various Rsquareds
predict yhat, mu
quietly correlate TRIPS yhat
display "Squared correlation of TRIPS and predicted mean = " r(rho)^2
scalar deviance = e(deviance)
scalar pearson = e(deviance_p)
* Need to compare to intercept only model
quietly glm TRIPS, vce(robust) family(poisson)
display "Pearson in fitted model = " pearson "  and in intercept model = " e(deviance_p)
display "Pearson R-squared = " pearson/e(deviance_p)
display "Deviance in fitted model = " deviance "  and in intercept model = " e(deviance)
display "Deviance R-squared = " deviance/e(deviance)
drop yhat
scalar drop pearson deviance

* Overdispersion tests
quietly poisson TRIPS $XLIST
predict muhat, n
generate ystar = ((TRIPS-muhat)^2-TRIPS)/muhat  
regress ystar muhat, noconstant vce(robust)     // NB2 form
regress ystar, vce(robust)   // NB1 form
drop muhat ystar

* Predicted probabilities
quietly poisson TRIPS $XLIST, vce(robust)
foreach y of numlist 0/5 8 11 14 17 62 {
   predict pp`y', pr(`y')
   predict cump`y', pr(0,`y')
   }
replace pp8 = cump8 - cump5
replace pp11 = cump11 - cump8
replace pp14 = cump14 - cump11
replace pp17 = cump17 - cump14
replace pp62 = cump62 - cump17
* sum pp*
* sum cump*

* Chisquare goodness of fit test
* The cells are 0, 1, 2, ...., $MAXCOUNT or more
global MAXCOUNT 5
generate ycensored = TRIPS
replace ycensored = $MAXCOUNT + 1 if TRIPS >= $MAXCOUNT + 1
generate one = 1
quietly poisson TRIPS $XLIST, vce(robust)
capture drop pyhat dy* pf* mf* py* pres* pscore*
predict pyhat, n
generate pres = TRIPS - pyhat
foreach var in $XLIST {
  generate pscore`var' = pres*`var'
  }
* The cells are 0, 1, 2, ...., $MAXCOUNT or more
generate pfitsum = 0
forvalues i = 0/$MAXCOUNT {
   generate dy`i' = ycensored == `i' 
   predict pfit`i', pr(`i')
   generate mfit`i' = dy`i' - pfit`i'
   quietly replace pfitsum = pfitsum + pfit`i'
   }
local i = $MAXCOUNT+1
generate dy`i' = ycensored == `i' 
generate pfit`i' = 1 - pfitsum
generate mfit`i' = dy`i' - pfit`i'
drop pfitsum
* Generate Pearson
scalar Pearson = 0
scalar range = $MAXCOUNT+1
global MAXPLUSONE = range
forvalues i = 0/$MAXPLUSONE {
   quietly sum mfit`i'
   scalar diffsquared = r(mean)^2
   quietly sum pfit`i'
   * display "count" `i' "  " r(N)*diffsquared/r(mean)
   scalar Pearson = Pearson + r(N)*diffsquared/r(mean) 
   }
display Pearson
* countfit TRIPS $XLIST, prm nograph maxcount(6)   
* Generate Andrews chisquare goodness of fittest
scalar range = $MAXCOUNT+1
global MAXPLUSONE = range
* NR^2 from the uncentered regression has Chisq distribution
local i = $MAXCOUNT+1
drop mfit`i'
regress one mfit* pres pscore*, noconstant
scalar Andrews = e(N)*e(r2)
display "GoF Test N R^2 = " e(N)*e(r2) " with p-value = " chi2tail($MAXCOUNT,e(N)*e(r2))

* Negbin2
nbreg TRIPS $XLIST, dispersion(mean)
estimates store NB2def
nbreg TRIPS $XLIST, vce(robust) dispersion(mean)
estimates store NB2
foreach y of numlist 0/5 8 11 14 17 62 {
   predict pnb`y', pr(`y')
   predict cumnb`y', pr(0,`y')
   }
replace pnb8 = cumnb8 - cumnb5
replace pnb11 = cumnb11 - cumnb8
replace pnb14 = cumnb14 - cumnb11
replace pnb17 = cumnb17 - cumnb14
replace pnb62 = cumnb62 - cumnb17

* Negbin2 check: fitted mean is unusually large
predict munb2
summarize munb2, detail

* Negbin1
nbreg TRIPS $XLIST, vce(robust) dispersion(constant) 
estat ic
estimates store NB1
predict munb1
summarize munb1, detail

* Negbin2 with quadratic and interactions in costs
generate C1sq = C1*C1
generate C3sq = C3*C3
generate C4sq = C4*C4
generate C1C3 = C1*C3
generate C1C4 = C1*C4
generate C3C4 = C3*C4
nbreg TRIPS SO I SKI FC3 C*, vce(robust)
estat ic
predict munb2interact
summarize munb2interact, detail

*** TABLE 6.11: POISSON and NB2 ESTIMATES

estimates table POISSdef POISSON NB2def NB2, b(%9.3f) t(%10.2f) ///
   stats(ll aic bic N k) equations(1)

*** TABLE 6.14 (Part 1): PREDICTED PROBABILITIES FROM POISSON AND NB2

tabulate TRIPS 
sum pp*
sum cump*
sum pnb*
sum cumnb*

********* 6.4.3 MODIFIED COUNT MODELS (Tables 6.12 and 6.13)

*** FINITE MIXTURES MODELS (Table 6.12) 

** Note: finite Mixtrues here does not reproduce earlier results of Gurmu and Trivedi
* They have lnL = -916.63
* Here lnL = -938.66 if use difficult option and
* and  lnL = -956.84 if do not.
* Gurmu and Trivedi found lower lnL so report that.

* Finite mixtures Poisson - 2 components unconstrained
* Without difficult option
fmm TRIPS $XLIST, components(2) mixtureof(poisson) vce(robust)

* Finite mixtures Poisson - 2 components unconstrained
* With difficult option
fmm TRIPS $XLIST, components(2) mixtureof(poisson) vce(robust) difficult
estimates store FMP2
predict fmmpmu1, eq(component1)
predict fmmpmu2, eq(component2)
summarize fmmpmu1 fmmpmu2, detail

* Finite mixtures NB2 - 2 components unconstrained
fmm TRIPS $XLIST, components(2) mixtureof(negbin2) vce(robust)
estimates store FMNB2
predict fmmnb2mu1, eq(component1)
predict fmmnb2mu2, eq(component2)
summarize fmmnb2mu1 fmmnb2mu2, detail

* Finite mixtures NB1 - 2 components unconstrained
fmm TRIPS $XLIST, components(2) mixtureof(negbin1) vce(robust)
estimates store FMNB1
predict fmmnb1mu1, eq(component1)
predict fmmnb1mu2, eq(component2)
summarize fmmnb1mu1 fmmnb1mu2, detail

* TO DO - get predicted probabilities

* Finite mixtures NB2 with quadratic and interactions in costs
fmm TRIPS SO I SKI FC3 C*, components(2) mixtureof(negbin2) vce(robust) iter(120)
estimates store FMNB2interact

*** TABLE 6.12: FINITE MIXTURES MODELS

estimates table FMP2 FMNB2 FMNB1, b(%10.3f) t(%10.2f) eq(1) stats(ll aic bic N k)

*** HURDLE AND ZERO-INFLATED MODELS (Table 6.13)

generate DTRIPS = TRIPS > 0

* Problem: TRIPS are necessarily > 0 if FC3 = 1
tabulate FC3 DTRIPS

global XLIST SO SKI I FC3 C1 C3 C4 
global XLISTSHORT SO SKI I C1 C3 C4 

* Hurdle first component: NB2 - recode a to exp(a)
* Has problems after about 20 iterations

program lfNB2binary
  version 10.1
  args lnf theta1 a               
  tempvar mu p expa
  local y "$ML_y1"                 
  generate double `mu'      = exp(`theta1')
  generate double `expa'    = exp(`a')
  generate double `p'       = 1 - (1/(1+`expa'*`mu'))^(1/`expa') 
  quietly replace `lnf'     = `y'*ln(`p') + ln(1-`p') - `y'*ln(1-`p')
end

* This includes SKI which is not identified
ml model lf lfNB2binary (DTRIPS = $XLIST) (), vce(robust)
ml maximize, iter(20)

* This drops SKI which is not identified
ml model lf lfNB2binary (DTRIPS = $XLISTSHORT) (), vce(robust)
ml maximize, iter(12)
estimates store H1NB2
scalar llH1NB2 = e(ll)
scalar kH1NB2 = e(k)

* Get the predicted probability of a zero 
* This code only works for this example
matrix bhurd = e(b)
scalar bSO = bhurd[1,1]
scalar bSKI = bhurd[1,2]
scalar bI = bhurd[1,3]
scalar bC1 = bhurd[1,4]
scalar bC3 = bhurd[1,5]
scalar bC4 = bhurd[1,6]
scalar bcons = bhurd[1,7]
* Recall that reparameterized as exp(alpha) not alpha
scalar alpha1 = ln(bhurd[1,e(k)])
di "alpha for first part of hurdle NB2 = " alpha1
generate xbhurd = bSO*SO + bSKI*SKI + bI*I + bC1*C1 + bC3*C3 + bC4*C4 + bcons
generate f10 = (1 + alpha1*exp(xbhurd))^(-1/alpha1)

* Check by compare to logit predicted probability
quietly logit DTRIPS $XLISTSHORT
predict plogit
generate f10logit = 1 - plogit
correlate f10 f10logit
* scatter f10 f10logit

* Hurdle second component: NB2
ztnb TRIPS $XLIST if TRIPS>0, dispersion(mean) vce(robust)
estimates store H2NB2
scalar llH2NB2 = e(ll)
scalar kH2NB2 = e(k)

* Get the predicted probability of a zero in the second part of model
predict mu2, n
scalar alpha2 = e(alpha)
generate f20 = (1 + alpha2*mu2)^(-1/alpha2)

* Combine to get fitted mean
generate muhurdle = mu2*(1-f10)/(1-f20)
* or using binary logit
generate muhurdlelogit = mu2*(1-f10logit)/(1-f20)

* Combine to get fitted probabilities
* Not done. Need to code predictions from ztnb as predict after ztnb
* does not have option pr( ). THen multiply these by (1-f10)/(1-f20)

* ZIP with only an interept for the zeros
zip TRIPS $XLIST, inflate($XLIST) vce(robust)
estimates store ZIPint

* ZIP
zip TRIPS $XLIST, inflate($XLIST) vce(robust)
estimates store ZIP

* ZINB with only an interept for the zeros
zinb TRIPS $XLIST, inflate(_cons) vce(robust)
estimates store ZINBint

* ZINB
zinb TRIPS $XLIST, inflate($XLIST) vce(robust)
estat ic
estimates store ZINB
quietly zinb TRIPS $XLIST, inflate($XLIST) vuong
display "Vuong statistic: " e(vuong)
foreach y of numlist 0/5 8 11 14 17 62 {
   predict pzinb`y', pr(`y')
   predict cumzinb`y', pr(0,`y')
   }
replace pzinb8 = cumzinb8 - cumzinb5
replace pzinb11 = cumzinb11 - cumzinb8
replace pzinb14 = cumzinb14 - cumzinb11
replace pzinb17 = cumzinb17 - cumzinb14
replace pzinb62 = cumzinb62 - cumzinb17
predict muZINB
summarize muZINB, detail

*** TABLE 6.13: HURDLE and ZINB 

estimates table H1NB2 H2NB2 ZINB , b(%10.3f) t(%10.2f) eq(1) stats(ll aic bic N k)

summarize muhurdle muhurdlelogit muZINB

*** TABLE 6.14 (Part 2): PREDICTED PROBABILITIES FROM ZERO-INFLATED NB2

* This program does not compute predicted probabilities for hurdle NB2
sum pzinb*
sum cumzinb*

********** CLOSE OUTPUT

* log close
* clear
* exit
