-------------------------------------------------------------------------------------------------------------------------------
      name:  <unnamed>
       log:  c:\acdbookrevision\stata_final_programs_2013\racd05.txt
  log type:  text
 opened on:  15 Jan 2013, 16:36:44

. 
. ********** OVERVIEW OF racd05.do **********
. 
. * STATA Program 
. * copyright C 2013 by A. Colin Cameron and Pravin K. Trivedi 
. * used for "Regression Analyis of Count Data" SECOND EDITION
. * by A. Colin Cameron and Pravin K. Trivedi (2013)
. * Cambridge University Press
. 
. * Chapter 5
. *  5.2.5 BASICS
. *  5.3.4 GOODNESS-OF-FIT
. 
. * To run you need file
. *   racd05data.dta
. * in your directory
. * and Stata user-written command
. *   countfit
. 
. ********** SETUP **********
. 
. set more off

. version 12

. clear all

. set linesize 82

. set scheme s1mono  // Graphics scheme

. 
. ************
. 
. * This STATA program does analysis of takeover bids studied in chapter 5
. *  5.2.5 RESIDUALS
. *  5.3.4 R-SQUARED and GOODNESS-OF-FIT
. *  5.4.2 TESTS OF NONNESTED MODELS 
. 
. 
. ********** DATA DESCRIPTION
. 
. * The original data are from Sanjiv Jaggia and Satish Thosar, 1993,
. * "Multiple Bids as a Consequence of Target Management Resistance"
. * Review of Quantitative Finance and Accounting, 447-457.
. * The data are also used in 
. * A.C. Cameron and Per Johansson (1997), 
. * "Count Data Regression Models using Series Expansions: with Applications", 
. * Journal of Applied Econometrics, May, Vol. 12, pp.203-223.
. 
. * For more details see these datasets and racd05makedata.dta
. 
. *************** 5.2.5 TAKEOVER BIDS: DESCRIPTIVE STATISTICS 
. 
. use racd05data.dta, clear

. 
. summarize

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       DOCNO |       126    82174.41    2251.783      78001      85059
       WEEKS |       126    11.44898    7.711424      2.857     41.429
     NUMBIDS |       126    1.738095    1.432081          0         10
    TAKEOVER |       126           1           0          1          1
     BIDPREM |       126    1.346806     .189325    .942675   2.066366
-------------+--------------------------------------------------------
    INSTHOLD |       126    .2518175    .1856136          0       .904
        SIZE |       126    1.219031    3.096624    .017722     22.169
    LEGLREST |       126    .4285714    .4968472          0          1
    REALREST |       126    .1825397    .3878308          0          1
     FINREST |       126    .1031746    .3054011          0          1
-------------+--------------------------------------------------------
    REGULATN |       126    .2698413    .4456492          0          1
    WHTKNGHT |       126    .5952381    .4928054          0          1
      SIZESQ |       126    10.99902    59.91479    .000314   491.4646
    CONSTANT |       126           1           0          1          1

. describe 

Contains data from racd05data.dta
  obs:           126                          
 vars:            14                          7 Jun 2011 10:36
 size:         7,056                          
----------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
----------------------------------------------------------------------------------
DOCNO           float  %9.0g                  Document Number
WEEKS           float  %9.0g                  Weeks
NUMBIDS         float  %9.0g                  Number of takeover bids
TAKEOVER        float  %9.0g                  Equals 1 if taken over
BIDPREM         float  %9.0g                  Bid price divided by price 14
                                                working days before bid
INSTHOLD        float  %9.0g                  Percentage of stock held by
                                                institutions
SIZE            float  %9.0g                  Total book valiue of assets in
                                                billions of dollars
LEGLREST        float  %9.0g                  Equals 1 if legal defense by lawsuit
REALREST        float  %9.0g                  Equals 1 if proposed changes in
                                                asset structure
FINREST         float  %9.0g                  Equals 1 i proposed changes in
                                                ownership structure
REGULATN        float  %9.0g                  Equals 1 if intervention by federal
                                                regulators
WHTKNGHT        float  %9.0g                  Equals 1 if management invitation
                                                for friendly third-party bid
SIZESQ          float  %9.0g                  SIZE Squared
CONSTANT        float  %9.0g                  
----------------------------------------------------------------------------------
Sorted by:  

. 
. global XLIST LEGLREST REALREST FINREST WHTKNGHT BIDPREM INSTHOLD SIZE SIZESQ REG
> ULATN

. 
. *** TABLE 5.1: ACTUAL FREQUENCY DISTRIBUTION
. 
. tabulate NUMBIDS

  Number of |
   takeover |
       bids |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |          9        7.14        7.14
          1 |         63       50.00       57.14
          2 |         31       24.60       81.75
          3 |         12        9.52       91.27
          4 |          6        4.76       96.03
          5 |          1        0.79       96.83
          6 |          2        1.59       98.41
          7 |          1        0.79       99.21
         10 |          1        0.79      100.00
------------+-----------------------------------
      Total |        126      100.00

. 
. *** TABLE 5.2: VARIABLE DEFINITIONS AND SUMMARY STATISTCS
. 
. describe NUMBIDS $XLIST

              storage  display     value
variable name   type   format      label      variable label
----------------------------------------------------------------------------------
NUMBIDS         float  %9.0g                  Number of takeover bids
LEGLREST        float  %9.0g                  Equals 1 if legal defense by lawsuit
REALREST        float  %9.0g                  Equals 1 if proposed changes in
                                                asset structure
FINREST         float  %9.0g                  Equals 1 i proposed changes in
                                                ownership structure
WHTKNGHT        float  %9.0g                  Equals 1 if management invitation
                                                for friendly third-party bid
BIDPREM         float  %9.0g                  Bid price divided by price 14
                                                working days before bid
INSTHOLD        float  %9.0g                  Percentage of stock held by
                                                institutions
SIZE            float  %9.0g                  Total book valiue of assets in
                                                billions of dollars
SIZESQ          float  %9.0g                  SIZE Squared
REGULATN        float  %9.0g                  Equals 1 if intervention by federal
                                                regulators

. summarize NUMBIDS $XLIST

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
     NUMBIDS |       126    1.738095    1.432081          0         10
    LEGLREST |       126    .4285714    .4968472          0          1
    REALREST |       126    .1825397    .3878308          0          1
     FINREST |       126    .1031746    .3054011          0          1
    WHTKNGHT |       126    .5952381    .4928054          0          1
-------------+--------------------------------------------------------
     BIDPREM |       126    1.346806     .189325    .942675   2.066366
    INSTHOLD |       126    .2518175    .1856136          0       .904
        SIZE |       126    1.219031    3.096624    .017722     22.169
      SIZESQ |       126    10.99902    59.91479    .000314   491.4646
    REGULATN |       126    .2698413    .4456492          0          1

. 
. *************** 5.2.5 (Continued) RESIDUALS
. 
. *** TABLE 5.3: POISSON QMLE Estimates, Standard Errors, T-statistics
. 
. poisson NUMBIDS $XLIST, vce(robust)

Iteration 0:   log pseudolikelihood =  -184.9518  
Iteration 1:   log pseudolikelihood = -184.94833  
Iteration 2:   log pseudolikelihood = -184.94833  

Poisson regression                                Number of obs   =        126
                                                  Wald chi2(9)    =      34.98
                                                  Prob > chi2     =     0.0001
Log pseudolikelihood = -184.94833                 Pseudo R2       =     0.0825

------------------------------------------------------------------------------
             |               Robust
     NUMBIDS |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    LEGLREST |   .2601464   .1250534     2.08   0.037     .0150463    .5052465
    REALREST |  -.1956597   .1816167    -1.08   0.281    -.5516219    .1603025
     FINREST |   .0740301    .263571     0.28   0.779    -.4425597    .5906198
    WHTKNGHT |   .4813822   .1064947     4.52   0.000     .2726563     .690108
     BIDPREM |  -.6776958   .2974241    -2.28   0.023    -1.260636   -.0947553
    INSTHOLD |  -.3619912   .3231799    -1.12   0.263    -.9954122    .2714297
        SIZE |   .1785026   .0623544     2.86   0.004     .0562902    .3007149
      SIZESQ |  -.0075693   .0027788    -2.72   0.006    -.0130157    -.002123
    REGULATN |  -.0294392   .1420508    -0.21   0.836    -.3078537    .2489753
       _cons |   .9860598   .4137383     2.38   0.017     .1751477    1.796972
------------------------------------------------------------------------------

. 
. *** OVERDISPERSION TESTS presented in text - here underdispersion
. * Estimate from Pearson statistic divided by (n-k)
. quietly glm NUMBIDS $XLIST, family(poisson)

. display "Var = phi*E[y] where phi = " e(dispers_ps)
Var = phi*E[y] where phi = .74639511

. * LM Overdispersion test - here underdispersion
. quietly poisson NUMBIDS $XLIST 

. predict mu, n

. generate ystar = ((NUMBIDS - mu)^2 - NUMBIDS) / mu

. * Test against NB2 variance
. regress ystar mu, noconstant

      Source |       SS       df       MS              Number of obs =     126
-------------+------------------------------           F(  1,   125) =    1.41
       Model |  2.06997594     1  2.06997594           Prob > F      =  0.2377
    Residual |  183.855052   125  1.47084041           R-squared     =  0.0111
-------------+------------------------------           Adj R-squared =  0.0032
       Total |  185.925028   126  1.47559546           Root MSE      =  1.2128

------------------------------------------------------------------------------
       ystar |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          mu |  -.0682968   .0575706    -1.19   0.238    -.1822362    .0456425
------------------------------------------------------------------------------

. * Test against NB1 variance
. regress ystar, vce(robust)

Linear regression                                      Number of obs =     126
                                                       F(  0,   125) =    0.00
                                                       Prob > F      =       .
                                                       R-squared     =  0.0000
                                                       Root MSE      =  1.1772

------------------------------------------------------------------------------
             |               Robust
       ystar |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |  -.3175595   .1048714    -3.03   0.003     -.525113    -.110006
------------------------------------------------------------------------------

. sum mu ystar

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          mu |       126    1.738095    .7106812   .7156423   4.427624
       ystar |       126   -.3175595    1.177179  -1.284358   5.945698

. drop mu ystar

. 
. *** CONSTRUCT RESIDUALS after command glm
. * NOTE: Stata glm uses different terminology from the book
. * Stata standardized multiplies residual by (1-h_ii)^(-1/2)
. * We call this studentized (our star)
. * Stata studentized multiplies residual by one over the 
. * estimated square root of the estimated scale parameter
. * NOTE: Deviance residual differs from that in First Edition (error in first)
. quietly glm NUMBIDS $XLIST, family(poisson)

. predict mu, mu

. generate raw = NUMBIDS - mu

. predict pear, pearson

. predict pearstar, pearson standardized

. predict dev, deviance

. predict devstar, deviance standardized

. generate devadj = dev + 1/(6*sqrt(mu))

. predict anscombe, anscombe

. predict hat, hat

. * Extras for completeness
. predict pearstud, pearson studentized

. predict pearstan, pearson standardized

. predict devstud, deviance studentized

. predict devstan, deviance standardized

. 
. *** TABLE 5.4: DESCRIPTIVE STATISTICS FOR VARIOUS REDSIDUALS
. 
. summarize raw pear pearstar dev devstar devadj anscombe

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         raw |       126   -4.73e-09     1.22863  -3.225367   5.572376
        pear |       126    .0015625    .8322573  -1.606458   3.026831
    pearstar |       126   -.0028455    .8857773   -1.87161   3.111568
         dev |       126   -.0898632    .8371262  -2.271875   2.397711
     devstar |       126   -.0991351    .8906361  -2.365741   2.666335
-------------+--------------------------------------------------------
      devadj |       126    .0436852     .838656  -2.168127   2.519731
    anscombe |       126   -.0968019    .8535656  -2.409687   2.415112

. tabstat raw pear pearstar dev devstar devadj anscombe, ///
>   statistics(mean sd skew kurt min p10 p90 max) col(stat) format(%9.2f)

    variable |      mean        sd  skewness  kurtosis       min       p10
-------------+------------------------------------------------------------
         raw |     -0.00      1.23      1.38      7.48     -3.23     -1.29
        pear |      0.00      0.83      1.12      4.99     -1.61     -0.96
    pearstar |     -0.00      0.89      1.12      5.19     -1.87     -0.99
         dev |     -0.09      0.84      0.29      3.88     -2.27     -1.11
     devstar |     -0.10      0.89      0.30      4.03     -2.37     -1.27
      devadj |      0.04      0.84      0.25      3.82     -2.17     -1.03
    anscombe |     -0.10      0.85      0.21      3.93     -2.41     -1.11
--------------------------------------------------------------------------

    variable |       p90       max
-------------+--------------------
         raw |      1.33      5.57
        pear |      1.01      3.03
    pearstar |      1.05      3.11
         dev |      0.93      2.40
     devstar |      0.94      2.67
      devadj |      1.03      2.52
    anscombe |      0.93      2.42
----------------------------------

. 
. *** TABLE 5.5: CORRELATIONS OF VARIOUS REDSIDUALS
. 
. correlate raw pear pearstar dev devstar devadj anscombe
(obs=126)

             |      raw     pear pearstar      dev  devstar   devadj anscombe
-------------+---------------------------------------------------------------
         raw |   1.0000
        pear |   0.9759   1.0000
    pearstar |   0.9830   0.9976   1.0000
         dev |   0.9564   0.9839   0.9813   1.0000
     devstar |   0.9637   0.9822   0.9843   0.9975   1.0000
      devadj |   0.9549   0.9830   0.9804   0.9996   0.9973   1.0000
    anscombe |   0.9512   0.9801   0.9772   0.9997   0.9969   0.9992   1.0000


. 
. *** RESIDUAL PLOTS (several)
. 
. * Anscombe residual plotted against y
. label variable anscombe "Anscombe residual"

. graph twoway scatter anscombe NUMBIDS, msize(medium) xlabel(#6) saving(racd05gra
> ph1, replace)
(file racd05graph1.gph saved)

. * graph twoway (scatter anscombe NUMBIDS, msize(medium)) ///
> *  (lowess anscombe NUMBIDS, lwidth(medthick)), xlabel(#6) saving(racd05graph1, 
> replace)
. 
. * Anscombe residual plotted against fitted mean
. label variable mu "Predicted bids"

. graph twoway scatter anscombe mu, msize(medium) xlabel(#6) saving(racd05graph2, 
> replace)
(file racd05graph2.gph saved)

. * graph twoway (scatter anscombe mu, msize(medium)) ///
> *   (lowess anscombe mu, lwidth(medthick)), xlabel(#6) saving(racd05graph2, repl
> ace)
. 
. * Ordered anscombe residual plotted against standard normal ordinates 
. * NOTE: Axes reversed from the First Edition
. qnorm anscombe, msize(medium) xlabel(#6) saving(racd05graph3, replace)
(file racd05graph3.gph saved)

. 
. * Diagonal entries in Hat matrix for each observation plotted against observatio
> n number
. generate obsno = _n

. label variable obsno "Observation number"

. label variable hat "Diagonal entry in H"

. graph twoway scatter hat obsno, msize(medium) xlabel(#6) saving(racd05graph4, re
> place)
(file racd05graph4.gph saved)

. 
. *** FIGURE 5.1: RESIDUAL PLOTS
. 
. graph combine racd05graph1.gph racd05graph2.gph racd05graph3.gph racd05graph4.gp
> h, ///
>    iscale(0.7) ysize(5) xsize(6) rows(2)

. graph export racd05fig1.eps, replace
(file racd05fig1.eps written in EPS format)

. graph export racd05fig1.wmf, replace
(file c:\acdbookrevision\stata_final_programs_2013\racd05fig1.wmf written in Windo
> ws Metafile format)

. 
. * Identify and drop the observations with largest HAT matrix diagonal term
. poisson NUMBIDS $XLIST, vce(robust)

Iteration 0:   log pseudolikelihood =  -184.9518  
Iteration 1:   log pseudolikelihood = -184.94833  
Iteration 2:   log pseudolikelihood = -184.94833  

Poisson regression                                Number of obs   =        126
                                                  Wald chi2(9)    =      34.98
                                                  Prob > chi2     =     0.0001
Log pseudolikelihood = -184.94833                 Pseudo R2       =     0.0825

------------------------------------------------------------------------------
             |               Robust
     NUMBIDS |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    LEGLREST |   .2601464   .1250534     2.08   0.037     .0150463    .5052465
    REALREST |  -.1956597   .1816167    -1.08   0.281    -.5516219    .1603025
     FINREST |   .0740301    .263571     0.28   0.779    -.4425597    .5906198
    WHTKNGHT |   .4813822   .1064947     4.52   0.000     .2726563     .690108
     BIDPREM |  -.6776958   .2974241    -2.28   0.023    -1.260636   -.0947553
    INSTHOLD |  -.3619912   .3231799    -1.12   0.263    -.9954122    .2714297
        SIZE |   .1785026   .0623544     2.86   0.004     .0562902    .3007149
      SIZESQ |  -.0075693   .0027788    -2.72   0.006    -.0130157    -.002123
    REGULATN |  -.0294392   .1420508    -0.21   0.836    -.3078537    .2489753
       _cons |   .9860598   .4137383     2.38   0.017     .1751477    1.796972
------------------------------------------------------------------------------

. estimates store PFULL

. scalar kreg = e(k)

. scalar Nobs = e(N)

. list obsno hat if hat > 3*kreg/Nobs

     +------------------+
     | obsno        hat |
     |------------------|
 36. |    36   .2756452 |
 80. |    80   .3174862 |
 83. |    83   .6960669 |
 85. |    85   .3207826 |
102. |   102   .2830565 |
     |------------------|
126. |   126   .2971494 |
     +------------------+

. poisson NUMBIDS $XLIST if hat < 3*kreg/Nobs, vce(robust) 

Iteration 0:   log pseudolikelihood = -170.38986  
Iteration 1:   log pseudolikelihood = -170.38984  

Poisson regression                                Number of obs   =        120
                                                  Wald chi2(9)    =      48.01
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood = -170.38984                 Pseudo R2       =     0.0698

------------------------------------------------------------------------------
             |               Robust
     NUMBIDS |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    LEGLREST |   .2588834   .1232742     2.10   0.036     .0172705    .5004964
    REALREST |  -.3575586   .1975979    -1.81   0.070    -.7448433     .029726
     FINREST |   .2322629   .2469026     0.94   0.347    -.2516573    .7161831
    WHTKNGHT |   .4961819   .1059455     4.68   0.000     .2885325    .7038313
     BIDPREM |  -.9555442   .2964309    -3.22   0.001    -1.536538   -.3745504
    INSTHOLD |  -.2576956   .3202307    -0.80   0.421    -.8853361     .369945
        SIZE |   .0887005    .140571     0.63   0.528    -.1868135    .3642146
      SIZESQ |   .0059204   .0263884     0.22   0.822    -.0457998    .0576407
    REGULATN |  -.0430669   .1382526    -0.31   0.755     -.314037    .2279033
       _cons |   1.381025    .399858     3.45   0.001     .5973183    2.164733
------------------------------------------------------------------------------

. estimates store PNOOUTLIERS

. estimates table PFULL PNOOUTLIERS, b(%9.3f) se stats(ll)

--------------------------------------
    Variable |   PFULL     PNOOUTL~S  
-------------+------------------------
    LEGLREST |     0.260       0.259  
             |     0.125       0.123  
    REALREST |    -0.196      -0.358  
             |     0.182       0.198  
     FINREST |     0.074       0.232  
             |     0.264       0.247  
    WHTKNGHT |     0.481       0.496  
             |     0.106       0.106  
     BIDPREM |    -0.678      -0.956  
             |     0.297       0.296  
    INSTHOLD |    -0.362      -0.258  
             |     0.323       0.320  
        SIZE |     0.179       0.089  
             |     0.062       0.141  
      SIZESQ |    -0.008       0.006  
             |     0.003       0.026  
    REGULATN |    -0.029      -0.043  
             |     0.142       0.138  
       _cons |     0.986       1.381  
             |     0.414       0.400  
-------------+------------------------
          ll |  -184.948    -170.390  
--------------------------------------
                          legend: b/se

. 
. *************** 5.3.4 R-SQUARED and CHISQUARE GOODNESS-OF-FIT
. 
. *** Deviance, Pearson and R-squared measures presented in text
. * Fitted model
. glm NUMBIDS $XLIST, family(poisson) vce(robust)

Iteration 0:   log pseudolikelihood = -185.75208  
Iteration 1:   log pseudolikelihood = -184.95135  
Iteration 2:   log pseudolikelihood = -184.94833  
Iteration 3:   log pseudolikelihood = -184.94833  

Generalized linear models                          No. of obs      =       126
Optimization     : ML                              Residual df     =       116
                                                   Scale parameter =         1
Deviance         =  88.61503283                    (1/df) Deviance =  .7639227
Pearson          =  86.58183302                    (1/df) Pearson  =  .7463951

Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]

                                                   AIC             =  3.094418
Log pseudolikelihood = -184.9483263                BIC             = -472.3937

------------------------------------------------------------------------------
             |               Robust
     NUMBIDS |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    LEGLREST |   .2601464   .1250534     2.08   0.037     .0150463    .5052465
    REALREST |  -.1956597   .1816167    -1.08   0.281    -.5516219    .1603025
     FINREST |   .0740301    .263571     0.28   0.779    -.4425597    .5906198
    WHTKNGHT |   .4813822   .1064947     4.52   0.000     .2726563     .690108
     BIDPREM |  -.6776958   .2974241    -2.28   0.023    -1.260636   -.0947553
    INSTHOLD |  -.3619912   .3231799    -1.12   0.263    -.9954122    .2714297
        SIZE |   .1785026   .0623544     2.86   0.004     .0562902    .3007149
      SIZESQ |  -.0075693   .0027788    -2.72   0.006    -.0130157    -.002123
    REGULATN |  -.0294392   .1420508    -0.21   0.836    -.3078537    .2489753
       _cons |   .9860598   .4137383     2.38   0.017     .1751477    1.796972
------------------------------------------------------------------------------

. display "Deviance Statistic = " e(deviance)
Deviance Statistic = 88.615033

. display "Pearson Statistic  = " e(deviance_p)
Pearson Statistic  = 86.581833

. scalar Devfitted = e(deviance)

. scalar Pearsfitted = e(deviance_p)

. * Intercept-only model
. quietly glm NUMBIDS, family(poisson) vce(robust)

. display "Deviance Statistic = " e(deviance)
Deviance Statistic = 121.86157

. display "Pearson Statistic  = " e(deviance_p)
Pearson Statistic  = 147.49315

. scalar Devintercept = e(deviance)

. scalar Pearsintercept = e(deviance_p)

. * Calculate R-squared Deviance and Pearson
. scalar R2_Dev = 1 - Devfitted/Devintercept

. scalar R2_Pears = 1 - Pearsfitted/Pearsintercept

. display "Deviance R-squared = " R2_Dev "   Fitted = " Devfitted "   Intercept = 
> " Devintercept 
Deviance R-squared = .27282218   Fitted = 88.615033   Intercept = 121.86157

. display "Pearson R-squared  = " R2_Pears "   Fitted = " Pearsfitted "   Intercep
> t = " Pearsintercept
Pearson R-squared  = .41297726   Fitted = 86.581833   Intercept = 147.49315

. * Squared correlation coefficient
. capture drop mu

. quietly poisson NUMBIDS $XLIST, vce(robust)

. predict mu, n

. quietly correlate NUMBIDS mu

. display "Squared correlation coefficient = " r(rho)^2
Squared correlation coefficient = .26426804

. * Compare to OLS
. quietly regress NUMBIDS $XLIST

. display "OLS R-squared = " e(r2)
OLS R-squared = .23730025

. 
. *** Predicted Probabilities and begin Chi-square Goodness-of-fit test
. 
. ** In January 2013 there was a forthcoming Stata hournal article and 
. ** user-written addon to implement chisquare goodness of fit test.
. 
. * This program written for categories j = 0, 1, 2, ..., $REST or more
. global Y NUMBIDS

. global MAXCOUNT = 4        // Form cells y = 0, 1, 2, ... , maxcount

. global REST = 5            // The remaining category y >= $REST

. * Create indicators for y = 0, 1, 2, ...., maxcount and y >= $REST
. forvalues i = 0/$MAXCOUNT {
  2.    generate Dummy`i' = $Y==`i'
  3.    }

. generate Dummy$REST = $Y > $MAXCOUNT

. * Create corresponding predicted probabilites of y = 0, 1, 2, ...
. quietly poisson $Y $XLIST

. forvalues i = 0/$MAXCOUNT {
  2.    predict Predicted`i', pr(`i')
  3.    }

. predict Predicted$REST, pr($REST,.)

. * The preceding required Stata 12. Could instead use user-written addon countfit
. * or use recursion for Poisson probabilities as follows ..
. /*
> quietly poisson $Y $XLIST
> capture drop mu
> predict mu, n
> generate Predicted0 = exp(-mu)
> forvalues i = 1/$MAXCOUNT {
>    local j = `i' - 1
>    generate Predicted`i' = Predicted`j'*mu/`i'
>    }
> generate Predicted$REST = 1
> forvalues i = 0/$MAXCOUNT {
>    replace Predicted$REST = Predicted$REST - Predicted`i'
>    }
> */
. * Create differences between actual and predicted
. forvalues i = 0/$REST {
  2.    generate Difference`i' = Dummy`i' - Predicted`i'
  3.    }

. 
. *** TABLE 5.6: ACTUAL AND PREDICTED FREQUENCIES
. 
. summarize P* D*

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
  Predicted0 |       126    .2132476    .1130779   .0119428    .488878
  Predicted1 |       126    .2976829    .0747807   .0528784   .3678793
  Predicted2 |       126     .232677    .0388698   .1170628   .2706697
  Predicted3 |       126    .1366833    .0563692   .0298633   .2236638
  Predicted4 |       126    .0680005    .0497715   .0053429   .1953667
-------------+--------------------------------------------------------
  Predicted5 |       126    .0517086    .0767211   .0008662   .4541059
       DOCNO |       126    82174.41    2251.783      78001      85059
      Dummy0 |       126    .0714286    .2585675          0          1
      Dummy1 |       126          .5     .501996          0          1
      Dummy2 |       126    .2460317    .4324166          0          1
-------------+--------------------------------------------------------
      Dummy3 |       126    .0952381    .2947154          0          1
      Dummy4 |       126     .047619     .213809          0          1
      Dummy5 |       126    .0396825    .1959916          0          1
 Difference0 |       126   -.1418191    .2665421   -.488878   .9242796
 Difference1 |       126    .2023171    .4843285  -.3677845   .9382253
-------------+--------------------------------------------------------
 Difference2 |       126    .0133547    .4267367  -.2706697   .8536685
 Difference3 |       126   -.0414452    .2909759  -.2236638   .9470747
 Difference4 |       126   -.0203815    .2075399  -.1953667   .9305699
 Difference5 |       126   -.0120261     .181479  -.4151017    .958654

. 
. *** Continue Chi-square Goodness-of-fit test
. 
. * Obtain the scores to be used later
. generate score = $Y - mu

. foreach var of varlist $XLIST {
  2.    generate scorefor`var' = score*`var'
  3.    local i = `i' + 1
  4.    }

. * Run the auxiliary regression
. generate ones = 1

. quietly regress ones Difference* score scorefor*, noconstant

. scalar CGOF = e(N)*e(r2)

. di "Chi-square GOF Test: " CGOF     "  p-value: " chi2tail($MAXCOUNT,CGOF)
Chi-square GOF Test: 48.659953  p-value: 6.875e-10

. 
. * Compare to Stata user-written command countfit
. countfit NUMBIDS $XLIST, maxcount(10) prm nograph noestimates nofit
Comparison of Mean Observed and Predicted Count

            Maximum       At      Mean
Model     Difference    Value    |Diff|
---------------------------------------------
PRM         0.202         1      0.042

PRM: Predicted and actual probabilities

Count   Actual    Predicted    |Diff|   Pearson
------------------------------------------------
0        0.071       0.213      0.142    11.884
1        0.500       0.298      0.202    17.325
2        0.246       0.233      0.013     0.097
3        0.095       0.137      0.041     1.583
4        0.048       0.068      0.020     0.770
5        0.008       0.031      0.023     2.106
6        0.016       0.013      0.003     0.090
7        0.008       0.005      0.003     0.187
8        0.000       0.002      0.002     0.253
9        0.000       0.001      0.001     0.095
10       0.008       0.000      0.008    27.275
------------------------------------------------
Sum      1.000       1.000      0.458    61.666

. 
. * Aside: Stata command estat gof is a quite different test
. * of whether deviance statistic is stat. different from chisquare(n-k)
. quietly glm NUMBIDS $XLIST, family(poisson) vce(robust)

. display chi2tail((e(N)-e(k)),e(deviance))
.97244872

. quietly poisson NUMBIDS $XLIST, vce(robust)

. estat gof

         Deviance goodness-of-fit =  88.61504
         Prob > chi2(116)         =    0.9724

         Pearson goodness-of-fit  =  86.58183
         Prob > chi2(116)         =    0.9812

. 
. * Classification table (Confusion matrrix)
. * Find the mode probability for each observation (i.e. k than maximizes Pr[y = k
> ] 
. generate mode = 0

. forvalues i = 1/5 {
  2.    local j = `i' - 1
  3.    quietly replace mode = `i'   if Predicted`i'> Predicted`j'
  4.    }

. * Compare the actual count to the predicted mode 
. generate NUMBIDSgrouped = NUMBIDS

. replace NUMBIDSgrouped = $REST if NUMBIDS > $REST
(4 real changes made)

. tabulate NUMBIDSgrouped mode

NUMBIDSgro |                    mode
      uped |         0          1          2          5 |     Total
-----------+--------------------------------------------+----------
         0 |         2          5          2          0 |         9 
         1 |         9         43         10          1 |        63 
         2 |         0         22          6          3 |        31 
         3 |         1          5          4          2 |        12 
         4 |         0          1          3          2 |         6 
         5 |         0          1          2          2 |         5 
-----------+--------------------------------------------+----------
     Total |        12         77         27         10 |       126 


. tabulate mode

       mode |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         12        9.52        9.52
          1 |         77       61.11       70.63
          2 |         27       21.43       92.06
          5 |         10        7.94      100.00
------------+-----------------------------------
      Total |        126      100.00

. count if NUMBIDSgrouped == mode
   53

. 
. *************** 5.4.2 NON-NESTED MODELS: AIC, BIC and Vuong TEST
. 
. * Poisson
. poisson NUMBIDS $XLIST

Iteration 0:   log likelihood =  -184.9518  
Iteration 1:   log likelihood = -184.94833  
Iteration 2:   log likelihood = -184.94833  

Poisson regression                                Number of obs   =        126
                                                  LR chi2(9)      =      33.25
                                                  Prob > chi2     =     0.0001
Log likelihood = -184.94833                       Pseudo R2       =     0.0825

------------------------------------------------------------------------------
     NUMBIDS |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    LEGLREST |   .2601464   .1509594     1.72   0.085    -.0357286    .5560213
    REALREST |  -.1956597   .1926309    -1.02   0.310    -.5732093    .1818899
     FINREST |   .0740301   .2165219     0.34   0.732    -.3503452    .4984053
    WHTKNGHT |   .4813822   .1588698     3.03   0.002      .170003    .7927613
     BIDPREM |  -.6776958   .3767372    -1.80   0.072    -1.416087    .0606956
    INSTHOLD |  -.3619912   .4243292    -0.85   0.394    -1.193661    .4696788
        SIZE |   .1785026   .0600221     2.97   0.003     .0608614    .2961438
      SIZESQ |  -.0075693   .0031217    -2.42   0.015    -.0136878   -.0014509
    REGULATN |  -.0294392   .1605682    -0.18   0.855     -.344147    .2852686
       _cons |   .9860598   .5339201     1.85   0.065    -.0604044    2.032524
------------------------------------------------------------------------------

. estat ic

-----------------------------------------------------------------------------
       Model |    Obs    ll(null)   ll(model)     df          AIC         BIC
-------------+---------------------------------------------------------------
           . |    126   -201.5716   -184.9483     10     389.8967    418.2595
-----------------------------------------------------------------------------
               Note:  N=Obs used in calculating BIC; see [R] BIC note

. estimates store POISSON

. * Hurdle logit / Poisson
. hplogit NUMBIDS $XLIST

initial:       log likelihood = -254.30412
alternative:   log likelihood = -226.28023
rescale:       log likelihood = -226.28023
rescale eq:    log likelihood = -195.17087
Iteration 0:   log likelihood = -195.17087  
Iteration 1:   log likelihood = -187.21209  
Iteration 2:   log likelihood = -160.07158  
Iteration 3:   log likelihood = -159.50181  
Iteration 4:   log likelihood = -159.48681  
Iteration 5:   log likelihood = -159.47862  
Iteration 6:   log likelihood = -159.47747  
Iteration 7:   log likelihood = -159.47746  

Poisson-Logit Hurdle Regression                   Number of obs   =        126
                                                  Wald chi2(9)    =      11.83
Log likelihood = -159.47746                       Prob > chi2     =     0.2230

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
logit        |
    LEGLREST |   .9712886    .976998     0.99   0.320    -.9435923     2.88617
    REALREST |  -2.722899   .9997995    -2.72   0.006     -4.68247    -.763328
     FINREST |  -1.466672   1.174169    -1.25   0.212    -3.768001    .8346562
    WHTKNGHT |   1.192886   .8733562     1.37   0.172    -.5188602    2.904633
     BIDPREM |   .8245185   2.483786     0.33   0.740    -4.043612    5.692649
    INSTHOLD |  -1.838757   2.411417    -0.76   0.446    -6.565049    2.887534
        SIZE |   .3478178   1.019675     0.34   0.733    -1.650708    2.346344
      SIZESQ |   .0126446   .1849397     0.07   0.945    -.3498306    .3751198
    REGULATN |  -1.141261   .9822143    -1.16   0.245    -3.066366    .7838435
       _cons |    2.14836   3.472277     0.62   0.536    -4.657177    8.953898
-------------+----------------------------------------------------------------
poisson      |
    LEGLREST |   .4356921   .2145263     2.03   0.042     .0152282    .8561559
    REALREST |  -.0038302   .2473243    -0.02   0.988    -.4885769    .4809165
     FINREST |   .2651092    .273213     0.97   0.332    -.2703785    .8005969
    WHTKNGHT |   .8780368   .2760094     3.18   0.001     .3370683    1.419005
     BIDPREM |  -1.347424   .5342481    -2.52   0.012    -2.394531   -.3003171
    INSTHOLD |  -.6607018   .6081372    -1.09   0.277    -1.852629    .5312252
        SIZE |   .2381462   .0756031     3.15   0.002     .0899668    .3863256
      SIZESQ |  -.0102873   .0039627    -2.60   0.009    -.0180541   -.0025205
    REGULATN |  -.0571749   .2231117    -0.26   0.798    -.4944658    .3801159
       _cons |   1.136037   .7603113     1.49   0.135    -.3541459     2.62622
------------------------------------------------------------------------------
AIC Statistic =     2.690

. estat ic

-----------------------------------------------------------------------------
       Model |    Obs    ll(null)   ll(model)     df          AIC         BIC
-------------+---------------------------------------------------------------
           . |    126           .   -159.4775     20     358.9549    415.6806
-----------------------------------------------------------------------------
               Note:  N=Obs used in calculating BIC; see [R] BIC note

. estimates store PHURDLE

. * ZIP
. quietly zip NUMBIDS $XLIST, inflate($XLIST)

. estat ic

-----------------------------------------------------------------------------
       Model |    Obs    ll(null)   ll(model)     df          AIC         BIC
-------------+---------------------------------------------------------------
           . |    126   -194.5065   -179.9954     20     399.9908    456.7164
-----------------------------------------------------------------------------
               Note:  N=Obs used in calculating BIC; see [R] BIC note

. estimates store ZIP

. 
. *** VUONG TEST presented in text
. zip NUMBIDS $XLIST, inflate($XLIST) vuong

Fitting constant-only model:

Iteration 0:   log likelihood = -271.80732  
Iteration 1:   log likelihood = -203.50285  (not concave)
Iteration 2:   log likelihood = -203.04135  (not concave)
Iteration 3:   log likelihood = -201.03128  
Iteration 4:   log likelihood = -197.55236  
Iteration 5:   log likelihood = -195.80465  
Iteration 6:   log likelihood = -194.77842  
Iteration 7:   log likelihood = -194.57139  
Iteration 8:   log likelihood = -194.52119  
Iteration 9:   log likelihood = -194.50947  
Iteration 10:  log likelihood = -194.50697  
Iteration 11:  log likelihood = -194.50653  
Iteration 12:  log likelihood = -194.50648  
Iteration 13:  log likelihood = -194.50647  

Fitting full model:

Iteration 0:   log likelihood = -194.50647  
Iteration 1:   log likelihood = -181.46232  
Iteration 2:   log likelihood = -179.99738  
Iteration 3:   log likelihood =  -179.9954  
Iteration 4:   log likelihood =  -179.9954  

Zero-inflated Poisson regression                  Number of obs   =        126
                                                  Nonzero obs     =        117
                                                  Zero obs        =          9

Inflation model = logit                           LR chi2(9)      =      29.02
Log likelihood  = -179.9954                       Prob > chi2     =     0.0006

------------------------------------------------------------------------------
     NUMBIDS |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
NUMBIDS      |
    LEGLREST |   .2177038    .152802     1.42   0.154    -.0817825    .5171902
    REALREST |  -.0315728   .1993239    -0.16   0.874    -.4222405    .3590949
     FINREST |   .1535876    .220583     0.70   0.486    -.2787472    .5859224
    WHTKNGHT |   .3814168   .1611181     2.37   0.018     .0656311    .6972024
     BIDPREM |  -.6668731   .3744586    -1.78   0.075    -1.400798    .0670522
    INSTHOLD |  -.3662654   .4213391    -0.87   0.385    -1.192075     .459544
        SIZE |   .1645811   .0606754     2.71   0.007     .0456594    .2835027
      SIZESQ |  -.0070204   .0031485    -2.23   0.026    -.0131913   -.0008494
    REGULATN |   .0367102   .1630259     0.23   0.822    -.2828146     .356235
       _cons |   1.042256   .5298909     1.97   0.049     .0036886    2.080823
-------------+----------------------------------------------------------------
inflate      |
    LEGLREST |  -67.86428   42421.59    -0.00   0.999    -83212.65    83076.92
    REALREST |   122.6682   76025.39     0.00   0.999    -148884.4    149129.7
     FINREST |   37.73304   111246.2     0.00   1.000    -218000.8    218076.3
    WHTKNGHT |  -38.80397   146692.6    -0.00   1.000    -287551.1    287473.5
     BIDPREM |  -49.96548     384647    -0.00   1.000    -753944.2    753844.3
    INSTHOLD |   116.0081   186299.9     0.00   1.000    -365025.2    365257.2
        SIZE |  -7.262702   66890.29    -0.00   1.000    -131109.8    131095.3
      SIZESQ |   .3918664   3302.474     0.00   1.000    -6472.339    6473.122
    REGULATN |   76.77467   56540.79     0.00   0.999    -110741.1    110894.7
       _cons |  -76.91042   531108.8    -0.00   1.000     -1041031     1040877
------------------------------------------------------------------------------
Vuong test of zip vs. standard Poisson:            z =     2.05  Pr>z = 0.0200

. 
. *** TABLE 5.7: AIC and BIC
. 
. * Does not list coefficients of all the regressors 
. estimates table POISSON PHURDLE ZIP, b(%9.1f) keep(LEGLREST) ///
>    stats(N k ll aic bic) equations(1)

--------------------------------------------------
    Variable |  POISSON     PHURDLE       ZIP     
-------------+------------------------------------
    LEGLREST |       0.3         1.0         0.2  
-------------+------------------------------------
           N |       126         126         126  
           k |      10.0        20.0        20.0  
          ll |    -184.9      -159.5      -180.0  
         aic |     389.9       359.0       400.0  
         bic |     418.3       415.7       456.7  
--------------------------------------------------

. 
. ********** CLOSE OUTPUT
. 
. * log close
. 
end of do-file

. exit, clear