------------------------------------------------------------------------------------------------------
       log:  c:\Imbook\bwebpage\Section2\mma08p3diagnostics.txt
  log type:  text
 opened on:  17 May 2005, 14:10:13

. 
. ********** OVERVIEW OF MMA08P3DIAGNOSTICS.DO **********
. 
. * STATA Program 
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi 
. * used for "Microeconometrics: Methods and Applications" 
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press 
. 
. * Chapter 8.7.3 pages 290-1
. * Model diagnostics example  (Table 8.3)
. 
. * (A) DIFFERENT R-SQUAREDS
. * (B) CALCULATION OF RESIDUALS
. * for a Poisson model with simulated data (see below).
. 
. * The data generation requires free Stata add-on command rndpoix
. * In Stata: search rndpoix
. 
. * This program gives results for model 2
. * For model 1 need to rerun with only x3 as regressor
. 
. ********** SETUP **********
. 
. set more off

. version 8.0

. set scheme s1mono   /* Used for graphs */

.   
. ********** GENERATE DATA **********
. 
. * Model is
. *   y ~ Poisson[exp(b1 + b2*x2 + b3*x3]
. * where 
. *    x2 and x3 are iid ~ N[0,1] 
. * and b1=0.5 and b2=0.5 and b3=0.5.
. 
. * The Diagnostics below are from Poisson regression of y on x3 alone
. * or from Poisson regression of y on x3 and x3sq.   [Note" x2 is omitted]
. 
. set seed 10001

. set obs 100
obs was 0, now 100

. scalar b1 = 0.5

. scalar b2 = 0.5

. scalar b3 = 0.5

. 
. * Generate regressors
. gen x2 = invnorm(uniform())

. gen x3 = invnorm(uniform())

. 
. * Generate y
. gen mupoiss = exp(b1+b2*x2+b3*x3)

. * The next requires Stata add-on. In Stata: search rndpoix
. rndpoix(mupoiss)
( Generating ......... )
Variable xp created.

. gen y = xp

. sum

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          x2 |       100    .0053689    1.000686  -2.173506   2.106561
          x3 |       100   -.0235884    1.024207  -2.857666   2.149822
     mupoiss |       100    2.020511    1.400564   .3380426   7.029678
          xp |       100        1.92    1.835013          0          8
           y |       100        1.92    1.835013          0          8

. 
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x2 x3 using mma08p3diagnostics.asc, replace

. 
. ********* SETUP FOR THIS PROGRAM **********
. 
. * Change this if want different regressors
. gen x3sq = x3*x3

. * global XLIST x3       /* Model 1 */
. global XLIST x3 x3sq  /* Model 2 */

. 
. ********* R-SQUARED (reported in Table 8.3 p.291) **********
. 
. * The following code can be changed to diffferent models than poisson
. * For RsqRES, RsqEXP and RsqCOR need  
. *     y        dependent variable
. *     yhat     predicted value of dependent variable
. * For RsqWRSS additionally need
. *     sigmasq  predicted variance of dependent variable
. * For RsqRG need log density evaluated at values given below
. 
. * Obtain exp(x'b)   Will vary with the model
. poisson y $XLIST

Iteration 0:   log likelihood = -176.09611  
Iteration 1:   log likelihood = -176.09119  
Iteration 2:   log likelihood = -176.09119  

Poisson regression                                Number of obs   =        100
                                                  LR chi2(2)      =      30.96
                                                  Prob > chi2     =     0.0000
Log likelihood = -176.09119                       Pseudo R2       =     0.0808

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x3 |   .3588412     .07035     5.10   0.000     .2209578    .4967245
        x3sq |   .0912999   .0514311     1.78   0.076    -.0095032    .1921029
       _cons |    .492656   .0958903     5.14   0.000     .3047144    .6805975
------------------------------------------------------------------------------

. predict yhat       
(option n assumed; predicted number of events)

. scalar dof = e(N)-e(k)

. 
. * RsqRES and RsqEXP are R-squared from sums of squares
. * First get TSS, ESS and RSS
. egen ybar = mean(y)

. gen ylessybarsq = (y - ybar)^2

. quietly sum ylessybarsq

. scalar totalss = r(mean)

. gen yhatlessybarsq = (yhat - ybar)^2

. quietly sum yhatlessybarsq

. scalar explainedss = r(mean)

. gen residualsq = (y - yhat)^2

. quietly sum residualsq

. scalar residualss = r(mean)

. * Second computed the rsquared
. scalar sereg = sqrt(residualss/dof)

. scalar RsqRES = 1 - residualss/totalss

. scalar RsqEXP = explainedss/totalss

. 
. * RsqCOR uses sample correlation
. quietly correlate y yhat

. scalar RsqCOR = r(rho)^2

. 
. di "standard error of regression: " sereg
standard error of regression: .16620308

. di "totalss:     " totalss _n "explainedss: " explainedss _n "residualss:  " residualss
totalss:     3.3336
explainedss: .69556676
residualss:  2.6794761

. di "RsqRES: " RsqRES _n  "RsqEXP: " RsqEXP _n "RsqCOR: " RsqCOR
RsqRES: .19622149
RsqEXP: .20865333
RsqCOR: .19640666

. 
. * RsqWRSS uses weighted sums of squares
. * First generate estimated variance of y
. * Here for Poisson use fact that variance = mean
. gen sigmasq = yhat

. gen weightedylessybarsq = ((y - ybar)^2) / sigmasq

. quietly sum weightedylessybarsq

. scalar weightedtotalss = r(mean)

. gen weightedresidualsq = ((y - yhat)^2) / sigmasq

. quietly sum weightedresidualsq

. scalar weightedresidualss = r(mean)

. scalar RsqWRSS = 1 - weightedresidualss/weightedtotalss

. di "RsqWRSS: " RsqWRSS
RsqWRSS: .16945018

. 
. * RsqRG is from ML. Difficult to generalize beyond LEF models.
. * Need 
. *   lnL_fit  log-likelihood at fitted values  (the usual)
. *   lnL_0    log-likelihood at intecept only 
. *   lnL_max  log-likelihood at best fit 
. quietly poisson y $XLIST

. scalar lnL_fit = e(ll)

. scalar lnL_0 = e(ll_0)

. * The following applies only for Poisson. Differs for otehr models.
. * lnf(y) = -mu + y*ln(mu) - ln(y!) 
. * is maximized at mu = y
. * so compute lnL_max = sum of [-y + y*ln(y) - lny!]
. * Following sets 0*ln0 = 0
. gen ylny = 0

. replace ylny = y*ln(y)  if y > 0
(51 real changes made)

. gen lnfyatmax = -y + ylny - lnfact(y)

. quietly sum lnfyatmax

. scalar lnL_max = r(sum)

. scalar RsqRG = (lnL_fit - lnL_0) / (lnL_max - lnL_0)

. 
. * RsqQ should only be used for binary and other discrete choice models
. * And definitely use only if lnL_fit < 0
. scalar RsqQ = 1 - lnL_fit/lnL_0

. 
. di "lnL_0:   " lnL_0 _n "lnL_fit: " lnL_fit _n "lnL_max: " lnL_max
lnL_0:   -191.57162
lnL_fit: -176.09119
lnL_max: -101.12402

. di "RsqRG: " RsqRG _n "RsqQ: " RsqQ
RsqRG: .17115358
RsqQ: .08080754

. 
. * Check
. sum

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          x2 |       100    .0053689    1.000686  -2.173506   2.106561
          x3 |       100   -.0235884    1.024207  -2.857666   2.149822
     mupoiss |       100    2.020511    1.400564   .3380426   7.029678
          xp |       100        1.92    1.835013          0          8
           y |       100        1.92    1.835013          0          8
-------------+--------------------------------------------------------
        x3sq |       100    1.039067    1.446146   .0000877   8.166255
        yhat |       100        1.92     .838208   1.150405   5.398193
        ybar |       100        1.92           0       1.92       1.92
 ylessybarsq |       100      3.3336    5.966374      .0064    36.9664
yhatlessyb~q |       100    .6955668    1.572256   4.82e-06   12.09783
-------------+--------------------------------------------------------
  residualsq |       100    2.679476    4.830379   .0000825   36.93972
     sigmasq |       100        1.92     .838208   1.150405   5.398193
weightedyl~q |       100    1.681324    2.560112   .0018502   19.23135
weightedre~q |       100    1.396423    2.424518   .0000276   19.21747
        ylny |       100     2.15694     3.48234          0   16.63553
-------------+--------------------------------------------------------
   lnfyatmax |       100    -1.01124    .6233793  -1.969071          0

. poisson y $XLIST   /* Stata Rsq = RsqQ */

Iteration 0:   log likelihood = -176.09611  
Iteration 1:   log likelihood = -176.09119  
Iteration 2:   log likelihood = -176.09119  

Poisson regression                                Number of obs   =        100
                                                  LR chi2(2)      =      30.96
                                                  Prob > chi2     =     0.0000
Log likelihood = -176.09119                       Pseudo R2       =     0.0808

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x3 |   .3588412     .07035     5.10   0.000     .2209578    .4967245
        x3sq |   .0912999   .0514311     1.78   0.076    -.0095032    .1921029
       _cons |    .492656   .0958903     5.14   0.000     .3047144    .6805975
------------------------------------------------------------------------------

. 
. *** The following results are for Model 2 in Table 8.3 p.291
. *** For model 1 R-squareds need to rerun with only x3 as regressor
. di "standard error of regression: " sereg
standard error of regression: .16620308

. di "RsqRES: " RsqRES _n  "RsqEXP: " RsqEXP _n "RsqCOR: " RsqCOR 
RsqRES: .19622149
RsqEXP: .20865333
RsqCOR: .19640666

. di "RsqWRSS: " RsqWRSS _n "RsqRG: " RsqRG _n "RsqQ: " RsqQ
RsqWRSS: .16945018
RsqRG: .17115358
RsqQ: .08080754

. 
. ********* RESIDUAL ANALYSIS (text bottom p.290 to top p.291) **********
. 
. * Assume that from earlier have yhat
. 
. * raw residual
. gen raw = y - yhat

. gen sigma = sqrt(yhat)

. gen Pearson = (y - yhat)/sigma

. * Note that earlier defined ylny = 0 if y=0 and = yln(y) otherwise
. gen deviance = sign(y-yhat)*sqrt(2*(-y+ylny)-2*(-yhat+y*ln(yhat)))

. 
. *** The following are results reported in text bottom p.290 to top p.291
. sum raw Pearson deviance

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         raw |       100   -2.38e-09    1.645157  -2.993904   6.077806
     Pearson |       100   -.0014455    1.187656  -1.498094   4.383774
    deviance |       100   -.2103819    1.212345  -2.016939   3.264961

. corr raw Pearson deviance
(obs=100)

             |      raw  Pearson deviance
-------------+---------------------------
         raw |   1.0000
     Pearson |   0.9852   1.0000
    deviance |   0.9625   0.9818   1.0000


. * Example of use to find whether x3 belongs in the model
. * graph twoway scatter Pearson x3
. 
. ********** CLOSE OUTPUT
. log close
       log:  c:\Imbook\bwebpage\Section2\mma08p3diagnostics.txt
  log type:  text
 closed on:  17 May 2005, 14:10:13
----------------------------------------------------------------------------------------------------
