------------------------------------------------------------------------------------------------------
       log:  c:\Imbook\bwebpage\Section2\mma08p2nonnested.txt
  log type:  text
 opened on:  18 May 2005, 21:27:00

. 
. ********** OVERVIEW OF MMA08P2NONNESTED.DO **********
. 
. * STATA Program 
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi 
. * used for "Microeconometrics: Methods and Applications" 
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press 
. 
. * Chapter 8.5.3 pages 283-4
. * Nonnested model comparison given in Table 8.2:
. 
. *  (A) AIC AND VARIATIONS
. *  (B) VUONG TEST for Overlapping Models
. * for a Poisson model with simulated data (see below).
. 
. * This example requires the free Stata add-on command rndpoix. 
. * In Stata: search rndpoix
. 
. ********** SETUP **********
. 
. set more off

. version 8.0

. set scheme s1mono   /* Used for graphs */

.   
. ********** GENERATE DATA ********** 
.  
. * Dgp is
. *   y ~ Poisson[exp(b1 + b2*x2 + b3*x3]
. * where 
. *    x2, x3 is iid ~ N[0,1]
. * and b1=0 and b2=1 and b3=1.
. 
. * The Models compared are 
. * Poisson of y on x2
. * Poisson of y on x3 and x3^2 
. 
. set seed 10001

. set obs 100
obs was 0, now 100

. scalar b1 = 0.5

. scalar b2 = 0.5

. scalar b3 = 0.5

. 
. * Generate regressors
. gen x2 = invnorm(uniform())

. gen x3 = invnorm(uniform())

. gen x2sq = x2*x2

. gen x3sq = x3*x3

. 
. * Generate y
. gen mupoiss = exp(b1+b2*x2+b3*x3)

. * The next requires Stata add-on. In Stata: search rndpoix
. rndpoix(mupoiss)
( Generating ......... )
Variable xp created.

. gen y = xp

. 
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x2 x3 x2sq x3sq using mma08p2nonnested.asc, replace

. 
. ********* SETUP FOR THIS PROGRAM *********
. 
. * Change this if want different regressors
. * Here both models differ from the dgp
. * The Vuong test below assumes that the two models are OVERLAPPING
. global XLISTMODEL1 x2

. global XLISTMODEL2 x3 x3sq

. 
. ********* (A) AIC AND VARIATIONS *********
. 
. * Stata output from Poisson saves much of this.
. * Also calculate manually.
. 
. * The following code can be changed to different models than poisson
. * provided 
. * ereturn list yields N = e(N); q = e(k); and LnL = e(ll)
. * We use AIC = -2lnL+2q; BIC = -2lnL+lnN*q; CAIC = -2lnL+(1+lnN)*q
. 
. poisson y $XLISTMODEL1

Iteration 0:   log likelihood = -183.43146  
Iteration 1:   log likelihood = -183.43146  

Poisson regression                                Number of obs   =        100
                                                  LR chi2(1)      =      16.28
                                                  Prob > chi2     =     0.0001
Log likelihood = -183.43146                       Pseudo R2       =     0.0425

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x2 |    .291164    .072311     4.03   0.000     .1494371    .4328909
       _cons |   .6084331   .0752833     8.08   0.000     .4608806    .7559857
------------------------------------------------------------------------------

. estimates store model1

. scalar ll1 = e(ll)

. scalar q1 = e(k)

. scalar N1 = e(N)

. scalar aic1 = -2*ll1 + 2*q1

. scalar bic1 = -2*ll1 + ln(N1)*q1

. scalar caic1 = -2*ll1 + (1 + ln(N1))*q1

. 
. poisson y $XLISTMODEL2

Iteration 0:   log likelihood = -176.09611  
Iteration 1:   log likelihood = -176.09119  
Iteration 2:   log likelihood = -176.09119  

Poisson regression                                Number of obs   =        100
                                                  LR chi2(2)      =      30.96
                                                  Prob > chi2     =     0.0000
Log likelihood = -176.09119                       Pseudo R2       =     0.0808

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x3 |   .3588412     .07035     5.10   0.000     .2209578    .4967245
        x3sq |   .0912999   .0514311     1.78   0.076    -.0095032    .1921029
       _cons |    .492656   .0958903     5.14   0.000     .3047144    .6805975
------------------------------------------------------------------------------

. estimates store model2

. scalar ll2 = e(ll)

. scalar q2 = e(k)

. scalar N2 = e(N)

. scalar aic2 = -2*ll2 + 2*q2

. scalar bic2 = -2*ll2 + ln(N2)*q2

. scalar caic2 = -2*ll2 + (1 + ln(N2))*q2

. 
. * Display results given in first three rows of Table 8.2 page 284
. 
. estimates table model1 model2, stats(N k ll aic bic)

----------------------------------------
    Variable |   model1       model2    
-------------+--------------------------
          x2 |  .29116396               
          x3 |               .35884118  
        x3sq |               .09129986  
       _cons |  .60843314    .49265596  
-------------+--------------------------
           N |        100          100  
           k |          2            3  
          ll | -183.43146   -176.09119  
         aic |  370.86292    358.18238  
         bic |  376.07326    365.99789  
----------------------------------------

. 
. di "Model 1: " _n  "lnL:  " ll1  " q: " q1 _n " N: " N1
Model 1: 
lnL:  -183.43146 q: 2
 N: 100

. di "-2lnL: " -2*ll1 _n "AIC:  " aic1  _n " BIC: " bic1 _n "caic: " caic1
-2lnL: 366.86292
AIC:  370.86292
 BIC: 376.07326
caic: 378.07326

. 
. di "Model 2: " _n  "lnL:  " ll2  " q: " q2 _n " N: " N2
Model 2: 
lnL:  -176.09119 q: 3
 N: 100

. di "-2lnL: " -2*ll2 _n "AIC:  " aic2  _n " BIC: " bic2 _n "caic: " caic2
-2lnL: 352.18238
AIC:  358.18238
 BIC: 365.99789
caic: 368.99789

. 
. ********* (B) VUONG TEST FOR OVERLAPPING MODELS *********
. 
. * The test has three variants
. * (1) Nested models: G is contained in F 
. * (2) Strictly non-nested models: F intersection G equals null set
. * (3) Overlapping models: F intersection G does not equal null set
. 
. * Need to compute lnf(y) for models 1 and 2, 
. * where density f is model 1 and density g is model 2
. 
. * The procedures will vary with model. Here use Poisson.
. 
. * (0) COMPUTE THE LR TEST STATISTIC
. 
. * This is LR = Sum_i [ ln (fy1_i / gy2_i) ]
. *            = Sum_i lnfy1_i - Sum_i lngy2_i
. *            = difference in log-likelihood for the two models
. 
. * Easiest if program output gives logL
. * Otherwise need to generate manually
. 
. quietly poisson y $XLISTMODEL1

. scalar llf = e(ll)

. quietly poisson y $XLISTMODEL2

. scalar llg = e(ll)

. scalar LR = llf - llg

. di "LR = " LR "  and llf = " llf " llg = " llg
LR = -7.3402698  and llf = -183.43146 llg = -176.09119

. 
. * (1) NESTED MODELS
. 
. * Not done here as not relevant for the example of this application.
. 
. * (1A) Usual LR test if assume densities correctly specified.
. 
. * (1B) If instead want robustified version then need to compute W 
. * and use the weighted chi-square test.
. * This is not the appropriate test here,
. * but in 3(A) below W is computed and a weighted chi-square test used. 
. * This code could be easily adapted to here.
. 
. * (2) STRICTLY NON-NESTED MODELS
. 
. * Not done here as not relevant for the example of this application. 
. * Test uses LR/what ~ normal where what is computed in 3(B) below.
. 
. * (3) OVERLAPPING MODELS
. 
. * This is the relevant test here
. * First test whether overlapping (even though here know that is)
. * THen do the test
. 
. * (3A-1) Compute what^2 
. 
. * Calculate what^2 
. * = (1/N)*Sum_i[ln(fy1_i/gy2_i)^2] - [(1/N)*Sum_i[ln(fy1_i/gy2_i)]^2
. * = (1/N) * Sum_i [(ln(fy1_i) - ln(gy2_i))^2] - (LR/N)^2  
. 
. * For the Poisson
. *      f(y) = exp(-mu)*mu^y/y!
. * so lnf(y) = -mu + y*ln(mu) - lny!
. quietly poisson y $XLISTMODEL1

. predict yhatf
(option n assumed; predicted number of events)

. * Poisson default predict gives yhat = exp(x'b)
. gen lnf = -yhatf + y*ln(yhatf) - lnfact(y)

. quietly poisson y $XLISTMODEL2

. predict yhatg
(option n assumed; predicted number of events)

. gen lng = -yhatg + y*ln(yhatg) - lnfact(y)

. gen lnratiosq = (lnf-lng)^2

. sum lnratiosq

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
   lnratiosq |       100    .6967792    1.816804   .0000331   13.85592

. scalar whatsq = r(sum)/_N - (LR/_N)^2

. scalar Nwhatsq = _N*whatsq

. di "First-stage test statistic whatsq - still need to find critical value"
First-stage test statistic whatsq - still need to find critical value

. di "N*omegahatsq = " Nwhatsq
N*omegahatsq = 69.139128

. 
. * Aside: Check by recomputing LR this long way
. gen lnratio = (lnf-lng)

. sum lnratio

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
     lnratio |       100   -.0734027    .8356883  -3.722355   2.571382

. scalar LRcheck = r(sum)

. 
. *** Display results given in second last row of Table 8.2 page 284
. 
. di "LR = " LR "  and LRcheck = " LRcheck
LR = -7.3402698  and LRcheck = -7.3402702

. 
. * (3A-2) Find the critical value by first find W, then eigenvalues lamda, then simulate
. 
. * Calculate estimate of the W matrix on page ?? of Vuong. 
. * (a) Can estimate Af = E[d2lnf(y)/dbdb'] as inverse of usual ML variance matrix
. * (b) Since the robust ML variance matrix is V = Ainv*B*Ainv 
. *     can estimate Bf = -E[dlnf(y)/dbxdlnf(y)/db'] by A*V*A   where A is in (a) 
. * (c) For Ag same as in part (a) except for model g
. * (d) For Bg same as in part (a) except for model g
. * (e) The only tricky bit is computation of Bfg
. 
. gen one = 1

. * (a) Af
. quietly poisson y one $XLISTMODEL1, noconstant

. matrix Af = syminv(e(V))

. * (b) Bf
. quietly poisson y one $XLISTMODEL1, noconstant robust

. * robust gives Ainv*B*Ainv so pre and post multiply by A gives B
. * Also make adjustment s Stata divides by (_N-1). Here use _N.
. matrix Bf = Af*e(V)*Af*(_N-1)/_N

. * (c) Ag
. quietly poisson y one $XLISTMODEL2, noconstant

. matrix Ag = syminv(e(V))

. * (d) Bg
. quietly poisson y one $XLISTMODEL2, noconstant robust

. matrix Bg = Ag*e(V)*Ag*(_N-1)/_N

. 
. * (e) Bfg requires more specialized code pecuuliar to this example
. *     For Poisson  dlnf(y)/db  = Sum_I (y_i - mu_i)*x_i 
. *     so Bfg = (1/N)*Sum_i [(y_i - muf_i)*xf_i]*[(y_i - mug_i)*xg_i]'  
. * For model 1 x is intercept and x2  (global XLISTMODEL1 x2)
. gen bf1 = (y - yhatf)         /* yhatf saved earlier = y - muf */

. gen bf2 = (y - yhatf)*x2

. * For model 2 x is intercept, x3 and x3sq  (global XLISTMODEL2 x3 x3sq)
. gen bg1 = (y - yhatg)         /* yhatg saved earlier = y - mug */

. gen bg2 = (y - yhatg)*x3

. gen bg3 = (y - yhatg)*x3sq

. * Create Bfg 
. matrix accum BfBg = bf1 bf2 bg1 bg2 bg3, noconstant
(obs=100)

. * and Bfg is the (1,2) submatrix: rows 1 to 2 and columns 3 to 5
. matrix Bfg = BfBg[1..2,3..5]

. 
. * Form the matrix W
. * Note there is no need for minus sign as A has been defined as -A
. matrix W11 = Bf*syminv(Af)

. matrix W12 = Bfg*syminv(Ag)

. matrix W21 = Bfg'*syminv(Af)

. matrix W22 = Bg*syminv(Ag)

. matrix W = W11,W12\W21,W22

. matrix list W

W[5,5]
                y:          y:          y:          y:          y:
              one          x2         one          x3        x3sq
y:one   1.5571072   .01745302   1.3738479   .03868485   -.1702893
 y:x2   .05110494   1.4484966   .61074273   .07847014  -.15039712
  bg1   1.1488275    .1064062   1.6030095    .0647251  -.18944561
  bg2   .39558125   .08428705   .20709641   1.0650899  -.05677421
  bg3   1.1180355   -.0564763   .19914593   .07617139   .90718177

. 
. * Calculate the eigenvalues of W
. matrix eigenvalues reigvalW ceigvalW = W

. * Real eigenvalues
. matrix list reigvalW

reigvalW[1,5]
              y:         y:         y:         y:         y:
            one         x2        one         x3       x3sq
real  2.7511946  .29082285  1.4750881  1.0021719  1.0616075

. * Complex eigenvalues - hopefully none
. matrix list ceigvalW

ceigvalW[1,5]
            y:    y:    y:    y:    y:
          one    x2   one    x3  x3sq
complex     0     0     0     0     0

. 
. * This gives the vector lamda of eigenvalus of W
. matrix lamda = reigvalW

. scalar l1 = lamda[1,1]

. scalar l2 = lamda[1,2]

. scalar l3 = lamda[1,3]

. scalar l4 = lamda[1,4]

. scalar l5 = lamda[1,5]

. 
. * Now obtain the p-value and critical value at level 0.05
. preserve

. * Obtain the 5 percent critical value by simulating 10000 draws from 
. * M_p+q(lamda) = Sum_j lamda*j*z_j^2 where z_j are N[0,1] so z_j^2 are chi(1)
. set seed 10101

. set obs 10000
obs was 100, now 10000

. gen randomdraw = l1*invnorm(uniform())^2 + l2*invnorm(uniform())^2 +  /*
>   */ l3*invnorm(uniform())^2 + l4*invnorm(uniform())^2 + l5*invnorm(uniform())^2 

. gen indicator = Nwhatsq >= randomdraw

. quietly sum indicator

. di "p-value for the Omegahatsq test = "  1-r(mean)
p-value for the Omegahatsq test = 0

. sum randomdraw, detail

                         randomdraw
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .6438425       .0756691
 5%     1.286375       .1250253
10%     1.850972       .1326376       Obs               10000
25%     3.137835       .1402145       Sum of Wgt.       10000

50%     5.359223                      Mean           6.614841
                        Largest       Std. Dev.       4.90562
75%     8.751276       38.32291
90%      12.8871       38.75208       Variance       24.06511
95%     16.10237       40.94431       Skewness       1.733549
99%     23.85304       44.08449       Kurtosis       7.514808

. di "Reject overlapping at level .05 if N*omegahatsq exceeds " r(p95)
Reject overlapping at level .05 if N*omegahatsq exceeds 16.102374

. restore

. di "where N*omegahatequals " Nwhatsq 
where N*omegahatequals 69.139128

. di "If reject then continue to second step."
If reject then continue to second step.

. di "Otherwise stop as cannot determine whether models are overlapping."
Otherwise stop as cannot determine whether models are overlapping.

. 
. * (3B) Do the second stage test if reject at (3A)
. gen TLR = (LR/sqrt(whatsq))/sqrt(_N)

. 
. *** Display results given in second last row of Table 8.2 page 284
. 
. di "TLR is N[0,1]. Here TLR = " TLR
TLR is N[0,1]. Here TLR = -.88277513

. di "Two-tailed test p-value: " chi2tail(1,TLR^2)
Two-tailed test p-value: .37735778

. 
. ********** CLOSE OUTPUT **********
. log close
       log:  c:\Imbook\bwebpage\Section2\mma08p2nonnested.txt
  log type:  text
 closed on:  18 May 2005, 21:27:00
----------------------------------------------------------------------------------------------------