------------------------------------------------------------------------------------------------------
       log:  c:\Imbook\bwebpage\Section2\mma07p4boot.txt
  log type:  text
 opened on:  18 May 2005, 21:36:29

. 
. ********** OVERVIEW OF MMA07BOOT4.DO **********
. 
. * STATA Program 
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi 
. * used for "Microeconometrics: Methods and Applications" 
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press 
. 
. * Chapter 7.8 pages 254-256
. * Bootstrap applied to probit model
. * Provides
. *  (1) Bootstrap confidence intervals
. *  (2) Bootstrap hypothesis test without refinement
. *  (3) Bootstrap hypothesis test with refinement: percentile-t method
. 
. * Note corrections to book
. *  - sample size is N=40  not N=30
. *  - use 999 bootstrap replications not 1000
. *  - for asymptotic refinement p.256 the critical region 
. *      is (-1.89, 1.80) not (-2.62, 1.83)
. 
. * For more detail on bootstrap see 
. * Chapter 11: Bootstrap Methods pages 355-383
. * and program mma11p1boot.do
. 
. ********** SETUP **********
. 
. set more off

. version 8

.   
. ********** GENERATE DATA **********
. 
. * DGP is Probit: Pr[y=1] = PHI(a + bx)
. * where  x is N[0,1]
. * and  a = 0  and b = 1
. 
. * Change the following for different sample size N
. global numobs "40"

. 
. * Probit example with slope coefficient equal to 1
. set seed 10105

. set obs $numobs
obs was 0, now 40

. gen x = invnorm(uniform())

. gen y = 0

. replace y = 1 if 0+1.0*x+invnorm(uniform()) > 0
(19 real changes made)

. save xyforsim, replace
file xyforsim.dta saved

. summarize 

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
           x |        40   -.0359197    .9203391  -2.210579    1.45199
           y |        40        .475    .5057363          0          1

. probit y x

Iteration 0:   log likelihood = -27.675866
Iteration 1:   log likelihood = -22.927488
Iteration 2:   log likelihood = -22.735204
Iteration 3:   log likelihood = -22.733966
Iteration 4:   log likelihood = -22.733966

Probit estimates                                  Number of obs   =         40
                                                  LR chi2(1)      =       9.88
                                                  Prob > chi2     =     0.0017
Log likelihood = -22.733966                       Pseudo R2       =     0.1786

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .8168831   .2942893     2.78   0.006     .2400867    1.393679
       _cons |  -.0725436   .2162576    -0.34   0.737    -.4964006    .3513135
------------------------------------------------------------------------------

. save mma07p4boot, replace
file mma07p4boot.dta saved

. 
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x using mma07p4boot.asc, replace

. 
. ********** (1) BOOTSTRAP CONFIDENCE INTERVALS ********** 
. 
. * Stata produces four bootstrap 100*(1-alpha) confidence intervals
. * (1)-(2) have no asymptotic refinement
. * (3)-(4) have asymptotic refinement
. 
. * (1) Regular asymptotic normal: bhat +/- t(S-1)_alpha/2*se(bhat)
. *     except instead of using the initial se(bhat) 
. *     we use the standard deviation of bhat from the bootstrap reps
. *     and use t(S-1) rather than z for critical value
. *     where S = number of bootstrap reps
. 
. * (2) Percentile method: which orders the bhat(s) from simulations and
. *     goes from alpha/2 lowest bhat(s) to the alpha/2 highest bhat(s) 
. *     where (s) denotes the s-th bootstrap sample
. 
. * (3) Bootstrap-corrected. Same as (4) with a=0
. 
. * (4) Bootstrap-corrected and accelerated. 
. *     This works with the pivotal Wald statistic.
. *     See the manual [R]bootstrap or a textbook.
. *     e.g. Efron and Tibsharani (1993, pp.184-188) with a=0
. *     This orders the bhats from simulations and
. *     goes from p1 to the p2 highest
. *     where p1 and p2 are bias-correction adjustments to alpha/2 and 1-alpha/2
. *     Let p1 = Phi(2z0 - z_alpha/2)
. *         p2 = Phi(2z0 + z_alpha/2)
. *         z0 measures the median bias in bhat with
. *         z0 = Phi-inv(fraction of the bhat(s) < bhat)
. *     And if z0=0 then p1 = alpha/2 and no correction 
. 
. * Change the following for different number of simulations S
. * From page 399, for testing better to use 999 than 1000
. global breps "999"   /* The number of bootstrap reps used below */ 

. 
. * (1A) Simplest bootstrap is of all the estimated coefficients
. set seed 10105

. bootstrap "probit y x" _b, reps($breps) bca

command:      probit y x
statistics:   b_x        = _b[x]
              b_cons     = _b[_cons]

Bootstrap statistics                              Number of obs    =        40
                                                  Replications     =       999

------------------------------------------------------------------------------
Variable     |  Reps  Observed      Bias  Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
         b_x |   999  .8168831  .1017329  .3763803   .0782956   1.555471   (N)
             |                                       .3495505   1.878616   (P)
             |                                       .2808956   1.600026  (BC)
             |                                       .1552112   1.480223 (BCa)
      b_cons |   999 -.0725436 -.0176301  .2448404  -.5530047   .4079175   (N)
             |                                       -.596443   .4247662   (P)
             |                                      -.5528302   .4381396  (BC)
             |                                      -.5205303   .4445401 (BCa)
------------------------------------------------------------------------------
Note:  N   = normal
       P   = percentile
       BC  = bias-corrected
       BCa = bias-corrected and accelerated

. 
. * (1B) This bootstrap is of MLE of b2 and the associated standard error
. *     and additionally gives the bias-accelerated method of Efron
. set seed 10105

. bootstrap "probit y x" _b[x] _se[x], reps($breps) bca

command:      probit y x
statistics:   _bs_1      = _b[x]
              _bs_2      = _se[x]

Bootstrap statistics                              Number of obs    =        40
                                                  Replications     =       999

------------------------------------------------------------------------------
Variable     |  Reps  Observed      Bias  Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   999  .8168831  .1017329  .3763803   .0782956   1.555471   (N)
             |                                       .3495505   1.878616   (P)
             |                                       .2808956   1.600026  (BC)
             |                                       .1552112   1.480223 (BCa)
       _bs_2 |   999  .2942893  .0422005  .0932673   .1112667   .4773118   (N)
             |                                       .2323841   .5831083   (P)
             |                                       .2214397   .4475662  (BC)
             |                                       .2162534   .4143377 (BCa)
------------------------------------------------------------------------------
Note:  N   = normal
       P   = percentile
       BC  = bias-corrected
       BCa = bias-corrected and accelerated

. 
. * (1C) This bootstrap repeats (2) 
. *     but will permit bootstrapping if Stata commands are more than one line
. use mma07p4boot, clear

. program define commandtobootstrap, rclass
  1.   version 8.0
  2.   quietly probit y x
  3.   return scalar b2hat=_b[x]
  4.   return scalar seb2hat=_se[x]
  5. end

. set seed 10105

. bootstrap "commandtobootstrap" r(b2hat) r(seb2hat), reps($breps)

command:      commandtobootstrap
statistics:   _bs_1      = r(b2hat)
              _bs_2      = r(seb2hat)

Bootstrap statistics                              Number of obs    =        40
                                                  Replications     =       999

------------------------------------------------------------------------------
Variable     |  Reps  Observed      Bias  Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   999  .8168831  .1017329  .3763803   .0782956   1.555471   (N)
             |                                       .3495505   1.878616   (P)
             |                                       .2808956   1.600026  (BC)
       _bs_2 |   999  .2942893  .0422005  .0932673   .1112667   .4773118   (N)
             |                                       .2323841   .5831083   (P)
             |                                       .2214397   .4475662  (BC)
------------------------------------------------------------------------------
Note:  N   = normal
       P   = percentile
       BC  = bias-corrected

. 
. ********** (2) BOOTSTRAP HYPOTHESIS TESTS - NO REFINEMENT p.255 ********** 
. 
. * We want to test H0: b2 = 1 against Ha: b2 not equal 1
. 
. * For a simple test such as this we can just use
. * the bootstrap confidence intervals from (1) 
. * and reject if bhat2 is not in the confidence interval
. 
. * Here we instead present a common method without refinement
. * essentially (1) above, performing the usual Wald test, 
. * except the standard error is estimated by bootstrap.
. * This is useful when hard to obtain standard error by other means.
. * Here  W = (b2hat - b2_0) / seb2hat_boot  where b2_0 = 1
. * and reject at level .05 if |W| > z_.025 = 1.96
. 
. use mma07p4boot, clear

. * Save the estimate 
. quietly probit y x

. scalar b2est = _b[x]

. * Obtain the bootstrap standard error
. set seed 10105

. bootstrap "probit y x" _b, reps($breps) bca

command:      probit y x
statistics:   b_x        = _b[x]
              b_cons     = _b[_cons]

Bootstrap statistics                              Number of obs    =        40
                                                  Replications     =       999

------------------------------------------------------------------------------
Variable     |  Reps  Observed      Bias  Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
         b_x |   999  .8168831  .1017329  .3763803   .0782956   1.555471   (N)
             |                                       .3495505   1.878616   (P)
             |                                       .2808956   1.600026  (BC)
             |                                       .1552112   1.480223 (BCa)
      b_cons |   999 -.0725436 -.0176301  .2448404  -.5530047   .4079175   (N)
             |                                       -.596443   .4247662   (P)
             |                                      -.5528302   .4381396  (BC)
             |                                      -.5205303   .4445401 (BCa)
------------------------------------------------------------------------------
Note:  N   = normal
       P   = percentile
       BC  = bias-corrected
       BCa = bias-corrected and accelerated

. matrix sebboot = e(se)

. scalar seb2boot = sebboot[1,1]   /* x is first then constant */

. * Calculate the test statistic
. scalar Wald = (b2est - 1)/seb2boot

. 
. * DISPLAY RESULTS at bottom p.255
. * Note: Text had typo: 
. * (1-0.817)/0.376 = -0.487 should be (0.817-1)/0.376 = -0.487
. 
. di "Probit slope estimate is:        "   b2est
Probit slope estimate is:        .8168831

. di "Bootstrap standard estimate is:  "   seb2boot 
Bootstrap standard estimate is:  .37638029

. di "Wald statistic (no refinement) is: " Wald
Wald statistic (no refinement) is: -.48652096

. di "Reject at level .05 if |Wald| > 1.96"
Reject at level .05 if |Wald| > 1.96

. 
. ********** (3) BOOTSTRAP HYPOTHESIS TESTS - PERCENTILE-T p.256 ********** 
. 
. * Stata does not give this. For methods see
. * e.g. Efron and Tibsharani (1993, pp.160-162)  
. * e.g. Cameron and Trivedi (2005) Chapter 11.2.6-11.2.7
. * For sample s compute t-test(s) = (bhat(s)-bhat) / se(s)
. * where bhat is initial estimate 
. * and bhat(s) and se(s) are for sth round.
. * Order the t-test(s) statistics and choose the alpha/2 percentiles
. * which give the critical values for the t-test
. 
. * Implementation requires saving the results from each bootstrap replication 
. * in order to obtain ccritical values from percentiles of bootstrap distribution
. 
. * (3A) Here bootstrap computes (b(s) - bhat) / se(s)  s = 1,...,S
.  
. use mma07p4boot, clear

. * Save the estimate and the Wald test statistic 
. quietly probit y x

. scalar b2est = _b[x]

. scalar Wald = (_b[x] - 1)/_se[x]

. * Then bootstrap calculates (b(s) - bhat) / se(s)
. set seed 10105

. bootstrap "probit y x" ((_b[x]-b2est)/_se[x]), reps($breps) /*
>    */ level(95) saving(mma07p4bootreps) replace

command:      probit y x
statistic:    _bs_1      = (_b[x]-b2est)/_se[x]

Bootstrap statistics                              Number of obs    =        40
                                                  Replications     =       999

------------------------------------------------------------------------------
Variable     |  Reps  Observed      Bias  Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   999         0  .1003619  .9350234  -1.834837   1.834837   (N)
             |                                      -1.890602   1.801358   (P)
             |                                      -2.101316   1.565618  (BC)
------------------------------------------------------------------------------
Note:  N   = normal
       P   = percentile
       BC  = bias-corrected

. * Then get data sets with result from each bootstrap
. use mma07p4bootreps, clear
(bootstrap: probit y x)

. sum                 /* Here just _bs_1 */

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       _bs_1 |       999    .1003619    .9350234  -3.032139   2.572848

. gen b2test = _bs_1  /* _bs_1 is the bootstrap result of interest */

. sum b2test, detail  /* Gives percentiles but not 2.5% and 97.5%  */

                           b2test
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -2.188575      -3.032139
 5%    -1.540843      -2.605178
10%    -1.137846      -2.599248       Obs                 999
25%    -.4995352      -2.566578       Sum of Wgt.         999

50%     .1238111                      Mean           .1003619
                        Largest       Std. Dev.      .9350234
75%     .7789762        2.22565
90%     1.338348       2.359132       Variance       .8742688
95%     1.560646       2.377491       Skewness      -.2505319
99%     2.014282       2.572848       Kurtosis       2.853737

. _pctile b2test, p(2.5,97.5)

. 
. * DISPLAY RESULTS on p.256
. 
. * Note: Error on p.256  Here get (-1.89, 1.80) not (-2.62, 1.83)
. di "Lower 2.5 and upper 2.5 percentile of coeff b for z: " r(r1) "  and  " r(r2) 
Lower 2.5 and upper 2.5 percentile of coeff b for z: -1.8906019  and  1.8013585

. di "Reject H0 if Wald = " Wald " lies outside " r(r1) " ," r(r2) ")"
Reject H0 if Wald = -.62223436 lies outside -1.8906019 ,1.8013585)

. 
. * (3B) Equivalently bootstrap calculates b(s) and se(s)   s = 1,...,S
. *      and then later calculate (b(s) - bhat) / se(s)
. 
. use mma07p4boot, clear

. * Save the estimate and the Wald test statistic 
. quietly probit y x

. scalar b2est = _b[x]

. scalar Wald = (_b[x] - 1)/_se[x]

. * Then bootstrap calculates b(s) and se(s)
. set seed 10105

. bootstrap "probit y x" _b[x] _se[x], reps($breps) /*
>    */ level(95) saving(mma07p4bootreps) replace

command:      probit y x
statistics:   _bs_1      = _b[x]
              _bs_2      = _se[x]

Bootstrap statistics                              Number of obs    =        40
                                                  Replications     =       999

------------------------------------------------------------------------------
Variable     |  Reps  Observed      Bias  Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   999  .8168831  .1017329  .3763803   .0782956   1.555471   (N)
             |                                       .3495505   1.878616   (P)
             |                                       .2808956   1.600026  (BC)
       _bs_2 |   999  .2942893  .0422005  .0932673   .1112667   .4773118   (N)
             |                                       .2323841   .5831083   (P)
             |                                       .2214397   .4475662  (BC)
------------------------------------------------------------------------------
Note:  N   = normal
       P   = percentile
       BC  = bias-corrected

. * Then get data sets with result from each bootstrap
. use mma07p4bootreps, clear
(bootstrap: probit y x)

. sum                 /* Here _bs_1 and _bs_2 */

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       _bs_1 |       999     .918616    .3763803   .0030288   3.806198
       _bs_2 |       999    .3364898    .0932673   .2162534    1.34312

. gen b2test = (_bs_1 - b2est)/_bs_2

. _pctile b2test, p(2.5,97.5)

. 
. * DISPLAY RESULTS on p.256
. * Note: Error on p.256  Here get (-1.89, 1.80) not (-2.62, 1.83)
. di "Lower 2.5 and upper 2.5 percentile of coeff b for z: " r(r1) "  and  " r(r2) 
Lower 2.5 and upper 2.5 percentile of coeff b for z: -1.8906019  and  1.8013583

. di "Reject H0 if Wald = " Wald " lies outside " r(r1) " ," r(r2) ")"
Reject H0 if Wald = -.62223436 lies outside -1.8906019 ,1.8013583)

. 
. ********** CLOSE OUTPUT
. log close
       log:  c:\Imbook\bwebpage\Section2\mma07p4boot.txt
  log type:  text
 closed on:  18 May 2005, 21:36:36
----------------------------------------------------------------------------------------------------