-------------------------------------------------------------------------------
       log:  e:\webpage\stata\stsim.txt
  log type:  text
 opened on:   6 Sep 2001, 15:53:05

. di "stsim.txt by Colin Cameron: Stata simulation example"
stsim.txt by Colin Cameron: Stata simulation example

. 
. 
. **********  (A) SIMULATE ONCE  ******************************
. 
. * Heteroscedastic error example
. * y = a + bx + u  where a=1 b=1 u ~ N(0,1*x^2)
. * Sample size n is defined in the set obs command
. 
. * (1) Generate data
. * Set seed so get same random data in future runs of same program
. set seed 10101      

. drop _all       /* Drop previous data */

. * Change this for different sample size
. set obs 20
obs was 0, now 20

. * Generate N(0,1) using inverse-transformation method
. * If x ~ N(0,1) then PHI(x) = u so x = PHI-inverse(u) where PHI is N(0,1) cdf
> .
. gen x = invnorm(uniform())

. gen u = x*invnorm(uniform())

. gen y = 1 + 1*x + u

. summarize

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
           x |      20    .0963758   1.128545  -1.621421   2.943727
           u |      20   -.1186879   1.030822  -3.268378   1.110334
           y |      20    .9776879   1.282844  -3.055149   2.816744

. 
. * (2) Estimate model by OLS with usual and robust standard errors
. regress y x

      Source |       SS       df       MS              Number of obs =      20
-------------+------------------------------           F(  1,    18) =   12.57
       Model |   12.857086     1   12.857086           Prob > F      =  0.0023
    Residual |  18.4109884    18  1.02283269           R-squared     =  0.4112
-------------+------------------------------           Adj R-squared =  0.3785
       Total |  31.2680744    19  1.64568812           Root MSE      =  1.0114

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .7289128   .2055922     3.55   0.002     .2969797    1.160846
       _cons |   .9074384   .2270115     4.00   0.001     .4305049    1.384372
------------------------------------------------------------------------------

. regress y x, robust

Regression with robust standard errors                 Number of obs =      20
                                                       F(  1,    18) =    3.87
                                                       Prob > F      =  0.0647
                                                       R-squared     =  0.4112
                                                       Root MSE      =  1.0114

------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .7289128   .3704015     1.97   0.065    -.0492719    1.507098
       _cons |   .9074384   .2249449     4.03   0.001     .4348468     1.38003
------------------------------------------------------------------------------

. 
. 
. **********  (B) SIMULATE A NUMBER OF TIMES  ******************** 
. 
. * To perform a simulation
. * (1) Setup: generate and/or read in initial data
. * (2) Define the program that will be run a number of times
. *     In Stata this is command program and has an arcane format
. * (3) Run the simulations
. *     In Stata use command simul
. * (4) Summarize the results using descriptive statistics and/or graph
. 
. * Step (2) in Stata uses an arcane format
. * To define the progam called progname, say, use the following outline:
. *        program define progname
. *                if "`1'"=="?" {
. *                        global S_1 "variable names"
. *                        exit
. *                }
. *                perform single simulation
. *                post `1' results
. *        end
. * There must be the same number of results following the post command as vari
> able
. * names following the global S_1 command. See example below.
. 
. * Then Step (3) with 100 reps, say, uses the command
. * simul progname reps (100) 
. 
. * Heteroscedastic error example
. * y = a + bx + u  where a=1 b=1 u ~ N(0,1*x^2)
. * Post coefficient estimates and standard errors: a se(a) b se(b) c se(c)
. 
. * A big difference is between 
. * (B1) Completely random sampling, with both new draws of x and y.
. * (B2) x fixed in repeated samples, with just new draws of y.
. 
. 
. **** (B1) Completely random sampling
. 
. * For large n,S use n=100, S=10000. For small n,S use n=20, S=100.
. * As written there n = ?? and S = ?? simulations
. * Change n in set obs ?? and change S in reps(??)
. 
. * (1) Setup: 
. * Set seed so get same random data in future runs of same program
. set seed 10101      

. 
. * (2) Define the simulation program. 
. *     Compute both usual and White standard errors of slope coefficient.     
>   
. program define simrandm
  1. version 6.0 
  2. if "`1'" == "?" {
  3.    global S_1 "b_const se_const b_slope se_usual se_white"
  4.    exit  
  5.    }
  6.    drop _all
  7.    set obs 20          /* Change sample size by change set obs */
  8.    gen x = invnorm(uniform())
  9.    gen y = 1 + 1*x + x*invnorm(uniform())
 10.    regress y x, robust
 11.    scalar sewhite = _se[x]
 12.    regress y x
 13.    post `1' (_b[_cons]) (_se[_cons]) (_b[x]) (_se[x]) (sewhite)
 14. end

. 
. * (3) Do the simulations
. * Change number of simulations by changing reps 
. simul simrandm, reps(1000)

. 
. * (4) Post results
. * Theory says that for the dgp y = 1+1*x+x*e where x~N(0,1) and e~N(0,1)
. *   b_slope equals 1  (from the dgp)
. *   se_usual equals square root (1/n) where n is sample size
. *   se_white equals square root (3/n) where n is sample size
. describe

Contains data
  obs:         1,000                          
 vars:             5                          6 Sep 2001 15:53
 size:        24,000 (97.2% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
b_const         float  %9.0g                  
se_const        float  %9.0g                  
b_slope         float  %9.0g                  
se_usual        float  %9.0g                  
se_white        float  %9.0g                  
-------------------------------------------------------------------------------
Sorted by:  

. summarize

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
     b_const |    1000    1.007521    .217602   .3210715   1.875728
    se_const |    1000    .2118845   .0646813   .0618716   .5141954
     b_slope |    1000    .9967469   .3831677  -.3140723    2.10088
    se_usual |    1000    .2142737   .0557043   .0883832   .4814843
    se_white |    1000    .3036586   .1135306   .0890235   .9309203

. 
. kdensity b_slope, normal s(.) xlab ylab /*
>   */ title("Density of slope coeff in heteroskedasticity example")

. 
. 
. **** (B2) x fixed in repeated sampling
. 
. * For large n,S use n=100, S=10000. For small n,S use n=20, S=100.
. * As written there n = ?? and S = ?? simulations
. * Change n in set obs ?? and change S in reps(??)
. 
. * (1) Setup: First generate x to be used in repeated samples 
. set seed 10101              /* So get same random data in future runs */

. drop _all                   /* Drop previous data */

. set obs 20                  /* Change sample size by change set obs   */
obs was 0, now 20

. gen x = invnorm(uniform())  /* Using inverse-transformation method    */

. gen mean_y = 1+1*x          /* Saves recalculating in each simulation */  

. save fixedx, replace        /* Data to be re-used in each simulation  */
file fixedx.dta saved

. sum

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
           x |      20    .0963758   1.128545  -1.621421   2.943727
      mean_y |      20    1.096376   1.128545  -.6214205   3.943727

. 
. * (2) Define the simulation program
. program define simfixed
  1. version 6.0 
  2. if "`1'" == "?" {
  3.    global S_1 "b_const se_const b_slope se_usual se_white"
  4.    exit  
  5.    }
  6.    use fixedx, clear
  7.    gen y = mean_y + x*invnorm(uniform())
  8.    regress y x, robust
  9.    scalar sewhite = _se[x]
 10.    regress y x
 11.    post `1' (_b[_cons]) (_se[_cons]) (_b[x]) (_se[x]) (sewhite)
 12. end

. 
. * (3) Do the simulations
. * Change number of simulations by changing reps 
. simul simfixed, reps(1000)

. 
. * (4) Post results
. * Theory gives results for random x. See above. For fixed ??
. describe

Contains data
  obs:         1,000                          
 vars:             5                          6 Sep 2001 15:53
 size:        24,000 (96.3% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
b_const         float  %9.0g                  
se_const        float  %9.0g                  
b_slope         float  %9.0g                  
se_usual        float  %9.0g                  
se_white        float  %9.0g                  
-------------------------------------------------------------------------------
Sorted by:  

. summarize

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
     b_const |    1000    .9881501   .2334827   .1737309   1.656656
    se_const |    1000    .2212264   .0606328   .0862795   .4275818
     b_slope |    1000     .968175   .4290912  -.3860942     2.5669
    se_usual |    1000    .2003529   .0549119   .0781388    .387238
    se_white |    1000    .3103773   .1311619   .0757304   .7829539

. 
. kdensity b_slope, normal s(.) xlab ylab /*
>   */ title("Density of slope coeff in heteroskedasticity example")

. 
. 
. **********  (C) BOOTSTRAP  ******************** 
. 
. * This does some bootstrap.
. * But for more details go to separate program stboot.do
. 
. * The program produces three bootstrap 100*(1-alpha) confidence intervals
. * (1) Regular asymptotic normal: bhat +/- z_alpha/2*se(bhat)
. *     where se(bhat) is the standard deviation of bhat from the bootstrap rep
> s
. * (2) Percentile method: This orders the bhat's from simulations and
. *     goes from alpha/2 lowest bhat's to the alpha/2 highest bhat's 
. * (3) Bootstrap-corrected. This works with the pivotal Wald statistic.
. *     See manual or a textbook. e.g. Efron/Tibsharani (1993, pp.184-188) with
>  a=0
. *     This orders the bhat's from simulations and
. *     goes from p1 to the p2 highest
. *     where p1 and p2 are bias-correction adjustments to alpha/2 and 1-alpha/
> 2
. *     Let p1 = Phi(2z0 - z_alpha/2)  p2 = Phi(2z0 + z_alpha/2)
. *         z0 measures the median bias in bhat with
. *         z0 = Phi-inv(fraction of the bhat(s) < bhat)
. *     And if z0=0 then p1 = alpha/2 and no correction 
. 
. * (1) and (2) are not asymptotic refinements.
. * (3) is an asymptotic refinement and is better in small samples.
. * (4) Stata does not give the bootstrap t-interval. See progran stboot.do for
>  this.
. 
. * Generate the data set to be used
. set seed 10101              /* So get same random data in future runs */

. drop _all                   /* Drop previous data */

. set obs 20                  /* Change sample size by change set obs   */
obs was 0, now 20

. gen x = invnorm(uniform())  /* Using inverse-transformation method    */

. gen u = x*invnorm(uniform())

. gen y = 1 + 1*x + u

. sum

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
           x |      20    .0963758   1.128545  -1.621421   2.943727
           u |      20   -.1186879   1.030822  -3.268378   1.110334
           y |      20    .9776879   1.282844  -3.055149   2.816744

. 
. * Do the bootstrap
. regress y x

      Source |       SS       df       MS              Number of obs =      20
-------------+------------------------------           F(  1,    18) =   12.57
       Model |   12.857086     1   12.857086           Prob > F      =  0.0023
    Residual |  18.4109884    18  1.02283269           R-squared     =  0.4112
-------------+------------------------------           Adj R-squared =  0.3785
       Total |  31.2680744    19  1.64568812           Root MSE      =  1.0114

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .7289128   .2055922     3.55   0.002     .2969797    1.160846
       _cons |   .9074384   .2270115     4.00   0.001     .4305049    1.384372
------------------------------------------------------------------------------

. bs "regress y x"   "_b[x]", reps(999) level(95)

command:     regress y x
statistic:   _b[x]
(obs=20)

Bootstrap statistics

Variable |   Reps   Observed       Bias   Std. Err.   [95% Conf. Interval]
---------+-------------------------------------------------------------------
     bs1 |    999   .7289128   .0966306   .3837522    -.024141  1.481967  (N)
         |                                            .1951937  1.664412  (P)
         |                                            .1289247   1.50334 (BC)
-----------------------------------------------------------------------------
                              N = normal, P = percentile, BC = bias-corrected

. 
. 
. ********** CLOSE OUTPUT
. log close
       log:  e:\webpage\stata\stsim.txt
  log type:  text
 closed on:   6 Sep 2001, 15:53:14
-------------------------------------------------------------------------------