------------------------------------------------------------------------------- log: e:\webpage\stata\stsim.txt log type: text opened on: 6 Sep 2001, 15:53:05 . di "stsim.txt by Colin Cameron: Stata simulation example" stsim.txt by Colin Cameron: Stata simulation example . . . ********** (A) SIMULATE ONCE ****************************** . . * Heteroscedastic error example . * y = a + bx + u where a=1 b=1 u ~ N(0,1*x^2) . * Sample size n is defined in the set obs command . . * (1) Generate data . * Set seed so get same random data in future runs of same program . set seed 10101 . drop _all /* Drop previous data */ . * Change this for different sample size . set obs 20 obs was 0, now 20 . * Generate N(0,1) using inverse-transformation method . * If x ~ N(0,1) then PHI(x) = u so x = PHI-inverse(u) where PHI is N(0,1) cdf > . . gen x = invnorm(uniform()) . gen u = x*invnorm(uniform()) . gen y = 1 + 1*x + u . summarize Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- x | 20 .0963758 1.128545 -1.621421 2.943727 u | 20 -.1186879 1.030822 -3.268378 1.110334 y | 20 .9776879 1.282844 -3.055149 2.816744 . . * (2) Estimate model by OLS with usual and robust standard errors . regress y x Source | SS df MS Number of obs = 20 -------------+------------------------------ F( 1, 18) = 12.57 Model | 12.857086 1 12.857086 Prob > F = 0.0023 Residual | 18.4109884 18 1.02283269 R-squared = 0.4112 -------------+------------------------------ Adj R-squared = 0.3785 Total | 31.2680744 19 1.64568812 Root MSE = 1.0114 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .7289128 .2055922 3.55 0.002 .2969797 1.160846 _cons | .9074384 .2270115 4.00 0.001 .4305049 1.384372 ------------------------------------------------------------------------------ . regress y x, robust Regression with robust standard errors Number of obs = 20 F( 1, 18) = 3.87 Prob > F = 0.0647 R-squared = 0.4112 Root MSE = 1.0114 ------------------------------------------------------------------------------ | Robust y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .7289128 .3704015 1.97 0.065 -.0492719 1.507098 _cons | .9074384 .2249449 4.03 0.001 .4348468 1.38003 ------------------------------------------------------------------------------ . . . ********** (B) SIMULATE A NUMBER OF TIMES ******************** . . * To perform a simulation . * (1) Setup: generate and/or read in initial data . * (2) Define the program that will be run a number of times . * In Stata this is command program and has an arcane format . * (3) Run the simulations . * In Stata use command simul . * (4) Summarize the results using descriptive statistics and/or graph . . * Step (2) in Stata uses an arcane format . * To define the progam called progname, say, use the following outline: . * program define progname . * if "`1'"=="?" { . * global S_1 "variable names" . * exit . * } . * perform single simulation . * post `1' results . * end . * There must be the same number of results following the post command as vari > able . * names following the global S_1 command. See example below. . . * Then Step (3) with 100 reps, say, uses the command . * simul progname reps (100) . . * Heteroscedastic error example . * y = a + bx + u where a=1 b=1 u ~ N(0,1*x^2) . * Post coefficient estimates and standard errors: a se(a) b se(b) c se(c) . . * A big difference is between . * (B1) Completely random sampling, with both new draws of x and y. . * (B2) x fixed in repeated samples, with just new draws of y. . . . **** (B1) Completely random sampling . . * For large n,S use n=100, S=10000. For small n,S use n=20, S=100. . * As written there n = ?? and S = ?? simulations . * Change n in set obs ?? and change S in reps(??) . . * (1) Setup: . * Set seed so get same random data in future runs of same program . set seed 10101 . . * (2) Define the simulation program. . * Compute both usual and White standard errors of slope coefficient. > . program define simrandm 1. version 6.0 2. if "`1'" == "?" { 3. global S_1 "b_const se_const b_slope se_usual se_white" 4. exit 5. } 6. drop _all 7. set obs 20 /* Change sample size by change set obs */ 8. gen x = invnorm(uniform()) 9. gen y = 1 + 1*x + x*invnorm(uniform()) 10. regress y x, robust 11. scalar sewhite = _se[x] 12. regress y x 13. post `1' (_b[_cons]) (_se[_cons]) (_b[x]) (_se[x]) (sewhite) 14. end . . * (3) Do the simulations . * Change number of simulations by changing reps . simul simrandm, reps(1000) . . * (4) Post results . * Theory says that for the dgp y = 1+1*x+x*e where x~N(0,1) and e~N(0,1) . * b_slope equals 1 (from the dgp) . * se_usual equals square root (1/n) where n is sample size . * se_white equals square root (3/n) where n is sample size . describe Contains data obs: 1,000 vars: 5 6 Sep 2001 15:53 size: 24,000 (97.2% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- b_const float %9.0g se_const float %9.0g b_slope float %9.0g se_usual float %9.0g se_white float %9.0g ------------------------------------------------------------------------------- Sorted by: . summarize Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- b_const | 1000 1.007521 .217602 .3210715 1.875728 se_const | 1000 .2118845 .0646813 .0618716 .5141954 b_slope | 1000 .9967469 .3831677 -.3140723 2.10088 se_usual | 1000 .2142737 .0557043 .0883832 .4814843 se_white | 1000 .3036586 .1135306 .0890235 .9309203 . . kdensity b_slope, normal s(.) xlab ylab /* > */ title("Density of slope coeff in heteroskedasticity example") . . . **** (B2) x fixed in repeated sampling . . * For large n,S use n=100, S=10000. For small n,S use n=20, S=100. . * As written there n = ?? and S = ?? simulations . * Change n in set obs ?? and change S in reps(??) . . * (1) Setup: First generate x to be used in repeated samples . set seed 10101 /* So get same random data in future runs */ . drop _all /* Drop previous data */ . set obs 20 /* Change sample size by change set obs */ obs was 0, now 20 . gen x = invnorm(uniform()) /* Using inverse-transformation method */ . gen mean_y = 1+1*x /* Saves recalculating in each simulation */ . save fixedx, replace /* Data to be re-used in each simulation */ file fixedx.dta saved . sum Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- x | 20 .0963758 1.128545 -1.621421 2.943727 mean_y | 20 1.096376 1.128545 -.6214205 3.943727 . . * (2) Define the simulation program . program define simfixed 1. version 6.0 2. if "`1'" == "?" { 3. global S_1 "b_const se_const b_slope se_usual se_white" 4. exit 5. } 6. use fixedx, clear 7. gen y = mean_y + x*invnorm(uniform()) 8. regress y x, robust 9. scalar sewhite = _se[x] 10. regress y x 11. post `1' (_b[_cons]) (_se[_cons]) (_b[x]) (_se[x]) (sewhite) 12. end . . * (3) Do the simulations . * Change number of simulations by changing reps . simul simfixed, reps(1000) . . * (4) Post results . * Theory gives results for random x. See above. For fixed ?? . describe Contains data obs: 1,000 vars: 5 6 Sep 2001 15:53 size: 24,000 (96.3% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- b_const float %9.0g se_const float %9.0g b_slope float %9.0g se_usual float %9.0g se_white float %9.0g ------------------------------------------------------------------------------- Sorted by: . summarize Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- b_const | 1000 .9881501 .2334827 .1737309 1.656656 se_const | 1000 .2212264 .0606328 .0862795 .4275818 b_slope | 1000 .968175 .4290912 -.3860942 2.5669 se_usual | 1000 .2003529 .0549119 .0781388 .387238 se_white | 1000 .3103773 .1311619 .0757304 .7829539 . . kdensity b_slope, normal s(.) xlab ylab /* > */ title("Density of slope coeff in heteroskedasticity example") . . . ********** (C) BOOTSTRAP ******************** . . * This does some bootstrap. . * But for more details go to separate program stboot.do . . * The program produces three bootstrap 100*(1-alpha) confidence intervals . * (1) Regular asymptotic normal: bhat +/- z_alpha/2*se(bhat) . * where se(bhat) is the standard deviation of bhat from the bootstrap rep > s . * (2) Percentile method: This orders the bhat's from simulations and . * goes from alpha/2 lowest bhat's to the alpha/2 highest bhat's . * (3) Bootstrap-corrected. This works with the pivotal Wald statistic. . * See manual or a textbook. e.g. Efron/Tibsharani (1993, pp.184-188) with > a=0 . * This orders the bhat's from simulations and . * goes from p1 to the p2 highest . * where p1 and p2 are bias-correction adjustments to alpha/2 and 1-alpha/ > 2 . * Let p1 = Phi(2z0 - z_alpha/2) p2 = Phi(2z0 + z_alpha/2) . * z0 measures the median bias in bhat with . * z0 = Phi-inv(fraction of the bhat(s) < bhat) . * And if z0=0 then p1 = alpha/2 and no correction . . * (1) and (2) are not asymptotic refinements. . * (3) is an asymptotic refinement and is better in small samples. . * (4) Stata does not give the bootstrap t-interval. See progran stboot.do for > this. . . * Generate the data set to be used . set seed 10101 /* So get same random data in future runs */ . drop _all /* Drop previous data */ . set obs 20 /* Change sample size by change set obs */ obs was 0, now 20 . gen x = invnorm(uniform()) /* Using inverse-transformation method */ . gen u = x*invnorm(uniform()) . gen y = 1 + 1*x + u . sum Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- x | 20 .0963758 1.128545 -1.621421 2.943727 u | 20 -.1186879 1.030822 -3.268378 1.110334 y | 20 .9776879 1.282844 -3.055149 2.816744 . . * Do the bootstrap . regress y x Source | SS df MS Number of obs = 20 -------------+------------------------------ F( 1, 18) = 12.57 Model | 12.857086 1 12.857086 Prob > F = 0.0023 Residual | 18.4109884 18 1.02283269 R-squared = 0.4112 -------------+------------------------------ Adj R-squared = 0.3785 Total | 31.2680744 19 1.64568812 Root MSE = 1.0114 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .7289128 .2055922 3.55 0.002 .2969797 1.160846 _cons | .9074384 .2270115 4.00 0.001 .4305049 1.384372 ------------------------------------------------------------------------------ . bs "regress y x" "_b[x]", reps(999) level(95) command: regress y x statistic: _b[x] (obs=20) Bootstrap statistics Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] ---------+------------------------------------------------------------------- bs1 | 999 .7289128 .0966306 .3837522 -.024141 1.481967 (N) | .1951937 1.664412 (P) | .1289247 1.50334 (BC) ----------------------------------------------------------------------------- N = normal, P = percentile, BC = bias-corrected . . . ********** CLOSE OUTPUT . log close log: e:\webpage\stata\stsim.txt log type: text closed on: 6 Sep 2001, 15:53:14 -------------------------------------------------------------------------------