** STATA Demonstration Program by Colin Cameron ** Program stdemo.do written July 2003 for Stata version 8 version 8 ********** OVERVIEW OF STDEMO.DO ********** * This program demonstrates many basic Stata commands for a new user * (1) How to Run a Stata program * (2) Initial Stata commands: write output to an output file * (3) Comments and how to input lengthy commands * (4) Read in data (for more complex data see stinfile.do) * (5) Data transformations * (6) Descriptive Statistics: essential check of data * (7) Run a regression * (8) Saving and Displaying Regression Results * (9) Post-regression analysis * (10) Matrix commands * (11) Graph ***** (1) HOW TO RUN A STATA PROGRAM * To run this program in Windows * - Start Stata (by clicking on the Stata icon) * and within Stata give the Stata command: do stdemo * - Or Double-click on the do file (works if it has a .do extension) * To run this program in Unix batch give Unix command stata - b stdemo.do * This program accesses data. It assume you have the data file * jaggia.asc * in an appropriate directory - see READ DATA below. ***** (2) INITIAL STATA COMMANDS * Clear memory and close files possibly open from previous Stata execution * capture in front means program continues even if no log file open clear capture log close * Create output file here called stdemo.log * replace here means existing file of same name will be overwritten * text ensures desired column alignment for statistical output di "stdemo.do by Colin Cameron: Basic Stata Demonstration example" log using stdemo.log, text replace * Change Stata default settings, especially if large data set * See Stata manual [R] set or in Stata give command help set * Common examples are set memory, set maxvar and set matsize set more off ***** (3) COMMENTS AND LENGTHY COMMANDS * A single line beginning with * is a comment * Alternatively anything between /* and */ is a comment * If a command spans more than one line then * end the line with /* and begin the next line with */ * See the infile command below in READ IN DATA for an example. * An alternative is to use the #delimit command. See [R] delimit. ***** (4) READ IN DATA * The original data are from Sanjiv Jaggia and Satish Thosar, 1993, * "Multiple Bids as a Consequence of Target Management Resistance" * Review of Quantitative Finance and Accounting, 447-457. * This is used e.g. in A.C.Cameron and P.K.Trivedi (1998) * "Regression Analysis of Count Data", Cambridge University Press, pp.146-151. * The data are in ascii file jaggia.asc * There are 126 observations on 12 variables with three lines per obs * The data are fixed format with * line 1 variables 1-4 F16.8,F17.9,F17.9,F17.9 * line 2 variables 5-8 F16.8,F17.9,F17.9,F17.9, * line 3 variables 9-12 F16.8,F17.9,F17.9,F17.9 * There is at least one space between each variable. * The 12 variables will be denoted (Note: Stata names are case sensitive) * 1. docno Document Number * 2. weeks Weeks * 3. numbids Number of Takeover Bids (after the initial bid) * 4. takeover 1 if firm taken over and 0 otherwise * 5. bidprem Bid Premium (Bid price divided by price 14 working days before bid) * 6. insthold Institutional Holdings (percentage of stock held by institutions) * 7. size Total book value of assets in billions of dollars * 8. leglrest Legal Restructuring: 1 if legal defense by lawsuit * 9. realrest Real Restructuring: 1 if proposed chages in asset structure * 10. finrest Financial Restructuring: if proposed chages in ownership structure * 11. regulatn Regulation: 1 if intervention by federal regulators * 12. whtknght White Knight: 1 if management invitation for friendly third-party bid * and additionally this program will create * 13. sizesq Size Squared * There are several ways to read in this data. * We use a simple read command here, appropriate as the data are space delimited. * For more complex reads see stinfile.do * Infile: FREE FORMAT WITHOUT DICTIONARY * As there is space between each observation data is also space-delimited * free format and then there is no need for a dictionary file * The following command spans more that one line so use /* and */ infile docno weeks numbids takeover bidprem insthold size leglrest /* */ realrest finrest regulatn whtknght using jaggia.asc * To drop off extra blanks (if any) at end of file jaggia.asc drop if _n>126 * If your data have special missing value codes see the Stata manual. ***** (5) DATA TRANSFORMATIONS * Use the Stata generate command gen sizesq = size*size * The Stata label command gives longer descriptions of variable label variable sizesq "size squared" * Save the Stata data set as jaggia.dta * This is not necessary here but given for completeness save jaggia, replace ***** (6) DESCRIPTIVE STATISTICS: ESSENTIAL CHECK OF DATA * It is essential that one checks that data read in correctly * and that transformations done correctly * Stata command describe gives desccription of a Stata data set describe * Stata command summarize gives descriptive statistics: * number of non-missing observations, mean, standard deviation, min and max summarize * Stata command summarize with d option gives more detail such as quantiles summarize numbids, d * Stata command list gives complete listing of data, here first 2 observations list in 1/2 ***** (7) REGRESSION * As command spans more than one line use /* and */ here * The robust option gives White heteroskedastic-consistent standard errors regress numbids leglrest realrest finrest whtknght bidprem insthold size /* */ sizesq regulatn, robust * It is convenient to define a regressor list to shorten Stata commands global XLIST leglrest realrest finrest whtknght bidprem insthold size /* */ sizesq regulatn * Example of use regress numbids $XLIST, robust ***** (8) SAVING AND DISPLAYING REGRESSION RESULTS quietly regress numbids $XLIST estimates store olsiiderrors quietly regress numbids $XLIST, robust estimates store olsheterrors estimates table olsiiderrors olsheterrors, se stats(N ll r2 rss mss rmse df_r) b(%10.4f) ***** (9) POST-REGRESSION ANALYSIS *** (A) Accessing saved results from Stata commands. *** See Stata 8 [U] chapter 16.6. * Results are stored in * r( ) for r-class commands (general commands such as summarize) * e( ) for e-class commands (estimation commands such as regress) * s( ) for s-class commands (less-used programming commands for parsing) * For Stata 8 do the following (replaces estimates list in Stata 7) return list ereturn list sreturn list * Example: construct likelihood ratio test assuming iid normal errors scalar lrt = 2*(e(ll)-e(ll_0)) di "Likelihood ratio test statistic = " lrt di "Chi-squared with deg of freedom = " e(df_m) *** (B) Accessing coefficients and standard errors. *** See Stata 8 [u] chapter 16.5 and 17.5 * Coefficients are in _b[varname] and standard errors in _se[varname] * The following is a long way to get fitted values from previous OLS generate predbids = _b[_cons] + _b[leglrest]*leglrest +_b[realrest]*realrest /* */ + _b[finrest]*finrest + _b[whtknght]*whtknght + _b[bidprem]*bidprem /* */ + _b[insthold]*insthold + _b[size]*size + _b[sizesq]*sizesq /* */ + _b[regulatn]*regulatn sum predbids numbids * There are two ways to put coefficient estimates in a 1 x q vector matrix bols = e(b) /* Poisson coefficient vector */ matrix bolsalt = get(_b) matrix list bols matrix list bolsalt * There are two ways to put coefficient variance estimate in a matrix matrix varols = e(V) /* Poisson coefficient vector */ matrix varolsalt = get(VCE) matrix list varols matrix list varolsalt *** (C) For post-estimation commands see Stata 8 [U] chapter 23 * In-sample predictions along with standard error stored separately predict yhat, xb predict seyhat, stdp sum yhat seyhat ***** (10) MATRIX COMMANDS * For matrices see Stata 8 [U] chapter 17 and [P] matrix * Most programs limit matrix size. * So cannot define an N x k matrix X for large N. * For intercooled Stata default is 40 x 40 and largest is 800 x 800 * Can get around this by instead working with X'X which is k x k * Example: OLS estimation of NUMBIDS on regressors in $XLIST * The simple way works only for small N (here N = 126) set matsize 150 gen ONE = 1 mkmat $XLIST ONE, matrix(X) mkmat numbids, matrix(y) matrix bOLS = syminv(X'*X)*X'*y matrix list bOLS * Now get usual standard errors scalar numregs = colsof(X) scalar numobs = rowsof(X) matrix residuals = y - X*bOLS matrix s_squared = residuals'*residuals/(numobs-numregs) matrix vmOLS = s_squared*syminv(X'*X) * Stata does not do square root on a matrix or vector * so getting standard errors is tricky * Note that vecdiag creates a row vector so transpose to get column vector matrix vOLS = vecdiag(vmOLS)' matrix seOLS = J(numregs,1,0) /* Initially column vector of zeroes */ scalar icol = 1 /* Need loop here as Stata does not do square root on a vector */ while icol <= numregs { matrix seOLS[icol,1] = sqrt(vOLS[icol,1]) scalar icol = icol+1 } matrix list seOLS * Should be same as regress command regress numbids $XLIST * The more complicated way works for big N * Rather than use mkmat to define a large N x k matrix X * it uses accum to define the smaller kxk matrix X'X. * First create X'X * This uses accum which automatically adds a constant at the end matrix accum XX = $XLIST * Second create X'y which is trickier * Create Z'Z = [y X]' [y X] using accum which automatically adds a constant matrix accum ZZ = numbids $XLIST * Then take out the first column (excluding first entry) which is X'y matrix Xy = ZZ[2...,1] * Then calculate (X'X)inverse*X'y matrix bmanual = syminv(XX)*Xy matrix list bmanual * The following lists coefficient of second regressor scalar b2 = bmanual[2,1] scalar list b2 ********** CLOSE OUTPUT log close