** STATA Demonstration Program by Colin Cameron
** Program stdemo.do written July 2003 for Stata version 8

version 8

********** OVERVIEW OF STDEMO.DO **********

* This program demonstrates many basic Stata commands for a new user
* (1) How to Run a Stata program
* (2) Initial Stata commands: write output to an output file
* (3) Comments and how to input lengthy commands
* (4) Read in data (for more complex data see stinfile.do)
* (5) Data transformations 
* (6) Descriptive Statistics: essential check of data
* (7) Run a regression
* (8) Saving and Displaying Regression Results
* (9) Post-regression analysis
* (10) Matrix commands
* (11) Graph 


***** (1) HOW TO RUN A STATA PROGRAM

* To run this program in Windows 
*  - Start Stata (by clicking on the Stata icon) 
*    and within Stata give the Stata command:  do stdemo
*  - Or Double-click on the do file (works if it has a .do extension)
* To run this program in Unix batch give Unix command stata - b stdemo.do

* This program accesses data. It assume you have the data file
*    jaggia.asc
* in an appropriate directory - see READ DATA below.


***** (2) INITIAL STATA COMMANDS

* Clear memory and close files possibly open from previous Stata execution
* capture in front means program continues even if no log file open 
clear
capture log close    

* Create output file here called stdemo.log
* replace here means existing file of same name will be overwritten
* text ensures desired column alignment for statistical output
di "stdemo.do by Colin Cameron: Basic Stata Demonstration example"
log using stdemo.log, text replace

* Change Stata default settings, especially if large data set
* See Stata manual [R] set or in Stata give command help set
* Common examples are set memory, set maxvar and set matsize
set more off


***** (3) COMMENTS AND LENGTHY COMMANDS

* A single line beginning with * is a comment

* Alternatively anything between  /* and */ is a comment

* If a command spans more than one line then 
* end the line with /*   and begin the next line with */ 
* See the infile command below in READ IN DATA for an example.
* An alternative is to use the #delimit command. See [R] delimit.


***** (4) READ IN DATA 

* The original data are from Sanjiv Jaggia and Satish Thosar, 1993,
* "Multiple Bids as a Consequence of Target Management Resistance"
* Review of Quantitative Finance and Accounting, 447-457.
* This is used e.g. in A.C.Cameron and P.K.Trivedi (1998)
* "Regression Analysis of Count Data", Cambridge University Press, pp.146-151. 

* The data are in ascii file jaggia.asc
* There are 126 observations on 12 variables with three lines per obs
* The data are fixed format with 
*   line 1  variables 1-4  F16.8,F17.9,F17.9,F17.9
*   line 2  variables 5-8  F16.8,F17.9,F17.9,F17.9,
*   line 3  variables 9-12 F16.8,F17.9,F17.9,F17.9
* There is at least one space between each variable.

* The 12 variables will be denoted (Note: Stata names are case sensitive)
* 1. docno      Document Number
* 2. weeks      Weeks
* 3. numbids    Number of Takeover Bids (after the initial bid)
* 4. takeover   1 if firm taken over and 0 otherwise
* 5. bidprem    Bid Premium (Bid price divided by price 14 working days before bid)
* 6. insthold   Institutional Holdings (percentage of stock held by institutions)
* 7. size       Total book value of assets in billions of dollars
* 8. leglrest   Legal Restructuring: 1 if legal defense by lawsuit 
* 9. realrest   Real Restructuring: 1 if proposed chages in asset structure
* 10. finrest   Financial Restructuring: if proposed chages in ownership structure
* 11. regulatn  Regulation: 1 if intervention by federal regulators
* 12. whtknght  White Knight: 1 if management invitation for friendly third-party bid

* and additionally this program will create
* 13. sizesq    Size Squared

* There are several ways to read in this data.
* We use a simple read command here, appropriate as the data are space delimited.
* For more complex reads see stinfile.do 

* Infile: FREE FORMAT WITHOUT DICTIONARY
* As there is space between each observation data is also space-delimited 
* free format and then there is no need for a dictionary file

* The following command spans more that one line so use /* and */

infile docno weeks numbids takeover bidprem insthold size leglrest /*
   */ realrest finrest regulatn whtknght using jaggia.asc
 
* To drop off extra blanks (if any) at end of file jaggia.asc
drop if _n>126

* If your data have special missing value codes see the Stata manual.


***** (5) DATA TRANSFORMATIONS

* Use the Stata generate command
gen sizesq = size*size

* The Stata label command gives longer descriptions of variable
label variable sizesq "size squared"

* Save the Stata data set as jaggia.dta
* This is not necessary here but given for completeness
save jaggia, replace


***** (6) DESCRIPTIVE STATISTICS: ESSENTIAL CHECK OF DATA

* It is essential that one checks that data read in correctly 
* and that transformations done correctly 

* Stata command describe gives desccription of a Stata data set
describe

* Stata command summarize gives descriptive statistics:  
* number of non-missing observations, mean, standard deviation, min and max
summarize

* Stata command summarize with d option gives more detail such as quantiles 
summarize numbids, d

* Stata command list gives complete listing of data, here first 2 observations
list in 1/2


***** (7) REGRESSION 

* As command spans more than one line use /* and */ here
* The robust option gives White heteroskedastic-consistent standard errors

regress numbids leglrest realrest finrest whtknght bidprem insthold size /*
    */ sizesq regulatn, robust

* It is convenient to define a regressor list to shorten Stata commands
global XLIST leglrest realrest finrest whtknght bidprem insthold size /*
    */ sizesq regulatn

* Example of use
regress numbids $XLIST, robust


***** (8) SAVING AND DISPLAYING REGRESSION RESULTS

quietly regress numbids $XLIST
estimates store olsiiderrors
quietly regress numbids $XLIST, robust
estimates store olsheterrors
estimates table olsiiderrors olsheterrors, se stats(N ll r2 rss mss rmse df_r) b(%10.4f)


***** (9) POST-REGRESSION ANALYSIS

*** (A) Accessing saved results from Stata commands. 
***     See Stata 8 [U] chapter 16.6.

* Results are stored in
*   r( ) for r-class commands  (general commands such as summarize)
*   e( ) for e-class commands  (estimation commands such as regress)
*   s( ) for s-class commands  (less-used programming commands for parsing)

* For Stata 8 do the following (replaces estimates list in Stata 7) 
return list
ereturn list
sreturn list

* Example: construct likelihood ratio test assuming iid normal errors
scalar lrt = 2*(e(ll)-e(ll_0))
di "Likelihood ratio test statistic = " lrt
di "Chi-squared with deg of freedom = " e(df_m)

*** (B) Accessing coefficients and standard errors. 
***     See Stata 8 [u] chapter 16.5 and 17.5

* Coefficients are in _b[varname] and standard errors in _se[varname]

* The following is a long way to get fitted values from previous OLS
generate predbids = _b[_cons] + _b[leglrest]*leglrest +_b[realrest]*realrest /*
  */ + _b[finrest]*finrest + _b[whtknght]*whtknght + _b[bidprem]*bidprem /*
  */ + _b[insthold]*insthold + _b[size]*size + _b[sizesq]*sizesq /*
  */ + _b[regulatn]*regulatn
sum predbids numbids

* There are two ways to put coefficient estimates in a 1 x q vector
matrix bols = e(b)          /* Poisson coefficient vector */
matrix bolsalt = get(_b)
matrix list bols
matrix list bolsalt

* There are two ways to put coefficient variance estimate in a matrix
matrix varols = e(V)          /* Poisson coefficient vector */
matrix varolsalt = get(VCE)
matrix list varols
matrix list varolsalt

*** (C) For post-estimation commands see Stata 8 [U] chapter 23

* In-sample predictions along with standard error stored separately  
predict yhat, xb
predict seyhat, stdp
sum yhat seyhat


***** (10) MATRIX COMMANDS 

* For matrices see Stata 8 [U] chapter 17 and [P] matrix

* Most programs limit matrix size.
* So cannot define an N x k matrix X for large N.
* For intercooled Stata default is 40 x 40 and largest is 800 x 800
* Can get around this by instead working with X'X which is k x k

* Example: OLS estimation of NUMBIDS on regressors in $XLIST

* The simple way works only for small N  (here N = 126)
set matsize 150
gen ONE = 1
mkmat $XLIST ONE, matrix(X)
mkmat numbids, matrix(y)
matrix bOLS = syminv(X'*X)*X'*y
matrix list bOLS
* Now get usual standard errors
scalar numregs = colsof(X)
scalar numobs = rowsof(X)
matrix residuals = y - X*bOLS
matrix s_squared = residuals'*residuals/(numobs-numregs)
matrix vmOLS = s_squared*syminv(X'*X)
* Stata does not do square root on a matrix or vector 
* so getting standard errors is tricky
* Note that vecdiag creates a row vector so transpose to get column vector
matrix vOLS = vecdiag(vmOLS)'
matrix seOLS = J(numregs,1,0)   /* Initially column vector of zeroes */        
scalar icol = 1
/* Need loop here as Stata does not do square root on a vector */
while icol <= numregs {
  matrix seOLS[icol,1] = sqrt(vOLS[icol,1])
  scalar icol = icol+1
  }
matrix list seOLS
* Should be same as regress command
regress numbids $XLIST

* The more complicated way works for big N
* Rather than use mkmat to define a large N x k matrix X 
* it uses accum to define the smaller kxk matrix X'X.

* First create X'X 
* This uses accum which automatically adds a constant at the end
matrix accum XX = $XLIST
* Second create X'y which is trickier
* Create Z'Z = [y X]' [y X] using accum which automatically adds a constant
matrix accum ZZ = numbids $XLIST
* Then take out the first column (excluding first entry) which is X'y
matrix Xy = ZZ[2...,1]
* Then calculate (X'X)inverse*X'y
matrix bmanual = syminv(XX)*Xy
matrix list bmanual
* The following lists coefficient of second regressor
scalar b2 = bmanual[2,1]
scalar list b2 

********** CLOSE OUTPUT
log close