/* hhgpan1.prg does linear panel data using Hausman, Hall and Griliches data
Short panel with few years and many individuals.
TO RUN THIS PROGRAM ....
(1) You need my program SELDATA.SRC in the same directory (to select data)
(2) You need data patr7079.dat in the same directory.
You can get seldata.src and patr7079.dat from my web site.
Program hhgpanl.prg
1. Reads in a flat ascii data set.
2. Creates new data from transformation of original data.
This sets up the data in appropriate form for panel analysis.
3. Converts the data to a Gauss data set, with names.
4. Selects a subset of the data to be used in regression
(This uses my separate procedure seldata.src)
5. Estimates the fixed effects estimator (random effects still to come)
Program to estimate linear panel data model.
This uses original data PATR7079.DAT with data ordered by each of 346 firms.
Here one observation is all years for one firm.
This program uses 1975-79: 5 observations lost due to lagging LOGR 5 times.
Here I use the 346 firms from 1970 to 1979.
The data are those used in Hall, Griliches, and Hausman (1986),
"Patents and R&D: Is There a Lag?", IER 27: 265-283. The selection
of the firms, and creation of the data are described in Hall,
Griliches, and Hausman, NBER Technical Working Paper
No. 72 by Cummins, Hall, Laderman, and Mundy, and Bound, Cummins,
Griliches, Hall, and Jaffe, "Who Does R&D and Who Patents?," in
Griliches, Zvi (ed), R&D, Patents, and Productivity, University of
Chicago Press (1984).
The files have the same organization, fixed format ~80 column length,
4 records per firm-observation, data separated by blanks, but in fixed
fields so the file can be read either by fixed or free format commands.
The order of the 25 variables is the following:
CUSIP Compustat's identifying number for the firm (Committee on
Uniform Security Identification Procedures number).
ARDSIC A two-digit code for the applied R&D industrial classification
(roughly that in Bound, Cummins, Griliches, Hall, and Jaffe, in
the Griliches R&D, Patents, and Productivity volume).
SCISECT Dummy equal to one for firms in the scientific sector.
LOGK The logarithm of the book value of capital in 1972.
SUMPAT The sum of patents applied for between 1972-1979.
LOGR70- The logarithm of R&D spending during the year (in 1972 dollars).
LOGR79
PAT70- The number of patents applied for during the year that were
PAT79 eventually granted.
The PATR7279 file (not used here) has the same variables, but contains only
LOGR72-LOGR79 and PAT71-PAT79 (note difference).
*/
/******* Initial setup: libraries used etc. and output file ********
*/
New; @ clear workspace @
/*
library maxlik; @ So Gauss knows we use procedures in maxlik @
maxset; @ A maxlik command @
*/
disable; @ Allows missing values in calculation @
outfile = "hhgpan1.out"; @ all output goes to this file @
output file = ^outfile reset; @ delete earlier contents @
print "File hhgpan1.out is output from Gauss program hhgpan1.src";
print "Date is:";
print date;
print;
/******* Include additional Gauss code in separate files **********
seldata.src: Gauss program to select the subset of data
and variable names used in analysis.
This includes procedures opendat and readdat used below.
I assume you have these files in the same directory as this program.
*/
#include seldata.src;
@ or if e.g. in directory gauss then include c:\gauss\seldata.src @
/******* Read in the data and create variables **********
Here we first read the data in as a flat ascii file.
The data are ordered by firm with 8, 6, 6 and 5 variables
in lines 1, 2, 3 and 4. Some of these observations
Variables in order are
5 time invariant: CUSIP,ARDSSIC,SCISECT,LOGK,SUMPAT,
Regressor for 10 years: LOGR70,LOGR71,LOGR72,LOGR73,LOGR74,
LOGR75,LOGR76,LOGR77,LOGR78,LOGR79,
Dependent for 10 years: PAT70,PAT71,PAT72,PAT73,PAT74,
PAT75,PAT76,PAT77,PAT78,PAT79
Then we create a Gauss data set with names provided.
*/
load data1[346,25] = patr7079.dat; @ Data read in to matrix @
patents = data1[.,21:25]; @ PAT75,PAT76,PAT77,PAT78,PAT79 @
one = ones(346,1); @ A column vector of ones @
static = data1[.,1:5]; @ CUSIP,ARDSSIC,SCISECT,LOGK,SUMPAT @
xvec75 = data1[.,11]~data1[.,10]~data1[.,9]~data1[.,8]~data1[.,7]
~data1[.,6]; @ LOGR75,LOGR74,LOGR73,LOGR72,LOGR71,LOGR70 @
xvec76 = data1[.,12]~data1[.,11]~data1[.,10]~data1[.,9]~data1[.,8]
~data1[.,7]; @ LOGR76,LOGR75,LOGR74,LOGR73,LOGR72,LOGR71 @
xvec77 = data1[.,13]~data1[.,12]~data1[.,11]~data1[.,10]~data1[.,9]
~data1[.,8]; @ LOGR77,LOGR76,LOGR75,LOGR74,LOGR73,LOGR72 @
xvec78 = data1[.,14]~data1[.,13]~data1[.,12]~data1[.,11]~data1[.,10]
~data1[.,9]; @ LOGR75,LOGR74,LOGR73,LOGR72,LOGR71,LOGR70 @
xvec79 = data1[.,15]~data1[.,14]~data1[.,13]~data1[.,12]~data1[.,11]
~data1[.,10]; @ LOGR75,LOGR74,LOGR73,LOGR72,LOGR71,LOGR70 @
yeardums = one~one~one~one~one;
data2 = patents~one~static~xvec75~xvec76~xvec77~xvec78~xvec79;
clear data1,one,static,xvec75,xvec76,xvec77,xvec78,xvec79;
@ New variables added to existing data @
/******* Create Gauss data set with names *********
The created Gauss data set will be called hhgpan1.dat
and the associated names will be in file hhgpan1.dht
The data set will contain the time invariant data,
PAT75,PAT76,PAT77,PAT78,PAT79,
ONE,CUSIP,ARDSSIC,SCISECT,LOGK,SUMPAT,
XVEC75 = LOGR75,LOGR74,LOGR73,LOGR72,LOGR71,LOGR70
XVEC76 = LOGR76,LOGR75,LOGR74,LOGR73,LOGR72,LOGR71
XVEC77 = LOGR77,LOGR76,LOGR75,LOGR74,LOGR73,LOGR72
XVEC78 = LOGR78,LOGR77,LOGR76,LOGR75,LOGR74,LOGR73
XVEC79 = LOGR79,LOGR78,LOGR77,LOGR76,LOGR75,LOGR74
Note that an alternative is to use the Gauss utility program ATOG386.EXE
(where ATOG means ascii to Gauss). See Gauss Systems and Graphics manual.
*/
dataset = "hhg7079r"; @ or e.g. "c:\\gauss\\myex\\hhg7079r" @
let vnames = PAT75,PAT76,PAT77,PAT78,PAT79,
ONE,CUSIP,ARDSSIC,SCISECT,LOGK,SUMPAT,
LOGR75,LOGR75L1,LOGR75L2,LOGR75L3,LOGR75L4,LOGR75L5,
LOGR76,LOGR76L1,LOGR76L2,LOGR76L3,LOGR76L4,LOGR76L5,
LOGR77,LOGR77L1,LOGR77L2,LOGR77L3,LOGR77L4,LOGR77L5,
LOGR78,LOGR78L1,LOGR78L2,LOGR78L3,LOGR78L4,LOGR78L5,
LOGR79,LOGR79L1,LOGR79L2,LOGR79L3,LOGR79L4,LOGR79L5;
call saved(data2,dataset,vnames);
/* Check the data by getting descriptive statistics.
*/
data ="hhg7079r"; @ or e.g. "c:\\gauss\\myex\\jaggiatr" @
print;
call dstat(data,0); @ check data set correct @
print;
/******** Extract the subset of the Gauss data set to be used *********
*/
let YVAR = PAT75 PAT76 PAT77 PAT78 PAT79;
let XVAR = LOGR75,LOGR75L1,LOGR75L2,LOGR75L3,LOGR75L4,LOGR75L5,
LOGR76,LOGR76L1,LOGR76L2,LOGR76L3,LOGR76L4,LOGR76L5,
LOGR77,LOGR77L1,LOGR77L2,LOGR77L3,LOGR77L4,LOGR77L5,
LOGR78,LOGR78L1,LOGR78L2,LOGR78L3,LOGR78L4,LOGR78L5,
LOGR79,LOGR79L1,LOGR79L2,LOGR79L3,LOGR79L4,LOGR79L5;
/* Set data size dimensions and some other variables.
*/
m = 5; @ Number of dependent variables = number of years @
@ Redefine if different number of years @
nobs = 346; @ Number of observations @
/* Call opendat, rad, readdat in seldata.src
This gives
x - nobs*k matrix of explanatory variables
y - nobs*m matrix of dependent variables
indy - Index for the dependent variables
indx - Index for the exogenous variables
f1 - File handle for the open data set
n - Number of observations in the data set
nror - number of times data set needs to be read to be complete
rest - number of obervations in the time data is read
The data are read into memory one time.
Nobs must be equal to number of observations in the data set
*/
{f1,n,indy,indx} = opendat(data,yvar,xvar,&indices,m);
{nror, rest } = rad(nobs,n);
{ y,x } = Readdat(f1,1,nobs,indy,indx);
/* Do linear fixed effects for short panel.
This includes time dummies for each year (with first year omitted).
To drop dummies set x1 = x1s, ... , x5s.
This program should still work if
- the number of regressors are changed
- the number of observations are changed.
If the number of years are changed, however, then you need to change code.
e.g. if a sixth year is added then need to have T=s, define y6 and x6,
and redefine xbari, ybari, xx and xy in the obvious manner.
*/
z = y~x;
n = rows(z); @ number of observations @
T = m; @ number of years @
p = (cols(z) - T)/T; @ number of regressors excluding year dummies @
one = ones(n,1);
zero = zeros(n,1);
y1 = z[.,1];
y2 = z[.,2];
y3 = z[.,3];
y4 = z[.,4];
y5 = z[.,5];
x1s = z[.,T+1:T+p];
x1 =x1s~zero~zero~zero~zero;
x2s = z[.,T+1+p:T+2*p];
x2 = x2s~one~zero~zero~zero;
x3s = z[.,T+1+2*p:T+3*p];
x3 = x3s~zero~one~zero~zero;
x4s = z[.,T+1+3*p:T+4*p];
x4 = x4s~zero~zero~one~zero;
x5s = z[.,T+1+4*p:T+5*p];
x5= x5s~zero~zero~zero~one;
xbari = (x1+x2+x3+x4+x5)/5;
ybari = (y1+y2+y3+y4+y5)/5;
one = ones(n,1);
xx = (x1-xbari)'(x1-xbari)+(x2-xbari)'(x2-xbari)+(x3-xbari)'(x3-xbari)+
(x4-xbari)'(x4-xbari)+(x5-xbari)'(x5-xbari);
xy = (x1-xbari)'(y1-ybari)+(x2-xbari)'(y2-ybari)+(x3-xbari)'(y3-ybari)+
(x4-xbari)'(y4-ybari)+(x5-xbari)'(y5-ybari);
/* The fixed effects estimator of b and the estimated individual effects a
*/
bfe = invpd(xx)*xy;
afe = ybari - xbari*bfe;
/* Estimate the error variance.
*/
sumres = (y1-afe-x1*bfe)'(y1-afe-x1*bfe)+(y2-afe-x2*bfe)'(y2-afe-x2*bfe)
+(y3-afe-x3*bfe)'(y3-afe-x3*bfe)+(y4-afe-x4*bfe)'(y4-afe-x4*bfe)
+(y5-afe-x5*bfe)'(y5-afe-x5*bfe);
sigmasq = sumres/(n*(T-1)-p-T+1);
/* Estimate covariance matrix of estimated b.
*/
covbfe = sigmasq*invpd(xx);
vbfe = diag(covbfe);
sebfe = sqrt(vbfe);
tbfe = bfe./sebfe;
/* Print the results */
print "Sum of squared residuals and estimated error variance";
print sumres;
print sigmasq;
print;
print "Linear Fixed effects estimator";
print " bfe sebfe tbfe";
print bfe~sebfe~tbfe;
print;
f1 = close(f1);