STATA: Data Entry
A. Colin Cameron, Dept. of Economics, Univ. of Calif.
- Davis
This September 1999 help sheet gives information on
-
Stata Data Sets
-
Read in Non-Stata Data Sets
-
Stata command infile
-
Saving Stata files
As with many programs, one of the greatest challenges can be the initial
reading in of data, especially if data are fixed format or if each observation
spans several lines or if some of the data is character rather than numeric.
A detailed example is given in stjaggia.do
which requires files jaggia.asc and jaggia.dct
STATA DATA SETS
Stata data sets have the extension .dta
They can be read in using the Stata command use.
For example, use mydata.dta
READ IN NON-STATA DATA SETS
Ways to proceed include
-
Use a proprietary program such as DBMSCOPY or STAT/TRANSFER to convert
other data sets such as SAS data sets to a Stata data set.
-
Use Stata command infile (or insheet or infix) to read unformatted or formatted
ascii (text) data.
(Stata will not read in binary data: it needs to be first converted
to ascii data).
The Stata command help infiling gives an overview on reading in ascii (text)
data. We have:
-
For free format or unformatted data that can be space-separated, tab-separated,
or comma-separated use the command infile (see help infile1). Note that
strings with embedded spaces or commas must be enclosed in quotes (even
if tab- or comma-separated).
-
For free format or unformatted data created by a spreadsheet one can instead
use insheet if the data are tab-separated or comma-separated, a single
observation is only on one line. The first line of the file can optinally
contain the names of the variables, making this an easy command to use
(see help insheet).
-
For fixed format data use the command infile (see help infile2).
-
For fixed format data with a single observation on only one line use infix
(see help infix).
STATA COMMAND INFILE
The basic and most flexible command is infile.
For free format or unformatted data the basic command is, for
example,
infile v1 v2 v3 using mydata.raw
which creates variables we have labelled v1-v3 from the data in mydata.raw
For fixed format data there are two variations on the command
-
infile using mydict, using(mydata.raw)
in which case file mydict.dct has names of variables (in a special
format) and mydata.raw has the ascii data.
-
infile using mydict
in which case file mydict.dct has both names of variables and either
the data or the names of the data set.
To see the form of the dictionary you need to see help infile2
but even better is to look at the Reference Guide [R] Infile
If you have problems also see the User's Guide [U] Commands to Input
Data.
For Stata 6 manual this is chapter 24.
A detailed example is given in stjaggia.do
which requires files jaggia.asc and jaggia.dct
SAVING STATA DATA SET
Once data is read in to Stata you should save it in Stata format using
the save command.
save mydata
for example, to save in file mydata.dta, or
save mydata, replace
if replacing an existing Stata data set.
Also you should immediately that data is correctly read in by giving
the Stata command
summarize
which will give the minimum, maximum, mean and standard deviation of
the data.
For further information on how to use Stata go to
http://www.econ.ucdavis.edu/faculty/cameron