A. Colin Cameron, Dept. of Economics, Univ. of Calif. -
Davis
This March 2016 help sheet gives information on installing and using
R
WHAT IS R?
R is a free matrix programming language and software environment
that is widely used among statisticians for developing statistical
software and data analysis. It is based on an earlier language S
that became the commercial product S-Plus. It runs on Windows, Mac
and Linux. Unlike S-Plus, R is free. See http://en.wikipedia.org/wiki/R_(programming_language)
For regression analysis one needs
Necessary: The latest base version of R. This includes
some basic regression commands as part of the R Stats package.
Strongly Recommended: RStudio which is a simpler
front-end to R like R-commander and/or R-Studio.
Perhaps useful for beginner:R Commander which is a GUI
interface for which you don't need to know R commands in advance
(It shows the commands produced, so you know for future. But it
only covers a restricted number of commands.).
As Needed: Relevant user-written programs called packages that
include additional commands (functions) that are not part of the
base version of R.
If you install RStudio then from now on initiate R Studio, not R.
(Note: R Commander runs both under R and under R Studio).
OPTIONAL: INSTALL R
COMMANDER A GUI FRONT-END FOR R
Once in R give the command
install.packages("Rcmdr", dependencies = TRUE) and then
command library(Rcmdr)
If you install RCommander then from now on initiate R Commander by
the command library(Rcmdr) once in RStudio
INSTALL OTHER PACKAGES
Go to http://cran.r-project.org/web/views
to see lists of potentially useful packages organized for
convenience by task. Tasks include Econometrics, Finance,
SocialSciences and TimeSeries.
Install a package. For example, to install package np using
RStudio, open RStudio, go to the Install Packages window, search
for np, and click on np.
> is the R prompt
<= is the R assignment ("equality") operator (often can
instead use = )
( , ) R function arguments are given in parentheses and are comma
separated
? is the help prefix e.g. ?lm gives help on the lm function
First remove all variables from the workspace
> rm(list=ls())
Consider linear regression of y on x with five observations
(y,x)=(1,1),(2,2),(2,3),(2,4),(2,5).
To type in data use the c function (here > is the prompt from R
and is not typed).
> y = c(1,2,2,2,3)
> x = c(1,2,3,4,5)
To see the data (the [1] is because y is a column vector that is a
5x1 vector and the first column of this vector is being listed -
here as a row)
> y
[1] 1 2 2 2 3
> x
[1] 1 2 3 4 5
To see the mean of y
> mean(y)
[1] 2
To summarize y
> summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1
2 2
2 2 3
To plot y against x
> plot(y,x)
.... output omitted ...
To OLS regress y on
x and have coefficients reported.
> lm(y~x)
Call:
lm(formula = y ~x)
Coefficients:
(Intercept)
x
0.8 0.4
To regress y on x and obtain more complete regression output, first
save the results in lm.cars and then use summary.lm to print out complete results.
> lm.cars <- lm(y~x)
> summary(lm.cars)
Call:
lm(formula = y ~x)
Residuals:
1
2
3
4 5
-2.000e-01 4.000e-01 -6.855e-17 -4.000e-01 2.000e-01
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.8000
0.3830 2.089 0.1279
x
0.4000
0.1155 3.464 0.0405 *
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 `
' 1
Residual standard error: 0.3651 on 3 degrees of freedom
Multiple R-squared:
0.8, Adjusted R-squared: 0.7333
F-statistic: 12 on 1 and 3 DF, p-value:
0.04052
MORE REGRESSION
To obtain
heteroskedastic-consistent (sandwich) standard errors use
function vcovHC in package sandwich that may first need to be
installed.
(Or can use coeftest in package lmtest)
> library(sandwich)
Error in library(sandwich): there is no package called
`sandwich'
> install.packages("sandwich")
.... output omitted
> library(sandwich)
> model <- lm(y ~x)
> vcovHC(model)
(Intercept) x
(Intercept) 0.28489796 -0.07959184
x
-0.07959184 0.02653061
Function vcovHC gave the variance matrix. We need to get the standard errors, the square root
of the diagonal entries. These are 0.5337 and 0.1628 compared to
earlier default standard errors of 0.3830 and 0.1155.
> sqrt(diag(vcovHC(model)))
(Intercept) x
0.5337583 0.1628822
To produce the original OLS results (using defaults standard errors)
as a Latex table