A. Colin Cameron, Dept. of Economics, Univ. of Calif. -
This March 2016 help sheet gives information on installing and using
WHAT IS R?
R is a free matrix programming language and software environment
that is widely used among statisticians for developing statistical
software and data analysis. It is based on an earlier language S
that became the commercial product S-Plus. It runs on Windows, Mac
and Linux. Unlike S-Plus, R is free. See http://en.wikipedia.org/wiki/R_(programming_language)
For regression analysis one needs
Necessary: The latest base version of R. This includes
some basic regression commands as part of the R Stats package.
Strongly Recommended: RStudio which is a simpler
front-end to R like R-commander and/or R-Studio.
Perhaps useful for beginner:R Commander which is a GUI
interface for which you don't need to know R commands in advance
(It shows the commands produced, so you know for future. But it
only covers a restricted number of commands.).
As Needed: Relevant user-written programs called packages that
include additional commands (functions) that are not part of the
base version of R.
> is the R prompt
<= is the R assignment ("equality") operator (often can
instead use = )
( , ) R function arguments are given in parentheses and are comma
? is the help prefix e.g. ?lm gives help on the lm function
First remove all variables from the workspace
Consider linear regression of y on x with five observations
To type in data use the c function (here > is the prompt from R
and is not typed).
> y = c(1,2,2,2,3)
> x = c(1,2,3,4,5)
To see the data (the  is because y is a column vector that is a
5x1 vector and the first column of this vector is being listed -
here as a row)
 1 2 2 2 3
 1 2 3 4 5
To see the mean of y
To summarize y
Min. 1st Qu. Median Mean 3rd Qu. Max.
2 2 3
To plot y against x
.... output omitted ...
To OLS regress y on
x and have coefficients reported.
lm(formula = y ~x)
To regress y on x and obtain more complete regression output, first
save the results in lm.cars and then use summary.lm to print out complete results.
> lm.cars <- lm(y~x)
lm(formula = y ~x)
-2.000e-01 4.000e-01 -6.855e-17 -4.000e-01 2.000e-01
Estimate Std. Error t value Pr(>|t|)
0.3830 2.089 0.1279
0.1155 3.464 0.0405 *
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 `
Residual standard error: 0.3651 on 3 degrees of freedom
0.8, Adjusted R-squared: 0.7333
F-statistic: 12 on 1 and 3 DF, p-value:
heteroskedastic-consistent (sandwich) standard errors use
function vcovHC in package sandwich that may first need to be
(Or can use coeftest in package lmtest)
Error in library(sandwich): there is no package called
.... output omitted
> model <- lm(y ~x)
(Intercept) 0.28489796 -0.07959184
Function vcovHC gave the variance matrix. We need to get the standard errors, the square root
of the diagonal entries. These are 0.5337 and 0.1628 compared to
earlier default standard errors of 0.3830 and 0.1155.
To produce the original OLS results (using defaults standard errors)
as a Latex table