**WHAT IS R?**

R is a free matrix programming language and software environment
that is widely used among statisticians for developing statistical
software and data analysis. It is based on an earlier language S
that became the commercial product S-Plus. It runs on Windows, Mac
and Linux. Unlike S-Plus, R is free. See http://en.wikipedia.org/wiki/R_(programming_language)

For regression analysis one needs

- Necessary: The latest
**base version of R**. This includes some basic regression commands as part of the R Stats package. - Strongly Recommended:
**RStudio**which is a simpler front-end to R like R-commander and/or R-Studio. - Perhaps useful for beginner:
**R Commander**which is a GUI interface for which you don't need to know R commands in advance

(It shows the commands produced, so you know for future. But it only covers a restricted number of commands.).

- As Needed: Relevant user-written programs called packages that
include additional commands (functions) that are not part of the
base version of R.

You need to initially have R installed.

- Go to http://www.r-project.org and install the latest version of R (install base)

INSTALL R-STUDIO

Go to http://www.rstudio.com and install the front-end RStudioIf you install RStudio then from now on initiate R Studio, not R.

(Note: R Commander runs both under R and under R Studio).

OPTIONAL:

Once in R give the command install.packages("Rcmdr", dependencies = TRUE) and then command library(Rcmdr)

If you install RCommander then from now on initiate R Commander by the command library(Rcmdr) once in RStudio

INSTALL OTHER PACKAGES

- Go to http://cran.r-project.org/web/views to see lists of potentially useful packages organized for convenience by task. Tasks include Econometrics, Finance, SocialSciences and TimeSeries.
- Install a package. For example, to install package np using
RStudio, open RStudio, go to the Install Packages window, search
for np, and click on np.

The repository for R packages is CRAN (The Comprehensive R Archive Network).

Packages are only put on CRAN if they pass quality insurance checks, especially stability. Packages that are also published in journals like

These can be accessed from http://cran.r-project.org/web/packages/ either by individual package or by Task.

Basic R documentation is at https://cran.r-project.org/manuals.html

See especially "An Introduction to R". https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf

R Commander documentation is installed in the Rcmdr directory

RStudio has useful documentation at http://www.rstudio.org/docs/ including http://www.rstudio.org/docs/help_with_r

A useful introduction for econometricians is Jeff Racine and Rob Hyndman (2002), "Using R to Teach Econometrics," Journal of Applied Econometrics, 17, 175-189.

Jeff Racine has useful general information on getting going in R (in addition to his own package np). See https://socialsciences.mcmaster.ca/racinej/Gallery/Home.html

The website http://www.statmethods.net/ is useful.

SOME R BASICS

> is the R prompt

<= is the R assignment ("equality") operator (often can instead use = )

( , ) R function arguments are given in parentheses and are comma separated

? is the help prefix e.g. ?lm gives help on the lm function

For some sample programs see http://cameron.econ.ucdavis.edu/R/R.html

R EXAMPLE: LINEAR REGRESSION

First remove all variables from the workspace

> rm(list=ls())

Consider linear regression of y on x with five observations (y,x)=(1,1),(2,2),(2,3),(2,4),(2,5).

To type in data use the c function (here > is the prompt from R and is not typed).

> y = c(1,2,2,2,3)

> x = c(1,2,3,4,5)

> y

[1] 1 2 2 2 3

> x

[1] 1 2 3 4 5

> mean(y)

[1] 2

> summary(y)

Min. 1st Qu. Median Mean 3rd Qu. Max.

1
2 2
2 2 3

> plot(y,x)

.... output omitted ...

> lm(y~x)

Call:

lm(formula = y ~x)

Coefficients:

(Intercept)
x

0.8 0.4

> lm.cars <- lm(y~x)

> summary(lm.cars)

Call:

lm(formula = y ~x)

Residuals:

1
2
3
4 5

-2.000e-01 4.000e-01 -6.855e-17 -4.000e-01 2.000e-01

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.8000
0.3830 2.089 0.1279

x
0.4000
0.1155 3.464 0.0405 *

---

Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 `
' 1

Residual standard error: 0.3651 on 3 degrees of freedom

Multiple R-squared:
0.8, Adjusted R-squared: 0.7333

F-statistic: 12 on 1 and 3 DF, p-value:
0.04052

To obtain heteroskedastic-consistent (sandwich) standard errors use function vcovHC in package sandwich that may first need to be installed.

(Or can use coeftest in package lmtest)

> library(sandwich)

Error in library(sandwich): there is no package called
`sandwich'

> install.packages("sandwich")

.... output omitted

> library(sandwich)

> model <- lm(y ~x)

> vcovHC(model)

(Intercept) x

(Intercept) 0.28489796 -0.07959184

x
-0.07959184 0.02653061

> sqrt(diag(vcovHC(model)))

(Intercept) x

0.5337583 0.1628822

> install.packages("xtable")

> library(xtable)

> xtable(model)

To do regression manually using matrix commands. First create matrices X (including intercept) and y and then form inverse of X'X times X'y.

> x

[1] 1 2 3 4 5

> X <- cbind(1,x)

> X

x

[1,] 1 1

[2,] 1 2

[3,] 1 3

[4,] 1 4

[5,] 1 5

> bhat <- solve(t(X)%*%X)%*%t(X)%*%y

> bhat

[,1]

0.8

x 0.4

READ IN A COMMA SEPARATED DATASET

To read in a comma-separated values file

mydata
<-
read.csv("http://cameron.econ.ucdavis.edu/excel/carsdata.csv")

>
summary(mydata)

CARS HH.SIZE

Min. :1 Min. :1

1st Qu.:2 1st Qu.:2

Median :2 Median :3

Mean :2 Mean :3

3rd Qu.:2 3rd Qu.:4

Max. :3 Max. :5

>

READ IN A STATA DATASET (Can't be too recent a version. Version 12 or earlier is okay).

> install.packages("foreign")

> library(foreign)

> mydata2 <- read.dta("http://cameron.econ.ucdavis.edu/stata/carsdata.dta")

> summary(mydata2)