Title: | Instrumental-Variables Regression by '2SLS', '2SM', or '2SMM', with Diagnostics |
---|---|
Description: | Instrumental variable estimation for linear models by two-stage least-squares (2SLS) regression or by robust-regression via M-estimation (2SM) or MM-estimation (2SMM). The main ivreg() model-fitting function is designed to provide a workflow as similar as possible to standard lm() regression. A wide range of methods is provided for fitted ivreg model objects, including extensive functionality for computing and graphing regression diagnostics in addition to other standard model tools. |
Authors: | John Fox [aut] , Christian Kleiber [aut] , Achim Zeileis [aut, cre] , Nikolas Kuschnig [ctb] , R Core Team [ctb] |
Maintainer: | Achim Zeileis <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.6-4 |
Built: | 2025-01-17 05:24:09 UTC |
Source: | https://github.com/zeileis/ivreg |
Determinants of cigarette demand for the 48 continental US States in 1995 and compared between 1995 and 1985.
data("CigaretteDemand", package = "ivreg")
data("CigaretteDemand", package = "ivreg")
A data frame with 48 rows and 10 columns.
Number of cigarette packs per capita sold in 1995.
Real price in 1995 (including sales tax).
Real per capita income in 1995.
Sales tax in 1995.
Cigarette-specific taxes (federal and average local excise taxes) in 1995.
Difference in log(packs)
(between 1995 and 1985).
Difference in log(rprice)
(between 1995 and 1985).
Difference in log(rincome)
(between 1995 and 1985).
Difference in salestax
(between 1995 and 1985).
Difference in cigtax
(between 1995 and 1985).
The data are taken from the online complements to Stock and Watson (2007) and
had been prepared as panel data (in long form) in CigarettesSW
from the AER package (Kleiber and Zeileis 2008). Here, the data are provided by
state (in wide form), readily preprocessed to contain all variables needed for
illustrations of OLS and IV regressions. More related examples from Stock and
Watson (2007) are provided in the AER package in StockWatson2007
.
A detailed discussion of the various cigarette demand examples with R code
is provided by Hanck et al. (2020, Chapter 12).
Online complements to Stock and Watson (2007).
Hanck, C., Arnold, M., Gerber, A., and Schmelzer, M. (2020). Introduction to Econometrics with R. https://www.econometrics-with-r.org/
Kleiber, C. and Zeileis, A. (2008). Applied Econometrics with R. Springer-Verlag
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed., Addison Wesley.
## load data data("CigaretteDemand", package = "ivreg") ## basic price elasticity: OLS vs. IV cig_ols <- lm(log(packs) ~ log(rprice), data = CigaretteDemand) cig_iv <- ivreg(log(packs) ~ log(rprice) | salestax, data = CigaretteDemand) cbind(OLS = coef(cig_ols), IV = coef(cig_iv)) ## adjusting for income differences (exogenous) cig_iv2 <- ivreg(log(packs) ~ log(rprice) + log(rincome) | salestax + log(rincome), data = CigaretteDemand) ## adding a second instrument for log(rprice) cig_iv3 <- update(cig_iv2, . ~ . | . + cigtax) ## comparison using heteroscedasticity-consistent standard errors library("lmtest") library("sandwich") coeftest(cig_iv2, vcov = vcovHC, type = "HC1") coeftest(cig_iv3, vcov = vcovHC, type = "HC1") ## long-run price elasticity using differences between 1995 and 1985 cig_ivdiff1 <- ivreg(packsdiff ~ pricediff + incomediff | incomediff + salestaxdiff, data = CigaretteDemand) cig_ivdiff2 <- update(cig_ivdiff1, . ~ . | . - salestaxdiff + cigtaxdiff) cig_ivdiff3 <- update(cig_ivdiff1, . ~ . | . + cigtaxdiff) coeftest(cig_ivdiff1, vcov = vcovHC, type = "HC1") coeftest(cig_ivdiff2, vcov = vcovHC, type = "HC1") coeftest(cig_ivdiff3, vcov = vcovHC, type = "HC1")
## load data data("CigaretteDemand", package = "ivreg") ## basic price elasticity: OLS vs. IV cig_ols <- lm(log(packs) ~ log(rprice), data = CigaretteDemand) cig_iv <- ivreg(log(packs) ~ log(rprice) | salestax, data = CigaretteDemand) cbind(OLS = coef(cig_ols), IV = coef(cig_iv)) ## adjusting for income differences (exogenous) cig_iv2 <- ivreg(log(packs) ~ log(rprice) + log(rincome) | salestax + log(rincome), data = CigaretteDemand) ## adding a second instrument for log(rprice) cig_iv3 <- update(cig_iv2, . ~ . | . + cigtax) ## comparison using heteroscedasticity-consistent standard errors library("lmtest") library("sandwich") coeftest(cig_iv2, vcov = vcovHC, type = "HC1") coeftest(cig_iv3, vcov = vcovHC, type = "HC1") ## long-run price elasticity using differences between 1995 and 1985 cig_ivdiff1 <- ivreg(packsdiff ~ pricediff + incomediff | incomediff + salestaxdiff, data = CigaretteDemand) cig_ivdiff2 <- update(cig_ivdiff1, . ~ . | . - salestaxdiff + cigtaxdiff) cig_ivdiff3 <- update(cig_ivdiff1, . ~ . | . + cigtaxdiff) coeftest(cig_ivdiff1, vcov = vcovHC, type = "HC1") coeftest(cig_ivdiff2, vcov = vcovHC, type = "HC1") coeftest(cig_ivdiff3, vcov = vcovHC, type = "HC1")
"ivreg"
ObjectsVarious methods for processing "ivreg"
objects; for diagnostic methods,
see ivregDiagnostics
.
## S3 method for class 'ivreg' coef(object, component = c("stage2", "stage1"), complete = TRUE, ...) ## S3 method for class 'ivreg' vcov(object, component = c("stage2", "stage1"), complete = TRUE, ...) ## S3 method for class 'ivreg' confint( object, parm, level = 0.95, component = c("stage2", "stage1"), complete = TRUE, vcov. = NULL, df = NULL, ... ) ## S3 method for class 'ivreg' bread(x, ...) ## S3 method for class 'ivreg' estfun(x, ...) ## S3 method for class 'ivreg' vcovHC(x, ...) ## S3 method for class 'ivreg' terms(x, component = c("regressors", "instruments", "full"), ...) ## S3 method for class 'ivreg' model.matrix( object, component = c("regressors", "projected", "instruments"), ... ) ## S3 method for class 'ivreg_projected' model.matrix(object, ...) ## S3 method for class 'ivreg' predict( object, newdata, type = c("response", "terms"), na.action = na.pass, se.fit = FALSE, interval = c("none", "confidence", "prediction"), df = Inf, level = 0.95, weights, ... ) ## S3 method for class 'ivreg' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'ivreg' summary(object, vcov. = NULL, df = NULL, diagnostics = NULL, ...) ## S3 method for class 'summary.ivreg' print( x, digits = max(3, getOption("digits") - 3), signif.stars = getOption("show.signif.stars"), ... ) ## S3 method for class 'ivreg' anova(object, object2, test = "F", vcov. = NULL, ...) ## S3 method for class 'ivreg' update(object, formula., ..., evaluate = TRUE) ## S3 method for class 'ivreg' residuals( object, type = c("response", "projected", "regressors", "working", "deviance", "pearson", "partial", "stage1"), ... ) ## S3 method for class 'ivreg' Effect(focal.predictors, mod, ...) ## S3 method for class 'ivreg' formula(x, component = c("complete", "regressors", "instruments"), ...) ## S3 method for class 'ivreg' find_formula(x, ...) ## S3 method for class 'ivreg' Anova(mod, test.statistic = c("F", "Chisq"), ...) ## S3 method for class 'ivreg' linearHypothesis( model, hypothesis.matrix, rhs = NULL, test = c("F", "Chisq"), ... ) ## S3 method for class 'ivreg' alias(object, ...) ## S3 method for class 'ivreg' qr(x, ...) ## S3 method for class 'ivreg' weights(object, type = c("variance", "robustness"), ...)
## S3 method for class 'ivreg' coef(object, component = c("stage2", "stage1"), complete = TRUE, ...) ## S3 method for class 'ivreg' vcov(object, component = c("stage2", "stage1"), complete = TRUE, ...) ## S3 method for class 'ivreg' confint( object, parm, level = 0.95, component = c("stage2", "stage1"), complete = TRUE, vcov. = NULL, df = NULL, ... ) ## S3 method for class 'ivreg' bread(x, ...) ## S3 method for class 'ivreg' estfun(x, ...) ## S3 method for class 'ivreg' vcovHC(x, ...) ## S3 method for class 'ivreg' terms(x, component = c("regressors", "instruments", "full"), ...) ## S3 method for class 'ivreg' model.matrix( object, component = c("regressors", "projected", "instruments"), ... ) ## S3 method for class 'ivreg_projected' model.matrix(object, ...) ## S3 method for class 'ivreg' predict( object, newdata, type = c("response", "terms"), na.action = na.pass, se.fit = FALSE, interval = c("none", "confidence", "prediction"), df = Inf, level = 0.95, weights, ... ) ## S3 method for class 'ivreg' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'ivreg' summary(object, vcov. = NULL, df = NULL, diagnostics = NULL, ...) ## S3 method for class 'summary.ivreg' print( x, digits = max(3, getOption("digits") - 3), signif.stars = getOption("show.signif.stars"), ... ) ## S3 method for class 'ivreg' anova(object, object2, test = "F", vcov. = NULL, ...) ## S3 method for class 'ivreg' update(object, formula., ..., evaluate = TRUE) ## S3 method for class 'ivreg' residuals( object, type = c("response", "projected", "regressors", "working", "deviance", "pearson", "partial", "stage1"), ... ) ## S3 method for class 'ivreg' Effect(focal.predictors, mod, ...) ## S3 method for class 'ivreg' formula(x, component = c("complete", "regressors", "instruments"), ...) ## S3 method for class 'ivreg' find_formula(x, ...) ## S3 method for class 'ivreg' Anova(mod, test.statistic = c("F", "Chisq"), ...) ## S3 method for class 'ivreg' linearHypothesis( model, hypothesis.matrix, rhs = NULL, test = c("F", "Chisq"), ... ) ## S3 method for class 'ivreg' alias(object, ...) ## S3 method for class 'ivreg' qr(x, ...) ## S3 method for class 'ivreg' weights(object, type = c("variance", "robustness"), ...)
object , object2 , model , mod
|
An object of class |
component |
For |
complete |
If |
... |
arguments to pass down. |
parm |
parameters for which confidence intervals are to be computed; a vector or numbers or names; the default is all parameters. |
level |
for confidence or prediction intervals, default |
vcov. |
Optional coefficient covariance matrix, or a function to compute the covariance matrix, to use in computing the model summary. |
df |
For |
x |
An object of class |
newdata |
Values of predictors for which to obtain predicted values; if missing predicted (i.e., fitted) values are computed for the data to which the model was fit. |
type |
For |
na.action |
|
se.fit |
Compute standard errors of predicted values (default |
interval |
Type of interval to compute for predicted values: |
weights |
Either a numeric vector or a one-sided formula to provide weights for prediction
intervals when the fit is weighted. If |
digits |
For printing. |
diagnostics |
Report 2SLS "diagnostic" tests in model summary (default is |
signif.stars |
Show "significance stars" in summary output. |
test , test.statistic
|
Test statistics for ANOVA table computed by |
formula. |
To update model. |
evaluate |
If |
focal.predictors |
Focal predictors for effect plot, see |
hypothesis.matrix , rhs
|
For formulating a linear hypothesis; see the documentation
for |
ivreg
, ivreg.fit
, ivregDiagnostics
"ivreg"
ObjectsMethods for computing deletion and other regression diagnostics for 2SLS regression.
It's generally more efficient to compute the deletion diagnostics via the influence
method and then to extract the various specific diagnostics with the methods for
"influence.ivreg"
objects. Other diagnostics for linear models, such as
added-variable plots (avPlots
) and component-plus-residual
plots (crPlots
), also work, as do effect plots
(e.g., predictorEffects
) with residuals (see the examples below).
The pointwise confidence envelope for the qqPlot
method assumes an independent random sample
from the t distribution with degrees of freedom equal to the residual degrees of
freedom for the model and so are approximate, because the studentized residuals aren't
independent.
For additional information, see the vignette Diagnostics for 2SLS Regression.
## S3 method for class 'ivreg' influence( model, sigma. = n <= 1000, type = c("stage2", "both", "maximum"), applyfun = NULL, ncores = NULL, ... ) ## S3 method for class 'ivreg' rstudent(model, ...) ## S3 method for class 'ivreg' cooks.distance(model, ...) ## S3 method for class 'influence.ivreg' dfbeta(model, ...) ## S3 method for class 'ivreg' dfbeta(model, ...) ## S3 method for class 'ivreg' hatvalues(model, type = c("stage2", "both", "maximum", "stage1"), ...) ## S3 method for class 'influence.ivreg' rstudent(model, ...) ## S3 method for class 'influence.ivreg' hatvalues(model, ...) ## S3 method for class 'influence.ivreg' cooks.distance(model, ...) ## S3 method for class 'influence.ivreg' qqPlot( x, ylab = paste("Studentized Residuals(", deparse(substitute(x)), ")", sep = ""), distribution = c("t", "norm"), ... ) ## S3 method for class 'ivreg' influencePlot(model, ...) ## S3 method for class 'influence.ivreg' influencePlot(model, ...) ## S3 method for class 'ivreg' infIndexPlot(model, ...) ## S3 method for class 'influence.ivreg' infIndexPlot(model, ...) ## S3 method for class 'influence.ivreg' model.matrix(object, ...) ## S3 method for class 'ivreg' avPlots(model, terms, ...) ## S3 method for class 'ivreg' avPlot(model, ...) ## S3 method for class 'ivreg' mcPlots(model, terms, ...) ## S3 method for class 'ivreg' mcPlot(model, ...) ## S3 method for class 'ivreg' Boot( object, f = coef, labels = names(f(object)), R = 999, method = "case", ncores = 1, ... ) ## S3 method for class 'ivreg' crPlots(model, terms, ...) ## S3 method for class 'ivreg' crPlot(model, ...) ## S3 method for class 'ivreg' ceresPlots(model, terms, ...) ## S3 method for class 'ivreg' ceresPlot(model, ...) ## S3 method for class 'ivreg' plot(x, ...) ## S3 method for class 'ivreg' qqPlot(x, distribution = c("t", "norm"), ...) ## S3 method for class 'ivreg' outlierTest(model, ...) ## S3 method for class 'ivreg' spreadLevelPlot(x, main = "Spread-Level Plot", ...) ## S3 method for class 'ivreg' ncvTest(model, ...) ## S3 method for class 'ivreg' deviance(object, ...) ## S3 method for class 'rivreg' influence(model, ...)
## S3 method for class 'ivreg' influence( model, sigma. = n <= 1000, type = c("stage2", "both", "maximum"), applyfun = NULL, ncores = NULL, ... ) ## S3 method for class 'ivreg' rstudent(model, ...) ## S3 method for class 'ivreg' cooks.distance(model, ...) ## S3 method for class 'influence.ivreg' dfbeta(model, ...) ## S3 method for class 'ivreg' dfbeta(model, ...) ## S3 method for class 'ivreg' hatvalues(model, type = c("stage2", "both", "maximum", "stage1"), ...) ## S3 method for class 'influence.ivreg' rstudent(model, ...) ## S3 method for class 'influence.ivreg' hatvalues(model, ...) ## S3 method for class 'influence.ivreg' cooks.distance(model, ...) ## S3 method for class 'influence.ivreg' qqPlot( x, ylab = paste("Studentized Residuals(", deparse(substitute(x)), ")", sep = ""), distribution = c("t", "norm"), ... ) ## S3 method for class 'ivreg' influencePlot(model, ...) ## S3 method for class 'influence.ivreg' influencePlot(model, ...) ## S3 method for class 'ivreg' infIndexPlot(model, ...) ## S3 method for class 'influence.ivreg' infIndexPlot(model, ...) ## S3 method for class 'influence.ivreg' model.matrix(object, ...) ## S3 method for class 'ivreg' avPlots(model, terms, ...) ## S3 method for class 'ivreg' avPlot(model, ...) ## S3 method for class 'ivreg' mcPlots(model, terms, ...) ## S3 method for class 'ivreg' mcPlot(model, ...) ## S3 method for class 'ivreg' Boot( object, f = coef, labels = names(f(object)), R = 999, method = "case", ncores = 1, ... ) ## S3 method for class 'ivreg' crPlots(model, terms, ...) ## S3 method for class 'ivreg' crPlot(model, ...) ## S3 method for class 'ivreg' ceresPlots(model, terms, ...) ## S3 method for class 'ivreg' ceresPlot(model, ...) ## S3 method for class 'ivreg' plot(x, ...) ## S3 method for class 'ivreg' qqPlot(x, distribution = c("t", "norm"), ...) ## S3 method for class 'ivreg' outlierTest(model, ...) ## S3 method for class 'ivreg' spreadLevelPlot(x, main = "Spread-Level Plot", ...) ## S3 method for class 'ivreg' ncvTest(model, ...) ## S3 method for class 'ivreg' deviance(object, ...) ## S3 method for class 'rivreg' influence(model, ...)
model , x , object
|
A |
sigma. |
If |
type |
If |
applyfun |
Optional loop replacement function that should work like
|
ncores |
Numeric, number of cores to be used in parallel computations. If set
to an integer the
|
... |
arguments to be passed down. |
ylab |
The vertical axis label. |
distribution |
|
terms |
Terms for which added-variable plots are to be constructed; the default,
if the argument isn't specified, is the |
f , labels , R
|
see |
method |
only |
main |
Main title for the graph. |
In the case of influence.ivreg
, an object of class "influence.ivreg"
with the following components:
coefficients
the estimated regression coefficients
model
the model matrix
dfbeta
influence on coefficients
sigma
deleted values of the residual standard deviation
dffits
overall influence on the regression coefficients
cookd
Cook's distances
hatvalues
hatvalues
rstudent
Studentized residuals
df.residual
residual degrees of freedom
In the case of other methods, such as rstudent.ivreg
or
rstudent.influence.ivreg
, the corresponding diagnostic statistics.
Many other methods (e.g., crPlot.ivreg
, avPlot.ivreg
, Effect.ivreg
)
draw graphs.
ivreg
, avPlots
,
crPlots
, predictorEffects
,
qqPlot
, influencePlot
,
infIndexPlot
, Boot
,
outlierTest
, spreadLevelPlot
,
ncvTest
.
kmenta.eq1 <- ivreg(Q ~ P + D | D + F + A, data = Kmenta) summary(kmenta.eq1) car::avPlots(kmenta.eq1) car::mcPlots(kmenta.eq1) car::crPlots(kmenta.eq1) car::ceresPlots(kmenta.eq1) car::influencePlot(kmenta.eq1) car::influenceIndexPlot(kmenta.eq1) car::qqPlot(kmenta.eq1) car::spreadLevelPlot(kmenta.eq1) plot(effects::predictorEffects(kmenta.eq1, residuals = TRUE)) set.seed <- 12321 # for reproducibility confint(car::Boot(kmenta.eq1, R = 250)) # 250 reps for brevity car::outlierTest(kmenta.eq1) car::ncvTest(kmenta.eq1)
kmenta.eq1 <- ivreg(Q ~ P + D | D + F + A, data = Kmenta) summary(kmenta.eq1) car::avPlots(kmenta.eq1) car::mcPlots(kmenta.eq1) car::crPlots(kmenta.eq1) car::ceresPlots(kmenta.eq1) car::influencePlot(kmenta.eq1) car::influenceIndexPlot(kmenta.eq1) car::qqPlot(kmenta.eq1) car::spreadLevelPlot(kmenta.eq1) plot(effects::predictorEffects(kmenta.eq1, residuals = TRUE)) set.seed <- 12321 # for reproducibility confint(car::Boot(kmenta.eq1, R = 250)) # 250 reps for brevity car::outlierTest(kmenta.eq1) car::ncvTest(kmenta.eq1)
Fit instrumental-variable regression by two-stage least squares (2SLS). This is equivalent to direct instrumental-variables estimation when the number of instruments is equal to the number of regressors. Alternative robust-regression estimators are also provided, based on M-estimation (2SM) and MM-estimation (2SMM).
ivreg( formula, instruments, data, subset, na.action, weights, offset, contrasts = NULL, model = TRUE, y = TRUE, x = FALSE, method = c("OLS", "M", "MM"), ... )
ivreg( formula, instruments, data, subset, na.action, weights, offset, contrasts = NULL, model = TRUE, y = TRUE, x = FALSE, method = c("OLS", "M", "MM"), ... )
formula , instruments
|
formula specification(s) of the regression
relationship and the instruments. Either |
data |
an optional data frame containing the variables in the model.
By default the variables are taken from the environment of the
|
subset |
an optional vector specifying a subset of observations to be used in fitting the model. |
na.action |
a function that indicates what should happen when the data
contain |
weights |
an optional vector of weights to be used in the fitting process. |
offset |
an optional offset that can be used to specify an a priori known component to be included during fitting. |
contrasts |
an optional list. See the |
model , x , y
|
logicals. If |
method |
the method used to fit the stage 1 and 2 regression:
|
... |
further arguments passed to |
ivreg
is the high-level interface to the work-horse function
ivreg.fit
. A set of standard methods (including print
,
summary
, vcov
, anova
, predict
, residuals
,
terms
, model.matrix
, bread
, estfun
) is available
and described in ivregMethods
. For methods related to regression
diagnostics, see ivregDiagnostics
.
Regressors and instruments for ivreg
are most easily specified in a
formula with two parts on the right-hand side, e.g., y ~ x1 + x2 | z1
+ z2 + z3
, where x1
and x2
are the explanatory variables and z1
,
z2
, and z3
are the instrumental variables. Note that exogenous regressors
have to be included as instruments for themselves.
For example, if there is
one exogenous regressor ex
and one endogenous regressor en
with instrument in
, the appropriate formula would be y ~ en +
ex | in + ex
. Alternatively, a formula with three parts on the right-hand
side can also be used: y ~ ex | en | in
. The latter is typically more convenient, if
there is a large number of exogenous regressors.
Moreover, two further equivalent specification strategies are possible that are
typically less convenient compared to the strategies above. One option is to use
an update formula with a .
in the second part of the formula is used:
y ~ en + ex | . - en + in
. Another option is to use a separate formula
for the instruments (only for backward compatibility with earlier versions):
formula = y ~ en + ex, instruments = ~ in + ex
.
Internally, all specifications are converted to the version with two parts on the right-hand side.
ivreg
returns an object of class "ivreg"
that inherits from
class "lm"
, with the following components:
coefficients |
parameter estimates, from the stage-2 regression. |
residuals |
vector of model residuals. |
residuals1 |
matrix of residuals from the stage-1 regression. |
residuals2 |
vector of residuals from the stage-2 regression. |
fitted.values |
vector of predicted means for the response. |
weights |
either the vector of weights used (if any) or |
offset |
either the offset used (if any) or |
estfun |
a matrix containing the empirical estimating functions. |
n |
number of observations. |
nobs |
number of observations with non-zero weights. |
p |
number of columns in the model matrix x of regressors. |
q |
number of columns in the instrumental variables model matrix z |
rank |
numeric rank of the model matrix for the stage-2 regression. |
df.residual |
residual degrees of freedom for fitted model. |
cov.unscaled |
unscaled covariance matrix for the coefficients. |
sigma |
residual standard deviation. |
qr |
QR decomposition for the stage-2 regression. |
qr1 |
QR decomposition for the stage-1 regression. |
rank1 |
numeric rank of the model matrix for the stage-1 regression. |
coefficients1 |
matrix of coefficients from the stage-1 regression. |
df.residual1 |
residual degrees of freedom for the stage-1 regression. |
exogenous |
columns of the |
endogenous |
columns of the |
instruments |
columns of the |
method |
the method used for the stage 1 and 2 regressions, one of |
rweights |
a matrix of robustness weights with columns for each of the stage-1
regressions and for the stage-2 regression (in the last column) if the fitting method is
|
hatvalues |
a matrix of hatvalues. For |
df.residual |
residual degrees of freedom for fitted model. |
call |
the original function call. |
formula |
the model formula. |
na.action |
function applied to missing values in the model fit. |
terms |
a list with elements |
levels |
levels of the categorical regressors. |
contrasts |
the contrasts used for categorical regressors. |
model |
the full model frame (if |
y |
the response vector (if |
x |
a list with elements |
Greene, W.H. (2003) Econometric Analysis, 5th ed., Upper Saddle River: Prentice Hall.
ivreg.fit
, ivregDiagnostics
, ivregMethods
,
lm
, lm.fit
## data data("CigaretteDemand", package = "ivreg") ## model m <- ivreg(log(packs) ~ log(rprice) + log(rincome) | salestax + log(rincome), data = CigaretteDemand) summary(m) summary(m, vcov = sandwich::sandwich, df = Inf) ## ANOVA m2 <- update(m, . ~ . - log(rincome) | . - log(rincome)) anova(m, m2) car::Anova(m) ## same model specified by formula with three-part right-hand side ivreg(log(packs) ~ log(rincome) | log(rprice) | salestax, data = CigaretteDemand) # Robust 2SLS regression data("Kmenta", package = "ivreg") Kmenta1 <- Kmenta Kmenta1[20, "Q"] <- 95 # corrupted data deq <- ivreg(Q ~ P + D | D + F + A, data=Kmenta) # demand equation, uncorrupted data deq1 <- ivreg(Q ~ P + D | D + F + A, data=Kmenta1) # standard 2SLS, corrupted data deq2 <- ivreg(Q ~ P + D | D + F + A, data=Kmenta1, subset=-20) # standard 2SLS, removing bad case deq3 <- ivreg(Q ~ P + D | D + F + A, data=Kmenta1, method="MM") # 2SLS MM estimation car::compareCoefs(deq, deq1, deq2, deq3) round(deq3$rweights, 2) # robustness weights
## data data("CigaretteDemand", package = "ivreg") ## model m <- ivreg(log(packs) ~ log(rprice) + log(rincome) | salestax + log(rincome), data = CigaretteDemand) summary(m) summary(m, vcov = sandwich::sandwich, df = Inf) ## ANOVA m2 <- update(m, . ~ . - log(rincome) | . - log(rincome)) anova(m, m2) car::Anova(m) ## same model specified by formula with three-part right-hand side ivreg(log(packs) ~ log(rincome) | log(rprice) | salestax, data = CigaretteDemand) # Robust 2SLS regression data("Kmenta", package = "ivreg") Kmenta1 <- Kmenta Kmenta1[20, "Q"] <- 95 # corrupted data deq <- ivreg(Q ~ P + D | D + F + A, data=Kmenta) # demand equation, uncorrupted data deq1 <- ivreg(Q ~ P + D | D + F + A, data=Kmenta1) # standard 2SLS, corrupted data deq2 <- ivreg(Q ~ P + D | D + F + A, data=Kmenta1, subset=-20) # standard 2SLS, removing bad case deq3 <- ivreg(Q ~ P + D | D + F + A, data=Kmenta1, method="MM") # 2SLS MM estimation car::compareCoefs(deq, deq1, deq2, deq3) round(deq3$rweights, 2) # robustness weights
Fit instrumental-variable regression by two-stage least squares (2SLS). This is equivalent to direct instrumental-variables estimation when the number of instruments is equal to the number of predictors. Alternative robust-regression estimation is also supported, based on M-estimation (22M) or MM-estimation (2SMM).
ivreg.fit( x, y, z, weights, offset, method = c("OLS", "M", "MM"), rlm.args = list(), ... )
ivreg.fit( x, y, z, weights, offset, method = c("OLS", "M", "MM"), rlm.args = list(), ... )
x |
regressor matrix. |
y |
vector for the response variable. |
z |
instruments matrix. |
weights |
an optional vector of weights to be used in the fitting process. |
offset |
an optional offset that can be used to specify an a priori known component to be included during fitting. |
method |
the method used to fit the stage 1 and 2 regression:
|
rlm.args |
a list of optional arguments to be passed to the |
... |
further arguments passed to |
ivreg
is the high-level interface to the work-horse function
ivreg.fit
. ivreg.fit
is essentially a convenience interface to
lm.fit
(or lm.wfit
)
for first projecting x
onto the image of
z
, then running a regression of y
on the projected
x
, and computing the residual standard deviation.
ivreg.fit
returns an unclassed list with the following
components:
coefficients |
parameter estimates, from the stage-2 regression. |
residuals |
vector of model residuals. |
residuals1 |
matrix of residuals from the stage-1 regression. |
residuals2 |
vector of residuals from the stage-2 regression. |
fitted.values |
vector of predicted means for the response. |
weights |
either the vector of weights used (if any) or |
offset |
either the offset used (if any) or |
estfun |
a matrix containing the empirical estimating functions. |
n |
number of observations. |
nobs |
number of observations with non-zero weights. |
p |
number of columns in the model matrix x of regressors. |
q |
number of columns in the instrumental variables model matrix z |
rank |
numeric rank of the model matrix for the stage-2 regression. |
df.residual |
residual degrees of freedom for fitted model. |
cov.unscaled |
unscaled covariance matrix for the coefficients. |
sigma |
residual standard error; when method is |
x |
projection of x matrix onto span of z. |
qr |
QR decomposition for the stage-2 regression. |
qr1 |
QR decomposition for the stage-1 regression. |
rank1 |
numeric rank of the model matrix for the stage-1 regression. |
coefficients1 |
matrix of coefficients from the stage-1 regression. |
df.residual1 |
residual degrees of freedom for the stage-1 regression. |
exogenous |
columns of the |
endogenous |
columns of the |
instruments |
columns of the |
method |
the method used for the stage 1 and 2 regressions, one of |
rweights |
a matrix of robustness weights with columns for each of the stage-1
regressions and for the stage-2 regression (in the last column) if the fitting method is
|
hatvalues |
a matrix of hatvalues. For |
ivreg
, lm.fit
,
lm.wfit
, rlm
, mad
## data data("CigaretteDemand", package = "ivreg") ## high-level interface m <- ivreg(log(packs) ~ log(rprice) + log(rincome) | salestax + log(rincome), data = CigaretteDemand) ## low-level interface y <- m$y x <- model.matrix(m, component = "regressors") z <- model.matrix(m, component = "instruments") ivreg.fit(x, y, z)$coefficients
## data data("CigaretteDemand", package = "ivreg") ## high-level interface m <- ivreg(log(packs) ~ log(rprice) + log(rincome) | salestax + log(rincome), data = CigaretteDemand) ## low-level interface y <- m$y x <- model.matrix(m, component = "regressors") z <- model.matrix(m, component = "instruments") ivreg.fit(x, y, z)$coefficients
These are partly contrived data from Kmenta (1986), constructed to illustrate estimation of a simultaneous-equation econometric model. The data are an annual time-series for the U.S. economy from 1922 to 1941. The values of the exogenous variables D, and F, and A are real, while those of the endogenous variables Q and P are simulated according to the linear simultaneous equation model fit in the examples.
data("Kmenta", package = "ivreg")
data("Kmenta", package = "ivreg")
A data frame with 20 rows and 5 columns.
food consumption per capita.
ratio of food prices to general consumer prices.
disposable income in constant dollars.
ratio of preceding year's prices received by farmers to general consumer prices.
time in years.
Kmenta, J. (1986) Elements of Econometrics, 2nd ed., Macmillan.
data("Kmenta", package = "ivreg") deq <- ivreg(Q ~ P + D | D + F + A, data = Kmenta) # demand equation seq <- ivreg(Q ~ P + F + A | D + F + A, data = Kmenta) # supply equation summary(deq, tests = TRUE) summary(seq, tests = TRUE)
data("Kmenta", package = "ivreg") deq <- ivreg(Q ~ P + D | D + F + A, data = Kmenta) # demand equation seq <- ivreg(Q ~ P + F + A | D + F + A, data = Kmenta) # supply equation summary(deq, tests = TRUE) summary(seq, tests = TRUE)
Data from the U.S. National Longitudinal Survey of Young Men (NLSYM) in 1976 but using some variables dating back to earlier years.
data("SchoolingReturns", package = "ivreg")
data("SchoolingReturns", package = "ivreg")
A data frame with 3010 rows and 22 columns.
Raw wages in 1976 (in cents per hour).
Education in 1976 (in years).
Years of labor market experience, computed as age - education - 6
.
Factor indicating ethnicity. Is the individual African-American
("afam"
) or not ("other"
)?
Factor. Does the individual reside in a SMSA (standard metropolitan statistical area) in 1976?
Factor. Does the individual reside in the South in 1976?
Age in 1976 (in years).
Factor. Did the individual grow up near a 4-year college?
Factor. Did the individual grow up near a 2-year college?
Factor. Did the individual grow up near a 4-year public or private college?
Factor. Is the individual enrolled in college in 1976?
factor. Is the individual married in 1976?
Education in 1966 (in years).
Factor. Does the individual reside in a SMSA in 1966?
Factor. Does the individual reside in the South in 1966?
Father's educational attainment (in years). Imputed with average if missing.
Mother's educational attainment (in years). Imputed with average if missing.
Ordered factor coding family education class (from 1 to 9).
Knowledge world of work (KWW) score.
Normed intelligence quotient (IQ) score
Factor coding living with parents at age 14: both parents, single mother, step parent, other
Factor. Was there a library card in home at age 14?
Investigating the causal link of schooling on earnings in a classical model for wage determinants is problematic because it can be argued that schooling is endogenous. Hence, one possible strategy is to use an exogonous variable as an instrument for the years of education. In his well-known study, Card (1995) uses geographical proximity to a college when growing up as such an instrument, showing that this significantly increases both the years of education and the wage level obtained on the labor market. Using instrumental variables regression Card (1995) shows that the estimated returns to schooling are much higher than when simply using ordinary least squares.
The data are taken from the supplementary material for Verbeek (2004) and are based
on the work of Card (1995). The U.S. National Longitudinal Survey of Young Men
(NLSYM) began in 1966 and included 5525 men, then aged between 14 and 24.
Card (1995) employs labor market information from the 1976 NLSYM interview which
also included information about educational attainment. Out of the 3694 men
still included in that wave of NLSYM, 3010 provided information on both wages
and education yielding the subset of observations provided in SchoolingReturns
.
The examples replicate the results from Verbeek (2004) who used the simplest specifications from Card (1995). Including further region or family background characteristics improves the model significantly but does not affect much the main coefficients of interest, namely that of years of education.
Supplementary material for Verbeek (2004).
Card, D. (1995). Using Geographical Variation in College Proximity to Estimate the Return to Schooling. In: Christofides, L.N., Grant, E.K., and Swidinsky, R. (eds.), Aspects of Labour Market Behaviour: Essays in Honour of John Vanderkamp, University of Toronto Press, Toronto, 201-222.
Verbeek, M. (2004). A Guide to Modern Econometrics, 2nd ed. John Wiley.
## load data data("SchoolingReturns", package = "ivreg") ## Table 5.1 in Verbeek (2004) / Table 2(1) in Card (1995) ## Returns to education: 7.4% m_ols <- lm(log(wage) ~ education + poly(experience, 2, raw = TRUE) + ethnicity + smsa + south, data = SchoolingReturns) summary(m_ols) ## Table 5.2 in Verbeek (2004) / similar to Table 3(1) in Card (1995) m_red <- lm(education ~ poly(age, 2, raw = TRUE) + ethnicity + smsa + south + nearcollege, data = SchoolingReturns) summary(m_red) ## Table 5.3 in Verbeek (2004) / similar to Table 3(5) in Card (1995) ## Returns to education: 13.3% m_iv <- ivreg(log(wage) ~ education + poly(experience, 2, raw = TRUE) + ethnicity + smsa + south | nearcollege + poly(age, 2, raw = TRUE) + ethnicity + smsa + south, data = SchoolingReturns) summary(m_iv)
## load data data("SchoolingReturns", package = "ivreg") ## Table 5.1 in Verbeek (2004) / Table 2(1) in Card (1995) ## Returns to education: 7.4% m_ols <- lm(log(wage) ~ education + poly(experience, 2, raw = TRUE) + ethnicity + smsa + south, data = SchoolingReturns) summary(m_ols) ## Table 5.2 in Verbeek (2004) / similar to Table 3(1) in Card (1995) m_red <- lm(education ~ poly(age, 2, raw = TRUE) + ethnicity + smsa + south + nearcollege, data = SchoolingReturns) summary(m_red) ## Table 5.3 in Verbeek (2004) / similar to Table 3(5) in Card (1995) ## Returns to education: 13.3% m_iv <- ivreg(log(wage) ~ education + poly(experience, 2, raw = TRUE) + ethnicity + smsa + south | nearcollege + poly(age, 2, raw = TRUE) + ethnicity + smsa + south, data = SchoolingReturns) summary(m_iv)