# Instrumental variable

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In statistics and econometrics, an instrumental variable (IV, or instrument) can be used in structural equation models to produce a consistent estimator of a structural (or causal) parameter when the explanatory variables (covariates) are correlated with the error terms. This can be caused by endogeneity, by omitted covariates, or by measurement errors in the covariates. In this situation, ordinary linear regression produces biased and inconsistent estimates. However, if an instrument is available, consistent estimates may still be obtained. An instrument is a variable that does not itself belong in the explanatory equation, that is correlated with the suspect explanatory variable, and that is uncorrelated with the error term. Formal definitions of instrumental variables, using counterfactuals and graphical criteria are given in (Pearl, 2000).[1]

In linear models, there are two main requirements for using an IV:

• The instrument must be correlated with the explanatory variable.
• The instrument cannot be correlated with the error term in the explanatory equation, nor with any other variable in that equation (that is, the instrument cannot suffer from the same problem as the original predicting variable).

In non-linear systems, instruments cannot in general be used to produce a consistent estimator of the desired causal effects. However, they can be used to produce tight bounds on those effects.[1]

## Econometrics

This ordinary least squares estimator ($\widehat{\beta}_\mathrm{OLS}$) is used to estimate the mean structure of a model of the form

$y_i = \beta x_i + \varepsilon_i$

and takes the form

$\widehat{\beta}_\mathrm{OLS} = \frac{\sum_i x_i y_i}{\sum_i x_i^2} = \frac{\sum_i x_i (x_i \beta + \varepsilon_i)}{\sum_i x_i^2} = \beta + \frac{\sum_i x_i \varepsilon_i}{\sum_i x_i^2}.$

When x and $\varepsilon$ are uncorrelated, the second term goes to zero in the limit and the estimator is unbiased with decreasing variance as the number of sampled units increases and thus also consistent. When x and $\varepsilon$ are correlated, however, the estimator is biased and inconsistent.

An instrumental variable is one that is correlated with the independent variable but not with the error term. The estimator is

$\widehat{\beta}_\mathrm{IV} = \frac{\sum_i z_i y_i}{\sum_i z_i x_i} = \frac{\sum_i z_i (x_i \beta + \varepsilon_i)}{\sum_i z_i x_i} = \beta + \frac{\sum_i z_i \varepsilon_i}{\sum_i z_i x_i}.$

When z and $\varepsilon$ are uncorrelated, the final term approaches zero in the limit, providing a consistent estimator. Note that when x is uncorrelated with the error term, x is itself an instrument for itself. In this light, under certain assumptions, OLS is a narrower version of IV estimators.

The approach above generalizes in a straightforward way to a regression with multiple explanatory variables. Suppose X is the T x K matrix of explanatory variables resulting from T observations on K variables. Let Z be a T x K matrix of instruments. Then

$\widehat{\beta}_\mathrm{IV} = (Z'X)^{-1}Z'Y = (Z'X)^{-1}Z'(X\beta+\varepsilon) = \beta + (Z'X)^{-1}Z'\varepsilon.$

One computational method often used for implementing the technique is two-stage least-squares (2SLS). One advantage of this approach is that it can efficiently combine information from multiple instruments for over-identified regressions: where there are fewer covariates than instruments. Under the 2SLS approach, in a first stage, each endogenous covariate (predictor variable) is regressed on all valid instruments, including the full set of exogenous covariates in the main regression. Since the instruments are exogenous, these approximations of the endogenous covariates will not be correlated with the error term. So, intuitively they provide a way to analyze the relationship between the outcome variable and the endogenous covariates. In the second stage, the regression of interest is estimated as usual, except that in this each endogenous covariate is replaced with its approximation estimated in the first stage. The slope estimator thus obtained is consistent. A small correction must be made to the sum-of-squared residuals in the second-stage fitted model in order that the associated standard errors be computed correctly.

Stage 1: $\widehat{X}= Z(Z'Z)^{-1}Z'X$
Stage 2: $\widehat{B}_\mathrm{IV} = (\widehat{X}'\widehat{X})^{-1}\widehat{X}'Y$

Mathematically, this estimator is identical to the single stage estimator presented above when the number of instruments is the same as the number of covariates.

## Applications and problems

The use of the instrumental variables estimation technique often provides a useful, convenient and ethical alternative to the classical randomized experiment. In the randomized experiment, exogenous variation in treatment is provided by the random assignment of participants to the treatment and control conditions, causing the investigator to deny the treatment to the control participants. Using IVE, participants can be permitted to self-select into treatment and control, and the investigator can subsequently tease out the exogenous component of the treatment variation using the instrument. Of course, one does not get anything for nothing -- the IVE technique is only as good as the instruments it employs.

In comparison to randomized experiments, IV estimates local average treatment effects (LATE) rather than average treatment effects (ATE). The effect of a program is only identified for the subpopulation that is affected by the instrument. For example, using financial aid as an instrument for college (assuming financial aid changed exogenously due to a policy change) only identifies the returns to education for students who attend college solely because of financial aid. Students who receive no financial aid are not affected by the instrument.

The technique is useful for solving the errors in variables problem and for the recovery of structural parameters from simultaneous equations models such as supply and demand. Unfortunately, there is no way to prove that the independent variables are not correlated with the error term, since the error is by definition unobservable. Consequently, one problem is in the selection and defense of suitable instruments. Good instruments are often created by exogenous policy changes (i.e., the cancellation of federal student aid scholarship program), geographic differences in the application of standards (i.e., different states implement different passing standards for a common exam) or generic randomness (e.g., the Vietnam Draft Lottery) have led to exogenous disruptions in the values of the construct being measured by the selected instrument.

Another problem is caused by the selection of "weak" instruments. These are instruments that are very poor predictors of the endogenous question predictor in the first-stage equation. In this latter case, the prediction of the question predictor by the instrument will be poor and the obtained predicted values will have very little variation. Consequently, they are unlikely to have much success in predicting the ultimate outcome when they are used to replace the question predictor in the second-stage equation.

## Hypothesis testing

The problem can be written as

$\widehat{\beta}=\left(Z' X\right)^{-1} Z' y$

By using the fact that $y=X \beta + \varepsilon$, it follows that $\widehat{\beta}$ is normally distributed with mean $\beta$ and covariance matrix

$\Sigma = \sigma^2 \left( Z' X\right)^{-1} \left(Z' Z \right) \left(X' Z \right)^{-1} = \sigma^2 A$

where $\scriptstyle{\sigma^2}$ is the variance of $\scriptstyle{\varepsilon}.\,$

The residual sum of squares is computed with:

$RSS=\widehat{\varepsilon}'\widehat{\varepsilon}=y' \left(I - Z \left( X' Z \right)^{-1} X' \right) \left( I - X \left( Z' X \right)^{-1} Z' \right) y$

where

$\widehat{\varepsilon} = y - X \widehat{\beta}.\,$

The variance of the error is estimated with

$\widehat{\sigma}^2 = \frac{RSS}{r}\,$

where $r$ is the rank of $\left(I - Z \left( X' Z \right)^{-1} X' \right) \left( I - X \left( Z' X \right)^{-1} Z' \right).$

The variable

$t_i = \frac{\widehat{\beta}_i }{\sqrt{\widehat{\sigma}^2 A_{ii}}}$

follows a Student's t-distribution with $r$ degrees of freedom.

## Mediating Instrumental Variables

Judea Pearl (1993, and 2000)[2][1] has developed an alternative method of Instrumental Variables, called "Mediating Instrumental Variables" (MIV) which relies on finding an auxiliary variable $Z^\prime$ lying on the causal pathway between $X$ and $Y$. The MIV method can provide unbiased estimates under conditions where the standard IV method fails, hence, agreement between the two methods would constitute strong evidence for the reliability of the estimation. Moreover, the MIV approach extends naturally to nonlinear and nonparametric models, thus requiring only minimal structural assumptions on the part of the investigator, with no commitment to any particular functional form.

## Testable implications

The assumptions defining the classical IV equations:

$Y = b X + \epsilon$
$E(Z \epsilon) = 0, E(ZX) \neq 0$

are not testable. In other words, for every tri-variate covariance matrix on $X, Y$ and $Z$ we can always find a $b$ that renders the matrix compatible with the equations above. This means that, absent apriori assumptions, no $Z$ (satisfying $E(ZX) \neq 0$) can be rules out as an instrument for $b$. Things are different in nonlinear systems; the IV equation

$Y = f(X, \epsilon), Z \bot\!\!\!\bot \epsilon$

may have testable implication. Pearl (2000)[1] has shown that, for discrete variables, a necessary condition for $Z$ to be an instrument is that the inequality

$\sum_y [\max_z Pr (Y=y, X=x|Z=z)] \leq 1$

is satisfied for every $x$.