Here’s an excerpt from an introduction-to-econometrics paper written for lawyers, which I will present at the ABA Antitrust Spring meeting in DC in late March. If you are interested in reading the whole submission (a little over 4,000 words), please write me.

* * *

Well, if that’s all regression does, you might ask, why in the heck do we need it? The answer is that many factors in addition to the challenged conduct likely affect prices in this market, and we need to control for those factors in case they changed around the time that the challenged conduct ended. Prices are typically determined as a markup over the cost of serving the customer. Suppose the seller (the defendant) in this case always imposes a markup of 50 percent over costs; in the during period, average costs were $667 and average prices were $1,000. Suppose further that costs on average declined in the after period relative to the before period by $100, bringing average costs to $567 and average prices to $850 (equal to $567 plus 0.5 x $567). We now have an independent reason—unrelated in any way to the challenged conduct—for why prices would have declined in the after period!

Suppose the analyst is unaware that costs had changed or that cost data are not available. He regresses the simple model from equation [1]. The estimated parameter on the conduct indicator comes back at $250, but we know that the parameter is biased. Technically, this means the expected value of the parameter in repeated samples will not be equal to the true value. The regression is attributing too much of the change in prices between the during and the after period to the challenged conduct. This problem is referred to in the econometrics literature as “omitted variable bias,” and it represents a major challenge for applied economists.

Here’s why: Remember that assumption on the error term in equation [1]? It required that the error term was not correlated with the conduct indicator. By omitting cost from the regression, however, we violated it. In particular, we know that costs declined remarkably right around the time that the conduct ended; hence, when the conduct was absent (present), costs were lower (higher). Without controlling for costs, *B* will now capture the* sum* of the direct effect of the conduct on prices (what we want) plus the indirect effect of the conduct on costs, which in this case is positive. So when we omit costs from the regression, our predictions of prices based on equation [1] will be worse in the presence of the conduct—that is, the error term is now correlated with the conduct indicator. In general, whenever the omitted variable (in this case, cost) is positively correlated with *both* the included regressor (the conduct) and the dependent variable (the price), the estimate of the included variable’s coefficient will be upwardly biased. Because this rule is hard to memorize, I’ve presented a simple table for reference below.

Correlation between omitted variable and included regressor |
Correlation between omitted variable and dependent variable |
Direction of Bias on Included Regressor |

Positive | Positive | Upward |

Negative | Positive | Downward |

Positive | Negative | Downward |

It bears noting that most if not all regressions ever estimated have omitted at least some explanatory variables from the equation (otherwise, there would be no error term, and the R-squared would be 100 percent). But that does not imply that the resulting parameters of the imperfect model were biased. Two conditions must be present for an omitted variable to result in a biased regression estimate: (1) the omitted variable must be a factor that explains the dependent variable; and (2) the omitted variable must be correlated with an independent variable specified in the regression. The second condition is a generalization of the phenomenon we just encountered with costs and the challenged conduct. This means that it is not sufficient for an opposing economist to merely point out that a regression is missing a key variable. For the critique to be valid, the opposing economist must demonstrate that both conditions are satisfied. One way to do this is indirectly, by providing an evidentiary basis that the allegedly omitted variable is a factor in defendant’s pricing, and that it is correlated with the conduct variable. Alternatively, the opposing economist can demonstrate omitted-variable bias directly by re-running the regression with the omitted variable included, and showing that not only does it belong in the regression (as evidenced by a statistically and economically significant effect), but also that the revised estimate of the conduct parameter is no longer statistically or economically significant or of the expected sign.