IV with Panel Data — Interactive Lab

A pedagogical companion to IV Estimation with Panel Data: Economic Shocks and Civil Conflict ↗ Back to the post

Why instrumental variables?

Does poverty cause violence? The correlation is real, but correlation is not causation. Conflict destroys economic activity (reverse causality), and the nighttime lights we use to measure activity are a noisy proxy (measurement error). Both contaminate ordinary least squares. Instrumental variables — using lagged rainfall and drought as a clean source of variation — slice through both problems.

In this lab you can turn the knobs yourself. Three tabs let you: explore the first-stage relationship between weather and economic activity; watch OLS and 2SLS recover (or fail to recover) a true causal effect you set; and inspect the post's actual estimates with confidence intervals and first-stage F-statistics.

The IV identification path

The instrument Z (weather, two years ago) shifts the endogenous regressor D (lagged economic activity), which in turn shifts the outcome Y (conflict today). Confounders U and measurement error ME contaminate D, but they cannot reach Z — that is the exclusion restriction. The animated dot traces the clean IV path Z → D → Y.

Tab 2

First-Stage Lab

Slide instrument strength π and watch the first-stage F-statistic cross the Stock-Yogo threshold. See what a "strong" vs "weak" instrument looks like.

Tab 3

OLS vs 2SLS Showdown

You set the true effect. Slide measurement-error and confounding to bias OLS. Watch 2SLS recover the truth. Run 100 simulations to see the bias-variance picture.

Tab 4

Forest Plot

The post's headline estimates, interactively. Hover any point to see the SE, 95% CI, and the first-stage F. Toggle outcomes and methods.

Three takeaways from the post

  1. Economic shocks cause civil conflict. A 10% decline in nighttime light intensity raises the probability of conflict (1+ deaths) by about 3 percentage points — a 66% jump above the 4.6% baseline.
  2. OLS massively underestimates the effect. OLS returns 0.001; 2SLS returns about -0.30. A 300-fold difference, driven by attenuation bias from measurement error and omitted-variable bias from unobserved confounders.
  3. The instruments are strong and valid. First-stage F-statistics (24.6 to 40.3) clear the Stock-Yogo 10% threshold of 16.38. Hansen J p-value of 0.93 says rainfall and drought tell the same story.

Glossary (open a card if a term is unfamiliar)

Endogeneity
When a regressor correlates with the error term. OLS is contaminated by omitted variables, reverse causality, or measurement error.
Instrument Z
An external variable that drives variation in the regressor without belonging in the outcome equation. Here: lagged rainfall and drought.
First-stage F
The F-statistic from regressing the endogenous regressor on the instrument(s). Measures instrument strength.
Stock-Yogo critical value
A formal threshold for "weak instruments." 16.38 with one instrument keeps 2SLS bias below 10% of OLS bias.
Exclusion restriction
Z affects Y only through D. Untestable in just-identified models. This is the most contested part of any IV design.
2SLS
Two-Stage Least Squares. Stage 1: regress D on Z. Stage 2: regress Y on fitted D. The slope is the IV estimate.
Attenuation bias
Classical measurement error in a regressor shrinks the OLS slope toward zero. IV with a clean instrument removes the shrinkage.
Hansen J test
A joint test of instrument validity when you have more instruments than endogenous regressors. Failure to reject (high p) supports validity.
LATE
Local Average Treatment Effect. The IV estimand: the causal effect for the subpopulation whose D responds to Z (compliers).
Reduced form
The regression of Y directly on Z. Its sign tells you whether the instrument predicts the outcome. The IV estimate = reduced form / first stage.

First-Stage Lab — what does instrument strength look like?

The first stage regresses the endogenous regressor D on the instrument Z. The F-statistic on Z measures instrument strength: above the Stock-Yogo (10%) critical value of 16.38, the instrument is "strong"; below 10, it is dangerously weak. Drag the instrument strength slider and watch the binned scatter, the slope, and the F-statistic respond in real time.

Larger π ⇒ Z explains more of D. In the post, π_rain ≈ 0.036, π_drought ≈ 0.006 — both small but the F-stats are large because n = 96,591.
More observations ⇒ tighter estimates ⇒ higher F-stat for the same π.
How much an unobserved variable also drives D. Does not change the first stage on its own.
first-stage slope (π̂)
how Z moves D
first-stage F
vs Stock-Yogo 16.38
verdict
strong / borderline / weak
sample size n
300
observations

What to look for

  • Slide π toward zero. The orange binned dots flatten, the teal regression line goes horizontal, and the F-stat crashes. Below 10, the verdict turns red — the instrument is weak.
  • Now boost n. Even a tiny π becomes "strong" once n is large enough. This is why the post's tiny coefficients (0.036, 0.006) still produce F = 24.6 and F = 40.3 — there are 96,591 observations.
  • Confounding ρ does not affect the first stage. Slide it and watch: the F-stat is unchanged. Confounding contaminates OLS, not the Z-D relationship. (You will see ρ's effect in Tab 3.)

OLS vs 2SLS Showdown — recover a causal effect you set

You set the true causal effect δ. The simulator contaminates the observed regressor with measurement error and/or correlates it with an unobserved confounder. Watch OLS bend toward zero (attenuation) or away from truth (omitted-variable bias), while 2SLS using the clean instrument Z stays on target. The headline pattern of the post — OLS ≈ 0, 2SLS ≈ -0.30 — is reproducible in seconds.

The causal slope of D on Y. Set it to match the post (-0.30) or any value you like.
Noise added to the observed D. Larger ⇒ stronger attenuation bias in OLS.
How much an unobserved variable drives D. Combined with τ ≠ 0, this creates omitted-variable bias.
Direct effect of the confounder on Y. With ρ × τ ≠ 0, OLS is biased; 2SLS is not.
Same as Tab 2. Weak instruments make 2SLS imprecise; strong ones make it reliable.
Capped at 500 so the "Run 100 sims" button finishes quickly.

OLS

slope of Y on observed D — contaminated by ME & confounding

δ̂
SE(δ̂)
bias

2SLS

slope of Y on D, instrumented by Z — clean

δ̂
SE(δ̂)
first-stage F
bias

Why this happens

  • Measurement error attenuates OLS. Slide σ_ME up. The OLS bar shrinks toward zero. 2SLS holds steady because Z is uncorrelated with the measurement noise in D.
  • Confounding biases OLS in either direction. Slide ρ and τ around. The OLS bar swings; 2SLS does not, because the instrument is uncorrelated with the confounder.
  • Weak instruments hurt 2SLS too. Push π toward zero. The first-stage F drops, and 2SLS becomes wildly variable across reseeds. Strong instruments are not optional — they are the precondition for trustworthy IV.

Bias vs variance over 100 simulations

Single runs are noisy. Run the same DGP 100 times with fresh draws (same parameters, different noise) to see whether the OLS bias is systematic and how dispersed the 2SLS estimates are.

The post's forest plot — interactively

These numbers come straight from coef_comparison_conflict01.csv and Tables 2–3 in the post folder — the same estimates used to produce Figure: "OLS vs 2SLS coefficient comparison". Toggle outcomes and methods to compare. Hover any point to see its SE, 95% CI, and the first-stage F.

What to look for

  • OLS is glued to zero. Both conflict outcomes show δ̂_OLS ≈ 0.001 with tight standard errors. A classic attenuation signature.
  • All three 2SLS rows cluster around δ̂ = -0.30 for Conflict 1+ and around -0.09 for Conflict 25+. Three different instrument specifications give the same answer — strong evidence the IV strategy is identifying a real causal effect.
  • The "Both" specification is the tightest. Standard errors shrink from 0.111 (Rain) and 0.085 (Drought) to 0.076 (Both). Adding more instruments improves efficiency without sacrificing consistency, as long as they all satisfy exclusion.
  • Hover the 2SLS dots to see the first-stage F. All three specifications clear the Stock-Yogo 10% threshold of 16.38. The instruments are not weak.

Outcomes

Methods

IV diagnostics from the post

Three checks support the IV strategy:

  • First-stage F (Rain alone): 24.62 — above the Stock-Yogo (10%) critical value of 16.38. Strong.
  • First-stage F (Drought alone): 40.33 — well above 16.38. Strong.
  • Hansen J overidentification test (Both): J = 0.007, p = 0.932. We fail to reject instrument validity by a wide margin — rainfall and drought tell the same story.

Connecting back to Tab 3

The OLS vs 2SLS gap you simulated in Tab 3 is exactly what shows up in the real Hodler-Raschky (2014) panel of 5,689 African regions over 1994–2010:

  • OLS: 0.0008 — essentially zero. The attenuation bias from measuring nighttime lights swamps the true effect.
  • 2SLS (Both instruments): -0.296. A 10% drop in economic activity raises the probability of conflict by ~3 percentage points.
  • Ratio: roughly 300-to-1 in absolute value. The measurement-error story is quantitatively enormous.

The simulator in Tab 3 lets you reproduce a gap of any size you like — turn σ_ME and ρ up, and the OLS bar collapses toward zero while the 2SLS bar stays on the truth.