Synthetic Control with Prediction Intervals

How do you measure uncertainty when there is only one treated unit?

West Germany was reunified once, in 1990. There is no untreated twin to compare it against. The synthetic control method builds an artificial twin from a weighted blend of donor countries, then reads off the post-1990 gap. The SCPI framework of Cattaneo, Feng & Titiunik (2021) extends this with prediction intervals: a formal way to ask whether that gap is real or just noise.

This app lets you turn the dials. In four tabs you will: watch a synthetic West Germany self-assemble from 6 donor countries; widen or shrink the prediction-interval band and see how the “outside the band” years change; explore a simulated gap with an adjustable PI; and finally compare four weight-constraint methods on the actual reunification data.

The donor blend — animated

The simplex synthetic West Germany is a weighted average of 6 of the 16 donor countries. The bars below animate the weights filling in (highlighted bar = Austria, the largest contributor). Notice how concentrated the blend is — just two donors (Austria and the USA) account for 56% of the weight.

Tab 2

Donor Pool

Plot the actual West Germany alongside its synthetic counterfactual and a 95% prediction-interval band. Drag the slider to widen the band and watch how many years remain “outside”.

Tab 3

PI Simulator

The treatment gap on its own, with a symmetric PI band around zero. Toggle the “PI width multiplier” to see why even 99% intervals fail to cover the late-1990s gap.

Tab 4

Method Forest

The four weighting methods from §9 (Simplex, Lasso, Ridge, OLS) as a forest plot. Hover any point for the SE and CI; compare pre-RMSE versus the gap each method estimates.

Three takeaways the app is built around

The gap is large and growing. By 2003 West Germany's GDP per capita was about −$3,465 below the synthetic counterfactual — roughly 11% lower than what the simulator predicts in the absence of reunification.
The gap is statistically significant. From 1997 onward, actual GDP falls below the lower bound of the 95% prediction interval. Even at the 99% confidence level, the actual GDP falls outside the band for 7 of 13 post-treatment years.
The synthetic is sparse and interpretable. Only 6 of 16 donors get non-zero weight: Austria 0.291, USA 0.273, Italy 0.191, Netherlands 0.133, Switzerland 0.081, France 0.030.

Glossary (open a card if a term is unfamiliar)

Synthetic control

A weighted average of donor units chosen to match the treated unit during the pre-treatment period. After treatment, the same weights produce the counterfactual.

Donor pool

The untreated units used to build the synthetic. Here, 16 OECD countries that did not experience reunification in 1990.

Simplex constraint

Weights are non-negative and sum to one. Guarantees a convex blend of real donors — no extrapolation.

Pre-treatment RMSE

Root-mean-square error of the synthetic vs the treated unit, computed on the pre-treatment window only. Low RMSE = credible counterfactual.

Treatment gap

Actual outcome minus synthetic outcome in the post-treatment period. The point estimate of the causal effect.

Prediction interval (PI)

A range that covers the counterfactual with stated probability (e.g., 95%). When actual GDP falls outside the band, the gap is statistically distinguishable from zero.

In-sample uncertainty

Imperfect pre-treatment fit. With only 31 years to estimate 16 weights, the weights themselves carry sampling noise.

Out-of-sample uncertainty

Post-treatment shocks the model could not have predicted from pre-treatment data alone. Dominates the PI width in later years.

Actual vs Synthetic West Germany — with a prediction interval band

The orange line is West Germany's actual GDP per capita; the dashed steel line is the synthetic counterfactual built from the 6 donor weights. The shaded band is the prediction interval. Drag the slider to widen or shrink the band — orange-highlighted dots mark years where the actual line falls outside the band (i.e., a statistically significant gap).

PI band multiplier 1.00

1.0 = baseline 95% PI from scpi() (simplex, HC1, gaussian). Drag right to mimic a higher confidence level; drag left to see when years start falling outside the band.

Years outside PI

—

out of 13 post-treatment

Avg PI half-width

—

thousand USD

2003 gap

−3.465

thousand USD per capita

Avg gap 1991–2003

−1.668

thousand USD per capita

What to look for

The two lines coincide before 1991. That is not a coincidence — the simplex weights were chosen to minimise pre-treatment RMSE (= 0.072 thousand USD, about 0.3% of West Germany's pre-1990 GDP).
At 1.0 the band starts catching the actual line from 1997 onward. Earlier years sit inside the PI — the effect is emerging, but the in-sample uncertainty is large enough to absorb it. Late years sit clearly outside.
Shrink the band to 0.5. Almost every post-1992 year becomes “significant”. Stretch to 2.0 and only the deepest 1999–2003 years stay outside.
The band widens over time even at 1.0. That is the out-of-sample component: forecasting noise compounds with the distance from the pre-treatment window.

The treatment gap, isolated

Subtracting the synthetic line from the actual line gives the year-by-year gap — the SCPI point estimate of the causal effect. The band shows the prediction interval translated to the gap scale: any year where the gap crosses outside the shaded region is statistically distinguishable from zero. Move the multiplier to mimic different confidence levels.

PI band multiplier 1.00

≈ 1.30 widens the 95% band to roughly 99% coverage. ≈ 0.80 narrows it toward 90%.

Years outside band

—

significant gap, out of 13

Largest gap (year)

—

most negative effect

Avg post-treatment gap

−1.668

≈ 5.5% of pre-1990 GDP

Why the late years matter most

The gap is monotone after 1992. A short-lived disturbance would generate a transient gap. A persistent gap with the same sign across thirteen years is the signature of a structural effect.
The PI band is widest where the gap is largest. Out-of-sample uncertainty accumulates over time — yet even with that increasing width, the gap still falls outside at the end. That is the §10 sensitivity result made visible.
Try multiplier = 1.30. Roughly the 99% PI in the post (avg width 3.30 vs 2.84 for 95%). You should still see 7 of 13 years outside the band.

Sensitivity table (from §10 of the post)

Confidence	α (per side)	Avg PI width	Years outside
99%	0.01	3.298	7 / 13
95%	0.05	2.842	7 / 13
90%	0.10	2.583	9 / 13
80%	0.20	2.304	9 / 13

Robustness across weight constraints

Section 9 of the post compares four weighting methods — Simplex, Lasso, Ridge, and OLS. The forest plot below shows the estimated gap in 2003 and the average post-treatment gap under each method, with approximate 95% confidence intervals. All four methods agree that the gap is negative; they differ only in magnitude.

What to look for

Simplex and Lasso are nearly identical (gap 2003: −3.465 vs −3.426). Lasso is a relaxation of Simplex; on this data the relaxation barely kicks in.
Ridge and OLS estimate a smaller gap (−2.72 and −2.38). Their pre-RMSE is lower (0.04 vs 0.07) — they fit the pre-treatment period better, but at the cost of using all 16 donors with possibly negative weights. Over-fitting the pre-period compresses the post-treatment divergence.
None of the 95% CIs cross zero, on either outcome. The qualitative conclusion (reunification reduced West German GDP) is robust to the choice of constraint.

Connecting back to Tab 2

The PI band you played with on Tab 2 used the simplex constraint (the leftmost bar in the forest plot above). Ridge and OLS would shrink the gap by about 20–30%, but their pre-fit is also tighter — so their PI bands would be narrower too. Whether you read the gap as “about −$3,500” (Simplex/Lasso) or “about −$2,500” (Ridge/OLS), it is in the same order of magnitude and survives uncertainty quantification.

Synthetic Control with Prediction Intervals — Interactive Lab

How do you measure uncertainty when there is only one treated unit?

The donor blend — animated

Donor Pool

PI Simulator

Method Forest

Three takeaways the app is built around

Glossary (open a card if a term is unfamiliar)

Actual vs Synthetic West Germany — with a prediction interval band

What to look for

The treatment gap, isolated

Why the late years matter most

Sensitivity table (from §10 of the post)

Robustness across weight constraints

Outcomes

Methods

What to look for

Connecting back to Tab 2