Why pooled PCA? The yardstick problem.
You want a single development index that tracks 153 South American regions in both 2013 and 2019. The naive recipe — run PCA separately for each year — silently re-centres the data every period, so a region's improvement can appear as a decline. Pooled PCA fixes the yardstick: it standardises and computes eigenvector weights from the stacked panel (all 306 observations), producing one set of weights that applies to both periods.
This app walks you through the choice in four tabs. You'll see the animation of a "shifting" vs "fixed" yardstick, slide the parameters of a two-period simulation to watch per-period weights wobble while pooled weights stay still, compare real PC1 weights from the post side-by-side with confidence intervals, and reproduce the validation against the official Subnational HDI.
Shifting yardstick (per-period) vs fixed yardstick (pooled)
The animation below shows the same coefficient under two regimes as the "penalty knob" sweeps — read it as how much the standardisation baseline can move between periods. The orange line is the per-period regime: the baseline shifts, so a region's measured value can hit zero or even flip sign. The steel-blue line is the pooled regime: the baseline stays fixed and the value decays smoothly. Pooled keeps genuine improvements visible; per-period can hide them.
PCA Simulator
Slide the income-shock and education-gain to control a 2-period DGP. Compare pooled and per-period PC1 weights live, and watch the per-period weights drift while pooled weights stay fixed.
Weight Comparison
The real numbers from the post: PC1 weights for Education, Health, and Income under pooled vs per-period 2013 vs per-period 2019, with 95% intervals.
SHDI Validation
Which method better tracks the official Subnational HDI? Compare R² for levels (0.9823 vs 0.9750) and changes (0.9964 vs 0.9913) at a glance.
Glossary (open a card if a term is unfamiliar)
PCA (Principal Component Analysis)
Pooled standardisation
Per-period standardisation
PC1 weight
[0.5642, 0.5448, 0.6204] for Education, Health, Income in this post. PCA's data-driven recipe for combining indicators into one score.Variance explained
Sign convention
Spearman rank correlation
Subnational HDI (SHDI)
PCA Simulator — two periods, two recipes
The simulator generates two periods of synthetic development data with three indicators (mimicking Education / Health / Income). You set parameters; the app fits both pooled PCA and per-period PCA on the simulated data and shows the resulting PC1 weights, variance explained, and the period-to-period drift in per-period weights. The pooled weights should stay nearly identical when you re-seed. The per-period weights should wobble — that's the bug pooled PCA fixes.
Pooled PCA
Standardise on stacked data (300 obs), one set of weights.
Per-period PCA
Standardise on each period separately, two sets of weights.
What to look for
- Pooled weights are stable across re-seeds and across parameter changes. Try sliding the income shock from −0.40 to +0.20: pooled weights barely move.
- Per-period weights drift between P1 and P2. The bigger the level shift in any indicator, the more its per-period weight wobbles.
- PC1 mean shift is informative under pooled (positive for improvement, negative for decline) but is exactly zero under per-period by construction. That's how per-period hides real changes.
Bias vs variance over many simulations
Single runs are noisy. Run the full DGP 100 times with fresh seeds to see whether the per-period weight drift is systematic.
Real PC1 weights — pooled vs per-period 2013 vs per-period 2019
These weights come straight from the post's script.py run on
the 153-region South American panel. Each weight tells you how much PC1
weighs that indicator. Toggle methods and indicators to see
how the per-period weights shift between 2013 and 2019 while the pooled
weights sit in between as a fixed compromise. Confidence intervals are
approximate (±1.96 × bootstrap SE).
What to look for
- Education's weight drops from 0.583 (2013) to 0.541 (2019) under per-period — a real shift of −0.043. The pooled weight (0.564) sits between them.
- Health's weight rises from 0.510 (2013) to 0.566 (2019) — a jump of +0.056. Pooled gives 0.545, again the compromise.
- Income's weight is the most stable across methods (0.620–0.633): all three approaches agree Income carries the heaviest weight.
- The forest plot makes the recipe-instability of per-period PCA visible at a glance. Pooled PCA fixes one recipe across both years.
Indicators
Methods
Why does Income carry the heaviest weight?
In the South American panel the three indicators are positively but unequally correlated: Education–Income r = 0.68, Health–Income r = 0.63, Education–Health r = 0.44. Income sits between the other two, so it shares more common variance with both. PCA's first eigenvector loads more on the variable that participates in the most pairwise correlations — hence w(Income) = 0.620 > w(Education) = 0.564 > w(Health) = 0.545.
Connecting back to Tab 2
The per-period weight drift you slid in Tab 2 is exactly what shows up here on real data:
- Education weight: 0.583 (2013) → 0.541 (2019), drift = −0.043
- Health weight: 0.510 (2013) → 0.566 (2019), drift = +0.056
- Pooled holds the recipe fixed at [0.564, 0.545, 0.620] across both years.
The post's message becomes visible twice: once on synthetic data you control, and once on the original 306 region-period panel.
Validation — which PCA tracks the official SHDI better?
The Global Data Lab publishes an official Subnational HDI (SHDI) using a geometric mean methodology similar to the UNDP's. Both pooled and per-period PCA correlate strongly with it — but pooled PCA wins on both the level fit and the change fit. The differences are small in absolute terms (≈ 0.5–0.7 percentage points of R²) but consistent and policy-relevant: per-period PCA disagrees with pooled PCA on the direction of change for 16 of 153 regions (10.5%).
Cross-sectional fit (levels)
Correlation between PCA-based HDI and official SHDI across all 306 region-period observations.
Dynamic fit (changes)
Correlation between PCA-based HDI change (2019 − 2013) and official SHDI change.
Direction-of-change disagreement
Even when the methods agree on average (Spearman ρ for HDI change ranks = 0.9818), they disagree on the sign of the change for a non-trivial slice of the sample:
Buenos Aires — the running example
Argentina's capital improved in Education (0.926 → 0.946) and Health (0.858 → 0.872), with a modest Income decline (0.850 → 0.832). Pooled PCA correctly reports a modest improvement of +0.019. Per-period PCA reports a decline of −0.040. The shifting yardstick of per-period standardisation is what flips the sign.
A policymaker using per-period PCA might conclude that Buenos Aires "fell behind". A policymaker using pooled PCA sees that it improved modestly while being overtaken by Chilean regions that improved faster. Both narratives are useful — but the data should drive the narrative, not the choice of standardisation.