C-LASSO Latent Groups — Interactive Lab

A pedagogical companion to Identifying Latent Group Structures in Panel Data: The classifylasso Command in Stata ↗ Back to the post

Why search for latent groups in a panel?

Most panel-data models assume every country shares the same slope on every regressor. That is a strong assumption — and often a wrong one. The post you are reading shows that the pooled democracy-growth coefficient of +1.055 for 98 countries hides a sharp split: +2.151 in 57 countries and −0.936 in 41 countries. Pooling averaged a positive and a negative effect into a single misleading number.

The Classifier-LASSO (C-LASSO) of Su, Shi & Phillips (2016) discovers these latent groups directly from the data. This app lets you turn the dials yourself: sweep the LASSO penalty and watch coefficients snap to zero, simulate panels with two hidden groups and see when pooled OLS misleads, and explore the post's group-specific results in a forest plot.

LASSO (L1) vs Ridge (L2) — why LASSO selects, Ridge does not

Both methods shrink coefficients toward zero, but only LASSO drives them exactly to zero. The animation below shows the same coefficient under the two penalties as the penalty λ grows: the orange L1 estimate hits zero abruptly, the steel-blue L2 estimate asymptotes but never reaches zero. C-LASSO uses the same selection mechanic to collapse country-specific slopes onto a small number of group centres.

Tab 2

Penalty Lab

Slide λ and watch which controls survive. The same penalty mechanic that selects predictors here is what C-LASSO uses to sort countries into groups.

Tab 3

Pooled vs Group

Simulate a panel with two hidden groups whose slopes have opposite signs. Watch pooled OLS shrink to near zero while two-group estimates recover the truth.

Tab 4

Forest Plot

The post's headline numbers, interactively. Compare pooled fixed effects against C-LASSO Group 1 and Group 2 for both savings and democracy.

Glossary (open a card if a term is unfamiliar)

Slope heterogeneity
When the same regressor has a different coefficient across units. A pooled regression imposes one slope; if reality is heterogeneous, the pooled slope is a contaminated average.
Latent groups
Unobserved subsets of units sharing the same slope vector. Latent because membership is not known in advance — the algorithm discovers it.
Penalty λ
The knob controlling shrinkage. In C-LASSO, larger λ pulls more country-specific slopes toward a common group centre. This is the main slider in Tab 2.
Classifier-LASSO (C-LASSO)
A penalized panel estimator (Su, Shi & Phillips 2016) that jointly estimates the number of groups, group memberships, and group-specific slopes — in one optimization.
Information criterion (IC)
Balances fit against complexity to pick the optimal K. In both applications, IC selects K = 2 groups (savings IC: 0.054 → −0.028 → 0.059 → 0.131 → 0.213 for K = 1…5).
Postlasso step
Once C-LASSO assigns countries to groups, plain OLS is re-run within each group. The selection stage uses the penalty; the inference stage does not.
Simpson's paradox
When an aggregate trend reverses inside subgroups. The pooled democracy coefficient (+1.055) sits between the C-LASSO group estimates (+2.151 and −0.936) — it describes neither group accurately.
Nickell bias
The downward bias of the lagged-DV coefficient when fixed effects are applied to short panels. Within-demeaning correlates the lagged regressor with the demeaned error. C-LASSO's dynamic option uses a half-panel jackknife to remove it.

Penalty Lab — turn the LASSO knob yourself

The simulated data has one focal regressor and many candidates. The focal coefficient α = 0.5 (orange curve below). Drag the λ slider and watch coefficients shrink to exactly zero, one at a time. The same shrinkage mechanic is what C-LASSO uses to pull country-specific slopes toward a small number of group centres in the post.

More data ⇒ each coefficient is estimated more precisely.
About 15% have a true nonzero effect; the rest are noise — mirroring the unit-by-unit C-LASSO first stage.
Magnitude of the truly-relevant coefficients relative to noise.
Slide left for less shrinkage (more coefficients survive); right for more.
coefficients kept
out of candidates
α̂ from raw LASSO
shrunk toward zero
α̂ from post-OLS
refit on selected support
true α
0.50
held fixed for comparison

What to look for

  • Sparsity grows with λ. Slide right: more coefficients are pinned to zero. Slide left: more re-enter. At λ ≈ 0 you recover OLS. In C-LASSO, this same dial controls how many distinct group centres survive.
  • The post-OLS α̂ tracks the true α more closely than the raw LASSO α̂. LASSO biases all coefficients toward zero. The postlasso refit removes that bias on the selected support — which is exactly why the post reports postlasso group coefficients.
  • The orange focal coefficient stays in. Try a large p and large λ: most predictors disappear, but the focal effect remains visible. C-LASSO does the analogous thing — it does not zero out group slopes, only the differences between units within a group.

Pooled vs Group — when averaging cancels the truth

The post's headline finding is Simpson's paradox in panel data: the pooled democracy coefficient (+1.055) sits between +2.151 (Group 1, 57 countries) and −0.936 (Group 2, 41 countries). It describes neither group. Below, we simulate a panel where the focal coefficient is positive in one half of the units and negative in the other half. Watch what pooled OLS does to the average — and how recovering the two groups uncovers the truth.

Capped at 300 so the "Run 100 sims" button finishes quickly.
Capped at 50 for the 100-sim run.
Common scale for both groups.
0 = both groups share the same slope · 1 = groups have strongly opposite slopes (the post's setting).

Pooled FE single slope for everyone

Assumes all units share β_i = β — averages opposite signs together.

α̂
SE(α̂)
|I_y|
|I_d|
union |I_y ∪ I_d|
λ_y, λ_d

C-LASSO-style select-then-refit on groups

A data-driven selection step finds the right support; postlasso OLS refits without shrinkage — the procedure the post uses for inference.

α̂
SE(α̂)
|I_y|
|I_d|
union |I_y ∪ I_d|
λ_y, λ_d

What to look for

  • Crank up the asymmetry slider. When groups disagree strongly (asymmetry → 1), the pooled estimate collapses toward zero even though the truth is far from zero in each group. This is exactly the savings model: pooled CPI = +0.030 hides −0.181 and +0.478.
  • Reduce asymmetry to 0. Both groups now share the same slope. The pooled estimate is now unbiased — there is no latent structure to discover. C-LASSO would correctly pick K = 1 here.
  • Variance vs bias. The pooled estimate has lower variance (it uses all N units) but is badly biased under heterogeneity. C-LASSO trades a little variance for a lot of bias reduction — when the data supports a multi-group structure.

Bias vs. variance over many simulations

Single runs are noisy. Run the whole pipeline 100 times with fresh draws (same parameters, different random shocks) to see whether the pooled bias is systematic.

The post's headline numbers — interactively

These coefficients come straight from the post's Stata C-LASSO output for the savings and democracy applications. Toggle outcomes and methods to compare. Hover a point to see its standard error, 95% CI, and the number of countries each estimate is built on.

What to look for

  • The democracy split is the most dramatic. Pooled FE gives +1.055. C-LASSO Group 1 (57 countries) gives +2.151. Group 2 (41 countries) gives −0.936. The confidence bands of G1 and G2 do not overlap.
  • The CPI sign reversal survives the dynamic specification. Toggle between "CPI → Savings (static)" and "CPI → Savings (dynamic)". Group 1 stays negative (−0.181 → −0.160); Group 2 stays positive (+0.478 → +0.197). The groups are robust.
  • Pooled FE bars hug zero or sit between the groups. For both CPI and Interest, the pooled estimate is small and indistinguishable from zero — averaging opposite-signed effects cancels them out.

Outcomes

Methods

Why does C-LASSO matter here?

The pooled democracy coefficient of +1.055 sits between +2.151 and −0.936 — it describes neither group. A policymaker reading the pooled result would conclude democracy universally promotes growth. They would miss that for 41 countries (42% of the sample), the conditional association runs in the opposite direction. The same logic flips an "insignificant" pooled CPI coefficient (+0.030) into two highly significant group-specific effects of opposite sign (−0.181 and +0.478). C-LASSO does not invent these patterns — it reveals them.

Connecting back to Tab 3

The pooled-versus-group simulation in Tab 3 is what happens here on the real data:

  • Democracy: pooled gives +1.055; group split is +2.151 vs −0.936 — sign reversal across 42% of countries.
  • CPI on savings (static): pooled gives +0.030 (insignificant); group split is −0.181 vs +0.478 — both highly significant, opposite signs.
  • CPI on savings (dynamic): pooled is still small; group split survives at −0.160 vs +0.197 — robustness across specifications.

The lesson from §10 of the post is therefore visible twice: once on a simulation where you set the truth, and once on the original panels of 56 countries (savings) and 98 countries (democracy).