<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>sdid | Carlos Mendez</title><link>https://carlos-mendez.org/tag/sdid/</link><atom:link href="https://carlos-mendez.org/tag/sdid/index.xml" rel="self" type="application/rss+xml"/><description>sdid</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>© 2018–2026 Carlos Mendez. All rights reserved.</copyright><lastBuildDate>Sun, 07 Jun 2026 00:00:00 +0000</lastBuildDate><image><url>https://carlos-mendez.org/media/icon_huedfae549300b4ca5d201a9bd09a3ecd5_79625_512x512_fill_lanczos_center_3.png</url><title>sdid</title><link>https://carlos-mendez.org/tag/sdid/</link></image><item><title>Staggered Synthetic Difference-in-Differences (SDID) in Stata: Gender Quotas and Women in Parliament</title><link>https://carlos-mendez.org/post/stata_sdid_staggered/</link><pubDate>Sun, 07 Jun 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/stata_sdid_staggered/</guid><description>&lt;h2 id="abstract">Abstract&lt;/h2>
&lt;p>Most real-world policies are not adopted on a single clock — parliamentary gender quotas, minimum-wage laws, and carbon taxes arrive in different units in different years, a staggered-adoption design where naive two-way fixed-effects difference-in-differences quietly breaks by using already-treated units as controls and placing negative weights on some effects. This tutorial extends synthetic difference-in-differences (SDID) to staggered adoption and applies it in Stata to a question in political economy: do parliamentary gender quotas raise the share of women in national parliaments? It uses the &lt;code>quota_example&lt;/code> dataset distributed with the &lt;code>sdid&lt;/code> package (Bhalotra, Clarke, Gomes &amp;amp; Venkataramani, 2023) — a balanced panel of 119 countries observed annually from 1990 to 2015 (3,094 observations), in which 9 countries adopt a quota across 7 cohorts (2000, 2002, 2003, 2005, 2010, 2012, 2013) and 110 remain never-treated. The method estimates a separate, clean SDID per cohort against the never-treated donor pool, then aggregates the cohort effects into the overall ATT with non-negative treated-period-share weights, complemented by the &lt;code>sdid_event&lt;/code> event study and bootstrap, jackknife, and placebo inference. The overall ATT is +8.03 percentage points (SE 3.74, p = 0.032), robust to a log-GDP control (8.05 optimized, 8.06 projected), but the cohort effects swing from −3.5 to +21.8 points, with flat pre-adoption placebos supporting parallel synthetic trends and dynamic effects that appear immediately and persist for over a decade. The lesson is that a single headline number summarizes real heterogeneity, and that transparent, non-negative cohort weighting is essential when treatment timing is staggered.&lt;/p>
&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>In a &lt;a href="https://carlos-mendez.org/post/stata_sdid/">previous tutorial&lt;/a>, one unit — California — adopted one policy — Proposition 99 — in one year — 1989. That &lt;strong>block design&lt;/strong> is the textbook setting for synthetic difference-in-differences (SDID). But most real policies do not arrive on a single clock. Parliamentary gender quotas, minimum-wage laws, carbon taxes, and clean-air regulations are adopted by &lt;strong>different units in different years&lt;/strong>. This is the &lt;strong>staggered adoption&lt;/strong> design, and it is where naive panel methods quietly break.&lt;/p>
&lt;p>This tutorial extends SDID to staggered adoption and applies it in Stata to a real question in political economy: &lt;strong>do parliamentary gender quotas raise the share of women in national parliaments?&lt;/strong> We use the &lt;code>quota_example&lt;/code> dataset that ships with the &lt;code>sdid&lt;/code> package — 119 countries observed annually from 1990 to 2015, in which 9 countries adopt a gender quota across 7 different cohorts (2000, 2002, 2003, 2005, 2010, 2012, and 2013).&lt;/p>
&lt;p>The headline is a story about heterogeneity. The overall effect of quotas is about &lt;strong>+8 percentage points&lt;/strong> of women in parliament, but the cohort-by-cohort effects swing from &lt;strong>−3.5 to +21.8 points&lt;/strong>. A single number hides that range — and, as we will see, the naive two-way fixed-effects regression that most people reach for first can hide even more.&lt;/p>
&lt;details>
&lt;summary>&lt;b>Why does staggered timing break the naive regression?&lt;/b> (click to expand)&lt;/summary>
&lt;p>The workhorse for panel policy evaluation is the &lt;strong>two-way fixed-effects (TWFE)&lt;/strong> regression — unit dummies, time dummies, and a treatment dummy. With one adoption date it estimates a clean difference-in-differences. With &lt;em>staggered&lt;/em> timing and &lt;em>heterogeneous&lt;/em> effects, the same regression implicitly uses &lt;strong>already-treated units as controls for later adopters&lt;/strong> (&amp;ldquo;forbidden comparisons&amp;rdquo;). The result is a variance-weighted average of every 2×2 comparison in the panel, and some of those weights can be &lt;strong>negative&lt;/strong> — so the estimate can even take the wrong sign (Goodman-Bacon, 2021; de Chaisemartin &amp;amp; D&amp;rsquo;Haultfœuille, 2020). Staggered SDID sidesteps this by estimating a &lt;strong>separate, clean&lt;/strong> SDID effect for each adoption cohort and aggregating with transparent, non-negative weights.&lt;/p>
&lt;/details>
&lt;pre>&lt;code class="language-mermaid">graph TD
subgraph &amp;quot;Block design — predecessor (Prop 99)&amp;quot;
B1[&amp;quot;California&amp;lt;br/&amp;gt;adopts 1989&amp;quot;] --&amp;gt; BATT[&amp;quot;one ATT&amp;quot;]
B2[&amp;quot;other states&amp;lt;br/&amp;gt;never treated&amp;quot;] --&amp;gt; BATT
end
subgraph &amp;quot;Staggered design — this post (gender quotas)&amp;quot;
S1[&amp;quot;cohort 2000&amp;quot;] --&amp;gt; SATT[&amp;quot;aggregate ATT&amp;quot;]
S2[&amp;quot;cohort 2002&amp;quot;] --&amp;gt; SATT
S3[&amp;quot;cohorts 2003 to 2013&amp;quot;] --&amp;gt; SATT
SC[&amp;quot;110 never-treated&amp;lt;br/&amp;gt;controls&amp;quot;] -.donor pool.-&amp;gt; SATT
end
style B1 fill:#d97757,stroke:#141413,color:#fff
style B2 fill:#6a9bcc,stroke:#141413,color:#fff
style BATT fill:#00d4c8,stroke:#141413,color:#141413
style S1 fill:#d97757,stroke:#141413,color:#fff
style S2 fill:#d97757,stroke:#141413,color:#fff
style S3 fill:#d97757,stroke:#141413,color:#fff
style SC fill:#6a9bcc,stroke:#141413,color:#fff
style SATT fill:#00d4c8,stroke:#141413,color:#141413
&lt;/code>&lt;/pre>
&lt;h3 id="11-learning-objectives">1.1 Learning objectives&lt;/h3>
&lt;p>By the end of this tutorial you will be able to:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Explain&lt;/strong> why staggered adoption breaks naive TWFE difference-in-differences, and how per-cohort SDID avoids the forbidden-comparison problem.&lt;/li>
&lt;li>&lt;strong>Derive&lt;/strong> the SDID estimator from first principles — unit weights $\omega$, time weights $\lambda$, and the weighted two-way fixed-effects objective — and the rule that aggregates cohort-specific effects $\hat{\tau}_a$ into one overall ATT.&lt;/li>
&lt;li>&lt;strong>Estimate&lt;/strong> the effect of gender quotas with &lt;code>sdid&lt;/code> on a staggered panel, add a covariate two different ways (&lt;code>optimized&lt;/code> vs &lt;code>projected&lt;/code>), and choose among bootstrap, jackknife, and placebo inference.&lt;/li>
&lt;li>&lt;strong>Read&lt;/strong> an SDID event-study plot produced by &lt;code>sdid_event&lt;/code>, distinguishing pre-trend placebo coefficients from post-period dynamic effects.&lt;/li>
&lt;/ul>
&lt;h2 id="2-key-concepts-at-a-glance">2. Key concepts at a glance&lt;/h2>
&lt;p>Each card gives a plain-language &lt;strong>definition&lt;/strong>, a concrete &lt;strong>example&lt;/strong> from this quota study, and an everyday &lt;strong>analogy&lt;/strong>. Open any term that is unfamiliar.&lt;/p>
&lt;details>
&lt;summary>&lt;b>1. ATT (average treatment effect on the treated)&lt;/b> — the question we actually answer.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> The effect of adopting a quota on the women-in-parliament share, &lt;em>in the countries that adopted one&lt;/em>, averaged over their post-adoption years. It is not the effect a quota would have everywhere — only where one was actually tried.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> Our headline ATT is &lt;strong>+8.0 percentage points&lt;/strong>: across the nine adopting countries, quotas raised women&amp;rsquo;s parliamentary share by about eight points relative to their no-quota counterfactual.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> Like asking &amp;ldquo;how much did the patients who &lt;em>took&lt;/em> the drug improve?&amp;rdquo; — not &amp;ldquo;how much would everyone improve?&amp;rdquo; You measure only the units that were actually treated.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>2. Synthetic control&lt;/b> — a made-to-order comparison country.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> A weighted blend of never-treated &amp;ldquo;donor&amp;rdquo; countries, built so its pre-adoption path mimics the treated cohort. It stands in for the unobservable counterfactual: what the cohort&amp;rsquo;s outcome &lt;em>would&lt;/em> have been without a quota.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> The 2002 cohort&amp;rsquo;s synthetic control mixes dozens of donors (Belgium, Paraguay, Cuba, …) so that, before 2002, the blend tracks the cohort&amp;rsquo;s trend — then keeps going as the cohort would have without the law.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> A stunt double cast to match the lead actor&amp;rsquo;s build and movement — close enough that, in the shots you cannot film the star, the double stands in convincingly.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>3. Unit weights (ω)&lt;/b> — how much each donor counts.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> Non-negative weights, one per donor country, summing to one, that build the synthetic control. Each cohort gets its own ω.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> In the 2000 cohort, 80 donors receive nonzero weight — Argentina ≈ 0.061, Guatemala ≈ 0.057, Austria ≈ 0.045 — a &lt;em>diffuse&lt;/em> blend rather than one or two stand-ins.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> A recipe calling for many ingredients in small, precise amounts: no single one dominates, so the dish survives a bad batch of any one ingredient.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>4. Time weights (λ)&lt;/b> — which "before" years matter.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> Non-negative weights on the pre-adoption years, summing to one, that decide which pre-periods define the baseline. They up-weight the years most like the post-period.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> For the 2002 cohort, λ concentrates on the late 1990s and 2001 rather than spreading evenly across 1990–2001 — the recent past is the relevant baseline.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> Forecasting tomorrow&amp;rsquo;s weather, you trust last week far more than the same date five years ago. Time weights formalize &amp;ldquo;recent and similar counts more.&amp;rdquo;&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>5. Adoption cohort (a)&lt;/b> — units that switch on together.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> The set of countries that first adopt a quota in the same calendar year. Staggered SDID runs one self-contained SDID per cohort, always against the never-treated controls.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> There are seven cohorts — 2000, 2002, 2003, 2005, 2010, 2012, 2013 — with two countries each in 2002 and 2003, and one in the rest.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> School graduating classes: the &amp;ldquo;class of 2002&amp;rdquo; and the &amp;ldquo;class of 2010&amp;rdquo; share a start date and are analyzed as groups, even though all attend the same school.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>6. Staggered adoption &amp;amp; the forbidden comparison&lt;/b> — why the naive regression breaks.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> Staggered adoption means units are treated at different times. The hazard: a two-way fixed-effects regression can use &lt;em>already-treated&lt;/em> units as controls for &lt;em>later&lt;/em> adopters — a &amp;ldquo;forbidden comparison&amp;rdquo; that places negative weights on some effects and can flip the sign.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> When the 2012 cohort adopts, a naive TWFE quietly treats the 2002 cohort — already treated, already changed — as part of its control group. Staggered SDID never does this: each cohort is compared only to the 110 never-treated countries.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> Timing a late runner against runners who already crossed the line and slowed to a walk — your &amp;ldquo;control&amp;rdquo; is contaminated because it has already run the race.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>7. Event time (relative period)&lt;/b> — every cohort on its own clock.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> Time measured relative to each cohort&amp;rsquo;s &lt;em>own&lt;/em> adoption year (… −2, −1, 0, +1 …), so cohorts that adopted in different calendar years can be lined up and averaged.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> Event time 0 is the year 2000 for the first cohort but 2013 for the last; re-centring lets us ask &amp;ldquo;what happens three years &lt;em>after&lt;/em> a quota?&amp;rdquo; across all cohorts at once.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> Comparing marathon runners by their own start gun, not the wall clock: a runner who started at 9:05 and one who started at 9:20 are both &amp;ldquo;at mile 10&amp;rdquo; measured from their own start.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>8. ATT aggregation&lt;/b> — from many cohort effects to one number.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> The overall ATT is a weighted average of the cohort effects, each weighted by its share of treated unit-by-post-period observations — earlier, longer-exposed, larger cohorts count more.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> The seven cohort effects span &lt;strong>−3.5 to +21.8&lt;/strong>; weighted by treated country-years they average to &lt;strong>+8.0&lt;/strong> (the plain unweighted mean would be ≈ 7.0).&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> A course grade that weights the final exam more than a pop quiz: the cohorts you observe for longer carry more of the final mark.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>9. Pre-trend placebo test&lt;/b> — the assumption you can see.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> Event-study coefficients for the &lt;em>pre-adoption&lt;/em> periods. If treated and synthetic-control countries moved in parallel before treatment, these sit near zero — a falsification check.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> For the 2002 cohort, all twelve pre-period placebos fall in &lt;strong>[−0.2, +0.8]&lt;/strong> points — flat, so we cannot reject parallel synthetic trends.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> Checking a scale by weighing nothing first: if it does not read zero when empty, you distrust every later reading. Flat placebos are that &amp;ldquo;reads zero when empty&amp;rdquo; check.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>10. Bootstrap, jackknife, placebo&lt;/b> — three rulers for uncertainty.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> Three ways to attach a standard error to the ATT. With many treated units all three are available; they share one point estimate but report different spread.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> On the two-cohort subsample the ATT is &lt;strong>10.3&lt;/strong> for all three, but the SE is &lt;strong>4.7&lt;/strong> (bootstrap), &lt;strong>6.0&lt;/strong> (jackknife, most conservative), and &lt;strong>2.3&lt;/strong> (placebo, tightest).&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> Measuring a table with a tape, a folding ruler, and a laser: they agree on the length but disagree on the error bars — the cautious carpenter reports the widest.&lt;/p>
&lt;/details>
&lt;h2 id="3-the-data-gender-quotas-across-119-countries">3. The data: gender quotas across 119 countries&lt;/h2>
&lt;p>We use &lt;code>quota_example.dta&lt;/code>, the balanced panel from Bhalotra, Clarke, Gomes &amp;amp; Venkataramani (2023) distributed with the &lt;code>sdid&lt;/code> package. The outcome is the percentage of seats held by women in the national parliament; the treatment is the adoption of a reserved-seat gender quota; the covariate is log GDP per capita.&lt;/p>
&lt;pre>&lt;code class="language-stata">webuse set www.damianclarke.net/stata/
webuse quota_example, clear
label variable quota &amp;quot;Parliamentary gender quota&amp;quot;
xtset country year
codebook country year quota womparl lngdp, compact
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Variable Obs Unique Mean Min Max Label
----------------------------------------------------------------------------
country 3094 119 . . . Country
year 3094 26 2002.5 1990 2015 Year
quota 3094 2 .0303814 0 1 =1 if country has a gender quota
womparl 3094 449 14.96531 0 63.8 Women in parliament
lngdp 2990 2956 9.154291 5.8701 11.61789 log(GDP)
----------------------------------------------------------------------------
&lt;/code>&lt;/pre>
&lt;p>The panel is &lt;strong>balanced&lt;/strong>: 119 countries times 26 years equals 3,094 observations, with no gaps in the outcome or treatment (&lt;code>lngdp&lt;/code> has 104 missing values, which will matter only when we add the covariate). The treatment indicator &lt;code>quota&lt;/code> equals one for just 3% of observations, a reminder that treated country-years are scarce. Crucially, &lt;code>quota&lt;/code> is &lt;strong>absorbing&lt;/strong> — once a country adopts a quota it stays treated — which SDID requires.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Role&lt;/th>
&lt;th>Symbol&lt;/th>
&lt;th>Description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>country&lt;/code>&lt;/td>
&lt;td>unit&lt;/td>
&lt;td>$i$&lt;/td>
&lt;td>119 countries (9 ever-treated, 110 never-treated)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>year&lt;/code>&lt;/td>
&lt;td>time&lt;/td>
&lt;td>$t$&lt;/td>
&lt;td>1990–2015 (26 years)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>womparl&lt;/code>&lt;/td>
&lt;td>outcome&lt;/td>
&lt;td>$Y_{it}$&lt;/td>
&lt;td>% women in the national parliament&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>quota&lt;/code>&lt;/td>
&lt;td>treatment&lt;/td>
&lt;td>$W_{it}$&lt;/td>
&lt;td>1 once a country has a quota, 0 before / never&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>lngdp&lt;/code>&lt;/td>
&lt;td>covariate&lt;/td>
&lt;td>$X_{it}$&lt;/td>
&lt;td>log GDP per capita&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>The estimand.&lt;/strong> Our target is the &lt;strong>average treatment effect on the treated (ATT)&lt;/strong>: the effect of adopting a quota on the women-in-parliament share &lt;em>in the countries that adopted one&lt;/em>, averaged over their post-adoption years. Formally,&lt;/p>
&lt;p>$$
\tau = \frac{1}{N_{tr}\, T_{post}} \sum_{i:\, W_i = 1}\ \sum_{t &amp;gt; T_{pre}} \left[\, Y_{it}(1) - Y_{it}(0) \,\right]
$$&lt;/p>
&lt;p>In words: for every treated country and every post-adoption year, take the gap between the share of women &lt;em>with&lt;/em> a quota, $Y_{it}(1)$, and the share that &lt;em>would have occurred without one&lt;/em>, $Y_{it}(0)$ — then average. The first term is observed; the second is the counterfactual that the synthetic control must impute, because we never see a quota-adopting country in the parallel world where it abstained.&lt;/p>
&lt;p>&lt;strong>An observational, not experimental, setting.&lt;/strong> Quotas are not randomly assigned. Countries that adopt them early may differ systematically — they may be wealthier, more democratic, or already on a rising trajectory of women&amp;rsquo;s representation. That is exactly why we need a method that builds a &lt;em>credible counterfactual&lt;/em> from comparison countries rather than assuming a simple before/after change would have held. Identification rests on assumptions we will keep visible: that treated and synthetic-control countries share a &lt;strong>common (synthetic) trend&lt;/strong> absent treatment, &lt;strong>no anticipation&lt;/strong> of the quota, &lt;strong>no spillovers&lt;/strong> across countries, and that adoption timing is not itself driven by the outcome&amp;rsquo;s future path.&lt;/p>
&lt;h3 id="31-the-staggered-structure">3.1 The staggered structure&lt;/h3>
&lt;p>Before modelling, let us see the timing directly. The adoption year is the first year a country is treated; we tabulate the cohorts.&lt;/p>
&lt;pre>&lt;code class="language-stata">bysort country (year): egen firsttreat = min(cond(quota==1, year, .))
preserve
keep country firsttreat
duplicates drop
tab firsttreat, missing
restore
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> firsttreat | Freq. Percent Cum.
------------+-----------------------------------
2000 | 1 0.84 0.84
2002 | 2 1.68 2.52
2003 | 2 1.68 4.20
2005 | 1 0.84 5.04
2010 | 1 0.84 5.88
2012 | 1 0.84 6.72
2013 | 1 0.84 7.56
. | 110 92.44 100.00
------------+-----------------------------------
Total | 119 100.00
&lt;/code>&lt;/pre>
&lt;p>Nine countries adopt a quota, spread across &lt;strong>seven cohorts&lt;/strong>; the 2002 and 2003 cohorts contain two countries each, the rest one. The remaining &lt;strong>110 countries are never treated&lt;/strong> — they form the donor pool from which every cohort&amp;rsquo;s synthetic control is built. This staircase of adoption dates is the defining feature of a staggered design, and the reason a single &amp;ldquo;post&amp;rdquo; dummy is too blunt.&lt;/p>
&lt;h2 id="4-exploratory-analysis-with-panelview">4. Exploratory analysis with &lt;code>panelview&lt;/code>&lt;/h2>
&lt;p>A staggered design is best understood by &lt;em>looking&lt;/em> at it. The &lt;code>panelview&lt;/code> command (Xu &amp;amp; Hua) draws two pictures we need: a heatmap of &lt;em>who is treated when&lt;/em>, and the raw outcome trajectories colored by treatment status.&lt;/p>
&lt;pre>&lt;code class="language-stata">ssc install panelview, replace
panelview womparl quota, i(country) t(year) type(treat) bytiming
panelview womparl quota, i(country) t(year) type(outcome)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_sdid_staggered_panelview_treat.png" alt="Treatment-timing heatmap: countries sorted by adoption year reveal the staggered staircase">&lt;/p>
&lt;p>The treatment heatmap (&lt;code>type(treat)&lt;/code>, sorted with &lt;code>bytiming&lt;/code>) makes the staggered structure unmistakable: the dark treated cells appear in the &lt;strong>top-right corner as a staircase&lt;/strong>, each step a different cohort switching on between 2000 and 2013, against a sea of never-treated controls. This is the visual opposite of a block design, where every treated cell would switch on in the same column.&lt;/p>
&lt;p>&lt;img src="stata_sdid_staggered_panelview_outcome.png" alt="Outcome trajectories: treated countries (orange) against the control spaghetti (blue)">&lt;/p>
&lt;p>The outcome plot (&lt;code>type(outcome)&lt;/code>) overlays all 119 women-in-parliament series, with the 9 treated countries in orange. Several treated countries start near the bottom of the distribution and climb steeply after their adoption year — a hint of a positive effect — but the climbs begin at different times, and a few treated countries barely move. No single &amp;ldquo;treated average&amp;rdquo; line could summarize this; we need cohort-specific counterfactuals.&lt;/p>
&lt;pre>&lt;code class="language-stata">collapse (mean) womparl, by(evertreat year)
* ... reshape and plot ever- vs never-adopting means ...
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_sdid_staggered_raw_trends.png" alt="Mean outcome: ever-adopting vs never-adopting countries">&lt;/p>
&lt;p>Collapsing to group means tells a cautionary tale. The ever-adopting countries (orange) start the 1990s &lt;strong>below&lt;/strong> the never-adopting countries (about 4% vs 10% women in parliament) and end &lt;strong>above&lt;/strong> them by 2015 (about 23% vs 22%). A naive eyeball difference-in-differences on these two lines would be badly confounded: the groups began at different levels and the &amp;ldquo;treated&amp;rdquo; line aggregates countries that switched on in seven different years. The raw means motivate the machinery to come — we must compare each cohort to a &lt;em>tailored&lt;/em> synthetic control, not to the grand average.&lt;/p>
&lt;h2 id="5-synthetic-difference-in-differences-from-first-principles">5. Synthetic difference-in-differences from first principles&lt;/h2>
&lt;p>Before tackling staggered timing, fix ideas with a single cohort. SDID (Arkhangelsky et al., 2021) is a &lt;strong>weighted two-way fixed-effects regression&lt;/strong>. It chooses an ATT, a constant, unit fixed effects, and time fixed effects to minimize a weighted sum of squared residuals:&lt;/p>
&lt;p>$$
\left(\hat{\tau}, \hat{\mu}, \hat{\alpha}, \hat{\beta}\right) = \arg\min_{\tau,\mu,\alpha,\beta} \sum_{i=1}^{N} \sum_{t=1}^{T} \left(Y_{it} - \mu - \alpha_i - \beta_t - W_{it}\,\tau\right)^{2}\, \hat{\omega}_i\, \hat{\lambda}_t
$$&lt;/p>
&lt;p>In words: run a difference-in-differences regression, but weight each observation by a &lt;strong>unit weight&lt;/strong> $\hat{\omega}_i$ times a &lt;strong>time weight&lt;/strong> $\hat{\lambda}_t$. Here $\alpha_i$ is a country fixed effect, $\beta_t$ a year fixed effect, $W_{it}$ the treatment dummy, and $\tau$ the ATT we want. Set all weights equal and you recover ordinary DiD; the weights are what make SDID special. They are not free parameters — each solves its own optimization.&lt;/p>
&lt;p>The &lt;strong>unit weights&lt;/strong> are chosen so that a weighted blend of control countries tracks the treated cohort across the pre-period:&lt;/p>
&lt;p>$$
\hat{\omega} = \arg\min_{\omega_0,\, \omega \ge 0} \sum_{t=1}^{T_{pre}} \left(\omega_0 + \sum_{i=1}^{N_{co}} \omega_i\, Y_{it} - \frac{1}{N_{tr}} \sum_{i=1}^{N_{tr}} Y_{it}\right)^{2} + \zeta^{2}\, T_{pre}\, \lVert \omega \rVert^{2}
$$&lt;/p>
&lt;p>The bracketed term asks the synthetic control $\sum_i \omega_i Y_{it}$ (plus an intercept $\omega_0$) to match the treated average in every pre-adoption year. The intercept $\omega_0$ is the SDID twist: it lets the synthetic match the treated &lt;em>trend&lt;/em> without matching its &lt;em>level&lt;/em>, because any constant level gap is later absorbed by the unit fixed effect $\alpha_i$. The final term is a &lt;strong>ridge penalty&lt;/strong> with regularization strength $\zeta$; it spreads weight across many donors instead of concentrating it on a few, which stabilizes the estimate. (Synthetic control, by contrast, drops $\omega_0$ and the penalty and must match the level too.)&lt;/p>
&lt;p>The &lt;strong>time weights&lt;/strong> are the mirror image — they pick the pre-period years that best predict each control country&amp;rsquo;s post-period average:&lt;/p>
&lt;p>$$
\hat{\lambda} = \arg\min_{\lambda_0,\, \lambda \ge 0} \sum_{i=1}^{N_{co}} \left(\lambda_0 + \sum_{t=1}^{T_{pre}} \lambda_t\, Y_{it} - \frac{1}{T_{post}} \sum_{t=T_{pre}+1}^{T} Y_{it}\right)^{2} + \zeta_{\lambda}^{2}\, N_{co}\, \lVert \lambda \rVert^{2}
$$&lt;/p>
&lt;p>Years that look most like the post-period get the most weight, so the &amp;ldquo;before&amp;rdquo; comparison is built from the most relevant history rather than a flat average over possibly-irrelevant early years. The two weighting schemes together are what distinguish SDID from its cousins, as the table summarizes.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Unit weights $\omega$&lt;/th>
&lt;th>Time weights $\lambda$&lt;/th>
&lt;th>Unit FE $\alpha_i$&lt;/th>
&lt;th>Must match&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>DiD&lt;/strong>&lt;/td>
&lt;td>uniform&lt;/td>
&lt;td>uniform&lt;/td>
&lt;td>yes&lt;/td>
&lt;td>trend on &lt;em>all&lt;/em> controls&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Synthetic control&lt;/strong>&lt;/td>
&lt;td>optimized&lt;/td>
&lt;td>uniform&lt;/td>
&lt;td>&lt;strong>no&lt;/strong>&lt;/td>
&lt;td>level &lt;em>and&lt;/em> trend&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>SDID&lt;/strong>&lt;/td>
&lt;td>optimized&lt;/td>
&lt;td>optimized&lt;/td>
&lt;td>yes&lt;/td>
&lt;td>trend (level gap allowed)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="6-the-staggered-extension-per-cohort-effects-and-their-aggregation">6. The staggered extension: per-cohort effects and their aggregation&lt;/h2>
&lt;p>Staggered SDID is a disarmingly simple idea: &lt;strong>do the single-cohort analysis once per adoption cohort, then average.&lt;/strong> For each cohort $a$, take only that cohort&amp;rsquo;s treated countries plus the pure never-treated controls, solve the SDID problem above on that sub-panel to get its own $\hat{\omega}_a$, $\hat{\lambda}_a$, and cohort effect $\hat{\tau}_a$. Because each cohort is compared &lt;strong>only to never-treated controls&lt;/strong>, an already-treated unit is never used as a control for a later adopter — precisely the contamination that breaks naive TWFE.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
POOL[&amp;quot;110 never-treated&amp;lt;br/&amp;gt;controls (donor pool)&amp;quot;]
C1[&amp;quot;Cohort 2000&amp;lt;br/&amp;gt;+ controls&amp;quot;]
C2[&amp;quot;Cohort 2002&amp;lt;br/&amp;gt;+ controls&amp;quot;]
CD[&amp;quot;Cohorts 2003…2013&amp;lt;br/&amp;gt;+ controls&amp;quot;]
T1[&amp;quot;SDID &amp;amp;rarr; &amp;amp;tau;&amp;lt;sub&amp;gt;2000&amp;lt;/sub&amp;gt; = 8.4&amp;quot;]
T2[&amp;quot;SDID &amp;amp;rarr; &amp;amp;tau;&amp;lt;sub&amp;gt;2002&amp;lt;/sub&amp;gt; = 7.0&amp;quot;]
TD[&amp;quot;SDID &amp;amp;rarr; &amp;amp;tau;&amp;lt;sub&amp;gt;a&amp;lt;/sub&amp;gt;&amp;lt;br/&amp;gt;(&amp;amp;minus;3.5 … +21.8)&amp;quot;]
ATT[&amp;quot;Aggregate ATT = 8.0&amp;lt;br/&amp;gt;weighted by treated periods&amp;quot;]
POOL --&amp;gt; C1 --&amp;gt; T1 --&amp;gt; ATT
POOL --&amp;gt; C2 --&amp;gt; T2 --&amp;gt; ATT
POOL --&amp;gt; CD --&amp;gt; TD --&amp;gt; ATT
style POOL fill:#6a9bcc,stroke:#141413,color:#fff
style C1 fill:#d97757,stroke:#141413,color:#fff
style C2 fill:#d97757,stroke:#141413,color:#fff
style CD fill:#d97757,stroke:#141413,color:#fff
style T1 fill:#1f2b5e,stroke:#6a9bcc,color:#fff
style T2 fill:#1f2b5e,stroke:#6a9bcc,color:#fff
style TD fill:#1f2b5e,stroke:#6a9bcc,color:#fff
style ATT fill:#00d4c8,stroke:#141413,color:#141413
&lt;/code>&lt;/pre>
&lt;p>The overall ATT aggregates the cohort effects with &lt;strong>non-negative&lt;/strong> weights equal to each cohort&amp;rsquo;s share of treated unit-by-post-period observations:&lt;/p>
&lt;p>$$
\widehat{ATT} = \sum_{a \in \mathcal{A}} \frac{N_{tr}^{a}\, T_{post}^{a}}{\sum_{b \in \mathcal{A}} N_{tr}^{b}\, T_{post}^{b}}\ \hat{\tau}_a
$$&lt;/p>
&lt;p>In words: a cohort counts in proportion to how many treated country-years it contributes. The 2000 cohort, treated for 16 years (2000–2015), carries more weight than the 2013 cohort, treated for only 3. This is the staggered generalization of single-cohort SDID, and — unlike TWFE — every weight is positive and interpretable. (When each cohort has one treated unit, this reduces to the post-period share $T_{post}^{a}/T_{post}$ from Clarke et al., 2024.)&lt;/p>
&lt;h2 id="7-estimation-in-stata">7. Estimation in Stata&lt;/h2>
&lt;p>One command does the whole staggered procedure. We request bootstrap inference and a fixed seed for reproducibility.&lt;/p>
&lt;pre>&lt;code class="language-stata">sdid womparl country year quota, vce(bootstrap) seed(1213)
matrix list e(tau)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Synthetic Difference-in-Differences Estimator
-----------------------------------------------------------------------------
womparl | ATT Std. Err. t P&amp;gt;|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------
quota | 8.03410 3.74040 2.15 0.032 0.70305 15.36516
-----------------------------------------------------------------------------
&lt;/code>&lt;/pre>
&lt;p>The overall &lt;strong>ATT is +8.03 percentage points&lt;/strong> (SE 3.74, $t=2.15$, $p=0.032$), with a 95% confidence interval of [0.70, 15.37] that excludes zero. Substantively: adopting a parliamentary gender quota raises the share of women in parliament by about &lt;strong>eight percentage points&lt;/strong> in the adopting countries — a large effect against a sample mean of 15%, and statistically distinguishable from no effect at the 5% level.&lt;/p>
&lt;p>The single number, though, is the average of a very heterogeneous set of cohort effects, returned in &lt;code>e(tau)&lt;/code>:&lt;/p>
&lt;pre>&lt;code class="language-text">T[7,3]
Tau Std.Err. Time
r1 8.3888685 .68278345 2000
r2 6.9677465 .64102999 2002
r3 13.952256 9.1289943 2003
r4 -3.4505431 .75603453 2005
r5 2.7490355 .44799502 2010
r6 21.762716 .91589982 2012
r7 -.82032354 .83151601 2013
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_sdid_staggered_cohort_taus.png" alt="Cohort-specific SDID effects with 95% confidence intervals and the aggregate ATT">&lt;/p>
&lt;p>The cohort effects span an enormous range: from &lt;strong>−3.5 points&lt;/strong> (2005 cohort) to &lt;strong>+21.8 points&lt;/strong> (2012 cohort), with the 2003 cohort essentially uninformative (SE 9.13, a confidence interval that runs from −4 to +32). The teal line marks the aggregate ATT of 8.0. Notice that this aggregate is &lt;strong>not&lt;/strong> the simple average of the seven cohort effects — that average would be about 7.0. It is the &lt;em>treated-period-weighted&lt;/em> average from the aggregation formula, which up-weights the earlier, longer-exposed 2000, 2002, and 2003 cohorts. The lesson of the figure is that &amp;ldquo;+8 points on average&amp;rdquo; is a summary of real heterogeneity, not a universal constant; some quotas were transformative, others did nothing measurable.&lt;/p>
&lt;p>To see the synthetic-control machinery underneath one cohort, the figure below plots the 2002 cohort against its synthetic control. Because SDID matches the pre-period &lt;em>trend&lt;/em> and lets the unit fixed effect absorb the &lt;em>level&lt;/em> gap, we anchor the synthetic to the treated cohort by its $\lambda$-weighted pre-period gap so the two align before adoption.&lt;/p>
&lt;p>&lt;img src="stata_sdid_staggered_cohort2002_path.png" alt="SDID counterfactual for the 2002 cohort (synthetic anchored to the treated pre-period)">&lt;/p>
&lt;p>The treated 2002 cohort (orange) and its anchored synthetic control (blue dashed) track each other closely &lt;strong>before 2002&lt;/strong> — the synthetic was built precisely to do so — and then diverge: the treated cohort climbs to roughly 15% women in parliament while the synthetic counterfactual reaches only about 9–10%. That post-2002 gap is the cohort effect, about +7 points, matching $\hat{\tau}_{2002}=6.97$ from &lt;code>e(tau)&lt;/code>.&lt;/p>
&lt;p>Which pre-period years anchor that comparison? The time weights $\hat{\lambda}_t$ for the 2002 cohort do not spread evenly over 1990–2001 — they concentrate on the years just before adoption.&lt;/p>
&lt;p>&lt;img src="stata_sdid_staggered_lambda.png" alt="SDID pre-period time weights (λ) for the 2002 cohort">&lt;/p>
&lt;p>The bars show SDID&amp;rsquo;s baseline for the 2002 cohort leaning on the late 1990s and 2001 — the pre-adoption years whose level most resembles the post-adoption period — rather than weighting all twelve pre-years equally as a plain difference-in-differences would. This is the time-weighting half of SDID at work: it builds the &amp;ldquo;before&amp;rdquo; from the most relevant history, which is also the baseline the event study below measures against.&lt;/p>
&lt;h2 id="8-adding-a-covariate-optimized-vs-projected">8. Adding a covariate: optimized vs projected&lt;/h2>
&lt;p>Does the quota effect simply reflect economic development — richer countries both grow GDP and elect more women? We can condition on log GDP per capita. The &lt;code>sdid&lt;/code> command offers two routes, and SDID needs a balanced panel, so we first drop the country-years with missing &lt;code>lngdp&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-stata">drop if missing(lngdp)
sdid womparl country year quota, vce(bootstrap) seed(2022) covariates(lngdp, optimized)
sdid womparl country year quota, vce(bootstrap) seed(1213) covariates(lngdp, projected)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">SDID + lngdp (optimized) ATT = 8.0515 SE = 3.0466
SDID + lngdp (projected) ATT = 8.0593 SE = 3.1191
&lt;/code>&lt;/pre>
&lt;p>The two methods differ in &lt;em>how&lt;/em> they estimate the covariate&amp;rsquo;s coefficient. The &lt;strong>optimized&lt;/strong> method (Arkhangelsky et al., 2021) folds the covariate adjustment into the SDID optimization itself, estimating it jointly with the weights — flexible but computationally heavy. The &lt;strong>projected&lt;/strong> method (Kranz, 2022) instead regresses the outcome on the covariate among the &lt;em>untreated&lt;/em> observations first, then runs SDID on the residuals — much faster and numerically more stable. Reassuringly, here they agree to the second decimal: &lt;strong>8.05 and 8.06&lt;/strong>, essentially unchanged from the no-covariate estimate of 8.03. Controlling for income does &lt;strong>not&lt;/strong> explain away the quota effect; the result is robust to the most obvious confounder.&lt;/p>
&lt;h2 id="9-the-event-study-with-sdid_event">9. The event study with &lt;code>sdid_event&lt;/code>&lt;/h2>
&lt;p>A single ATT — even per cohort — cannot tell us &lt;em>when&lt;/em> the effect appears, or whether treated and control countries were already diverging &lt;em>before&lt;/em> the quota. For that we need an &lt;strong>event study&lt;/strong>: the treatment effect traced out by years relative to adoption. The modern &lt;code>sdid_event&lt;/code> command (Ciccia, Clarke &amp;amp; Pailañir, 2024) computes exactly this for SDID, including pre-period &lt;strong>placebo&lt;/strong> estimates that serve as a parallel-trends test.&lt;/p>
&lt;p>The dynamic effect at event time $\ell$ is the treated-minus-synthetic gap in that period, &lt;em>net of the same gap at baseline&lt;/em>, where — characteristically for SDID — the baseline is the $\lambda$-weighted pre-period average rather than a single &amp;ldquo;year −1&amp;rdquo;:&lt;/p>
&lt;p>$$
\delta_{\ell} = \left(\bar{Y}_{\ell}^{,tr} - \bar{Y}_{\ell}^{,co}\right) - \left(\bar{Y}_{base}^{,tr} - \bar{Y}_{base}^{,co}\right), \qquad \bar{Y}_{base}^{,g} = \sum_{t=1}^{T_{pre}} \hat{\lambda}_t\, \bar{Y}_t^{,g}
$$&lt;/p>
&lt;p>&lt;code>sdid_event&lt;/code> handles the full staggered panel directly, returning a cohort-aggregated ATT plus dynamic effects. To read the dynamics transparently we focus the &lt;em>plot&lt;/em> on the 2002 cohort — the package authors&amp;rsquo; own worked example — which gives a clean event-time axis; the full-panel call confirms the same aggregated ATT (≈ 8.06).&lt;/p>
&lt;pre>&lt;code class="language-stata">ssc install sdid_event, replace
* full staggered panel: aggregated ATT + cohort-aggregated dynamic effects
sdid_event womparl country year quota, vce(bootstrap) brep(100) effects(8) placebo(5) covariates(lngdp)
* clean event study on the 2002 cohort, with all placebos
keep if quotaYear==2002 | quotaYear==.
sdid_event womparl country year quota, vce(placebo) brep(100) placebo(all) covariates(lngdp)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> | Estimate SE LB CI UB CI Switchers
-------------+------------------------------------------------------
ATT | 6.853472 3.372744 .2428928 13.46405 2
Effect_1 | 4.086404 1.191517 1.75103 6.421778 2
Effect_2 | 9.164442 1.522799 6.179756 12.14913 2
Effect_3 | 7.938504 2.182572 3.660663 12.21635 2
... |
Placebo_1 | -.218417 .470226 -1.14006 .703227 2
Placebo_2 | .242148 .884557 -1.491584 1.975880 2
... |
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_sdid_staggered_event_study.png" alt="Event-study SDID for the 2002 cohort: flat placebos before adoption, rising effects after">&lt;/p>
&lt;p>This plot rewards careful reading, and there are three things to look for.&lt;/p>
&lt;p>&lt;strong>First, the baseline is $\lambda$-weighted, not &amp;ldquo;the year before.&amp;rdquo;&lt;/strong> Unlike a textbook event study that normalizes to $t=-1$, SDID measures everything against the optimally weighted pre-period average. That is why the zero line is a &lt;em>weighted&lt;/em> baseline; do not read it as the single pre-adoption year.&lt;/p>
&lt;p>&lt;strong>Second, the points to the &lt;em>left&lt;/em> of zero are placebo tests.&lt;/strong> Every pre-adoption coefficient (&lt;code>Placebo_1&lt;/code> through &lt;code>Placebo_12&lt;/code>, event times −1 to −12) sits within a whisker of zero — ranging only from about −0.2 to +0.8. Because the treated cohort and its synthetic control moved in parallel &lt;em>before&lt;/em> 2002, we cannot reject that the parallel-(synthetic-)trends assumption holds. This is the identifying assumption made visible and, here, survived.&lt;/p>
&lt;p>&lt;strong>Third, the points to the &lt;em>right&lt;/em> of zero are the dynamic ATT.&lt;/strong> The effect appears immediately at adoption (&lt;code>Effect_1&lt;/code> = +4.1 points at event time 0), roughly doubles within a year or two (&lt;code>Effect_2&lt;/code> = +9.2), and then settles in the +6 to +9 range for over a decade. Quotas do not just shift the level once; they sustain a higher share of women in parliament. Aggregated by the same treated-period logic as before, these dynamic effects reproduce the cohort&amp;rsquo;s overall ATT of about +7 points — but the plot shows the &lt;em>shape&lt;/em> the single number conceals.&lt;/p>
&lt;h2 id="10-inference-bootstrap-jackknife-and-placebo">10. Inference: bootstrap, jackknife, and placebo&lt;/h2>
&lt;p>With one treated unit (California), the previous tutorial could only use placebo/permutation inference. With &lt;strong>nine&lt;/strong> treated units here, all three of &lt;code>sdid&lt;/code>&amp;rsquo;s variance estimators are on the table. To keep the comparison clean — jackknife needs more than one treated unit &lt;em>per adoption period&lt;/em> — we follow Clarke et al. (2024) and restrict to the two-country 2002 and 2003 cohorts by dropping the five single-country cohorts.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
Q1{&amp;quot;How many&amp;lt;br/&amp;gt;treated units?&amp;quot;}
Q1 --&amp;gt;|&amp;quot;One (e.g. California)&amp;quot;| PL1[&amp;quot;Placebo only&amp;lt;br/&amp;gt;jackknife undefined&amp;quot;]
Q1 --&amp;gt;|&amp;quot;Many (e.g. 9 quota adopters)&amp;quot;| Q2{&amp;quot;More controls than treated?&amp;lt;br/&amp;gt;no singleton cohorts?&amp;quot;}
Q2 --&amp;gt;|&amp;quot;Yes&amp;quot;| ALL[&amp;quot;All three available&amp;quot;]
Q2 --&amp;gt;|&amp;quot;Singleton cohorts&amp;quot;| PL2[&amp;quot;Placebo / bootstrap&amp;lt;br/&amp;gt;jackknife drops out&amp;quot;]
ALL --&amp;gt; BOOT[&amp;quot;bootstrap&amp;lt;br/&amp;gt;SE 4.7 (default)&amp;quot;]
ALL --&amp;gt; JACK[&amp;quot;jackknife&amp;lt;br/&amp;gt;SE 6.0 (most conservative)&amp;quot;]
ALL --&amp;gt; PLAC[&amp;quot;placebo&amp;lt;br/&amp;gt;SE 2.3 (homoskedastic)&amp;quot;]
style Q1 fill:#141413,stroke:#6a9bcc,color:#fff
style Q2 fill:#141413,stroke:#6a9bcc,color:#fff
style PL1 fill:#d97757,stroke:#141413,color:#fff
style PL2 fill:#d97757,stroke:#141413,color:#fff
style ALL fill:#00d4c8,stroke:#141413,color:#141413
style BOOT fill:#6a9bcc,stroke:#141413,color:#fff
style JACK fill:#6a9bcc,stroke:#141413,color:#fff
style PLAC fill:#6a9bcc,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-stata">drop if inlist(country,&amp;quot;Algeria&amp;quot;,&amp;quot;Kenya&amp;quot;,&amp;quot;Samoa&amp;quot;,&amp;quot;Swaziland&amp;quot;,&amp;quot;Tanzania&amp;quot;)
sdid womparl country year quota, vce(bootstrap) seed(1213)
sdid womparl country year quota, vce(placebo) seed(1213)
sdid womparl country year quota, vce(jackknife)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">method att se ci_l ci_u
bootstrap 10.33066 4.7291 1.0618 19.5995
placebo 10.33066 2.3404 5.7436 14.9178
jackknife 10.33066 6.0056 -1.4401 22.1014
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_sdid_staggered_inference.png" alt="Same ATT, three variance estimators">&lt;/p>
&lt;p>The point estimate is &lt;strong>identical&lt;/strong> across all three methods — 10.33 points on this subsample — because the inference procedure changes only the &lt;em>standard error&lt;/em>, never the estimate. But the standard errors differ by a factor of nearly three: &lt;strong>jackknife is the most conservative&lt;/strong> (SE 6.01, a confidence interval that crosses zero), &lt;strong>placebo is the tightest&lt;/strong> (SE 2.34) but rests on a homoskedasticity assumption and requires more controls than treated units, and &lt;strong>bootstrap sits in between&lt;/strong> (SE 4.73) and is the default. The practical takeaway: with only a handful of treated units, report the bootstrap as your headline but cross-check it — a result that is &amp;ldquo;significant&amp;rdquo; under placebo but not under jackknife deserves caution. (The subsample ATT of 10.3 is larger than the full-sample 8.0 because dropping the five single-country cohorts discards the negative 2005 and 2013 effects.)&lt;/p>
&lt;h2 id="11-robustness-and-discussion">11. Robustness and discussion&lt;/h2>
&lt;p>Three caveats keep the result honest. &lt;strong>Effect concentration:&lt;/strong> the +8 aggregate leans heavily on a few cohorts — the 2012 cohort alone contributes a +21.8 effect, and the early 2000/2002/2003 cohorts carry most of the aggregation weight. Drop the 2012 cohort and the average falls noticeably. &lt;strong>Fragile counterfactuals:&lt;/strong> with only 110 controls and as few as one treated country per cohort, some synthetic controls are imprecise — the 2003 cohort&amp;rsquo;s standard error of 9.13 is the tell. &lt;strong>Identifying assumptions:&lt;/strong> SDID still requires no anticipation, an absorbing treatment, no cross-country spillovers, and that quota timing is not itself a response to the outcome&amp;rsquo;s trajectory; the flat event-study placebos support, but cannot prove, the parallel-trends part. Finally, &lt;code>quota_example&lt;/code> is a teaching subset of Bhalotra et al. (2023); these numbers illustrate the &lt;em>method&lt;/em>, not a final verdict on quota policy.&lt;/p>
&lt;h2 id="12-summary-and-key-takeaways">12. Summary and key takeaways&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Method.&lt;/strong> Staggered SDID estimates a &lt;em>separate, clean&lt;/em> synthetic difference-in-differences for each adoption cohort — comparing it only to never-treated controls — and aggregates the cohort effects $\hat{\tau}_a$ with non-negative, treated-period-share weights. This avoids the negative-weighting trap that contaminates naive two-way fixed-effects DiD under staggered timing.&lt;/li>
&lt;li>&lt;strong>Result.&lt;/strong> Gender quotas raise the share of women in parliament by an overall &lt;strong>ATT of +8.0 percentage points&lt;/strong> (SE 3.74, $p=0.032$), robust to a log-GDP control (8.05 optimized, 8.06 projected). Cohort effects range widely, from &lt;strong>−3.5 to +21.8 points&lt;/strong> — heterogeneity the single number hides.&lt;/li>
&lt;li>&lt;strong>Event study.&lt;/strong> The &lt;code>sdid_event&lt;/code> plot shows pre-adoption placebo coefficients near zero (parallel synthetic trends) and post-adoption effects that appear immediately and persist for over a decade — the dynamics behind the average.&lt;/li>
&lt;li>&lt;strong>Inference.&lt;/strong> With nine treated units, bootstrap, jackknife, and placebo are all available; they share one point estimate (10.3 on the two-cohort illustration) but report standard errors of 4.7, 6.0, and 2.3. Jackknife is the most conservative.&lt;/li>
&lt;li>&lt;strong>Bridge.&lt;/strong> The block design (Proposition 99, the &lt;a href="https://carlos-mendez.org/post/stata_sdid/">previous tutorial&lt;/a>) and the staggered design here are two faces of one estimator — the staggered version is just single-cohort SDID, done once per cohort and averaged.&lt;/li>
&lt;/ul>
&lt;h2 id="13-exercises">13. Exercises&lt;/h2>
&lt;ol>
&lt;li>&lt;strong>Re-aggregate by hand.&lt;/strong> Pull &lt;code>e(tau)&lt;/code> and each cohort&amp;rsquo;s treated unit-count and post-period length. Verify that the treated-period-weighted average of the seven $\hat{\tau}_a$ reproduces the overall ATT of 8.03, and show that it differs from the unweighted mean (≈ 7.0). Which cohorts move the aggregate the most?&lt;/li>
&lt;li>&lt;strong>Inference sensitivity.&lt;/strong> Re-run the full nine-country sample with &lt;code>vce(bootstrap)&lt;/code> and then &lt;code>vce(placebo)&lt;/code> at &lt;code>reps(500)&lt;/code>. How much do the standard error and confidence interval move, and which would you report given only nine treated units?&lt;/li>
&lt;li>&lt;strong>Drop the outlier cohort.&lt;/strong> Re-estimate the overall ATT excluding the 2012 cohort (the +21.8 outlier). How far does the aggregate fall, and what does that tell you about how concentrated the average effect is?&lt;/li>
&lt;/ol>
&lt;h2 id="14-references">14. References&lt;/h2>
&lt;ol>
&lt;li>Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., &amp;amp; Wager, S. (2021). &lt;a href="https://doi.org/10.1257/aer.20190159" target="_blank" rel="noopener">Synthetic Difference-in-Differences&lt;/a>. &lt;em>American Economic Review&lt;/em>, 111(12), 4088–4118.&lt;/li>
&lt;li>Clarke, D., Pailañir, D., Athey, S., &amp;amp; Imbens, G. (2024). &lt;a href="https://doi.org/10.1177/1536867X241297184" target="_blank" rel="noopener">On Synthetic Difference-in-Differences and Related Estimation Methods in Stata&lt;/a>. &lt;em>The Stata Journal&lt;/em>, 24(4). Package: &lt;code>ssc install sdid&lt;/code>.&lt;/li>
&lt;li>Ciccia, D. (2024). &lt;a href="https://arxiv.org/abs/2407.09565" target="_blank" rel="noopener">A Short Note on Event-Study Synthetic Difference-in-Differences Estimators&lt;/a>. Package: &lt;code>ssc install sdid_event&lt;/code>.&lt;/li>
&lt;li>Bhalotra, S., Clarke, D., Gomes, J. F., &amp;amp; Venkataramani, A. (2023). &lt;a href="https://doi.org/10.1093/jeea/jvad043" target="_blank" rel="noopener">Maternal Mortality and Women&amp;rsquo;s Political Power&lt;/a>. &lt;em>Journal of the European Economic Association&lt;/em>. (Source of the &lt;code>quota_example&lt;/code> data.)&lt;/li>
&lt;li>Goodman-Bacon, A. (2021). &lt;a href="https://doi.org/10.1016/j.jeconom.2021.03.014" target="_blank" rel="noopener">Difference-in-Differences with Variation in Treatment Timing&lt;/a>. &lt;em>Journal of Econometrics&lt;/em>, 225(2), 254–277.&lt;/li>
&lt;li>de Chaisemartin, C., &amp;amp; D&amp;rsquo;Haultfœuille, X. (2020). &lt;a href="https://doi.org/10.1257/aer.20181169" target="_blank" rel="noopener">Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects&lt;/a>. &lt;em>American Economic Review&lt;/em>, 110(9), 2964–2996.&lt;/li>
&lt;li>Xu, Y., &amp;amp; Hua, L. &lt;a href="https://yiqingxu.org/packages/panelview_stata/" target="_blank" rel="noopener">panelView: Visualizing Panel Data&lt;/a>. Package: &lt;code>ssc install panelview&lt;/code>.&lt;/li>
&lt;/ol>
&lt;p>&lt;em>Related tutorials on this site:&lt;/em> &lt;a href="https://carlos-mendez.org/post/stata_sdid/">Synthetic Difference-in-Differences (the block design)&lt;/a> · &lt;a href="https://carlos-mendez.org/post/stata_did/">Difference-in-Differences&lt;/a>.&lt;/p>
&lt;h2 id="15-acknowledgments">15. Acknowledgments&lt;/h2>
&lt;p>This tutorial uses the &lt;code>sdid&lt;/code> command (Clarke, Pailañir, Athey &amp;amp; Imbens), the &lt;code>sdid_event&lt;/code> command (Ciccia, Clarke &amp;amp; Pailañir), and &lt;code>panelview&lt;/code> (Xu &amp;amp; Hua). The data, &lt;code>quota_example&lt;/code>, is distributed with &lt;code>sdid&lt;/code> and draws on Bhalotra, Clarke, Gomes &amp;amp; Venkataramani (2023). All estimates were produced by the companion &lt;code>analysis.do&lt;/code> and verified against Clarke et al. (2024). AI tools (Claude Code) assisted with drafting and figure preparation; all code was executed and every number checked by the author.&lt;/p>
&lt;hr>
&lt;style>
.podcast-overlay {
display: none;
position: fixed;
bottom: 0;
left: 0;
right: 0;
z-index: 9999;
animation: podSlideUp 0.35s ease-out;
}
@keyframes podSlideUp {
from { transform: translateY(100%); }
to { transform: translateY(0); }
}
.podcast-overlay.pod-closing {
animation: podSlideDown 0.3s ease-in forwards;
}
@keyframes podSlideDown {
from { transform: translateY(0); }
to { transform: translateY(100%); }
}
.podcast-container {
background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%);
padding: 18px 24px 20px;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
box-shadow: 0 -4px 32px rgba(0,0,0,0.5);
border-top: 1px solid rgba(106,155,204,0.2);
}
.podcast-inner {
max-width: 800px;
margin: 0 auto;
}
.podcast-top-row {
display: flex;
align-items: center;
gap: 14px;
margin-bottom: 14px;
}
.podcast-icon {
width: 42px;
height: 42px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 10px;
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
}
.podcast-icon svg {
width: 22px;
height: 22px;
fill: #fff;
}
.podcast-title-block {
flex: 1;
min-width: 0;
}
.podcast-title-block h4 {
margin: 0 0 1px 0;
color: #f0ece2;
font-size: 14px;
font-weight: 600;
letter-spacing: 0.02em;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.podcast-title-block span {
color: #8b9dc3;
font-size: 11px;
}
.podcast-close-btn {
background: none;
border: none;
cursor: pointer;
padding: 6px;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.2s;
flex-shrink: 0;
}
.podcast-close-btn:hover {
background: rgba(255,255,255,0.1);
}
.podcast-close-btn svg {
width: 20px;
height: 20px;
fill: #8b9dc3;
}
.podcast-progress-wrap {
margin-bottom: 12px;
}
.podcast-time-row {
display: flex;
justify-content: space-between;
font-size: 11px;
color: #8b9dc3;
margin-bottom: 5px;
font-variant-numeric: tabular-nums;
}
.podcast-bar-bg {
width: 100%;
height: 6px;
background: rgba(255,255,255,0.1);
border-radius: 3px;
cursor: pointer;
position: relative;
overflow: hidden;
transition: height 0.15s;
}
.podcast-bar-buffered {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: rgba(106,155,204,0.25);
border-radius: 3px;
transition: width 0.3s;
}
.podcast-bar-progress {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: linear-gradient(90deg, #6a9bcc, #00d4c8);
border-radius: 3px;
transition: width 0.1s linear;
}
.podcast-bar-bg:hover {
height: 10px;
margin-top: -2px;
}
.podcast-controls-row {
display: flex;
align-items: center;
justify-content: space-between;
}
.podcast-transport {
display: flex;
align-items: center;
gap: 8px;
}
.podcast-btn {
background: none;
border: none;
cursor: pointer;
padding: 4px;
display: flex;
align-items: center;
justify-content: center;
border-radius: 50%;
transition: all 0.2s;
}
.podcast-btn svg {
fill: #c8d0e0;
transition: fill 0.2s;
}
.podcast-btn:hover svg {
fill: #f0ece2;
}
.podcast-btn-skip {
position: relative;
}
.podcast-btn-skip span {
position: absolute;
font-size: 7px;
font-weight: 700;
color: #c8d0e0;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
pointer-events: none;
margin-top: 1px;
}
.podcast-btn-play {
width: 48px;
height: 48px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 50%;
box-shadow: 0 3px 12px rgba(217,119,87,0.4);
transition: all 0.2s;
}
.podcast-btn-play:hover {
transform: scale(1.08);
box-shadow: 0 5px 20px rgba(217,119,87,0.5);
}
.podcast-btn-play svg {
fill: #fff;
width: 22px;
height: 22px;
}
.podcast-extras {
display: flex;
align-items: center;
gap: 10px;
}
.podcast-volume-wrap {
display: flex;
align-items: center;
gap: 5px;
}
.podcast-volume-wrap svg {
fill: #8b9dc3;
width: 16px;
height: 16px;
cursor: pointer;
flex-shrink: 0;
}
.podcast-volume-wrap svg:hover {
fill: #c8d0e0;
}
.podcast-volume-slider {
-webkit-appearance: none;
appearance: none;
width: 60px;
height: 4px;
background: rgba(255,255,255,0.12);
border-radius: 2px;
outline: none;
cursor: pointer;
}
.podcast-volume-slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 12px;
height: 12px;
background: #6a9bcc;
border-radius: 50%;
cursor: pointer;
}
.podcast-speed-btn {
background: rgba(255,255,255,0.08);
border: 1px solid rgba(255,255,255,0.12);
color: #c8d0e0;
font-size: 11px;
font-weight: 600;
padding: 3px 9px;
border-radius: 12px;
cursor: pointer;
transition: all 0.2s;
font-family: inherit;
min-width: 40px;
text-align: center;
}
.podcast-speed-btn:hover {
background: rgba(106,155,204,0.2);
border-color: #6a9bcc;
color: #f0ece2;
}
.podcast-download-btn {
background: none;
border: 1px solid rgba(255,255,255,0.12);
border-radius: 8px;
padding: 4px 10px;
cursor: pointer;
display: flex;
align-items: center;
gap: 4px;
color: #8b9dc3;
font-size: 11px;
font-family: inherit;
text-decoration: none;
transition: all 0.2s;
}
.podcast-download-btn:hover {
border-color: #6a9bcc;
color: #f0ece2;
background: rgba(106,155,204,0.1);
}
.podcast-download-btn svg {
width: 14px;
height: 14px;
fill: currentColor;
}
@media (max-width: 600px) {
.podcast-container { padding: 14px 16px 16px; }
.podcast-volume-wrap { display: none; }
.podcast-title-block h4 { font-size: 13px; }
.podcast-extras { gap: 8px; }
}
&lt;/style>
&lt;div class="podcast-overlay" id="podOverlay">
&lt;div class="podcast-container">
&lt;div class="podcast-inner">
&lt;audio id="podAudio" preload="none" src="https://files.catbox.moe/iea7xk.m4a">&lt;/audio>
&lt;div class="podcast-top-row">
&lt;div class="podcast-icon">
&lt;svg viewBox="0 0 24 24">&lt;path d="M12 1a5 5 0 0 0-5 5v4a5 5 0 0 0 10 0V6a5 5 0 0 0-5-5zm0 16a7 7 0 0 1-7-7H3a9 9 0 0 0 8 8.94V22h2v-3.06A9 9 0 0 0 21 10h-2a7 7 0 0 1-7 7z"/>&lt;/svg>
&lt;/div>
&lt;div class="podcast-title-block">
&lt;h4>AI Podcast: Staggered Synthetic Difference-in-Differences&lt;/h4>
&lt;span id="podDurationLabel">Click play to load&lt;/span>
&lt;/div>
&lt;button class="podcast-close-btn" onclick="podClose()" title="Close player">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"/>&lt;/svg>
&lt;/button>
&lt;/div>
&lt;div class="podcast-progress-wrap">
&lt;div class="podcast-time-row">
&lt;span id="podCurrent">0:00&lt;/span>
&lt;span id="podDuration">0:00&lt;/span>
&lt;/div>
&lt;div class="podcast-bar-bg" id="podBarBg" onclick="podSeek(event)">
&lt;div class="podcast-bar-buffered" id="podBuffered">&lt;/div>
&lt;div class="podcast-bar-progress" id="podProgress">&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class="podcast-controls-row">
&lt;div class="podcast-transport">
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(-15)" title="Back 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1L7 6l5 5V7c3.31 0 6 2.69 6 6s-2.69 6-6 6-6-2.69-6-6H4c0 4.42 3.58 8 8 8s8-3.58 8-8-3.58-8-8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-play" id="podPlayBtn" onclick="podToggle()" title="Play">
&lt;svg id="podIconPlay" viewBox="0 0 24 24">&lt;path d="M8 5v14l11-7z"/>&lt;/svg>
&lt;svg id="podIconPause" viewBox="0 0 24 24" style="display:none">&lt;path d="M6 19h4V5H6v14zm8-14v14h4V5h-4z"/>&lt;/svg>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(15)" title="Forward 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1l5 5-5 5V7c-3.31 0-6 2.69-6 6s2.69 6 6 6 6-2.69 6-6h2c0 4.42-3.58 8-8 8s-8-3.58-8-8 3.58-8 8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;/div>
&lt;div class="podcast-extras">
&lt;div class="podcast-volume-wrap">
&lt;svg id="podVolIcon" onclick="podMute()" viewBox="0 0 24 24">&lt;path d="M3 9v6h4l5 5V4L7 9H3zm13.5 3A4.5 4.5 0 0 0 14 8.5v7a4.47 4.47 0 0 0 2.5-3.5zM14 3.23v2.06a6.51 6.51 0 0 1 0 13.42v2.06A8.51 8.51 0 0 0 14 3.23z"/>&lt;/svg>
&lt;input type="range" class="podcast-volume-slider" id="podVolume" min="0" max="1" step="0.05" value="0.8">
&lt;/div>
&lt;button class="podcast-speed-btn" id="podSpeedBtn" onclick="podCycleSpeed()" title="Playback speed">1x&lt;/button>
&lt;a class="podcast-download-btn" href="https://files.catbox.moe/iea7xk.m4a" target="_blank" rel="noopener" title="Stream">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 9h-4V3H9v6H5l7 7 7-7zM5 18v2h14v-2H5z"/>&lt;/svg>
&lt;/a>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;script>
(function(){
var overlay = document.getElementById('podOverlay');
var a = document.getElementById('podAudio');
var speeds = [0.75, 1, 1.25, 1.5, 2];
var si = 1;
var opened = false;
function fmt(s){
if(isNaN(s)) return '0:00';
var m=Math.floor(s/60), sec=Math.floor(s%60);
return m+':'+(sec&lt;10?'0':'')+sec;
}
document.addEventListener('click', function(e){
var link = e.target.closest('a.btn-page-header');
if(!link) return;
var text = link.textContent.trim();
if(text.indexOf('AI Podcast') === -1) return;
e.preventDefault();
e.stopPropagation();
overlay.style.display = 'block';
overlay.classList.remove('pod-closing');
if(!opened){
a.preload = 'metadata';
a.load();
opened = true;
}
});
a.volume = 0.8;
a.addEventListener('loadedmetadata', function(){
document.getElementById('podDuration').textContent = fmt(a.duration);
document.getElementById('podDurationLabel').textContent = fmt(a.duration) + ' minutes';
});
a.addEventListener('timeupdate', function(){
document.getElementById('podCurrent').textContent = fmt(a.currentTime);
var pct = a.duration ? (a.currentTime/a.duration)*100 : 0;
document.getElementById('podProgress').style.width = pct+'%';
});
a.addEventListener('progress', function(){
if(a.buffered.length>0){
var pct = (a.buffered.end(a.buffered.length-1)/a.duration)*100;
document.getElementById('podBuffered').style.width = pct+'%';
}
});
a.addEventListener('ended', function(){
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
});
window.podToggle = function(){
if(a.paused){a.play();document.getElementById('podIconPlay').style.display='none';document.getElementById('podIconPause').style.display='';}
else{a.pause();document.getElementById('podIconPlay').style.display='';document.getElementById('podIconPause').style.display='none';}
};
window.podSkip = function(s){a.currentTime = Math.max(0,Math.min(a.duration||0,a.currentTime+s));};
window.podSeek = function(e){
var rect = document.getElementById('podBarBg').getBoundingClientRect();
var pct = (e.clientX - rect.left)/rect.width;
a.currentTime = pct * (a.duration||0);
};
window.podMute = function(){
a.muted = !a.muted;
document.getElementById('podVolume').value = a.muted ? 0 : a.volume;
};
window.podCycleSpeed = function(){
si = (si+1) % speeds.length;
a.playbackRate = speeds[si];
document.getElementById('podSpeedBtn').textContent = speeds[si]+'x';
};
window.podClose = function(){
overlay.classList.add('pod-closing');
setTimeout(function(){ overlay.style.display='none'; }, 300);
a.pause();
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
};
document.getElementById('podVolume').addEventListener('input', function(){
a.volume = this.value;
a.muted = false;
});
if(window.location.hash === '#podcast-player'){
overlay.style.display = 'block';
a.preload = 'metadata';
a.load();
opened = true;
}
})();
&lt;/script></description></item><item><title>Synthetic Difference-in-Differences (SDID) in Stata: Re-evaluating California's Proposition 99</title><link>https://carlos-mendez.org/post/stata_sdid/</link><pubDate>Sun, 07 Jun 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/stata_sdid/</guid><description>&lt;h2 id="abstract">Abstract&lt;/h2>
&lt;p>Comparative case studies—where a single large unit adopts a policy and the analyst must recover its causal effect without ever observing the untreated counterfactual—are a recurring challenge in policy evaluation, and the canonical example is California&amp;rsquo;s Proposition 99, the 1988 ballot measure that raised the cigarette excise tax by 25 cents a pack and funded an anti-smoking campaign. This tutorial introduces and derives synthetic difference-in-differences (SDID) and applies it to re-evaluate Proposition 99, contrasting it with classic difference-in-differences (DiD) and synthetic control (SC). The data are the canonical strongly balanced panel distributed with the &lt;code>sdid&lt;/code> package (originally from Abadie, Diamond, and Hainmueller 2010): 39 US states observed annually from 1970 to 2000—1,209 observations, of which only 12 are treated—with annual cigarette sales in packs per capita as the sole outcome and California as the single treated unit from 1989. The methods estimate the average treatment effect on the treated by writing DiD, SC, and SDID as one weighted two-way fixed-effects regression in Stata, using the &lt;code>sdid&lt;/code> command (Clarke et al. 2024) and cross-checking SC against &lt;code>synth2&lt;/code>. All three estimators agree the policy reduced smoking but disagree on magnitude: the 2×2 DiD gives −27.35 packs per capita, synthetic control −19.48 (pre-period RMSE 1.66, R² 0.98, leaning on Utah, Montana, and Nevada), and SDID −15.60—roughly a 20% reduction—with SDID&amp;rsquo;s time weights concentrated entirely on 1986–1988. With one treated unit, placebo inference is the only valid procedure: the placebo standard error is 9.88 (95% CI [−35.0, 3.8], including zero) while the permutation test ranks California&amp;rsquo;s effect extreme (p = 0.026). The implication is that a single &lt;code>sdid&lt;/code> command unifies all three estimators, and SDID is the preferred single number because, by allowing a constant level gap and up-weighting the informative late-1980s years, it relies least on the exact parallel-trends assumption the others lean on hardest.&lt;/p>
&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>In November 1988 California voters passed &lt;strong>Proposition 99&lt;/strong>, which raised the cigarette excise tax by 25 cents a pack and funded a large anti-smoking campaign. Did it actually reduce smoking? This is the textbook question of &lt;strong>comparative case study&lt;/strong> research: a single, large unit (California) adopts a policy, and we want the causal effect even though we can never observe the California that &lt;em>did not&lt;/em> pass Proposition 99.&lt;/p>
&lt;p>This tutorial builds up to &lt;strong>synthetic difference-in-differences (SDID)&lt;/strong>, the estimator of Arkhangelsky, Athey, Hsiao, Imbens, and Wager (2021), and applies it with the &lt;code>sdid&lt;/code> command of Clarke, Pailañir, Athey, and Imbens (2024). SDID is best understood as the marriage of two older ideas:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Difference-in-differences (DiD)&lt;/strong> — compare California&amp;rsquo;s before/after change to the before/after change of &lt;em>all&lt;/em> control states.&lt;/li>
&lt;li>&lt;strong>Synthetic control (SC)&lt;/strong> — build a &amp;ldquo;synthetic California&amp;rdquo; as a weighted average of control states that tracks California before the policy.&lt;/li>
&lt;/ul>
&lt;p>SDID keeps the best of both: like SC it chooses &lt;strong>unit weights&lt;/strong> so the comparison group resembles California, and like DiD it allows a &lt;strong>constant level gap&lt;/strong> between California and its comparison group (a unit fixed effect). It then adds one more ingredient SC lacks — &lt;strong>time weights&lt;/strong> that emphasize the pre-policy years most predictive of the post-policy period.&lt;/p>
&lt;p>A second theme runs through the whole tutorial, and it is worth stating up front. As Clarke et al. (2024) put it, &lt;em>along with SDID, the &lt;code>sdid&lt;/code> command implements standard synthetic control and difference-in-differences in an &lt;strong>identical framework&lt;/strong>, allowing estimation, inference, and graphical output in a computationally efficient way.&lt;/em> We will show this concretely: the &lt;strong>same command&lt;/strong>, changing only one option, reproduces the raw difference-in-differences and the classic synthetic control — and we cross-check the latter against the dedicated &lt;code>synth2&lt;/code> command.&lt;/p>
&lt;h3 id="learning-objectives">Learning objectives&lt;/h3>
&lt;p>By the end you will be able to:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Derive&lt;/strong> the SDID estimator as a weighted two-way fixed-effects regression and read its unit-weight and time-weight optimization problems.&lt;/li>
&lt;li>&lt;strong>Distinguish&lt;/strong> SDID from the original DiD and SC — conceptually (which weights, which fixed effects) and quantitatively (on the same data).&lt;/li>
&lt;li>&lt;strong>Estimate&lt;/strong> the effect of Proposition 99 with &lt;code>sdid&lt;/code>, and reproduce DiD and SC from the very same command.&lt;/li>
&lt;li>&lt;strong>Compare&lt;/strong> the SDID synthetic against a classical synthetic control fit with &lt;code>synth2&lt;/code>.&lt;/li>
&lt;li>&lt;strong>Conduct&lt;/strong> valid inference when there is a single treated unit, using placebo (permutation) methods — and recognize when other procedures (bootstrap, jackknife) do and do not apply.&lt;/li>
&lt;/ul>
&lt;h3 id="what-we-are-estimating">What we are estimating&lt;/h3>
&lt;p>Throughout, the estimand is the &lt;strong>average treatment effect on the treated (ATT)&lt;/strong> — the effect of Proposition 99 &lt;em>on California&lt;/em>, over the post-1988 period:&lt;/p>
&lt;p>$$
\tau = \frac{1}{N_{tr}\, T_{post}} \sum_{i:\, W_i = 1}\ \sum_{t &amp;gt; T_{pre}} \left[\, Y_{it}(1) - Y_{it}(0) \,\right]
$$&lt;/p>
&lt;p>In words: average, over treated units and post-treatment years, the difference between the outcome with the policy, $Y_{it}(1)$, and the outcome that &lt;em>would have occurred&lt;/em> without it, $Y_{it}(0)$. Here there is exactly one treated unit ($N_{tr} = 1$, California), and $Y_{it}(0)$ is never observed after 1988 — every method in this tutorial is a different way of &lt;strong>imputing that missing counterfactual&lt;/strong>. Because California was &lt;em>not&lt;/em> randomly assigned to treatment, this is an &lt;strong>observational&lt;/strong> design: identification rests on assumptions (a stable comparison group, no large contemporaneous shocks unique to California) rather than on randomization.&lt;/p>
&lt;h3 id="key-concepts-at-a-glance">Key concepts at a glance&lt;/h3>
&lt;details>
&lt;summary>&lt;b>Counterfactual&lt;/b> — what California's smoking would have been without Proposition 99.&lt;/summary>
&lt;p>Every estimator here is a recipe for the dashed line &amp;ldquo;California if the policy had never passed.&amp;rdquo; DiD, SC, and SDID disagree only about how to build it.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>Unit weights (ω)&lt;/b> — how much each control state counts toward the synthetic California.&lt;/summary>
&lt;p>DiD gives every control the same weight ($1/N_{co}$). SC and SDID instead pick weights so the weighted controls reproduce California&amp;rsquo;s pre-policy outcome path. SC concentrates weight on a handful of states; SDID spreads it more widely.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>Time weights (λ)&lt;/b> — how much each pre-policy year counts.&lt;/summary>
&lt;p>This is SDID&amp;rsquo;s signature. Rather than treat every pre-1989 year equally, SDID up-weights the pre-period years that best predict the post-period — here, 1986–1988. SC and DiD have no time weights.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>Unit fixed effects (α)&lt;/b> — a constant level gap between California and its synthetic comparison.&lt;/summary>
&lt;p>DiD and SDID include them, so the comparison group only needs to move &lt;em>in parallel&lt;/em> with California, not sit at the same level. Classic SC omits them and instead tries to match California&amp;rsquo;s level outright.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>Placebo inference&lt;/b> — how we get a standard error with only one treated unit.&lt;/summary>
&lt;p>We pretend, one at a time, that a control state was &amp;ldquo;treated,&amp;rdquo; re-estimate the effect, and build the distribution of these placebo effects. If California&amp;rsquo;s real effect is extreme relative to that distribution, it is unlikely to be noise.&lt;/p>
&lt;/details>
&lt;hr>
&lt;h2 id="2-the-proposition-99-case-study">2. The Proposition 99 case study&lt;/h2>
&lt;p>We use the canonical dataset distributed with the &lt;code>sdid&lt;/code> package (originally from Abadie, Diamond, and Hainmueller 2010, and used by Arkhangelsky et al. 2021). It is a &lt;strong>strongly balanced panel&lt;/strong>: 39 US states observed annually from 1970 to 2000, with one outcome — annual cigarette sales in &lt;strong>packs per capita&lt;/strong>. California is the single treated unit; the policy bites from &lt;strong>1989&lt;/strong> onward. The remaining 38 states (which did not pass comparable large-scale tobacco programs in this window) form the &lt;strong>donor pool&lt;/strong>.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Role&lt;/th>
&lt;th>Description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>state&lt;/code>&lt;/td>
&lt;td>unit id&lt;/td>
&lt;td>39 US states (California + 38 controls)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>year&lt;/code>&lt;/td>
&lt;td>time id&lt;/td>
&lt;td>1970–2000 (19 pre-, 12 post-treatment years)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>packspercapita&lt;/code>&lt;/td>
&lt;td>outcome $Y_{it}$&lt;/td>
&lt;td>annual cigarette pack sales per capita&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>treated&lt;/code>&lt;/td>
&lt;td>treatment $W_{it}$&lt;/td>
&lt;td>1 for California in 1989–2000, else 0&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>One feature matters for a fair comparison: this panel contains &lt;strong>only the outcome&lt;/strong> — no income, price, or demographic covariates. That is deliberate here. It means synthetic control and SDID see &lt;em>exactly the same information&lt;/em> (California&amp;rsquo;s and the donors&amp;rsquo; pre-period smoking paths), so any difference in their answers comes from the &lt;strong>estimator&lt;/strong>, not from a different set of predictors.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
POOL[&amp;quot;&amp;lt;b&amp;gt;Donor pool&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;38 control states&amp;lt;br/&amp;gt;Utah, Nevada, Montana, …&amp;quot;]
CA[&amp;quot;&amp;lt;b&amp;gt;California&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;treated 1989&amp;quot;]
SYN[&amp;quot;&amp;lt;b&amp;gt;Synthetic California&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;counterfactual Y(0)&amp;quot;]
POOL --&amp;gt;|weighted average ω| SYN
CA --&amp;gt;|compare after 1989| SYN
style POOL fill:#6a9bcc,stroke:#141413,color:#fff
style CA fill:#d97757,stroke:#141413,color:#fff
style SYN fill:#00d4c8,stroke:#141413,color:#141413
&lt;/code>&lt;/pre>
&lt;p>Let us first look at the data with no model at all — California against the simple average of the 38 control states.&lt;/p>
&lt;pre>&lt;code class="language-stata">use prop99_example.dta, clear
describe
encode state, gen(id)
xtset id year
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Contains data from prop99_example.dta
Observations: 1,209
Variables: 4
-------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
state str14 %14s State
year int %8.0g Year
packspercapita float %9.0g PacksPerCapita
treated byte %8.0g
-------------------------------------------------------------------------------
Panel variable: id (strongly balanced)
Time variable: year, 1970 to 2000
Delta: 1 unit
&lt;/code>&lt;/pre>
&lt;p>The panel is strongly balanced (no gaps), which every method below requires. The figure compares California to the raw control average.&lt;/p>
&lt;p>&lt;img src="stata_sdid_raw_trends.png" alt="California&amp;amp;rsquo;s cigarette sales fall faster than the average of the 38 control states after 1989, but the two series were already on different levels and slopes before the policy — which is exactly why a naive comparison is not enough.">&lt;/p>
&lt;p>California (orange) already smoked &lt;strong>less&lt;/strong> than the average control state and was declining throughout the 1980s. After 1989 the gap widens visibly. But two problems jump out: California sits on a &lt;strong>different level&lt;/strong> than the average donor, and it was already on a &lt;strong>different trend&lt;/strong> before 1989. A credible estimate must deal with both — the job of the three estimators below.&lt;/p>
&lt;hr>
&lt;h2 id="3-three-estimators-one-equation">3. Three estimators, one equation&lt;/h2>
&lt;p>The cleanest way to see how DiD, SC, and SDID relate is to write them all as the &lt;strong>same&lt;/strong> weighted two-way fixed-effects (TWFE) regression and change only the weights. This is the unifying view of Arkhangelsky et al. (2021).&lt;/p>
&lt;h3 id="synthetic-difference-in-differences">Synthetic difference-in-differences&lt;/h3>
&lt;p>SDID solves a weighted TWFE regression:&lt;/p>
&lt;p>$$
\left(\hat{\tau}^{sdid}, \hat{\mu}, \hat{\alpha}, \hat{\beta}\right) = \underset{\tau,\mu,\alpha,\beta}{\arg\min} \sum_{i=1}^{N} \sum_{t=1}^{T} \left(Y_{it} - \mu - \alpha_i - \beta_t - W_{it}\,\tau\right)^{2} \hat{\omega}_i^{sdid}\ \hat{\lambda}_t^{sdid}
$$&lt;/p>
&lt;p>Reading the symbols against the Stata variables: $Y_{it}$ is &lt;code>packspercapita&lt;/code>; $W_{it}$ is &lt;code>treated&lt;/code>; $\alpha_i$ is a state fixed effect (one per &lt;code>state&lt;/code>); $\beta_t$ is a year fixed effect (one per &lt;code>year&lt;/code>); and $\tau$ is the ATT we want. The two extra terms are the difference from ordinary regression: $\hat{\omega}_i^{sdid}$ is a &lt;strong>unit weight&lt;/strong> (how much state $i$ counts) and $\hat{\lambda}_t^{sdid}$ is a &lt;strong>time weight&lt;/strong> (how much year $t$ counts). Set those weights to special values and you recover the older estimators.&lt;/p>
&lt;h3 id="the-original-difference-in-differences">The original difference-in-differences&lt;/h3>
&lt;p>DiD is the &lt;strong>special case with no weighting&lt;/strong> — every unit and every year counts equally:&lt;/p>
&lt;p>$$
\left(\hat{\tau}^{did}, \hat{\mu}, \hat{\alpha}, \hat{\beta}\right) = \underset{\tau,\mu,\alpha,\beta}{\arg\min} \sum_{i=1}^{N} \sum_{t=1}^{T} \left(Y_{it} - \mu - \alpha_i - \beta_t - W_{it}\,\tau\right)^{2}
$$&lt;/p>
&lt;p>This is just two-way fixed-effects regression. Its credibility hinges entirely on &lt;strong>parallel trends&lt;/strong>: the assumption that, absent the policy, California would have moved in lockstep with the &lt;em>average&lt;/em> control state. The raw-trends figure already makes that assumption look shaky.&lt;/p>
&lt;h3 id="the-original-synthetic-control">The original synthetic control&lt;/h3>
&lt;p>SC keeps &lt;strong>unit weights&lt;/strong> but drops the &lt;strong>time weights&lt;/strong> &lt;em>and&lt;/em> the &lt;strong>unit fixed effects&lt;/strong> $\alpha_i$:&lt;/p>
&lt;p>$$
\left(\hat{\tau}^{sc}, \hat{\mu}, \hat{\beta}\right) = \underset{\tau,\mu,\beta}{\arg\min} \sum_{i=1}^{N} \sum_{t=1}^{T} \left(Y_{it} - \mu - \beta_t - W_{it}\,\tau\right)^{2} \hat{\omega}_i^{sc}
$$&lt;/p>
&lt;p>Without $\alpha_i$, SC cannot absorb a level gap, so it must build a synthetic California that matches California&amp;rsquo;s pre-period outcomes in &lt;strong>both level and trend&lt;/strong>. That is a demanding requirement — and the reason SC sometimes cannot find a good fit.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
OBJ[&amp;quot;&amp;lt;b&amp;gt;One weighted two-way&amp;lt;br/&amp;gt;fixed-effects regression&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;min Σ (Y − μ − α − β − Wτ)² · ω · λ&amp;lt;/i&amp;gt;&amp;quot;]
OBJ --&amp;gt; DID[&amp;quot;&amp;lt;b&amp;gt;DiD&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;ω uniform, λ uniform&amp;lt;br/&amp;gt;α included&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;parallel trends on all controls&amp;lt;/i&amp;gt;&amp;quot;]
OBJ --&amp;gt; SC[&amp;quot;&amp;lt;b&amp;gt;Synthetic control&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;ω optimized, no λ&amp;lt;br/&amp;gt;&amp;lt;b&amp;gt;no&amp;lt;/b&amp;gt; unit FE α&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;match level AND trend&amp;lt;/i&amp;gt;&amp;quot;]
OBJ --&amp;gt; SDID[&amp;quot;&amp;lt;b&amp;gt;SDID&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;ω optimized + λ optimized&amp;lt;br/&amp;gt;α included&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;match trend, allow level gap&amp;lt;/i&amp;gt;&amp;quot;]
style OBJ fill:#141413,stroke:#6a9bcc,color:#fff
style DID fill:#d97757,stroke:#141413,color:#fff
style SC fill:#6a9bcc,stroke:#141413,color:#fff
style SDID fill:#00d4c8,stroke:#141413,color:#141413
&lt;/code>&lt;/pre>
&lt;h3 id="how-the-weights-are-chosen">How the weights are chosen&lt;/h3>
&lt;p>The &lt;strong>unit weights&lt;/strong> make the weighted controls track California&amp;rsquo;s pre-period path, with a small ridge penalty for stability:&lt;/p>
&lt;p>$$
\hat{\omega}^{sdid} = \underset{\omega \in \Omega}{\arg\min} \sum_{t=1}^{T_{pre}} \left(\omega_0 + \sum_{i=1}^{N_{co}} \omega_i\, Y_{it} - \frac{1}{N_{tr}} \sum_{i=N_{co}+1}^{N} Y_{it}\right)^{2} + \zeta^{2}\, T_{pre}\, \lVert \omega \rVert_2^{2}
$$&lt;/p>
&lt;p>In words: choose nonnegative weights summing to one (the set $\Omega$) so the weighted control outcome, plus an intercept $\omega_0$, comes as close as possible to the treated outcome &lt;strong>in every pre-treatment year&lt;/strong>. The intercept $\omega_0$ is what lets SDID match California&amp;rsquo;s &lt;em>trend&lt;/em> without matching its &lt;em>level&lt;/em>. The penalty $\zeta^{2} T_{pre} \lVert \omega \rVert_2^2$ discourages putting all weight on one or two donors; Arkhangelsky et al. set $\zeta = (N_{tr} T_{post})^{1/4}\, \hat{\sigma}$, with $\hat{\sigma}$ the standard deviation of first-differenced control outcomes.&lt;/p>
&lt;p>The &lt;strong>time weights&lt;/strong> are the mirror image — they find pre-period years whose weighted average lines up with the post-period:&lt;/p>
&lt;p>$$
\hat{\lambda}^{sdid} = \underset{\lambda \in \Lambda}{\arg\min} \sum_{i=1}^{N_{co}} \left(\lambda_0 + \sum_{t=1}^{T_{pre}} \lambda_t\, Y_{it} - \frac{1}{T_{post}} \sum_{t=T_{pre}+1}^{T} Y_{it}\right)^{2} + \zeta_{\lambda}^{2}\, N_{co}\, \lVert \lambda \rVert^{2}
$$&lt;/p>
&lt;p>This says: find pre-period year weights so the weighted pre-period control outcome matches each control&amp;rsquo;s &lt;em>post-period average&lt;/em>. Years that look most like the post-period get the most weight. We will see SDID place essentially all pre-period weight on &lt;strong>1986–1988&lt;/strong>.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th style="text-align:center">Unit weights ω&lt;/th>
&lt;th style="text-align:center">Time weights λ&lt;/th>
&lt;th style="text-align:center">Unit FE α&lt;/th>
&lt;th>Must match&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>DiD&lt;/strong>&lt;/td>
&lt;td style="text-align:center">uniform&lt;/td>
&lt;td style="text-align:center">uniform&lt;/td>
&lt;td style="text-align:center">yes&lt;/td>
&lt;td>parallel trends vs. all controls&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>SC&lt;/strong>&lt;/td>
&lt;td style="text-align:center">optimized&lt;/td>
&lt;td style="text-align:center">none&lt;/td>
&lt;td style="text-align:center">&lt;strong>no&lt;/strong>&lt;/td>
&lt;td>California&amp;rsquo;s level &lt;em>and&lt;/em> trend&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>SDID&lt;/strong>&lt;/td>
&lt;td style="text-align:center">optimized&lt;/td>
&lt;td style="text-align:center">optimized&lt;/td>
&lt;td style="text-align:center">yes&lt;/td>
&lt;td>California&amp;rsquo;s trend (level gap allowed)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;hr>
&lt;h2 id="4-loading-the-data">4. Loading the data&lt;/h2>
&lt;p>We already loaded and &lt;code>xtset&lt;/code> the panel above. The &lt;code>sdid&lt;/code> command takes the data in &lt;strong>long form&lt;/strong> and needs four arguments — outcome, unit, time, and a 0/1 treatment indicator — so no further reshaping is required. The &lt;code>synth2&lt;/code> command additionally needs a numeric panel id and &lt;code>xtset&lt;/code>, which we created with &lt;code>encode&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-stata">summarize packspercapita
tab treated
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
packsperca~a | 1,209 122.6493 35.04942 40.7 296.2
treated | Freq. Percent Cum.
------------+-----------------------------------
0 | 1,197 99.01 99.01
1 | 12 0.99 100.00
------------+-----------------------------------
&lt;/code>&lt;/pre>
&lt;p>Only &lt;strong>12&lt;/strong> of 1,209 observations are treated — California in its 12 post-1988 years. This extreme imbalance (one treated unit) is the defining feature of a comparative case study and, as we will see in Section 9, dictates how inference must be done.&lt;/p>
&lt;hr>
&lt;h2 id="5-a-first-look-the-original-difference-in-differences">5. A first look: the original difference-in-differences&lt;/h2>
&lt;p>The simplest credible estimate is a &lt;strong>2×2 difference-in-differences&lt;/strong>: compare California&amp;rsquo;s change from before to after 1989 with the control states&amp;rsquo; change over the same window. The &amp;ldquo;difference in differences&amp;rdquo; removes anything common to all states (the nationwide decline in smoking) and anything fixed about California (its lower baseline level).&lt;/p>
&lt;pre>&lt;code class="language-stata">gen byte cal = state==&amp;quot;California&amp;quot;
gen byte post = year&amp;gt;=1989
reg packspercapita i.cal##i.post
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">------------------------------------------------------------------------------
packsperca~a | Coefficient Std. err. t P&amp;gt;|t| [95% conf. interval]
-------------+----------------------------------------------------------------
1.cal | -14.359 6.788699 -2.12 0.035 -27.67799 -1.040019
1.post | -28.51142 1.747208 -16.32 0.000 -31.93932 -25.08351
|
cal#post |
1 1 | -27.34911 10.91131 -2.51 0.012 -48.75638 -5.941839
|
_cons | 130.5695 1.087062 120.11 0.000 128.4368 132.7023
------------------------------------------------------------------------------
&lt;/code>&lt;/pre>
&lt;p>The interaction &lt;code>cal#post&lt;/code> &lt;strong>= −27.35&lt;/strong> is the DiD estimate: relative to the control states, California&amp;rsquo;s smoking fell by about &lt;strong>27 packs per capita&lt;/strong> after Proposition 99. We can read the four group means straight off the table: control states averaged 130.57 packs before and 102.06 after (a drop of 28.5), while California went from 116.21 to 60.35 (a drop of 55.86). The difference of those drops, $-55.86 - (-28.51) = -27.35$, is the DiD.&lt;/p>
&lt;p>But this number trusts the &lt;strong>parallel-trends&lt;/strong> assumption against the &lt;em>simple average&lt;/em> of 38 very different states — and the raw-trends figure showed California was already drifting away from that average before 1989. If California was on a steeper downward path for reasons unrelated to the policy, DiD will overstate the effect. This is the weakness synthetic methods are designed to fix.&lt;/p>
&lt;hr>
&lt;h2 id="6-the-original-synthetic-control-with-synth2">6. The original synthetic control with &lt;code>synth2&lt;/code>&lt;/h2>
&lt;p>Synthetic control replaces the &lt;em>simple&lt;/em> average of controls with a &lt;em>weighted&lt;/em> average chosen to track California before 1989. We fit it with &lt;strong>&lt;code>synth2&lt;/code>&lt;/strong> (Yan and Chen 2023), a modern wrapper around Abadie&amp;rsquo;s &lt;code>synth&lt;/code> that adds placebo tests and visualization. Because our panel has only the outcome, we match on the &lt;strong>full pre-period path&lt;/strong> — each pre-1989 year of &lt;code>packspercapita&lt;/code> enters as its own predictor. This is the fair, like-for-like analog to what SDID uses.&lt;/p>
&lt;pre>&lt;code class="language-stata">* California is id 3 after encode (alphabetical)
local preds
forvalues y = 1970/1988 {
local preds &amp;quot;`preds' packspercapita(`y')&amp;quot;
}
synth2 packspercapita `preds', trunit(3) trperiod(1989) figure
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Number of Control Units = 38 Root Mean Squared Error = 1.65640
Number of Covariates = 19 R-squared = 0.97699
Optimal Unit Weights:
---------------------------
Unit | U.weight
--------------+------------
Utah | 0.3940
Montana | 0.2320
Nevada | 0.2050
Connecticut | 0.1090
NewHampshire | 0.0450
Colorado | 0.0150
---------------------------
Note: The average treatment effect over the posttreatment period is -19.4814.
&lt;/code>&lt;/pre>
&lt;p>The pre-period fit is excellent — a root mean squared prediction error of &lt;strong>1.66 packs&lt;/strong> and an $R^2$ of &lt;strong>0.98&lt;/strong>, meaning synthetic California reproduces real California almost exactly before 1989. The synthetic is built from just &lt;strong>six&lt;/strong> donors, dominated by &lt;strong>Utah (0.39), Montana (0.23), and Nevada (0.21)&lt;/strong> — states that smoked like California before the program. The estimated effect averages &lt;strong>−19.48 packs per capita&lt;/strong> over 1989–2000, smaller than the naive DiD&amp;rsquo;s −27.35: once we compare California to states that actually looked like it, part of the apparent drop turns out to be the wrong comparison group, not the policy.&lt;/p>
&lt;p>&lt;img src="stata_sdid_sc_path.png" alt="Synthetic California (blue dashed) tracks real California (orange) almost perfectly before 1989, then the two separate sharply.">&lt;/p>
&lt;p>The fit before 1989 is the whole credibility argument for synthetic control: if the synthetic matches California for nineteen years and then diverges exactly when the policy starts, the divergence is plausibly the policy. The next figure shows the same thing as a single &lt;strong>gap&lt;/strong> series.&lt;/p>
&lt;p>&lt;img src="stata_sdid_sc_gap.png" alt="The estimated gap (California minus synthetic) hugs zero before 1989 and then falls steadily to about −27 packs by 2000.">&lt;/p>
&lt;p>The gap is essentially flat and near zero through 1988 — the pre-period fit is good — and then opens up after the policy, reaching roughly &lt;strong>−27 packs by 2000&lt;/strong>. Averaged over the post-period, that is the −19.5 headline. The growing gap is consistent with a program whose effect compounds as the tax and campaign change long-run behavior.&lt;/p>
&lt;hr>
&lt;h2 id="7-synthetic-difference-in-differences-with-sdid">7. Synthetic difference-in-differences with &lt;code>sdid&lt;/code>&lt;/h2>
&lt;p>Now SDID. The syntax mirrors the data structure — outcome, unit, time, treatment — and one option, &lt;code>vce()&lt;/code>, selects the inference method. We start with &lt;code>vce(noinference)&lt;/code> to focus on the point estimate and the diagnostic graph.&lt;/p>
&lt;pre>&lt;code class="language-stata">sdid packspercapita state year treated, method(sdid) vce(noinference) graph
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Synthetic Difference-in-Differences Estimator
-----------------------------------------------------------------------------
packsperca~a | ATT Std. Err. t P&amp;gt;|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------
treated | -15.60383 . . . . .
-----------------------------------------------------------------------------
&lt;/code>&lt;/pre>
&lt;p>The SDID estimate is &lt;strong>−15.60 packs per capita&lt;/strong> — smaller again than both DiD (−27.35) and SC (−19.48). Relative to the level SDID implies California &lt;em>would&lt;/em> have smoked, this is roughly a &lt;strong>20% reduction&lt;/strong>, and it is the number reported in Arkhangelsky et al. (2021). Why is it smaller than SC&amp;rsquo;s? Because SDID does two things SC does not: it allows a constant level gap (so it is not forced to fit California&amp;rsquo;s &lt;em>level&lt;/em>, only its &lt;em>trend&lt;/em>), and it down-weights pre-period years that look nothing like the late 1980s. Both make the comparison more conservative.&lt;/p>
&lt;p>The &lt;code>graph&lt;/code> option produces SDID&amp;rsquo;s signature diagnostic.&lt;/p>
&lt;p>&lt;img src="stata_sdid_sdid_main.png" alt="The SDID diagnostic: California (red) versus the trend-matched synthetic control (blue dashed), which sits above California by a roughly constant gap because SDID matches trends, not levels. The green ribbon at the bottom shows the time weights, concentrated on 1986–1988.">&lt;/p>
&lt;p>Two things are worth noticing. First, the synthetic &amp;ldquo;Control&amp;rdquo; line sits &lt;strong>above&lt;/strong> California throughout — SDID does not try to close that level gap, because the unit fixed effect absorbs it. What SDID cares about is whether the two lines stay &lt;strong>parallel&lt;/strong> before 1989 (they do) and then diverge after (they do). Second, the green shaded ribbon shows the &lt;strong>time weights&lt;/strong> $\hat{\lambda}_t$ — and they are not uniform.&lt;/p>
&lt;p>&lt;img src="stata_sdid_lambda.png" alt="SDID&amp;amp;rsquo;s pre-period time weights fall almost entirely on 1986, 1987, and 1988 (0.37, 0.21, 0.43); earlier years get zero.">&lt;/p>
&lt;p>This is SDID&amp;rsquo;s distinctive move. Of the nineteen pre-policy years, it places &lt;strong>all&lt;/strong> pre-period weight on &lt;strong>1986–1988&lt;/strong> — the years most similar to the post-1989 period — and zero on 1970–1985. Intuitively, smoking behavior and its determinants in 1972 tell us little about the counterfactual for 1995; the late 1980s tell us much more. DiD and SC, by contrast, treat 1972 and 1988 as equally informative. We can confirm which states and years carry weight by asking &lt;code>sdid&lt;/code> to return them:&lt;/p>
&lt;pre>&lt;code class="language-stata">sdid packspercapita state year treated, vce(noinference) returnweights mattitles
&lt;/code>&lt;/pre>
&lt;p>The returned unit weights $\hat{\omega}_i$ are &lt;strong>diffuse&lt;/strong> compared with synthetic control&amp;rsquo;s: the largest are Nevada (0.12), New Hampshire (0.11), Connecticut (0.08), Delaware (0.07), and Colorado (0.06), with positive weight spread across roughly twenty states. Where &lt;code>synth2&lt;/code> leaned on six donors, SDID&amp;rsquo;s ridge penalty spreads the weight — trading a little pre-period fit for a more stable, less idiosyncratic comparison group. Both methods nonetheless agree on the &lt;em>kind&lt;/em> of state that resembles California: Nevada, Utah, Montana, Connecticut, and Colorado appear prominently in both.&lt;/p>
&lt;hr>
&lt;h2 id="8-one-command-three-estimators">8. One command, three estimators&lt;/h2>
&lt;p>Here is the practical payoff emphasized by Clarke et al. (2024): the &lt;code>sdid&lt;/code> command implements all three estimators in an &lt;strong>identical framework&lt;/strong>. You do not switch packages or rewrite your model — you change the single option &lt;code>method()&lt;/code>. Estimation, inference (&lt;code>vce()&lt;/code>), and the diagnostic &lt;code>graph&lt;/code> all work the same way for each.&lt;/p>
&lt;pre>&lt;code class="language-stata">sdid packspercapita state year treated, method(did) vce(noinference) graph
sdid packspercapita state year treated, method(sc) vce(noinference) graph
sdid packspercapita state year treated, method(sdid) vce(noinference) graph
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">DiD (sdid framework) = -27.34911
SC (sdid framework) = -19.61966
SDID = -15.60383
&lt;/code>&lt;/pre>
&lt;p>This is a strong internal consistency check. The framework&amp;rsquo;s &lt;code>method(did)&lt;/code> returns &lt;strong>−27.349&lt;/strong> — &lt;em>identical&lt;/em>, to the decimal, to the raw 2×2 interaction we computed by hand with &lt;code>reg&lt;/code> in Section 5. And &lt;code>method(sc)&lt;/code> returns &lt;strong>−19.620&lt;/strong>, essentially the same as the &lt;strong>−19.481&lt;/strong> from the standalone &lt;code>synth2&lt;/code> command (the tiny gap reflects different regularization: &lt;code>sdid&lt;/code> matches the full pre-period path with a ridge penalty, while &lt;code>synth2&lt;/code> optimizes Abadie&amp;rsquo;s predictor-weighting V-matrix). In other words, the unified command reproduces the two classic estimators we obtained by entirely separate routes — which is exactly the claim that they are special cases of one weighted regression. And because the optimal weights are computed once and reused across &lt;code>vce()&lt;/code> options, doing so is computationally cheap.&lt;/p>
&lt;p>The same &lt;code>graph&lt;/code> option yields each method&amp;rsquo;s diagnostic, so they can be read side by side.&lt;/p>
&lt;p>&lt;img src="stata_sdid_did_panel.png" alt="Difference-in-differences in the sdid framework: California versus the equally-weighted control average. Note there are no time weights — every pre-period year counts the same.">&lt;/p>
&lt;p>&lt;img src="stata_sdid_sc_panel.png" alt="Synthetic control in the sdid framework: California versus an optimally weighted synthetic, again with uniform time weights but optimized unit weights.">&lt;/p>
&lt;p>Stacking all four counterfactuals on one chart makes the ranking transparent. To put them on a common scale, the SDID counterfactual is anchored to California by its $\lambda$-weighted pre-period gap (recall SDID identifies effects only up to a constant level, which the unit fixed effect absorbs).&lt;/p>
&lt;p>&lt;img src="stata_sdid_compare_paths.png" alt="All four counterfactuals track California before 1989, then separate. The DiD counterfactual sits highest (largest estimated effect, −27), synthetic control next (−19.5), and SDID closest to California (smallest effect, −15.6).">&lt;/p>
&lt;p>The story is consistent across methods — Proposition 99 &lt;strong>reduced&lt;/strong> smoking — but the magnitude depends on how the counterfactual is built. The naive DiD is the most extreme because it compares California to a control average that was already on a different trajectory. Synthetic control fixes the comparison group and shrinks the estimate to about −19.5. SDID, by additionally allowing a level gap and weighting the informative late-1980s years, is the most conservative at −15.6. Reasonable methods bracket the truth; SDID&amp;rsquo;s contribution is to be robust to the assumption — exact parallel trends — that the others lean on hardest.&lt;/p>
&lt;p>Collecting every estimate in one place:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Command&lt;/th>
&lt;th style="text-align:center">ATT (packs per capita)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Raw 2×2 DiD&lt;/td>
&lt;td>&lt;code>reg y i.cal##i.post&lt;/code>&lt;/td>
&lt;td style="text-align:center">−27.35&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>DiD (unified)&lt;/td>
&lt;td>&lt;code>sdid …, method(did)&lt;/code>&lt;/td>
&lt;td style="text-align:center">−27.35&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Synthetic control&lt;/td>
&lt;td>&lt;code>synth2 …&lt;/code>&lt;/td>
&lt;td style="text-align:center">−19.48&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>SC (unified)&lt;/td>
&lt;td>&lt;code>sdid …, method(sc)&lt;/code>&lt;/td>
&lt;td style="text-align:center">−19.62&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>SDID&lt;/strong>&lt;/td>
&lt;td>&lt;code>sdid …, method(sdid)&lt;/code>&lt;/td>
&lt;td style="text-align:center">&lt;strong>−15.60&lt;/strong>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;hr>
&lt;h2 id="9-inference-how-sure-are-we">9. Inference: how sure are we?&lt;/h2>
&lt;p>A point estimate is not enough; we need a standard error. SDID&amp;rsquo;s variance feeds a familiar normal-approximation confidence interval:&lt;/p>
&lt;p>$$
\hat{\tau}^{sdid} \pm z_{\alpha/2} \sqrt{\hat{V}_{\tau}}
$$&lt;/p>
&lt;p>Arkhangelsky et al. (2021) offer three ways to estimate $\hat{V}_{\tau}$: a &lt;strong>bootstrap&lt;/strong>, a &lt;strong>jackknife&lt;/strong>, and a &lt;strong>placebo&lt;/strong> (permutation) procedure. The choice is not free here — it is forced by our design. With a &lt;strong>single treated unit&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>The &lt;strong>jackknife&lt;/strong> is literally &lt;strong>undefined&lt;/strong>. It works by deleting one unit at a time and re-estimating; when it deletes California, there is no treated unit left, so the treated-removed estimate does not exist.&lt;/li>
&lt;li>The &lt;strong>bootstrap&lt;/strong> relies on resampling &lt;em>many&lt;/em> treated units; its asymptotics require the number of treated units to grow. With one treated unit it is unreliable.&lt;/li>
&lt;li>The &lt;strong>placebo&lt;/strong> procedure is the one valid option. It keeps the controls, repeatedly assigns the treatment structure to a &lt;em>control&lt;/em> state as a fake &amp;ldquo;placebo&amp;rdquo; treatment, re-estimates the effect, and uses the spread of those placebo estimates as the variance.&lt;/li>
&lt;/ul>
&lt;pre>&lt;code class="language-mermaid">graph TD
Q{&amp;quot;How many&amp;lt;br/&amp;gt;treated units?&amp;quot;}
Q --&amp;gt;|&amp;quot;One — e.g. California&amp;quot;| PL[&amp;quot;&amp;lt;b&amp;gt;Placebo / permutation&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;the valid choice here&amp;lt;/i&amp;gt;&amp;quot;]
Q --&amp;gt;|&amp;quot;Many — e.g. staggered adoption&amp;quot;| BJ[&amp;quot;Bootstrap or jackknife&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;asymptotics in number of treated units&amp;lt;/i&amp;gt;&amp;quot;]
PL --&amp;gt; THIS[&amp;quot;this tutorial&amp;lt;br/&amp;gt;vce(placebo)&amp;quot;]
BJ --&amp;gt; OOS[&amp;quot;out of scope&amp;lt;br/&amp;gt;(needs another design)&amp;quot;]
style Q fill:#141413,stroke:#6a9bcc,color:#fff
style PL fill:#00d4c8,stroke:#141413,color:#141413
style THIS fill:#6a9bcc,stroke:#141413,color:#fff
style BJ fill:#6a9bcc,stroke:#141413,color:#fff
style OOS fill:#d97757,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>So we run placebo inference, the appropriate choice for a comparative case study.&lt;/p>
&lt;pre>&lt;code class="language-stata">sdid packspercapita state year treated, vce(placebo) seed(1213)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Synthetic Difference-in-Differences Estimator
-----------------------------------------------------------------------------
packsperca~a | ATT Std. Err. t P&amp;gt;|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------
treated | -15.60383 9.87941 -1.58 0.114 -34.96712 3.75946
-----------------------------------------------------------------------------
95% CIs and p-values are based on large-sample approximations.
&lt;/code>&lt;/pre>
&lt;p>The placebo standard error is &lt;strong>9.88&lt;/strong>, giving a 95% interval of roughly &lt;strong>[−35.0, 3.8]&lt;/strong>. Notice this interval &lt;strong>includes zero&lt;/strong>: by the normal-approximation criterion, we cannot reject &amp;ldquo;no effect&amp;rdquo; at the 5% level ($p = 0.114$). With a single treated unit and a noisy donor pool, the SDID interval is genuinely wide — honest about how hard it is to be certain from one case.&lt;/p>
&lt;p>But the normal approximation is not the only — or the sharpest — way to use the placebo distribution. We can also run an explicit &lt;strong>permutation test&lt;/strong>: assign the placebo treatment to each control state in turn, collect the placebo effects, and ask how California&amp;rsquo;s real estimate ranks against them.&lt;/p>
&lt;pre>&lt;code class="language-stata">* assign each control as a placebo-treated unit, collect placebo ATTs
drop if state==&amp;quot;California&amp;quot;
levelsof state, local(ctrls)
foreach s of local ctrls {
preserve
gen byte ptreat = (state==&amp;quot;`s'&amp;quot;) &amp;amp; (year&amp;gt;=1989)
sdid packspercapita state year ptreat, vce(noinference)
* store e(ATT)
restore
}
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_sdid_placebo_hist.png" alt="California&amp;amp;rsquo;s estimated effect (orange line, −15.6) sits in the extreme left tail of the placebo distribution; almost every control state shows an effect near zero.">&lt;/p>
&lt;p>The placebo effects for control states cluster around &lt;strong>zero&lt;/strong> — reassuring, since those states passed no comparable policy — while California&amp;rsquo;s &lt;strong>−15.6&lt;/strong> lands far in the left tail. Only &lt;strong>1 of 38&lt;/strong> control states produced a placebo effect as large in magnitude as California&amp;rsquo;s, a permutation &lt;strong>p-value of 0.026&lt;/strong>. So the two inferential lenses tell complementary stories: the rank-based permutation test says California&amp;rsquo;s drop is very unlikely to be noise (significant at 5%), while the conservative normal-approximation interval reminds us that, with a single treated unit, the &lt;em>precision&lt;/em> of the magnitude is limited. Reporting both is the honest summary.&lt;/p>
&lt;h3 id="other-inference-designs-out-of-scope">Other inference designs (out of scope)&lt;/h3>
&lt;p>It would be wrong to conclude that bootstrap and jackknife are &amp;ldquo;bad&amp;rdquo; — they are simply built for a &lt;strong>different design&lt;/strong>. They come into their own when there are &lt;strong>many treated units&lt;/strong>, especially under &lt;strong>staggered adoption&lt;/strong>, where units adopt the policy at different times. In that setting the ATT is an average of adoption-cohort-specific effects,&lt;/p>
&lt;p>$$
\widehat{ATT} = \sum_{a \in A} \frac{T_{post}^{a}}{T_{post}}\ \hat{\tau}_a^{sdid}
$$&lt;/p>
&lt;p>and with many treated units the asymptotic arguments behind the bootstrap and jackknife hold. The &lt;code>sdid&lt;/code> command supports all of this — &lt;code>vce(bootstrap)&lt;/code>, &lt;code>vce(jackknife)&lt;/code>, covariate adjustment, and staggered timing — but those tools require a genuinely different research design (multiple treated units adopting at multiple times) than California&amp;rsquo;s single 1989 intervention. We deliberately keep this tutorial to the &lt;strong>block design with one treated unit&lt;/strong>, where the placebo procedure is the right and sufficient tool. The staggered case, with its own estimation and inference, is a natural next tutorial.&lt;/p>
&lt;hr>
&lt;h2 id="10-robustness-and-discussion">10. Robustness and discussion&lt;/h2>
&lt;p>What should we take away about Proposition 99? Three independent constructions of the counterfactual — DiD, synthetic control, and SDID — all agree the policy &lt;strong>reduced&lt;/strong> smoking, with estimates from −15.6 to −27.3 packs per capita. The disagreement is informative rather than alarming: it maps directly onto how much each method trusts the comparison group.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>DiD (−27.35)&lt;/strong> trusts that California would have moved parallel to the &lt;em>average&lt;/em> of 38 heterogeneous states. The pre-period figure shows that average was already diverging from California, so DiD likely overstates the effect.&lt;/li>
&lt;li>&lt;strong>Synthetic control (−19.48)&lt;/strong> fixes the comparison group to states that actually resembled California (Utah, Montana, Nevada). Its pre-period fit is excellent (RMSE 1.66), which is the evidence for its credibility.&lt;/li>
&lt;li>&lt;strong>SDID (−15.60)&lt;/strong> additionally allows a constant level gap and concentrates on the informative late-1980s years. It is the most robust to a violation of exact parallel trends, and the most conservative.&lt;/li>
&lt;/ul>
&lt;p>The honest range, then, is something like &amp;ldquo;Proposition 99 cut cigarette consumption by &lt;strong>roughly 16–20 packs per capita per year&lt;/strong>, plausibly larger by the end of the 1990s,&amp;rdquo; with SDID the preferred single number because it leans least on the assumption most likely to fail.&lt;/p>
&lt;p>A few caveats apply to all three estimates. With &lt;strong>one treated unit&lt;/strong>, statistical power is inherently limited — the SDID confidence interval includes zero even though the permutation test is significant, and no method can fully escape that. The placebo variance assumes &lt;strong>homoskedasticity across units&lt;/strong> (the placebo treatments are drawn only from controls). And like every comparative case study, identification assumes &lt;strong>no other large shock hit California alone&lt;/strong> in 1989 and &lt;strong>no spillovers&lt;/strong> to the donor states (if Californians bought cigarettes across state lines, neighboring donors are contaminated). These are assumptions to argue substantively, not settle statistically.&lt;/p>
&lt;hr>
&lt;h2 id="11-summary-and-key-takeaways">11. Summary and key takeaways&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Method.&lt;/strong> SDID is one weighted two-way fixed-effects regression. DiD is the special case with uniform weights; synthetic control is the special case with unit weights but no time weights and no unit fixed effect. SDID uses &lt;strong>both&lt;/strong> unit and time weights and keeps the unit fixed effect, so it matches California&amp;rsquo;s pre-period &lt;em>trend&lt;/em> while allowing a constant &lt;em>level&lt;/em> gap.&lt;/li>
&lt;li>&lt;strong>Data.&lt;/strong> On the Proposition 99 panel, the estimates are DiD &lt;strong>−27.35&lt;/strong>, synthetic control &lt;strong>−19.48&lt;/strong>, and SDID &lt;strong>−15.60&lt;/strong> packs per capita — the same direction, with magnitude shrinking as the comparison group becomes more credible. SDID&amp;rsquo;s time weights land entirely on &lt;strong>1986–1988&lt;/strong>.&lt;/li>
&lt;li>&lt;strong>One framework.&lt;/strong> The single &lt;code>sdid&lt;/code> command reproduced the hand-computed 2×2 DiD &lt;em>exactly&lt;/em> (−27.35) and the standalone &lt;code>synth2&lt;/code> synthetic control closely (−19.62 vs −19.48), confirming that all three are special cases of one estimator and can be run, with inference and graphs, from one command.&lt;/li>
&lt;li>&lt;strong>Inference.&lt;/strong> With a single treated unit, &lt;strong>placebo&lt;/strong> is the valid procedure: jackknife is undefined and the bootstrap is unreliable. The placebo SE is 9.88 (95% CI [−35.0, 3.8], which includes zero), while the permutation test gives &lt;strong>p = 0.026&lt;/strong>. Report both.&lt;/li>
&lt;li>&lt;strong>Limitation and next step.&lt;/strong> One treated unit means limited power. The natural extension is &lt;strong>staggered adoption&lt;/strong> with many treated units, where &lt;code>vce(bootstrap)&lt;/code> and &lt;code>vce(jackknife)&lt;/code> become appropriate and covariates can be added — a different design, and a good follow-up tutorial.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="12-exercises">12. Exercises&lt;/h2>
&lt;ol>
&lt;li>&lt;strong>Weights side by side.&lt;/strong> Re-run &lt;code>sdid …, method(sc) vce(noinference) returnweights&lt;/code> and compare its unit weights to the &lt;code>synth2&lt;/code> donor weights from Section 6. Which states appear in both? Why does the &lt;code>sdid&lt;/code> version spread weight more widely? (Hint: the ridge penalty $\zeta$.)&lt;/li>
&lt;li>&lt;strong>Placebo stability.&lt;/strong> Re-estimate &lt;code>sdid …, vce(placebo) seed(1213)&lt;/code> with a different &lt;code>seed()&lt;/code> and with more replications via &lt;code>reps()&lt;/code>. How much does the standard error move? What does that tell you about reading a single placebo SE to three decimal places?&lt;/li>
&lt;li>&lt;strong>Time weights matter.&lt;/strong> Inspect &lt;code>e(lambda)&lt;/code> after the SDID run and confirm the weight on 1986–1988. Then think through: if you forced uniform time weights (as DiD and SC do), would you expect the estimate to move toward or away from the DiD number? Check your intuition by comparing &lt;code>method(sdid)&lt;/code> with &lt;code>method(sc)&lt;/code>.&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>Arkhangelsky, D., Athey, S., Hsiao, D. A., Imbens, G. W., and Wager, S. (2021). &lt;a href="https://doi.org/10.1257/aer.20190159" target="_blank" rel="noopener">Synthetic Difference-in-Differences&lt;/a>. &lt;em>American Economic Review&lt;/em> 111(12): 4088–4118.&lt;/li>
&lt;li>Clarke, D., Pailañir, D., Athey, S., and Imbens, G. (2024). &lt;a href="https://doi.org/10.1177/1536867X241297184" target="_blank" rel="noopener">On Synthetic Difference-in-Differences and Related Estimation Methods in Stata&lt;/a>. &lt;em>The Stata Journal&lt;/em> (st0757). The &lt;code>sdid&lt;/code> command.&lt;/li>
&lt;li>Abadie, A., Diamond, A., and Hainmueller, J. (2010). &lt;a href="https://doi.org/10.1198/jasa.2009.ap08746" target="_blank" rel="noopener">Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California&amp;rsquo;s Tobacco Control Program&lt;/a>. &lt;em>Journal of the American Statistical Association&lt;/em> 105(490): 493–505.&lt;/li>
&lt;li>Abadie, A., and Gardeazabal, J. (2003). &lt;a href="https://doi.org/10.1257/000282803321455188" target="_blank" rel="noopener">The Economic Costs of Conflict: A Case Study of the Basque Country&lt;/a>. &lt;em>American Economic Review&lt;/em> 93(1): 113–132.&lt;/li>
&lt;li>Yan, G., and Chen, Q. (2023). &lt;a href="https://doi.org/10.1177/1536867X231195278" target="_blank" rel="noopener">synth2: Synthetic Control Method with Placebo Tests, Robustness Test and Visualization&lt;/a>. &lt;em>The Stata Journal&lt;/em> 23(3): 597–624. The &lt;code>synth2&lt;/code> command.&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Related tutorials on this site:&lt;/strong> &lt;a href="https://carlos-mendez.org/post/stata_sc/">Synthetic control in Stata&lt;/a> · &lt;a href="https://carlos-mendez.org/post/stata_did/">Difference-in-differences in Stata&lt;/a> · &lt;a href="https://carlos-mendez.org/post/stata_honestdid/">Sensitivity analysis for parallel trends (honestdid)&lt;/a> · &lt;a href="https://carlos-mendez.org/post/r_sc_bayes_spatial/">Bayesian spatial synthetic control for Proposition 99 (R)&lt;/a>&lt;/p>
&lt;h2 id="acknowledgments">Acknowledgments&lt;/h2>
&lt;p>The analysis uses the &lt;code>sdid&lt;/code> (Clarke, Pailañir, Athey, and Imbens) and &lt;code>synth2&lt;/code> (Yan and Chen) Stata packages and the Proposition 99 dataset distributed with &lt;code>sdid&lt;/code>. AI tools (Claude Code, with NotebookLM for the audio summary) assisted in drafting and exposition; all code was executed and all numbers verified by the author, who is responsible for any remaining errors.&lt;/p>
&lt;hr>
&lt;style>
.podcast-overlay {
display: none;
position: fixed;
bottom: 0;
left: 0;
right: 0;
z-index: 9999;
animation: podSlideUp 0.35s ease-out;
}
@keyframes podSlideUp {
from { transform: translateY(100%); }
to { transform: translateY(0); }
}
.podcast-overlay.pod-closing {
animation: podSlideDown 0.3s ease-in forwards;
}
@keyframes podSlideDown {
from { transform: translateY(0); }
to { transform: translateY(100%); }
}
.podcast-container {
background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%);
padding: 18px 24px 20px;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
box-shadow: 0 -4px 32px rgba(0,0,0,0.5);
border-top: 1px solid rgba(106,155,204,0.2);
}
.podcast-inner {
max-width: 800px;
margin: 0 auto;
}
.podcast-top-row {
display: flex;
align-items: center;
gap: 14px;
margin-bottom: 14px;
}
.podcast-icon {
width: 42px;
height: 42px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 10px;
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
}
.podcast-icon svg {
width: 22px;
height: 22px;
fill: #fff;
}
.podcast-title-block {
flex: 1;
min-width: 0;
}
.podcast-title-block h4 {
margin: 0 0 1px 0;
color: #f0ece2;
font-size: 14px;
font-weight: 600;
letter-spacing: 0.02em;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.podcast-title-block span {
color: #8b9dc3;
font-size: 11px;
}
.podcast-close-btn {
background: none;
border: none;
cursor: pointer;
padding: 6px;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.2s;
flex-shrink: 0;
}
.podcast-close-btn:hover {
background: rgba(255,255,255,0.1);
}
.podcast-close-btn svg {
width: 20px;
height: 20px;
fill: #8b9dc3;
}
.podcast-progress-wrap {
margin-bottom: 12px;
}
.podcast-time-row {
display: flex;
justify-content: space-between;
font-size: 11px;
color: #8b9dc3;
margin-bottom: 5px;
font-variant-numeric: tabular-nums;
}
.podcast-bar-bg {
width: 100%;
height: 6px;
background: rgba(255,255,255,0.1);
border-radius: 3px;
cursor: pointer;
position: relative;
overflow: hidden;
transition: height 0.15s;
}
.podcast-bar-buffered {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: rgba(106,155,204,0.25);
border-radius: 3px;
transition: width 0.3s;
}
.podcast-bar-progress {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: linear-gradient(90deg, #6a9bcc, #00d4c8);
border-radius: 3px;
transition: width 0.1s linear;
}
.podcast-bar-bg:hover {
height: 10px;
margin-top: -2px;
}
.podcast-controls-row {
display: flex;
align-items: center;
justify-content: space-between;
}
.podcast-transport {
display: flex;
align-items: center;
gap: 8px;
}
.podcast-btn {
background: none;
border: none;
cursor: pointer;
padding: 4px;
display: flex;
align-items: center;
justify-content: center;
border-radius: 50%;
transition: all 0.2s;
}
.podcast-btn svg {
fill: #c8d0e0;
transition: fill 0.2s;
}
.podcast-btn:hover svg {
fill: #f0ece2;
}
.podcast-btn-skip {
position: relative;
}
.podcast-btn-skip span {
position: absolute;
font-size: 7px;
font-weight: 700;
color: #c8d0e0;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
pointer-events: none;
margin-top: 1px;
}
.podcast-btn-play {
width: 48px;
height: 48px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 50%;
box-shadow: 0 3px 12px rgba(217,119,87,0.4);
transition: all 0.2s;
}
.podcast-btn-play:hover {
transform: scale(1.08);
box-shadow: 0 5px 20px rgba(217,119,87,0.5);
}
.podcast-btn-play svg {
fill: #fff;
width: 22px;
height: 22px;
}
.podcast-extras {
display: flex;
align-items: center;
gap: 10px;
}
.podcast-volume-wrap {
display: flex;
align-items: center;
gap: 5px;
}
.podcast-volume-wrap svg {
fill: #8b9dc3;
width: 16px;
height: 16px;
cursor: pointer;
flex-shrink: 0;
}
.podcast-volume-wrap svg:hover {
fill: #c8d0e0;
}
.podcast-volume-slider {
-webkit-appearance: none;
appearance: none;
width: 60px;
height: 4px;
background: rgba(255,255,255,0.12);
border-radius: 2px;
outline: none;
cursor: pointer;
}
.podcast-volume-slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 12px;
height: 12px;
background: #6a9bcc;
border-radius: 50%;
cursor: pointer;
}
.podcast-speed-btn {
background: rgba(255,255,255,0.08);
border: 1px solid rgba(255,255,255,0.12);
color: #c8d0e0;
font-size: 11px;
font-weight: 600;
padding: 3px 9px;
border-radius: 12px;
cursor: pointer;
transition: all 0.2s;
font-family: inherit;
min-width: 40px;
text-align: center;
}
.podcast-speed-btn:hover {
background: rgba(106,155,204,0.2);
border-color: #6a9bcc;
color: #f0ece2;
}
.podcast-download-btn {
background: none;
border: 1px solid rgba(255,255,255,0.12);
border-radius: 8px;
padding: 4px 10px;
cursor: pointer;
display: flex;
align-items: center;
gap: 4px;
color: #8b9dc3;
font-size: 11px;
font-family: inherit;
text-decoration: none;
transition: all 0.2s;
}
.podcast-download-btn:hover {
border-color: #6a9bcc;
color: #f0ece2;
background: rgba(106,155,204,0.1);
}
.podcast-download-btn svg {
width: 14px;
height: 14px;
fill: currentColor;
}
@media (max-width: 600px) {
.podcast-container { padding: 14px 16px 16px; }
.podcast-volume-wrap { display: none; }
.podcast-title-block h4 { font-size: 13px; }
.podcast-extras { gap: 8px; }
}
&lt;/style>
&lt;div class="podcast-overlay" id="podOverlay">
&lt;div class="podcast-container">
&lt;div class="podcast-inner">
&lt;audio id="podAudio" preload="none" src="https://files.catbox.moe/wybbqc.m4a">&lt;/audio>
&lt;div class="podcast-top-row">
&lt;div class="podcast-icon">
&lt;svg viewBox="0 0 24 24">&lt;path d="M12 1a5 5 0 0 0-5 5v4a5 5 0 0 0 10 0V6a5 5 0 0 0-5-5zm0 16a7 7 0 0 1-7-7H3a9 9 0 0 0 8 8.94V22h2v-3.06A9 9 0 0 0 21 10h-2a7 7 0 0 1-7 7z"/>&lt;/svg>
&lt;/div>
&lt;div class="podcast-title-block">
&lt;h4>AI Podcast: Synthetic Difference-in-Differences&lt;/h4>
&lt;span id="podDurationLabel">Click play to load&lt;/span>
&lt;/div>
&lt;button class="podcast-close-btn" onclick="podClose()" title="Close player">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"/>&lt;/svg>
&lt;/button>
&lt;/div>
&lt;div class="podcast-progress-wrap">
&lt;div class="podcast-time-row">
&lt;span id="podCurrent">0:00&lt;/span>
&lt;span id="podDuration">0:00&lt;/span>
&lt;/div>
&lt;div class="podcast-bar-bg" id="podBarBg" onclick="podSeek(event)">
&lt;div class="podcast-bar-buffered" id="podBuffered">&lt;/div>
&lt;div class="podcast-bar-progress" id="podProgress">&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class="podcast-controls-row">
&lt;div class="podcast-transport">
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(-15)" title="Back 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1L7 6l5 5V7c3.31 0 6 2.69 6 6s-2.69 6-6 6-6-2.69-6-6H4c0 4.42 3.58 8 8 8s8-3.58 8-8-3.58-8-8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-play" id="podPlayBtn" onclick="podToggle()" title="Play">
&lt;svg id="podIconPlay" viewBox="0 0 24 24">&lt;path d="M8 5v14l11-7z"/>&lt;/svg>
&lt;svg id="podIconPause" viewBox="0 0 24 24" style="display:none">&lt;path d="M6 19h4V5H6v14zm8-14v14h4V5h-4z"/>&lt;/svg>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(15)" title="Forward 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1l5 5-5 5V7c-3.31 0-6 2.69-6 6s2.69 6 6 6 6-2.69 6-6h2c0 4.42-3.58 8-8 8s-8-3.58-8-8 3.58-8 8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;/div>
&lt;div class="podcast-extras">
&lt;div class="podcast-volume-wrap">
&lt;svg id="podVolIcon" onclick="podMute()" viewBox="0 0 24 24">&lt;path d="M3 9v6h4l5 5V4L7 9H3zm13.5 3A4.5 4.5 0 0 0 14 8.5v7a4.47 4.47 0 0 0 2.5-3.5zM14 3.23v2.06a6.51 6.51 0 0 1 0 13.42v2.06A8.51 8.51 0 0 0 14 3.23z"/>&lt;/svg>
&lt;input type="range" class="podcast-volume-slider" id="podVolume" min="0" max="1" step="0.05" value="0.8">
&lt;/div>
&lt;button class="podcast-speed-btn" id="podSpeedBtn" onclick="podCycleSpeed()" title="Playback speed">1x&lt;/button>
&lt;a class="podcast-download-btn" href="https://files.catbox.moe/wybbqc.m4a" target="_blank" rel="noopener" title="Stream">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 9h-4V3H9v6H5l7 7 7-7zM5 18v2h14v-2H5z"/>&lt;/svg>
&lt;/a>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;script>
(function(){
var overlay = document.getElementById('podOverlay');
var a = document.getElementById('podAudio');
var speeds = [0.75, 1, 1.25, 1.5, 2];
var si = 1;
var opened = false;
function fmt(s){
if(isNaN(s)) return '0:00';
var m=Math.floor(s/60), sec=Math.floor(s%60);
return m+':'+(sec&lt;10?'0':'')+sec;
}
document.addEventListener('click', function(e){
var link = e.target.closest('a.btn-page-header');
if(!link) return;
var text = link.textContent.trim();
if(text.indexOf('AI Podcast') === -1) return;
e.preventDefault();
e.stopPropagation();
overlay.style.display = 'block';
overlay.classList.remove('pod-closing');
if(!opened){
a.preload = 'metadata';
a.load();
opened = true;
}
});
a.volume = 0.8;
a.addEventListener('loadedmetadata', function(){
document.getElementById('podDuration').textContent = fmt(a.duration);
document.getElementById('podDurationLabel').textContent = fmt(a.duration) + ' minutes';
});
a.addEventListener('timeupdate', function(){
document.getElementById('podCurrent').textContent = fmt(a.currentTime);
var pct = a.duration ? (a.currentTime/a.duration)*100 : 0;
document.getElementById('podProgress').style.width = pct+'%';
});
a.addEventListener('progress', function(){
if(a.buffered.length>0){
var pct = (a.buffered.end(a.buffered.length-1)/a.duration)*100;
document.getElementById('podBuffered').style.width = pct+'%';
}
});
a.addEventListener('ended', function(){
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
});
window.podToggle = function(){
if(a.paused){a.play();document.getElementById('podIconPlay').style.display='none';document.getElementById('podIconPause').style.display='';}
else{a.pause();document.getElementById('podIconPlay').style.display='';document.getElementById('podIconPause').style.display='none';}
};
window.podSkip = function(s){a.currentTime = Math.max(0,Math.min(a.duration||0,a.currentTime+s));};
window.podSeek = function(e){
var rect = document.getElementById('podBarBg').getBoundingClientRect();
var pct = (e.clientX - rect.left)/rect.width;
a.currentTime = pct * (a.duration||0);
};
window.podMute = function(){
a.muted = !a.muted;
document.getElementById('podVolume').value = a.muted ? 0 : a.volume;
};
window.podCycleSpeed = function(){
si = (si+1) % speeds.length;
a.playbackRate = speeds[si];
document.getElementById('podSpeedBtn').textContent = speeds[si]+'x';
};
window.podClose = function(){
overlay.classList.add('pod-closing');
setTimeout(function(){ overlay.style.display='none'; }, 300);
a.pause();
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
};
document.getElementById('podVolume').addEventListener('input', function(){
a.volume = this.value;
a.muted = false;
});
if(window.location.hash === '#podcast-player'){
overlay.style.display = 'block';
a.preload = 'metadata';
a.load();
opened = true;
}
})();
&lt;/script></description></item></channel></rss>