<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>DiD | Carlos Mendez</title><link>https://carlos-mendez.org/category/did/</link><atom:link href="https://carlos-mendez.org/category/did/index.xml" rel="self" type="application/rss+xml"/><description>DiD</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>Carlos Mendez</copyright><lastBuildDate>Mon, 27 Apr 2026 00:00:00 +0000</lastBuildDate><image><url>https://carlos-mendez.org/media/icon_huedfae549300b4ca5d201a9bd09a3ecd5_79625_512x512_fill_lanczos_center_3.png</url><title>DiD</title><link>https://carlos-mendez.org/category/did/</link></image><item><title>Introduction to Difference-in-Differences (DiD) in Python</title><link>https://carlos-mendez.org/post/python_did101/</link><pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/python_did101/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>How much does an after-school tutoring program improve student performance? A school district implemented a new after-school tutoring program in 10 of its 35 high schools. After one year, the average GPA in tutored schools jumped from &lt;strong>60.17&lt;/strong> to &lt;strong>96.37&lt;/strong> — a staggering &lt;strong>36.20-point&lt;/strong> increase. Case closed?&lt;/p>
&lt;p>Not quite. Over the same period, GPA also rose in the 25 schools that &lt;em>did not&lt;/em> receive the program — from &lt;strong>71.22&lt;/strong> to &lt;strong>82.10&lt;/strong>. Some of the improvement in tutored schools simply reflects a region-wide upward trend. &lt;strong>Difference-in-Differences (DiD)&lt;/strong> strips away that common trend and reveals the tutoring program&amp;rsquo;s true causal effect: an &lt;strong>ATT of approximately 25.32 GPA points&lt;/strong>.&lt;/p>
&lt;p>This tutorial walks through DiD estimation in Python using &lt;a href="https://pyfixest.org/" target="_blank" rel="noopener">PyFixest&lt;/a> — a fast, Stata-flavored econometrics package — alongside &lt;a href="https://posit-dev.github.io/great-tables/" target="_blank" rel="noopener">Great Tables&lt;/a> for publication-quality output. We use the simulated case study from &lt;a href="https://doi.org/10.1007/s12564-024-09984-9" target="_blank" rel="noopener">Corral and Yang (2024)&lt;/a>, the same dataset used in the &lt;a href="https://carlos-mendez.org/post/stata_did/">Stata companion tutorial&lt;/a>.&lt;/p>
&lt;h3 id="11-learning-objectives">1.1 Learning objectives&lt;/h3>
&lt;p>By the end of this tutorial, you will be able to:&lt;/p>
&lt;ul>
&lt;li>Explain why &lt;strong>naive before-after comparisons overstate&lt;/strong> treatment effects&lt;/li>
&lt;li>Compute &lt;strong>2×2 DiD manually&lt;/strong> and via PyFixest&amp;rsquo;s &lt;code>feols()&lt;/code> function&lt;/li>
&lt;li>Estimate DiD using &lt;strong>multiple equivalent approaches&lt;/strong> with a unified formula syntax&lt;/li>
&lt;li>Compare inference under &lt;strong>iid, HC1, CRV1, and CRV3&lt;/strong> standard errors&lt;/li>
&lt;li>Build &lt;strong>publication-quality regression tables&lt;/strong> with &lt;code>etable()&lt;/code> and Great Tables&lt;/li>
&lt;li>Estimate and plot &lt;strong>event study models&lt;/strong> with &lt;code>i()&lt;/code> for dynamic treatment effects&lt;/li>
&lt;/ul>
&lt;h3 id="12-study-design">1.2 Study design&lt;/h3>
&lt;pre>&lt;code class="language-mermaid">graph LR
subgraph &amp;quot;Case Study Setting&amp;quot;
A[&amp;quot;&amp;lt;b&amp;gt;35 High Schools&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;in One Region&amp;quot;]
B[&amp;quot;&amp;lt;b&amp;gt;10 Treated Schools&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(tutoring program)&amp;quot;]
C[&amp;quot;&amp;lt;b&amp;gt;25 Comparison Schools&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(no program)&amp;quot;]
A --&amp;gt; B
A --&amp;gt; C
end
subgraph &amp;quot;DiD Design&amp;quot;
D[&amp;quot;&amp;lt;b&amp;gt;Pre-Program&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;GPA at baseline&amp;quot;]
E[&amp;quot;&amp;lt;b&amp;gt;Post-Program&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;GPA after intervention&amp;quot;]
F[&amp;quot;&amp;lt;b&amp;gt;DiD Estimate&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;ATT = 25.32&amp;quot;]
D --&amp;gt; E --&amp;gt; F
end
subgraph &amp;quot;Estimation Methods&amp;quot;
G[&amp;quot;&amp;lt;b&amp;gt;Manual 2x2&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Subtraction&amp;quot;]
H[&amp;quot;&amp;lt;b&amp;gt;TWFE Regression&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;PyFixest feols()&amp;quot;]
I[&amp;quot;&amp;lt;b&amp;gt;Event Study&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Dynamic effects&amp;quot;]
G --&amp;gt; H --&amp;gt; I
end
C --&amp;gt; D
F --&amp;gt; G
style A fill:#6a9bcc,stroke:#141413,color:#fff
style B fill:#d97757,stroke:#141413,color:#fff
style C fill:#6a9bcc,stroke:#141413,color:#fff
style F fill:#00d4c8,stroke:#141413,color:#fff
style I fill:#00d4c8,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>The data has a clean &lt;strong>panel structure&lt;/strong>: each of the 35 schools is observed in two time periods (pre and post), giving us 70 observations for the 2×2 design. A second dataset extends this to 8 periods (280 observations) for the event study analysis.&lt;/p>
&lt;hr>
&lt;style>
.podcast-overlay {
display: none;
position: fixed;
bottom: 0;
left: 0;
right: 0;
z-index: 9999;
animation: podSlideUp 0.35s ease-out;
}
@keyframes podSlideUp {
from { transform: translateY(100%); }
to { transform: translateY(0); }
}
.podcast-overlay.pod-closing {
animation: podSlideDown 0.3s ease-in forwards;
}
@keyframes podSlideDown {
from { transform: translateY(0); }
to { transform: translateY(100%); }
}
.podcast-container {
background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%);
padding: 18px 24px 20px;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
box-shadow: 0 -4px 32px rgba(0,0,0,0.5);
border-top: 1px solid rgba(106,155,204,0.2);
}
.podcast-inner {
max-width: 800px;
margin: 0 auto;
}
.podcast-top-row {
display: flex;
align-items: center;
gap: 14px;
margin-bottom: 14px;
}
.podcast-icon {
width: 42px;
height: 42px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 10px;
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
}
.podcast-icon svg {
width: 22px;
height: 22px;
fill: #fff;
}
.podcast-title-block {
flex: 1;
min-width: 0;
}
.podcast-title-block h4 {
margin: 0 0 1px 0;
color: #f0ece2;
font-size: 14px;
font-weight: 600;
letter-spacing: 0.02em;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.podcast-title-block span {
color: #8b9dc3;
font-size: 11px;
}
.podcast-close-btn {
background: none;
border: none;
cursor: pointer;
padding: 6px;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.2s;
flex-shrink: 0;
}
.podcast-close-btn:hover {
background: rgba(255,255,255,0.1);
}
.podcast-close-btn svg {
width: 20px;
height: 20px;
fill: #8b9dc3;
}
.podcast-progress-wrap {
margin-bottom: 12px;
}
.podcast-time-row {
display: flex;
justify-content: space-between;
font-size: 11px;
color: #8b9dc3;
margin-bottom: 5px;
font-variant-numeric: tabular-nums;
}
.podcast-bar-bg {
width: 100%;
height: 6px;
background: rgba(255,255,255,0.1);
border-radius: 3px;
cursor: pointer;
position: relative;
overflow: hidden;
transition: height 0.15s;
}
.podcast-bar-buffered {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: rgba(106,155,204,0.25);
border-radius: 3px;
transition: width 0.3s;
}
.podcast-bar-progress {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: linear-gradient(90deg, #6a9bcc, #00d4c8);
border-radius: 3px;
transition: width 0.1s linear;
}
.podcast-bar-bg:hover {
height: 10px;
margin-top: -2px;
}
.podcast-controls-row {
display: flex;
align-items: center;
justify-content: space-between;
}
.podcast-transport {
display: flex;
align-items: center;
gap: 8px;
}
.podcast-btn {
background: none;
border: none;
cursor: pointer;
padding: 4px;
display: flex;
align-items: center;
justify-content: center;
border-radius: 50%;
transition: all 0.2s;
}
.podcast-btn svg {
fill: #c8d0e0;
transition: fill 0.2s;
}
.podcast-btn:hover svg {
fill: #f0ece2;
}
.podcast-btn-skip {
position: relative;
}
.podcast-btn-skip span {
position: absolute;
font-size: 7px;
font-weight: 700;
color: #c8d0e0;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
pointer-events: none;
margin-top: 1px;
}
.podcast-btn-play {
width: 48px;
height: 48px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 50%;
box-shadow: 0 3px 12px rgba(217,119,87,0.4);
transition: all 0.2s;
}
.podcast-btn-play:hover {
transform: scale(1.08);
box-shadow: 0 5px 20px rgba(217,119,87,0.5);
}
.podcast-btn-play svg {
fill: #fff;
width: 22px;
height: 22px;
}
.podcast-extras {
display: flex;
align-items: center;
gap: 10px;
}
.podcast-volume-wrap {
display: flex;
align-items: center;
gap: 5px;
}
.podcast-volume-wrap svg {
fill: #8b9dc3;
width: 16px;
height: 16px;
cursor: pointer;
flex-shrink: 0;
}
.podcast-volume-wrap svg:hover {
fill: #c8d0e0;
}
.podcast-volume-slider {
-webkit-appearance: none;
appearance: none;
width: 60px;
height: 4px;
background: rgba(255,255,255,0.12);
border-radius: 2px;
outline: none;
cursor: pointer;
}
.podcast-volume-slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 12px;
height: 12px;
background: #6a9bcc;
border-radius: 50%;
cursor: pointer;
}
.podcast-speed-btn {
background: rgba(255,255,255,0.08);
border: 1px solid rgba(255,255,255,0.12);
color: #c8d0e0;
font-size: 11px;
font-weight: 600;
padding: 3px 9px;
border-radius: 12px;
cursor: pointer;
transition: all 0.2s;
font-family: inherit;
min-width: 40px;
text-align: center;
}
.podcast-speed-btn:hover {
background: rgba(106,155,204,0.2);
border-color: #6a9bcc;
color: #f0ece2;
}
.podcast-download-btn {
background: none;
border: 1px solid rgba(255,255,255,0.12);
border-radius: 8px;
padding: 4px 10px;
cursor: pointer;
display: flex;
align-items: center;
gap: 4px;
color: #8b9dc3;
font-size: 11px;
font-family: inherit;
text-decoration: none;
transition: all 0.2s;
}
.podcast-download-btn:hover {
border-color: #6a9bcc;
color: #f0ece2;
background: rgba(106,155,204,0.1);
}
.podcast-download-btn svg {
width: 14px;
height: 14px;
fill: currentColor;
}
@media (max-width: 600px) {
.podcast-container { padding: 14px 16px 16px; }
.podcast-volume-wrap { display: none; }
.podcast-title-block h4 { font-size: 13px; }
.podcast-extras { gap: 8px; }
}
/* Video player overlay */
.video-overlay {
display: none;
position: fixed;
top: 0;
left: 0;
right: 0;
bottom: 0;
z-index: 9999;
background: rgba(0,0,0,0.85);
animation: vidFadeIn 0.3s ease-out;
}
@keyframes vidFadeIn {
from { opacity: 0; }
to { opacity: 1; }
}
.video-overlay.vid-closing {
animation: vidFadeOut 0.25s ease-in forwards;
}
@keyframes vidFadeOut {
from { opacity: 1; }
to { opacity: 0; }
}
.video-container {
position: absolute;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
width: 94%;
max-width: 1600px;
}
.video-top-row {
display: flex;
align-items: center;
justify-content: space-between;
margin-bottom: 10px;
}
.video-top-row h4 {
margin: 0;
color: #f0ece2;
font-size: 15px;
font-weight: 600;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
display: flex;
align-items: center;
gap: 10px;
}
.video-icon {
width: 34px;
height: 34px;
background: #ff0000;
border-radius: 8px;
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
}
.video-icon svg {
width: 18px;
height: 18px;
fill: #fff;
}
.video-close-btn {
background: none;
border: none;
cursor: pointer;
padding: 6px;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.2s;
}
.video-close-btn:hover {
background: rgba(255,255,255,0.15);
}
.video-close-btn svg {
width: 24px;
height: 24px;
fill: #c8d0e0;
}
.video-frame-wrap {
position: relative;
padding-bottom: 56.25%;
height: 0;
overflow: hidden;
border-radius: 8px;
background: #000;
box-shadow: 0 8px 40px rgba(0,0,0,0.6);
}
.video-frame-wrap iframe {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
border: 0;
border-radius: 8px;
}
@media (max-width: 600px) {
.video-container { width: 98%; }
.video-top-row h4 { font-size: 13px; }
}
&lt;/style>
&lt;div class="podcast-overlay" id="podOverlay">
&lt;div class="podcast-container">
&lt;div class="podcast-inner">
&lt;audio id="podAudio" preload="none" src="https://files.catbox.moe/s6tyrz.wav">&lt;/audio>
&lt;div class="podcast-top-row">
&lt;div class="podcast-icon">
&lt;svg viewBox="0 0 24 24">&lt;path d="M12 1a5 5 0 0 0-5 5v4a5 5 0 0 0 10 0V6a5 5 0 0 0-5-5zm0 16a7 7 0 0 1-7-7H3a9 9 0 0 0 8 8.94V22h2v-3.06A9 9 0 0 0 21 10h-2a7 7 0 0 1-7 7z"/>&lt;/svg>
&lt;/div>
&lt;div class="podcast-title-block">
&lt;h4>AI Podcast: Introduction to DiD&lt;/h4>
&lt;span id="podDurationLabel">Click play to load&lt;/span>
&lt;/div>
&lt;button class="podcast-close-btn" onclick="podClose()" title="Close player">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"/>&lt;/svg>
&lt;/button>
&lt;/div>
&lt;div class="podcast-progress-wrap">
&lt;div class="podcast-time-row">
&lt;span id="podCurrent">0:00&lt;/span>
&lt;span id="podDuration">0:00&lt;/span>
&lt;/div>
&lt;div class="podcast-bar-bg" id="podBarBg" onclick="podSeek(event)">
&lt;div class="podcast-bar-buffered" id="podBuffered">&lt;/div>
&lt;div class="podcast-bar-progress" id="podProgress">&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class="podcast-controls-row">
&lt;div class="podcast-transport">
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(-15)" title="Back 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1L7 6l5 5V7c3.31 0 6 2.69 6 6s-2.69 6-6 6-6-2.69-6-6H4c0 4.42 3.58 8 8 8s8-3.58 8-8-3.58-8-8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-play" id="podPlayBtn" onclick="podToggle()" title="Play">
&lt;svg id="podIconPlay" viewBox="0 0 24 24">&lt;path d="M8 5v14l11-7z"/>&lt;/svg>
&lt;svg id="podIconPause" viewBox="0 0 24 24" style="display:none">&lt;path d="M6 19h4V5H6v14zm8-14v14h4V5h-4z"/>&lt;/svg>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(15)" title="Forward 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1l5 5-5 5V7c-3.31 0-6 2.69-6 6s2.69 6 6 6 6-2.69 6-6h2c0 4.42-3.58 8-8 8s-8-3.58-8-8 3.58-8 8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;/div>
&lt;div class="podcast-extras">
&lt;div class="podcast-volume-wrap">
&lt;svg id="podVolIcon" onclick="podMute()" viewBox="0 0 24 24">&lt;path d="M3 9v6h4l5 5V4L7 9H3zm13.5 3A4.5 4.5 0 0 0 14 8.5v7a4.47 4.47 0 0 0 2.5-3.5zM14 3.23v2.06a6.51 6.51 0 0 1 0 13.42v2.06A8.51 8.51 0 0 0 14 3.23z"/>&lt;/svg>
&lt;input type="range" class="podcast-volume-slider" id="podVolume" min="0" max="1" step="0.05" value="0.8">
&lt;/div>
&lt;button class="podcast-speed-btn" id="podSpeedBtn" onclick="podCycleSpeed()" title="Playback speed">1x&lt;/button>
&lt;a class="podcast-download-btn" href="https://files.catbox.moe/s6tyrz.wav" download="did101_podcast.wav" title="Download">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 9h-4V3H9v6H5l7 7 7-7zM5 18v2h14v-2H5z"/>&lt;/svg>
&lt;/a>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;script>
(function(){
var overlay = document.getElementById('podOverlay');
var a = document.getElementById('podAudio');
var speeds = [0.75, 1, 1.25, 1.5, 2];
var si = 1;
var opened = false;
function fmt(s){
if(isNaN(s)) return '0:00';
var m=Math.floor(s/60), sec=Math.floor(s%60);
return m+':'+(sec&lt;10?'0':'')+sec;
}
document.addEventListener('click', function(e){
var link = e.target.closest('a.btn-page-header');
if(!link) return;
var text = link.textContent.trim();
if(text.indexOf('AI Podcast') === -1) return;
e.preventDefault();
e.stopPropagation();
overlay.style.display = 'block';
overlay.classList.remove('pod-closing');
if(!opened){
a.preload = 'metadata';
a.load();
opened = true;
}
});
a.volume = 0.8;
a.addEventListener('loadedmetadata', function(){
document.getElementById('podDuration').textContent = fmt(a.duration);
document.getElementById('podDurationLabel').textContent = fmt(a.duration) + ' minutes';
});
a.addEventListener('timeupdate', function(){
document.getElementById('podCurrent').textContent = fmt(a.currentTime);
var pct = a.duration ? (a.currentTime/a.duration)*100 : 0;
document.getElementById('podProgress').style.width = pct+'%';
});
a.addEventListener('progress', function(){
if(a.buffered.length>0){
var pct = (a.buffered.end(a.buffered.length-1)/a.duration)*100;
document.getElementById('podBuffered').style.width = pct+'%';
}
});
a.addEventListener('ended', function(){
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
});
window.podToggle = function(){
if(a.paused){a.play();document.getElementById('podIconPlay').style.display='none';document.getElementById('podIconPause').style.display='';}
else{a.pause();document.getElementById('podIconPlay').style.display='';document.getElementById('podIconPause').style.display='none';}
};
window.podSkip = function(s){a.currentTime = Math.max(0,Math.min(a.duration||0,a.currentTime+s));};
window.podSeek = function(e){
var rect = document.getElementById('podBarBg').getBoundingClientRect();
var pct = (e.clientX - rect.left)/rect.width;
a.currentTime = pct * (a.duration||0);
};
window.podMute = function(){
a.muted = !a.muted;
document.getElementById('podVolume').value = a.muted ? 0 : a.volume;
};
window.podCycleSpeed = function(){
si = (si+1) % speeds.length;
a.playbackRate = speeds[si];
document.getElementById('podSpeedBtn').textContent = speeds[si]+'x';
};
window.podClose = function(){
overlay.classList.add('pod-closing');
setTimeout(function(){ overlay.style.display='none'; }, 300);
a.pause();
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
};
document.getElementById('podVolume').addEventListener('input', function(){
a.volume = this.value;
a.muted = false;
});
if(window.location.hash === '#podcast-player'){
overlay.style.display = 'block';
a.preload = 'metadata';
a.load();
opened = true;
}
})();
&lt;/script>
&lt;div class="video-overlay" id="vidOverlay">
&lt;div class="video-container">
&lt;div class="video-top-row">
&lt;h4>
&lt;span class="video-icon">
&lt;svg viewBox="0 0 24 24">&lt;path d="M10 15l5.19-3L10 9v6m11.56-7.83c.13.47.22 1.1.28 1.9.07.8.1 1.49.1 2.09L22 12c0 2.19-.16 3.8-.44 4.83-.25.9-.83 1.48-1.73 1.73-.47.13-1.33.22-2.65.28-1.3.07-2.49.1-3.59.1L12 19c-4.19 0-6.8-.16-7.83-.44-.9-.25-1.48-.83-1.73-1.73-.13-.47-.22-1.1-.28-1.9-.07-.8-.1-1.49-.1-2.09L2 12c0-2.19.16-3.8.44-4.83.25-.9.83-1.48 1.73-1.73.47-.13 1.33-.22 2.65-.28 1.3-.07 2.49-.1 3.59-.1L12 5c4.19 0 6.8.16 7.83.44.9.25 1.48.83 1.73 1.73z"/>&lt;/svg>
&lt;/span>
AI Video: Introduction to DiD
&lt;/h4>
&lt;button class="video-close-btn" onclick="vidClose()" title="Close video">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"/>&lt;/svg>
&lt;/button>
&lt;/div>
&lt;div class="video-frame-wrap">
&lt;iframe id="vidFrame" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen>&lt;/iframe>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;script>
(function(){
var overlay = document.getElementById('vidOverlay');
var frame = document.getElementById('vidFrame');
var vidSrc = 'https://www.youtube.com/embed/qObP9bGU5rM?enablejsapi=1&amp;rel=0';
function vidOpen(){
frame.src = vidSrc;
overlay.style.display = 'block';
overlay.classList.remove('vid-closing');
}
window.vidClose = function(){
overlay.classList.add('vid-closing');
setTimeout(function(){
overlay.style.display = 'none';
frame.src = '';
}, 250);
};
document.addEventListener('click', function(e){
var link = e.target.closest('a.btn-page-header');
if(!link) return;
var text = link.textContent.trim();
if(text.indexOf('AI Video') === -1) return;
e.preventDefault();
e.stopPropagation();
vidOpen();
});
overlay.addEventListener('click', function(e){
if(e.target === overlay) vidClose();
});
if(window.location.hash === '#video-player'){
vidOpen();
}
})();
&lt;/script>
&lt;hr>
&lt;h2 id="2-setup-and-imports">2. Setup and Imports&lt;/h2>
&lt;p>Install the required packages:&lt;/p>
&lt;pre>&lt;code class="language-python">pip install pyfixest great_tables pandas matplotlib
&lt;/code>&lt;/pre>
&lt;p>Import the libraries:&lt;/p>
&lt;pre>&lt;code class="language-python">import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pyfixest as pf
from great_tables import GT, md, style, loc
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Package&lt;/th>
&lt;th>Purpose&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>pyfixest&lt;/code>&lt;/td>
&lt;td>Fast fixed-effects estimation with Stata-like formula syntax&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>great_tables&lt;/code>&lt;/td>
&lt;td>Publication-quality HTML/PNG tables from DataFrames&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>pandas&lt;/code>&lt;/td>
&lt;td>Data loading, manipulation, and summary statistics&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>matplotlib&lt;/code>&lt;/td>
&lt;td>Custom figure generation with dark theme styling&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;details>
&lt;summary>&lt;strong>Dark theme figure styling&lt;/strong> (click to expand)&lt;/summary>
&lt;pre>&lt;code class="language-python"># Site color palette
STEEL_BLUE = &amp;quot;#6a9bcc&amp;quot;
WARM_ORANGE = &amp;quot;#d97757&amp;quot;
NEAR_BLACK = &amp;quot;#141413&amp;quot;
TEAL = &amp;quot;#00d4c8&amp;quot;
# Dark theme palette
DARK_NAVY = &amp;quot;#0f1729&amp;quot;
GRID_LINE = &amp;quot;#1f2b5e&amp;quot;
LIGHT_TEXT = &amp;quot;#c8d0e0&amp;quot;
WHITE_TEXT = &amp;quot;#e8ecf2&amp;quot;
plt.rcParams.update({
&amp;quot;figure.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.edgecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.linewidth&amp;quot;: 0,
&amp;quot;axes.labelcolor&amp;quot;: LIGHT_TEXT,
&amp;quot;axes.titlecolor&amp;quot;: WHITE_TEXT,
&amp;quot;axes.spines.top&amp;quot;: False,
&amp;quot;axes.spines.right&amp;quot;: False,
&amp;quot;axes.spines.left&amp;quot;: False,
&amp;quot;axes.spines.bottom&amp;quot;: False,
&amp;quot;axes.grid&amp;quot;: True,
&amp;quot;grid.color&amp;quot;: GRID_LINE,
&amp;quot;grid.linewidth&amp;quot;: 0.6,
&amp;quot;grid.alpha&amp;quot;: 0.8,
&amp;quot;xtick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;ytick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;text.color&amp;quot;: WHITE_TEXT,
&amp;quot;font.size&amp;quot;: 12,
&amp;quot;legend.frameon&amp;quot;: False,
&amp;quot;savefig.facecolor&amp;quot;: DARK_NAVY,
})
&lt;/code>&lt;/pre>
&lt;/details>
&lt;h2 id="3-data-loading-and-exploration">3. Data Loading and Exploration&lt;/h2>
&lt;p>We load the 2×2 dataset directly from GitHub. This Stata &lt;code>.dta&lt;/code> file contains 35 schools observed across 2 time periods:&lt;/p>
&lt;pre>&lt;code class="language-python">url_did = &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/isds/tutoring_did.dta&amp;quot;
df = pd.read_stata(url_did).astype(float)
print(df.shape)
print(df.dtypes)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">(70, 7)
id float64
time float64
treated float64
post float64
txp float64
gpa float64
female_share float64
dtype: object
&lt;/code>&lt;/pre>
&lt;p>The dataset has &lt;strong>70 observations&lt;/strong> (35 schools × 2 periods) and &lt;strong>7 variables&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>&lt;code>id&lt;/code> — School identifier (1–35)&lt;/li>
&lt;li>&lt;code>time&lt;/code> — Time period (1 = pre, 2 = post)&lt;/li>
&lt;li>&lt;code>treated&lt;/code> — Treatment indicator (1 = received tutoring program)&lt;/li>
&lt;li>&lt;code>post&lt;/code> — Post-period indicator (1 = after program implementation)&lt;/li>
&lt;li>&lt;code>txp&lt;/code> — Interaction term (treated × post)&lt;/li>
&lt;li>&lt;code>gpa&lt;/code> — Outcome: average GPA of low-income students (0–100 scale)&lt;/li>
&lt;li>&lt;code>female_share&lt;/code> — Share of female students (covariate)&lt;/li>
&lt;/ul>
&lt;pre>&lt;code class="language-python">print(df.describe().round(2))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> id time treated post txp gpa female_share
count 70.00 70.0 70.00 70.0 70.00 70.00 70.00
mean 18.00 1.5 0.29 0.5 0.14 77.12 0.53
std 10.17 0.5 0.46 0.5 0.35 10.88 0.03
min 1.00 1.0 0.00 0.0 0.00 59.39 0.47
25% 9.25 1.0 0.00 0.0 0.00 70.68 0.51
50% 18.00 1.5 0.00 0.5 0.00 76.27 0.53
75% 26.75 2.0 1.00 1.0 0.00 82.66 0.55
max 35.00 2.0 1.00 1.0 1.00 99.15 0.57
&lt;/code>&lt;/pre>
&lt;p>A crosstab confirms the balanced 2×2 design:&lt;/p>
&lt;pre>&lt;code class="language-python">ct = pd.crosstab(df[&amp;quot;treated&amp;quot;], df[&amp;quot;post&amp;quot;], margins=True)
print(ct)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Pre (0) Post (1) Total
Comparison (0) 25 25 50
Treated (1) 10 10 20
Total 35 35 70
&lt;/code>&lt;/pre>
&lt;p>We have &lt;strong>10 treated schools&lt;/strong> observed in 2 periods (20 observations) and &lt;strong>25 comparison schools&lt;/strong> (50 observations). This is a perfectly balanced panel — every school appears exactly once in each period.&lt;/p>
&lt;h3 id="31-panel-structure-visualization">3.1 Panel structure visualization&lt;/h3>
&lt;p>The heatmap below shows the treatment assignment across schools and time. Steel blue cells represent the comparison group, while orange cells indicate treated schools in the post-program period.&lt;/p>
&lt;p>&lt;img src="did101_panelview.png" alt="Panel structure showing 35 schools across 2 time periods. Treated schools (10) switch from light orange to dark orange after the intervention, while comparison schools (25) remain in steel blue.">&lt;/p>
&lt;p>This is a &lt;em>clean&lt;/em> 2×2 design: treatment timing is simultaneous (all 10 schools receive the program at the same time), and no school switches treatment status.&lt;/p>
&lt;h2 id="4-the-problem-with-naive-comparisons">4. The Problem with Naive Comparisons&lt;/h2>
&lt;p>The most intuitive approach to measuring the program&amp;rsquo;s effect is a simple before-after comparison for the treated schools:&lt;/p>
&lt;pre>&lt;code class="language-python">treated_means = df[df[&amp;quot;treated&amp;quot;] == 1].groupby(&amp;quot;post&amp;quot;)[&amp;quot;gpa&amp;quot;].mean()
print(f&amp;quot;Pre-program: {treated_means[0]:.2f}&amp;quot;)
print(f&amp;quot;Post-program: {treated_means[1]:.2f}&amp;quot;)
print(f&amp;quot;Naive change: {treated_means[1] - treated_means[0]:.2f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Pre-program: 60.17
Post-program: 96.37
Naive change: 36.20
&lt;/code>&lt;/pre>
&lt;p>The naive estimate says the program boosted GPA by &lt;strong>36.20 points&lt;/strong>. But this ignores everything else that may have changed over the same period — curriculum reforms, new textbooks, regional economic shifts, or simply students maturing. Any of these factors could drive GPA upward in &lt;em>all&lt;/em> schools, not just the treated ones.&lt;/p>
&lt;p>&lt;img src="did101_its.png" alt="Naive before-after comparison showing the treated group&amp;rsquo;s GPA rising from 60.17 to 96.37. The entire 36.20-point increase is attributed to the program, ignoring secular trends.">&lt;/p>
&lt;p>The naive approach &lt;em>overstates&lt;/em> the effect by conflating the treatment effect with time trends that would have occurred regardless of the program.&lt;/p>
&lt;h2 id="5-the-did-design-using-a-comparison-group">5. The DiD Design: Using a Comparison Group&lt;/h2>
&lt;p>The key insight of DiD is to use the &lt;strong>comparison group&lt;/strong> as a mirror for what would have happened to the treated schools &lt;em>without&lt;/em> the program. We compute all four group means:&lt;/p>
&lt;pre>&lt;code class="language-python">means = df.groupby([&amp;quot;treated&amp;quot;, &amp;quot;post&amp;quot;])[&amp;quot;gpa&amp;quot;].mean()
pre_control = means[(0, 0)] # 71.22
post_control = means[(0, 1)] # 82.10
pre_treated = means[(1, 0)] # 60.17
post_treated = means[(1, 1)] # 96.37
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Group means:
Comparison Pre: 71.22
Comparison Post: 82.10
Treated Pre: 60.17
Treated Post: 96.37
&lt;/code>&lt;/pre>
&lt;p>The comparison schools' GPA rose by &lt;strong>10.88 points&lt;/strong> (from 71.22 to 82.10) — this is the &lt;em>secular trend&lt;/em>. We assume the treated schools would have experienced the same trend absent the program. This gives us the &lt;strong>counterfactual&lt;/strong>:&lt;/p>
&lt;pre>&lt;code class="language-python">counterfactual = pre_treated + (post_control - pre_control)
did_estimate = post_treated - counterfactual
print(f&amp;quot;Counterfactual: {pre_treated:.2f} + ({post_control:.2f} - {pre_control:.2f}) = {counterfactual:.2f}&amp;quot;)
print(f&amp;quot;DiD estimate: {post_treated:.2f} - {counterfactual:.2f} = {did_estimate:.2f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Counterfactual: 60.17 + (82.10 - 71.22) = 71.05
DiD estimate: 96.37 - 71.05 = 25.32
&lt;/code>&lt;/pre>
&lt;p>The causal effect of the tutoring program is &lt;strong>25.32 GPA points&lt;/strong> — not 36.20. The naive approach overstated the effect by &lt;strong>43%&lt;/strong> because it attributed the 10.88-point common trend entirely to the program.&lt;/p>
&lt;p>&lt;img src="did101_counterfactual.png" alt="DiD design showing three lines: the comparison group (steel blue, 71.22 to 82.10), the treated group (orange, 60.17 to 96.37), and the counterfactual path (teal dashed, 60.17 to 71.05). The DiD estimate of 25.32 is the gap between the actual and counterfactual treated outcomes.">&lt;/p>
&lt;h3 id="51-the-parallel-trends-assumption">5.1 The parallel trends assumption&lt;/h3>
&lt;p>DiD rests on one critical assumption: &lt;strong>parallel trends&lt;/strong>. In the absence of treatment, treated and comparison groups would have followed the &lt;em>same trajectory&lt;/em> over time. Formally:&lt;/p>
&lt;p>$$E[Y_{i,1}(0) - Y_{i,0}(0) \mid D=1] = E[Y_{i,1}(0) - Y_{i,0}(0) \mid D=0]$$&lt;/p>
&lt;p>In words: the &lt;em>change&lt;/em> in potential untreated outcomes is the same for both groups. Think of it like two runners on parallel tracks — they may start at different positions (treated schools have lower baseline GPA), but they run at the same pace. If one runner suddenly speeds up after receiving coaching, the difference between their new speed and the other runner&amp;rsquo;s speed measures the coaching effect.&lt;/p>
&lt;p>Note what parallel trends does &lt;em>not&lt;/em> require: the two groups do not need the same &lt;em>level&lt;/em> of GPA, only the same &lt;em>trend&lt;/em>. This is why DiD is powerful — it naturally handles time-invariant differences between groups (like school quality or student demographics).&lt;/p>
&lt;h3 id="52-sutva">5.2 SUTVA&lt;/h3>
&lt;p>The &lt;strong>Stable Unit Treatment Value Assumption (SUTVA)&lt;/strong> requires that one school&amp;rsquo;s treatment does not affect another school&amp;rsquo;s outcome. If untreated schools lost students to tutored schools, or if tutored schools drew resources away from comparison schools, the DiD estimate would be biased. In this setting, schools serve distinct geographic catchments, making spillovers unlikely.&lt;/p>
&lt;h2 id="6-manual-did-calculation">6. Manual DiD Calculation&lt;/h2>
&lt;p>We can organize the four group means into a 2×2 table and compute the DiD as a &lt;em>double difference&lt;/em>:&lt;/p>
&lt;pre>&lt;code class="language-python">means_table = df.groupby([&amp;quot;treated&amp;quot;, &amp;quot;post&amp;quot;])[&amp;quot;gpa&amp;quot;].mean().unstack()
means_table[&amp;quot;Difference&amp;quot;] = means_table[1.0] - means_table[0.0]
print(means_table.round(2))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Pre (0) Post (1) Difference
Comparison (0) 71.22 82.10 10.88
Treated (1) 60.17 96.37 36.20
&lt;/code>&lt;/pre>
&lt;p>The DiD formula takes the &lt;em>difference of differences&lt;/em>:&lt;/p>
&lt;p>$$DiD = \Big(E[Y_{i,1} \mid D=1] - E[Y_{i,0} \mid D=1]\Big) - \Big(E[Y_{i,1} \mid D=0] - E[Y_{i,0} \mid D=0]\Big)$$&lt;/p>
&lt;p>Plugging in the numbers:&lt;/p>
&lt;p>$$DiD = (96.37 - 60.17) - (82.10 - 71.22) = 36.20 - 10.88 = 25.32$$&lt;/p>
&lt;p>Think of it this way: the treated schools improved by 36.20 points, but 10.88 of those points would have happened anyway (as evidenced by the comparison group). The remaining &lt;strong>25.32 points&lt;/strong> is the causal effect of the tutoring program.&lt;/p>
&lt;p>Going back to the runner analogy: the treated runner sped up by 36.20 units while the comparison runner sped up by 10.88. The coaching effect is the extra 25.32 units of speed that only the coached runner gained.&lt;/p>
&lt;p>&lt;img src="did101_diff_plot.png" alt="Manual DiD calculation showing both groups with labeled means. The comparison group change (10.88) represents the secular trend, while the treated group change (36.20) combines the trend and the treatment effect. The DiD of 25.32 isolates the causal effect.">&lt;/p>
&lt;h2 id="7-did-via-regression">7. DiD via Regression&lt;/h2>
&lt;h3 id="71-classical-ols-with-interaction">7.1 Classical OLS with interaction&lt;/h3>
&lt;p>The manual calculation is equivalent to an OLS regression with the treatment indicator, time indicator, and their interaction:&lt;/p>
&lt;p>$$Y_{it} = \alpha + \beta_1 \text{Treat}_i + \beta_2 \text{Post}_t + \beta_3 (\text{Treat}_i \times \text{Post}_t) + \varepsilon_{it}$$&lt;/p>
&lt;p>Where:&lt;/p>
&lt;ul>
&lt;li>$\alpha$ is the comparison group&amp;rsquo;s pre-period mean (intercept)&lt;/li>
&lt;li>$\beta_1$ captures the baseline difference between groups&lt;/li>
&lt;li>$\beta_2$ captures the common time trend&lt;/li>
&lt;li>$\beta_3$ is the &lt;strong>DiD estimate&lt;/strong> — the causal effect of treatment&lt;/li>
&lt;/ul>
&lt;p>In PyFixest, the &lt;code>feols()&lt;/code> function handles this with a familiar formula syntax:&lt;/p>
&lt;pre>&lt;code class="language-python">fit_ols = pf.feols(&amp;quot;gpa ~ treated + post + txp&amp;quot;, data=df, vcov=&amp;quot;HC1&amp;quot;)
print(fit_ols.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: gpa, Fixed effects: 0
Inference: HC1
Observations: 70
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|--------:|
| Intercept | 71.215 | 0.218 | 326.123 | 0.000 | 70.779 | 71.651 |
| treated | -11.049 | 0.288 | -38.388 | 0.000 | -11.624 | -10.475 |
| post | 10.886 | 0.339 | 32.116 | 0.000 | 10.209 | 11.563 |
| txp | 25.315 | 0.615 | 41.164 | 0.000 | 24.087 | 26.543 |
---
RMSE: 1.15 R2: 0.989
&lt;/code>&lt;/pre>
&lt;p>Every coefficient maps directly to our group means:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Intercept (71.22)&lt;/strong> — Comparison group pre-period mean&lt;/li>
&lt;li>&lt;strong>treated (−11.05)&lt;/strong> — Treated schools start 11 points &lt;em>below&lt;/em> comparison schools&lt;/li>
&lt;li>&lt;strong>post (10.89)&lt;/strong> — Common time trend (comparison group&amp;rsquo;s improvement)&lt;/li>
&lt;li>&lt;strong>txp (25.32)&lt;/strong> — The DiD estimate, matching our manual calculation&lt;/li>
&lt;/ul>
&lt;p>The &lt;code>vcov=&amp;quot;HC1&amp;quot;&lt;/code> option requests heteroskedasticity-robust (White) standard errors, the most common choice for cross-sectional data.&lt;/p>
&lt;h3 id="72-twfe-with-fixed-effects">7.2 TWFE with fixed effects&lt;/h3>
&lt;p>A more flexible approach absorbs school-level and time-level heterogeneity using &lt;strong>two-way fixed effects (TWFE)&lt;/strong>. PyFixest uses the &lt;code>|&lt;/code> pipe syntax to specify absorbed fixed effects:&lt;/p>
&lt;p>$$Y_{it} = \beta_3 (\text{Treat}_i \times \text{Post}_t) + \gamma_i + \vartheta_t + \varepsilon_{it}$$&lt;/p>
&lt;p>Here $\gamma_i$ are school fixed effects (absorbing all time-invariant school characteristics) and $\vartheta_t$ are time fixed effects (absorbing all common time shocks). Since &lt;code>treated&lt;/code> is perfectly collinear with $\gamma_i$ and &lt;code>post&lt;/code> is perfectly collinear with $\vartheta_t$, only the interaction term &lt;code>txp&lt;/code> remains:&lt;/p>
&lt;pre>&lt;code class="language-python">fit_twfe = pf.feols(&amp;quot;gpa ~ txp | id + time&amp;quot;, data=df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;id&amp;quot;})
print(fit_twfe.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: gpa, Fixed effects: id+time
Inference: CRV1
Observations: 70
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| txp | 25.315 | 0.585 | 43.265 | 0.000 | 24.126 | 26.504 |
---
RMSE: 0.788 R2: 0.995 R2 Within: 0.981
&lt;/code>&lt;/pre>
&lt;p>The estimate is unchanged: &lt;strong>25.315&lt;/strong>. But the standard errors now use &lt;strong>CRV1 (cluster-robust variance)&lt;/strong> clustered at the school level — the appropriate choice when treatment varies at the school level and observations within the same school are correlated.&lt;/p>
&lt;p>The formula &lt;code>&amp;quot;gpa ~ txp | id + time&amp;quot;&lt;/code> is one of PyFixest&amp;rsquo;s key strengths: everything to the left of &lt;code>|&lt;/code> is estimated, everything to the right is &lt;em>absorbed&lt;/em>. No need to manually create dummy variables.&lt;/p>
&lt;h3 id="73-twfe-with-covariate">7.3 TWFE with covariate&lt;/h3>
&lt;p>We can add &lt;code>female_share&lt;/code> as a time-varying covariate to check robustness:&lt;/p>
&lt;pre>&lt;code class="language-python">fit_cov = pf.feols(&amp;quot;gpa ~ txp + female_share | id + time&amp;quot;, data=df,
vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;id&amp;quot;})
print(fit_cov.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: gpa, Fixed effects: id+time
Inference: CRV1
Observations: 70
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|--------:|
| txp | 25.328 | 0.605 | 41.881 | 0.000 | 24.099 | 26.557 |
| female_share | -3.216 | 8.700 | -0.370 | 0.714 | -20.898 | 14.465 |
---
RMSE: 0.785 R2: 0.995 R2 Within: 0.982
&lt;/code>&lt;/pre>
&lt;p>Adding &lt;code>female_share&lt;/code> barely changes the DiD estimate (25.315 → 25.328, a shift of just 0.013). The covariate itself is statistically insignificant (p = 0.714), confirming that the two-way fixed effects already capture the relevant variation. This is reassuring — the treatment effect estimate is robust to the inclusion of observable covariates.&lt;/p>
&lt;h3 id="74-programmatic-access-to-results">7.4 Programmatic access to results&lt;/h3>
&lt;p>PyFixest provides tidy methods for extracting specific quantities — useful for post-estimation workflows and building custom tables:&lt;/p>
&lt;pre>&lt;code class="language-python">print(f&amp;quot;Coefficient: {fit_twfe.coef().values[0]:.4f}&amp;quot;)
print(f&amp;quot;Std. Error: {fit_twfe.se().values[0]:.4f}&amp;quot;)
print(f&amp;quot;t-statistic: {fit_twfe.tstat().values[0]:.4f}&amp;quot;)
print(f&amp;quot;p-value: {fit_twfe.pvalue().values[0]:.4f}&amp;quot;)
print(f&amp;quot;95% CI: [{fit_twfe.confint().values[0, 0]:.2f}, {fit_twfe.confint().values[0, 1]:.2f}]&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Coefficient: 25.3149
Std. Error: 0.5851
t-statistic: 43.2655
p-value: 0.0000
95% CI: [24.13, 26.50]
&lt;/code>&lt;/pre>
&lt;p>The &lt;code>.tidy()&lt;/code> method returns a full DataFrame of results:&lt;/p>
&lt;pre>&lt;code class="language-python">print(fit_twfe.tidy())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Estimate Std. Error t value Pr(&amp;gt;|t|) 2.5% 97.5%
Coefficient
txp 25.314897 0.585106 43.265472 0.0 24.125818 26.503976
&lt;/code>&lt;/pre>
&lt;h3 id="75-comparison-across-specifications">7.5 Comparison across specifications&lt;/h3>
&lt;p>All three specifications produce essentially the same DiD estimate:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Estimate&lt;/th>
&lt;th>Std. Error&lt;/th>
&lt;th>95% CI&lt;/th>
&lt;th>SE Type&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>OLS / HC1&lt;/td>
&lt;td>25.315&lt;/td>
&lt;td>0.615&lt;/td>
&lt;td>[24.09, 26.54]&lt;/td>
&lt;td>Heteroskedasticity-robust&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>TWFE / CRV1&lt;/td>
&lt;td>25.315&lt;/td>
&lt;td>0.585&lt;/td>
&lt;td>[24.13, 26.50]&lt;/td>
&lt;td>Cluster-robust (school)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>TWFE + Cov / CRV1&lt;/td>
&lt;td>25.328&lt;/td>
&lt;td>0.605&lt;/td>
&lt;td>[24.10, 26.56]&lt;/td>
&lt;td>Cluster-robust (school)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The point estimates range from 25.315 to 25.328 — a difference of just 0.013 GPA points. The design (treatment assignment, fixed effects) does the heavy lifting; the choice of specification has negligible impact on the estimate.&lt;/p>
&lt;h2 id="8-inference-comparison">8. Inference Comparison&lt;/h2>
&lt;p>One of PyFixest&amp;rsquo;s strengths is the ability to quickly compare different inference approaches on the same model. Here we estimate the TWFE model four times, each with a different variance-covariance estimator:&lt;/p>
&lt;pre>&lt;code class="language-python">vcov_types = {
&amp;quot;iid&amp;quot;: &amp;quot;iid&amp;quot;,
&amp;quot;HC1&amp;quot;: &amp;quot;HC1&amp;quot;,
&amp;quot;CRV1&amp;quot;: {&amp;quot;CRV1&amp;quot;: &amp;quot;id&amp;quot;},
&amp;quot;CRV3&amp;quot;: {&amp;quot;CRV3&amp;quot;: &amp;quot;id&amp;quot;},
}
for label, vcov_spec in vcov_types.items():
fit_tmp = pf.feols(&amp;quot;gpa ~ txp | id + time&amp;quot;, data=df, vcov=vcov_spec)
tidy = fit_tmp.tidy()
txp_row = tidy[tidy.index == &amp;quot;txp&amp;quot;].iloc[0]
print(f&amp;quot; {label:5s}: SE = {txp_row['Std. Error']:.4f}, &amp;quot;
f&amp;quot;t = {txp_row['t value']:.2f}, &amp;quot;
f&amp;quot;p = {txp_row['Pr(&amp;gt;|t|)']:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> iid : SE = 0.6071, t = 41.70, p = 0.0000
HC1 : SE = 0.5852, t = 43.26, p = 0.0000
CRV1 : SE = 0.5851, t = 43.27, p = 0.0000
CRV3 : SE = 0.6373, t = 39.72, p = 0.0000
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>SE Type&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>SE&lt;/th>
&lt;th>t-stat&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>iid&lt;/td>
&lt;td>Classical (assumes homoskedasticity)&lt;/td>
&lt;td>0.607&lt;/td>
&lt;td>41.70&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>HC1&lt;/td>
&lt;td>Heteroskedasticity-robust (White)&lt;/td>
&lt;td>0.585&lt;/td>
&lt;td>43.26&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>CRV1&lt;/td>
&lt;td>Cluster-robust at school level&lt;/td>
&lt;td>0.585&lt;/td>
&lt;td>43.27&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>CRV3&lt;/td>
&lt;td>Bias-corrected cluster-robust (Bell-McCaffrey)&lt;/td>
&lt;td>0.637&lt;/td>
&lt;td>39.72&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>&lt;strong>iid&lt;/strong> assumes constant variance — a strong assumption rarely justified in practice&lt;/li>
&lt;li>&lt;strong>HC1&lt;/strong> allows for heteroskedasticity but not within-cluster correlation&lt;/li>
&lt;li>&lt;strong>CRV1&lt;/strong> is the workhorse for panel data: it accounts for arbitrary within-school correlation&lt;/li>
&lt;li>&lt;strong>CRV3&lt;/strong> applies a small-sample bias correction, producing the most conservative SEs&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="did101_se_comparison.png" alt="Standard errors across four inference methods (iid, HC1, CRV1, CRV3). CRV3 produces the largest SE (0.637) while HC1 and CRV1 are nearly identical (0.585). All methods yield overwhelmingly significant results.">&lt;/p>
&lt;p>The key takeaway: &lt;strong>inference choice matters less than research design&lt;/strong>. Standard errors range from 0.585 to 0.637, but all four methods produce p-values that are essentially zero. When the treatment effect is 25 GPA points and the largest SE is 0.64, the t-statistic is still above 39. The signal-to-noise ratio is so strong that the choice of variance estimator is practically irrelevant here.&lt;/p>
&lt;p>In applications with smaller effects or fewer clusters, the choice between CRV1 and CRV3 can make the difference between statistical significance and not. With only 35 clusters, CRV3 is the safer default.&lt;/p>
&lt;h2 id="9-publication-quality-tables-with-etable-and-great-tables">9. Publication-Quality Tables with etable() and Great Tables&lt;/h2>
&lt;h3 id="91-stepwise-specifications-with-csw0">9.1 Stepwise specifications with csw0()&lt;/h3>
&lt;p>PyFixest&amp;rsquo;s &lt;code>csw0()&lt;/code> operator lets you estimate multiple specifications in a single call. The &lt;code>csw0&lt;/code> (&amp;ldquo;cumulative stepwise from zero&amp;rdquo;) starts with a baseline model and progressively adds covariates:&lt;/p>
&lt;pre>&lt;code class="language-python">fit_multi = pf.feols(&amp;quot;gpa ~ txp + csw0(female_share) | id + time&amp;quot;,
data=df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;id&amp;quot;})
&lt;/code>&lt;/pre>
&lt;p>This single line estimates &lt;strong>two models&lt;/strong>: (1) &lt;code>gpa ~ txp | id + time&lt;/code> and (2) &lt;code>gpa ~ txp + female_share | id + time&lt;/code>. In Stata, you would need two separate regression commands; PyFixest handles it in one formula.&lt;/p>
&lt;h3 id="92-etable-output">9.2 etable() output&lt;/h3>
&lt;p>The &lt;code>etable()&lt;/code> method generates a publication-style regression table as a Great Tables object:&lt;/p>
&lt;pre>&lt;code class="language-python">print(fit_multi.etable())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> (1) (2)
txp 25.315*** (0.585) 25.328*** (0.605)
female_share -3.216 (8.700)
---
FE: time x x
FE: id x x
Observations 70 70
S.E. type by: id by: id
R² 0.995 0.995
R² Within 0.981 0.982
&lt;/code>&lt;/pre>
&lt;p>The &lt;code>etable()&lt;/code> output shows significance stars, standard errors in parentheses, fixed effects indicators, and model diagnostics — all formatted for immediate inclusion in a paper.&lt;/p>
&lt;h3 id="93-custom-great-tables-table">9.3 Custom Great Tables table&lt;/h3>
&lt;p>For full control over formatting, we can build a table from the &lt;code>.tidy()&lt;/code> DataFrames:&lt;/p>
&lt;pre>&lt;code class="language-python">rows = []
for name, fit in [(&amp;quot;(1) OLS&amp;quot;, fit_ols),
(&amp;quot;(2) TWFE&amp;quot;, fit_twfe),
(&amp;quot;(3) TWFE + Cov&amp;quot;, fit_cov)]:
tidy = fit.tidy()
txp_row = tidy[tidy.index == &amp;quot;txp&amp;quot;].iloc[0]
rows.append({
&amp;quot;Model&amp;quot;: name,
&amp;quot;Estimate&amp;quot;: txp_row[&amp;quot;Estimate&amp;quot;],
&amp;quot;Std. Error&amp;quot;: txp_row[&amp;quot;Std. Error&amp;quot;],
&amp;quot;t value&amp;quot;: txp_row[&amp;quot;t value&amp;quot;],
&amp;quot;p-value&amp;quot;: txp_row[&amp;quot;Pr(&amp;gt;|t|)&amp;quot;],
&amp;quot;95% CI Lower&amp;quot;: txp_row[&amp;quot;2.5%&amp;quot;],
&amp;quot;95% CI Upper&amp;quot;: txp_row[&amp;quot;97.5%&amp;quot;],
&amp;quot;N&amp;quot;: fit._N,
})
gt_df = pd.DataFrame(rows)
gt_table = (
GT(gt_df)
.tab_header(
title=md(&amp;quot;**Table 2: DiD Estimates Across Specifications**&amp;quot;),
subtitle=&amp;quot;Dependent variable: GPA&amp;quot;
)
.fmt_number(columns=[&amp;quot;Estimate&amp;quot;, &amp;quot;Std. Error&amp;quot;, &amp;quot;t value&amp;quot;,
&amp;quot;95% CI Lower&amp;quot;, &amp;quot;95% CI Upper&amp;quot;], decimals=3)
.fmt_number(columns=[&amp;quot;p-value&amp;quot;], decimals=4)
.fmt_integer(columns=[&amp;quot;N&amp;quot;])
.tab_source_note(
&amp;quot;Notes: (1) OLS with HC1 robust SE. (2) TWFE with CRV1 &amp;quot;
&amp;quot;clustered at school level. (3) TWFE with female_share covariate and CRV1.&amp;quot;
)
)
gt_table.save(&amp;quot;did101_table2.png&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="did101_table2.png" alt="Table 2: DiD estimates across three specifications. All models produce an estimate of approximately 25.32–25.33 with highly significant p-values.">&lt;/p>
&lt;p>Great Tables provides fine-grained control over number formatting (&lt;code>.fmt_number()&lt;/code>), column labels (&lt;code>.cols_label()&lt;/code>), headers (&lt;code>.tab_header()&lt;/code>), and styling (&lt;code>.tab_style()&lt;/code>). The &lt;code>.save()&lt;/code> method exports to PNG using a headless browser.&lt;/p>
&lt;h3 id="94-exporting-latex-tables-for-manuscripts">9.4 Exporting LaTeX tables for manuscripts&lt;/h3>
&lt;p>When submitting to academic journals, you need LaTeX-formatted tables rather than HTML or PNG. PyFixest&amp;rsquo;s &lt;code>etable()&lt;/code> can generate publication-ready LaTeX directly by setting &lt;code>type=&amp;quot;tex&amp;quot;&lt;/code>. The output uses &lt;code>booktabs&lt;/code> for clean horizontal rules and &lt;code>threeparttable&lt;/code> for properly aligned footnotes — the standard format expected by most economics and social science journals.&lt;/p>
&lt;pre>&lt;code class="language-python">latex_output = pf.etable(
[fit_ols, fit_twfe, fit_cov],
type=&amp;quot;tex&amp;quot;,
labels={
&amp;quot;txp&amp;quot;: &amp;quot;Treatment $\\times$ Post&amp;quot;,
&amp;quot;treated&amp;quot;: &amp;quot;Treatment&amp;quot;,
&amp;quot;post&amp;quot;: &amp;quot;Post&amp;quot;,
&amp;quot;female_share&amp;quot;: &amp;quot;Female Share&amp;quot;,
&amp;quot;Intercept&amp;quot;: &amp;quot;Constant&amp;quot;,
},
notes=&amp;quot;Standard errors in parentheses. * p&amp;lt;0.05, ** p&amp;lt;0.01, *** p&amp;lt;0.001.&amp;quot;,
)
print(latex_output)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">\begin{threeparttable}
\begin{tabular}{lcccc}
\toprule
&amp;amp; \multicolumn{3}{c}{gpa} \\
\cmidrule(lr){2-4}
&amp;amp; (1) &amp;amp; (2) &amp;amp; (3) \\
\midrule
Treatment &amp;amp; \makecell{-11.049*** \\ (0.288)} &amp;amp; &amp;amp; \\
Post &amp;amp; \makecell{10.886*** \\ (0.339)} &amp;amp; &amp;amp; \\
Treatment:Post &amp;amp; \makecell{25.315*** \\ (0.615)} &amp;amp; \makecell{25.315*** \\ (0.585)} &amp;amp; \makecell{25.328*** \\ (0.605)} \\
Female Share &amp;amp; &amp;amp; &amp;amp; \makecell{-3.216 \\ (8.700)} \\
Constant &amp;amp; \makecell{71.215*** \\ (0.218)} &amp;amp; &amp;amp; \\
\midrule
id &amp;amp; - &amp;amp; x &amp;amp; x \\
time &amp;amp; - &amp;amp; x &amp;amp; x \\
\midrule
Observations &amp;amp; 70 &amp;amp; 70 &amp;amp; 70 \\
S.E. type &amp;amp; hetero &amp;amp; by: id &amp;amp; by: id \\
$R^2$ &amp;amp; 0.989 &amp;amp; 0.995 &amp;amp; 0.995 \\
$R^2$ Within &amp;amp; - &amp;amp; 0.981 &amp;amp; 0.982 \\
\bottomrule
\end{tabular}
\footnotesize Standard errors in parentheses. * p&amp;lt;0.05, ** p&amp;lt;0.01, *** p&amp;lt;0.001.
\end{threeparttable}
&lt;/code>&lt;/pre>
&lt;p>To save the table directly to a &lt;code>.tex&lt;/code> file that you can &lt;code>\input{}&lt;/code> in your manuscript, use the &lt;code>file_name&lt;/code> parameter:&lt;/p>
&lt;pre>&lt;code class="language-python">pf.etable(
[fit_ols, fit_twfe, fit_cov],
type=&amp;quot;tex&amp;quot;,
labels={
&amp;quot;txp&amp;quot;: &amp;quot;Treatment $\\times$ Post&amp;quot;,
&amp;quot;treated&amp;quot;: &amp;quot;Treatment&amp;quot;,
&amp;quot;post&amp;quot;: &amp;quot;Post&amp;quot;,
&amp;quot;female_share&amp;quot;: &amp;quot;Female Share&amp;quot;,
&amp;quot;Intercept&amp;quot;: &amp;quot;Constant&amp;quot;,
},
notes=&amp;quot;Standard errors in parentheses. * p&amp;lt;0.05, ** p&amp;lt;0.01, *** p&amp;lt;0.001.&amp;quot;,
file_name=&amp;quot;did101_table2.tex&amp;quot;,
)
&lt;/code>&lt;/pre>
&lt;p>This saves the file to &lt;code>did101_table2.tex&lt;/code>. In your LaTeX manuscript, include it with:&lt;/p>
&lt;pre>&lt;code class="language-text">\begin{table}[htbp]
\centering
\caption{DiD Estimates Across Specifications}
\label{tab:did-results}
\input{did101_table2.tex}
\end{table}
&lt;/code>&lt;/pre>
&lt;p>The &lt;code>labels&lt;/code> dictionary maps internal variable names to publication-friendly labels (e.g., &lt;code>&amp;quot;txp&amp;quot;&lt;/code> becomes &lt;code>&amp;quot;Treatment $\times$ Post&amp;quot;&lt;/code>). The &lt;code>notes&lt;/code> parameter adds a footnote below the table. Your LaTeX document needs the &lt;code>booktabs&lt;/code>, &lt;code>makecell&lt;/code>, and &lt;code>threeparttable&lt;/code> packages in the preamble:&lt;/p>
&lt;pre>&lt;code class="language-text">\usepackage{booktabs}
\usepackage{makecell}
\usepackage{threeparttable}
&lt;/code>&lt;/pre>
&lt;h2 id="10-coefficient-comparison">10. Coefficient Comparison&lt;/h2>
&lt;p>A coefficient plot provides a visual comparison of the DiD estimate across specifications, including 95% confidence intervals:&lt;/p>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(9, 5))
model_names = [&amp;quot;(1) OLS\nHC1&amp;quot;, &amp;quot;(2) TWFE\nCRV1&amp;quot;, &amp;quot;(3) TWFE+Cov\nCRV1&amp;quot;]
estimates = [fit.tidy().loc[&amp;quot;txp&amp;quot;, &amp;quot;Estimate&amp;quot;] for fit in [fit_ols, fit_twfe, fit_cov]]
ci_lower = [fit.tidy().loc[&amp;quot;txp&amp;quot;, &amp;quot;2.5%&amp;quot;] for fit in [fit_ols, fit_twfe, fit_cov]]
ci_upper = [fit.tidy().loc[&amp;quot;txp&amp;quot;, &amp;quot;97.5%&amp;quot;] for fit in [fit_ols, fit_twfe, fit_cov]]
ax.errorbar(estimates, range(3), xerr=[[e-l for e,l in zip(estimates, ci_lower)],
[u-e for e,u in zip(estimates, ci_upper)]],
fmt=&amp;quot;o&amp;quot;, color=TEAL, markersize=10, capsize=6, elinewidth=2)
ax.set_xlabel(&amp;quot;DiD Estimate (txp coefficient)&amp;quot;)
ax.set_title(&amp;quot;Coefficient Comparison Across Specifications&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="did101_coefplot.png" alt="Coefficient plot showing the txp estimate across three specifications. All estimates cluster tightly around 25.32–25.33 with narrow, non-overlapping-with-zero confidence intervals.">&lt;/p>
&lt;p>The near-identical point estimates and overlapping confidence intervals across all three specifications reinforce that the DiD estimate is robust. The point estimates span a range of just 0.013 GPA points (25.315 to 25.328), demonstrating remarkable stability regardless of whether we include school fixed effects, time fixed effects, or time-varying covariates.&lt;/p>
&lt;h2 id="11-event-study-dynamic-treatment-effects">11. Event Study: Dynamic Treatment Effects&lt;/h2>
&lt;h3 id="111-loading-the-event-study-data">11.1 Loading the event study data&lt;/h3>
&lt;p>The 2×2 design tells us &lt;em>whether&lt;/em> the program had an effect, but not &lt;em>when&lt;/em> the effect kicked in or whether it grew or faded over time. The event study dataset extends the analysis to &lt;strong>8 time periods&lt;/strong> (4 pre-treatment and 4 post-treatment):&lt;/p>
&lt;pre>&lt;code class="language-python">url_event = &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/isds/tutoring_didevent.dta&amp;quot;
df_event = pd.read_stata(url_event).astype(float)
print(f&amp;quot;Shape: {df_event.shape}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Shape: (280, 8)
&lt;/code>&lt;/pre>
&lt;p>The new variable &lt;code>timeToTreat&lt;/code> measures &lt;strong>periods relative to treatment onset&lt;/strong> for treated schools: −4 through −1 are pre-treatment periods, 0 through 3 are post-treatment. Untreated schools have &lt;code>NaN&lt;/code> for this variable (they are never treated).&lt;/p>
&lt;pre>&lt;code class="language-python">print(df_event[&amp;quot;timeToTreat&amp;quot;].value_counts().sort_index())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">timeToTreat
-4.0 10
-3.0 10
-2.0 10
-1.0 10
0.0 10
1.0 10
2.0 10
3.0 10
&lt;/code>&lt;/pre>
&lt;p>Each of the 10 treated schools contributes one observation per relative time period, giving 10 observations at each event time.&lt;/p>
&lt;p>&lt;img src="did101_panelview_event.png" alt="Panel structure for the event study design showing 35 schools across 8 time periods. Treatment begins at period 5, with 10 treated schools switching from light to dark orange while 25 comparison schools remain in steel blue.">&lt;/p>
&lt;h3 id="112-the-event-study-specification">11.2 The event study specification&lt;/h3>
&lt;p>The event study model replaces the single &lt;code>txp&lt;/code> interaction with a full set of &lt;strong>event-time indicators&lt;/strong>, one for each period relative to treatment. We omit one period (the reference period, $t = -1$) to avoid perfect collinearity:&lt;/p>
&lt;p>$$Y_{it} = \alpha + \sum_{j=-m}^{q} \theta_j \cdot \text{treat}_{it}(t = k + j) + \gamma_i + \vartheta_t + \varepsilon_{it}$$&lt;/p>
&lt;p>Each $\theta_j$ measures the treatment effect at event time $j$:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Pre-treatment coefficients&lt;/strong> ($j &amp;lt; 0$): These should be near zero if parallel trends holds. Significant pre-treatment coefficients would indicate that treated schools were already diverging &lt;em>before&lt;/em> the program — a red flag for the DiD design.&lt;/li>
&lt;li>&lt;strong>Post-treatment coefficients&lt;/strong> ($j \geq 0$): These capture the dynamic treatment effect at each lag after the program starts.&lt;/li>
&lt;/ul>
&lt;h3 id="113-estimation-with-i">11.3 Estimation with i()&lt;/h3>
&lt;p>PyFixest&amp;rsquo;s &lt;code>i()&lt;/code> function creates factor (indicator) variables with a specified reference level — perfect for event study designs:&lt;/p>
&lt;pre>&lt;code class="language-python">df_event[&amp;quot;timeToTreat&amp;quot;] = df_event[&amp;quot;timeToTreat&amp;quot;].fillna(-99)
fit_event = pf.feols(&amp;quot;gpa ~ i(timeToTreat, ref=-1) | id + time&amp;quot;,
data=df_event, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;id&amp;quot;})
print(fit_event.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: gpa, Fixed effects: id+time
Inference: CRV1
Observations: 280
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| t = -4 | 0.342 | 0.401 | 0.852 | 0.400 | -0.474 | 1.157 |
| t = -3 | -0.322 | 0.441 | -0.730 | 0.471 | -1.219 | 0.575 |
| t = -2 | 0.593 | 0.423 | 1.401 | 0.170 | -0.267 | 1.454 |
| t = 0 | 25.028 | 0.445 | 56.232 | 0.000 | 24.123 | 25.932 |
| t = 1 | 24.705 | 0.559 | 44.174 | 0.000 | 23.569 | 25.842 |
| t = 2 | 24.768 | 0.739 | 33.534 | 0.000 | 23.267 | 26.270 |
| t = 3 | 25.701 | 0.797 | 32.268 | 0.000 | 24.083 | 27.320 |
---
RMSE: 1.134 R2: 0.991 R2 Within: 0.961
&lt;/code>&lt;/pre>
&lt;p>&lt;em>Note: Coefficient names simplified for readability. PyFixest outputs &lt;code>C(timeToTreat, contr.treatment(base=-1))[T.X.0]&lt;/code> notation, where X is the relative time period.&lt;/em>&lt;/p>
&lt;p>The &lt;code>i(timeToTreat, ref=-1)&lt;/code> syntax tells PyFixest to create indicator variables for each unique value of &lt;code>timeToTreat&lt;/code>, using $t = -1$ as the reference period (coefficient normalized to zero). We fill &lt;code>NaN&lt;/code> values with −99 for untreated schools — this creates a dummy that gets absorbed by the school fixed effects, keeping the remaining coefficients interpretable.&lt;/p>
&lt;h3 id="114-event-study-plot">11.4 Event study plot&lt;/h3>
&lt;p>The event study plot is the signature visualization for DiD designs. Pre-treatment coefficients near zero validate the parallel trends assumption, while post-treatment coefficients reveal the dynamic treatment effect:&lt;/p>
&lt;p>&lt;img src="did101_event_study.png" alt="Event study plot showing coefficients at each period relative to treatment. Pre-treatment coefficients (t = -4 to -2) hover near zero with confidence intervals that include zero, supporting parallel trends. Post-treatment coefficients (t = 0 to 3) jump to approximately 25 and remain stable.">&lt;/p>
&lt;p>The plot shows a textbook event study pattern:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Pre-treatment (t = −4 to −2):&lt;/strong> Coefficients are small (0.34, −0.32, 0.59) and statistically insignificant (all p &amp;gt; 0.17). The confidence intervals comfortably include zero. This is strong evidence supporting the &lt;strong>parallel trends assumption&lt;/strong> — treated and comparison schools were following similar trajectories before the program.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Post-treatment (t = 0 to 3):&lt;/strong> An immediate, sharp jump to ≈25 points at the moment of treatment, with the effect remaining remarkably stable across all four post-treatment periods (24.71 to 25.70). There is &lt;strong>no evidence of fade-out&lt;/strong> or dynamic adjustment — the program&amp;rsquo;s effect is both immediate and sustained.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="115-event-study-coefficients-table">11.5 Event study coefficients table&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Period&lt;/th>
&lt;th>Estimate&lt;/th>
&lt;th>95% CI&lt;/th>
&lt;th>Significant?&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>t = −4&lt;/td>
&lt;td>0.342&lt;/td>
&lt;td>[−0.47, 1.16]&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>t = −3&lt;/td>
&lt;td>−0.322&lt;/td>
&lt;td>[−1.22, 0.57]&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>t = −2&lt;/td>
&lt;td>0.593&lt;/td>
&lt;td>[−0.27, 1.45]&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>t = −1&lt;/td>
&lt;td>0.000&lt;/td>
&lt;td>(reference)&lt;/td>
&lt;td>—&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>t = 0&lt;/td>
&lt;td>25.028&lt;/td>
&lt;td>[24.12, 25.93]&lt;/td>
&lt;td>Yes (p &amp;lt; 0.001)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>t = 1&lt;/td>
&lt;td>24.705&lt;/td>
&lt;td>[23.57, 25.84]&lt;/td>
&lt;td>Yes (p &amp;lt; 0.001)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>t = 2&lt;/td>
&lt;td>24.768&lt;/td>
&lt;td>[23.27, 26.27]&lt;/td>
&lt;td>Yes (p &amp;lt; 0.001)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>t = 3&lt;/td>
&lt;td>25.701&lt;/td>
&lt;td>[24.08, 27.32]&lt;/td>
&lt;td>Yes (p &amp;lt; 0.001)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;img src="did101_event_table.png" alt="Table 4: Event study coefficients with estimates, 95% confidence intervals, and significance indicators for each period relative to treatment.">&lt;/p>
&lt;p>The event study confirms three critical findings: (1) &lt;strong>no pre-trends&lt;/strong> — the design is credible, (2) &lt;strong>immediate effect&lt;/strong> — the program works from day one, and (3) &lt;strong>sustained impact&lt;/strong> — no fade-out over four post-treatment periods.&lt;/p>
&lt;h2 id="12-discussion">12. Discussion&lt;/h2>
&lt;h3 id="121-four-key-findings">12.1 Four key findings&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>The naive before-after comparison overstates the effect by 43%.&lt;/strong> The raw change in treated schools is 36.20 GPA points, but 10.88 of these points reflect a common upward trend shared by all schools. DiD correctly attributes only 25.32 points to the program.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The event study confirms no differential pre-trends.&lt;/strong> All three pre-treatment coefficients (0.34, −0.32, 0.59) are small, close to zero, and statistically insignificant. The parallel trends assumption is well-supported by the data.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The effect is immediate and sustained.&lt;/strong> Post-treatment coefficients range from 24.71 to 25.70, showing no evidence of delayed onset, gradual ramp-up, or fade-out. The tutoring program&amp;rsquo;s impact is both immediate and remarkably stable.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Inference choice matters less than design.&lt;/strong> Standard errors ranged from 0.585 (CRV1) to 0.637 (CRV3) across four inference methods, but all produced t-statistics above 39. When the research design is clean and the signal is strong, the choice of variance estimator is practically irrelevant.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="122-caveats">12.2 Caveats&lt;/h3>
&lt;p>This tutorial uses simulated data designed to illustrate DiD mechanics cleanly. In real applications, you should expect:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>R-squared below 0.99:&lt;/strong> The R² of 0.995 in our TWFE models is unrealistically high. Real education data has far more noise.&lt;/li>
&lt;li>&lt;strong>Smaller treatment effects:&lt;/strong> A 25-point GPA increase is enormous. Real programs typically produce single-digit effects.&lt;/li>
&lt;li>&lt;strong>Imperfect parallel trends:&lt;/strong> Pre-treatment coefficients may not be exactly zero, requiring judgment about how much deviation is acceptable.&lt;/li>
&lt;li>&lt;strong>Staggered treatment timing:&lt;/strong> When different units receive treatment at different times, the standard TWFE estimator can be biased. Modern DiD estimators (Callaway &amp;amp; Sant&amp;rsquo;Anna, 2021; Gardner, 2022) address this.&lt;/li>
&lt;/ul>
&lt;h2 id="13-summary-and-takeaways">13. Summary and Takeaways&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>DiD removes common time trends.&lt;/strong> The naive approach overstated the effect by 10.88 GPA points — exactly the secular trend captured by the comparison group.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Multiple approaches produce one answer.&lt;/strong> Three specifications (OLS, TWFE, TWFE with covariate) all yield a DiD estimate of 25.32–25.33, demonstrating the robustness of the design.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Event studies test parallel trends.&lt;/strong> Pre-treatment coefficients (0.34, −0.32, 0.59) are close to zero and insignificant, providing strong evidence that treated and comparison schools were on similar trajectories before the program.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The effect is immediate and sustained.&lt;/strong> Post-treatment coefficients (24.71–25.70) show no evidence of delayed onset or fade-out.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Inference flexibility is a PyFixest strength.&lt;/strong> Switching between iid, HC1, CRV1, and CRV3 requires only changing the &lt;code>vcov&lt;/code> argument — no need for separate packages or commands.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>etable() and Great Tables replace manual table construction.&lt;/strong> The &lt;code>csw0()&lt;/code> operator estimates multiple specifications in one call, and &lt;code>etable()&lt;/code> produces publication-ready output. For custom formatting, Great Tables provides full control via &lt;code>.tab_header()&lt;/code>, &lt;code>.fmt_number()&lt;/code>, &lt;code>.tab_style()&lt;/code>, and &lt;code>.save()&lt;/code>.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="14-exercises">14. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Robustness check:&lt;/strong> Load the event study dataset and collapse it to a 2×2 design by averaging GPA across all pre-treatment periods and all post-treatment periods for each school. Re-estimate the DiD. Does the estimate match the 2×2 result?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Inference sensitivity:&lt;/strong> Estimate the TWFE model with all four SE types (iid, HC1, CRV1, CRV3). At what significance level (α) would the choice of SE type change your conclusion about statistical significance? How few clusters would you need before the choice starts to matter?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Staggered DiD:&lt;/strong> Read about PyFixest&amp;rsquo;s &lt;code>did2s()&lt;/code> function for the Gardner (2022) two-stage DiD estimator. How would you adapt this tutorial&amp;rsquo;s code to handle a setting where the 10 treated schools received the program at different times?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>Corral, D. &amp;amp; Yang, M. (2024). An introduction to the difference-in-differences design in education policy research. &lt;em>Asia Pacific Education Review&lt;/em>.&lt;/li>
&lt;li>Callaway, B. &amp;amp; Sant&amp;rsquo;Anna, P.H. (2021). Difference-in-differences with multiple time periods. &lt;em>Journal of Econometrics&lt;/em>, 225(2), 200–230.&lt;/li>
&lt;li>Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. &lt;em>Journal of Econometrics&lt;/em>, 225(2), 254–277.&lt;/li>
&lt;li>Sun, L. &amp;amp; Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. &lt;em>Journal of Econometrics&lt;/em>, 225(2), 175–199.&lt;/li>
&lt;li>Borusyak, K., Jaravel, X. &amp;amp; Spiess, J. (2024). Revisiting event-study designs: robust and efficient estimation. &lt;em>Review of Economic Studies&lt;/em>, 91(6), 3253–3285.&lt;/li>
&lt;li>Baker, A.C., Larcker, D.F. &amp;amp; Wang, C.C.Y. (2022). How much should we trust staggered difference-in-differences estimates? &lt;em>Journal of Financial Economics&lt;/em>, 144(2), 370–395.&lt;/li>
&lt;li>Correia, S. (2016). REGHDFE: Stata module to perform linear or instrumental-variable regression absorbing any number of high-dimensional fixed effects.&lt;/li>
&lt;li>Fischer, A. &amp;amp; Schar, A. (2024). PyFixest: Fast high-dimensional fixed effects estimation in Python.&lt;/li>
&lt;li>Great Tables: Presentation-ready display tables. Posit PBC.&lt;/li>
&lt;li>Gardner, J. (2022). Two-stage differences in differences. Working paper.&lt;/li>
&lt;/ol></description></item><item><title>Introduction to Difference-in-Differences (DiD) in Stata</title><link>https://carlos-mendez.org/post/stata_did/</link><pubDate>Sat, 25 Apr 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/stata_did/</guid><description>&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>How can we evaluate whether a government program actually works when a randomized controlled trial (RCT) is not feasible? Education researchers frequently face this challenge: a new policy is rolled out in some schools but not others, and we need to know whether it made a difference. &lt;strong>Difference-in-Differences (DiD)&lt;/strong> is one of the most widely used quasi-experimental designs for answering this kind of causal question.&lt;/p>
&lt;p>In this tutorial, we introduce the DiD method through a case study based on Corral and Yang (2024). A fictitious government implements an after-school tutoring program in 10 of 35 high schools to improve the GPA of low-income students. We compare these treated schools against 25 comparison schools that did not receive the program. Our goal is to estimate the &lt;strong>Average Treatment Effect on the Treated (ATT)&lt;/strong> &amp;mdash; by how many GPA points did the program improve academic performance?&lt;/p>
&lt;p>We progress from a naive before-after comparison (which overstates the effect) to the full DiD regression framework, demonstrate five equivalent estimation approaches in Stata, and extend the analysis with an event study design that tests whether the parallel trends assumption holds. By the end, we find that the tutoring program increased GPA by approximately &lt;strong>25.32 points&lt;/strong> on a 0-100 scale &amp;mdash; a large and statistically significant effect.&lt;/p>
&lt;h3 id="learning-objectives">Learning objectives&lt;/h3>
&lt;ul>
&lt;li>Understand why naive before-after comparisons overstate treatment effects&lt;/li>
&lt;li>Implement the 2x2 DiD design manually and via regression&lt;/li>
&lt;li>Estimate the DiD using five equivalent Stata commands (&lt;code>diff&lt;/code>, &lt;code>reg&lt;/code>, &lt;code>didregress&lt;/code>, &lt;code>xtreg&lt;/code>, &lt;code>reghdfe&lt;/code>)&lt;/li>
&lt;li>Assess the parallel trends assumption using an event study design&lt;/li>
&lt;li>Interpret event study coefficients as evidence for or against parallel pre-trends&lt;/li>
&lt;/ul>
&lt;h3 id="study-design">Study design&lt;/h3>
&lt;p>The following diagram summarizes the case study setup and the analytical approach we follow throughout this tutorial.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
subgraph &amp;quot;Case Study Setting&amp;quot;
A[&amp;quot;&amp;lt;b&amp;gt;35 High Schools&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;in One Region&amp;quot;]
B[&amp;quot;&amp;lt;b&amp;gt;10 Treated Schools&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(tutoring program)&amp;quot;]
C[&amp;quot;&amp;lt;b&amp;gt;25 Comparison Schools&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(no program)&amp;quot;]
A --&amp;gt; B
A --&amp;gt; C
end
subgraph &amp;quot;DiD Design&amp;quot;
D[&amp;quot;&amp;lt;b&amp;gt;Pre-Program&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;GPA at baseline&amp;quot;]
E[&amp;quot;&amp;lt;b&amp;gt;Post-Program&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;GPA after intervention&amp;quot;]
F[&amp;quot;&amp;lt;b&amp;gt;DiD Estimate&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;ATT = 25.32&amp;quot;]
D --&amp;gt; E --&amp;gt; F
end
subgraph &amp;quot;Estimation Methods&amp;quot;
G[&amp;quot;&amp;lt;b&amp;gt;Manual 2x2&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Subtraction&amp;quot;]
H[&amp;quot;&amp;lt;b&amp;gt;TWFE Regression&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;5 approaches&amp;quot;]
I[&amp;quot;&amp;lt;b&amp;gt;Event Study&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Dynamic effects&amp;quot;]
G --&amp;gt; H --&amp;gt; I
end
C --&amp;gt; D
F --&amp;gt; G
style A fill:#6a9bcc,stroke:#141413,color:#fff
style B fill:#d97757,stroke:#141413,color:#fff
style C fill:#6a9bcc,stroke:#141413,color:#fff
style F fill:#00d4c8,stroke:#141413,color:#fff
style I fill:#00d4c8,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>The study uses panel data: the same 35 schools are observed at two time points (pre- and post-program), giving us 70 school-period observations. For the event study extension, we use an expanded dataset with 8 time periods (280 observations), allowing us to test for parallel pre-trends and examine dynamic treatment effects.&lt;/p>
&lt;hr>
&lt;style>
.podcast-overlay {
display: none;
position: fixed;
bottom: 0;
left: 0;
right: 0;
z-index: 9999;
animation: podSlideUp 0.35s ease-out;
}
@keyframes podSlideUp {
from { transform: translateY(100%); }
to { transform: translateY(0); }
}
.podcast-overlay.pod-closing {
animation: podSlideDown 0.3s ease-in forwards;
}
@keyframes podSlideDown {
from { transform: translateY(0); }
to { transform: translateY(100%); }
}
.podcast-container {
background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%);
padding: 18px 24px 20px;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
box-shadow: 0 -4px 32px rgba(0,0,0,0.5);
border-top: 1px solid rgba(106,155,204,0.2);
}
.podcast-inner {
max-width: 800px;
margin: 0 auto;
}
.podcast-top-row {
display: flex;
align-items: center;
gap: 14px;
margin-bottom: 14px;
}
.podcast-icon {
width: 42px;
height: 42px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 10px;
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
}
.podcast-icon svg {
width: 22px;
height: 22px;
fill: #fff;
}
.podcast-title-block {
flex: 1;
min-width: 0;
}
.podcast-title-block h4 {
margin: 0 0 1px 0;
color: #f0ece2;
font-size: 14px;
font-weight: 600;
letter-spacing: 0.02em;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.podcast-title-block span {
color: #8b9dc3;
font-size: 11px;
}
.podcast-close-btn {
background: none;
border: none;
cursor: pointer;
padding: 6px;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.2s;
flex-shrink: 0;
}
.podcast-close-btn:hover {
background: rgba(255,255,255,0.1);
}
.podcast-close-btn svg {
width: 20px;
height: 20px;
fill: #8b9dc3;
}
.podcast-progress-wrap {
margin-bottom: 12px;
}
.podcast-time-row {
display: flex;
justify-content: space-between;
font-size: 11px;
color: #8b9dc3;
margin-bottom: 5px;
font-variant-numeric: tabular-nums;
}
.podcast-bar-bg {
width: 100%;
height: 6px;
background: rgba(255,255,255,0.1);
border-radius: 3px;
cursor: pointer;
position: relative;
overflow: hidden;
transition: height 0.15s;
}
.podcast-bar-buffered {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: rgba(106,155,204,0.25);
border-radius: 3px;
transition: width 0.3s;
}
.podcast-bar-progress {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: linear-gradient(90deg, #6a9bcc, #00d4c8);
border-radius: 3px;
transition: width 0.1s linear;
}
.podcast-bar-bg:hover {
height: 10px;
margin-top: -2px;
}
.podcast-controls-row {
display: flex;
align-items: center;
justify-content: space-between;
}
.podcast-transport {
display: flex;
align-items: center;
gap: 8px;
}
.podcast-btn {
background: none;
border: none;
cursor: pointer;
padding: 4px;
display: flex;
align-items: center;
justify-content: center;
border-radius: 50%;
transition: all 0.2s;
}
.podcast-btn svg {
fill: #c8d0e0;
transition: fill 0.2s;
}
.podcast-btn:hover svg {
fill: #f0ece2;
}
.podcast-btn-skip {
position: relative;
}
.podcast-btn-skip span {
position: absolute;
font-size: 7px;
font-weight: 700;
color: #c8d0e0;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
pointer-events: none;
margin-top: 1px;
}
.podcast-btn-play {
width: 48px;
height: 48px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 50%;
box-shadow: 0 3px 12px rgba(217,119,87,0.4);
transition: all 0.2s;
}
.podcast-btn-play:hover {
transform: scale(1.08);
box-shadow: 0 5px 20px rgba(217,119,87,0.5);
}
.podcast-btn-play svg {
fill: #fff;
width: 22px;
height: 22px;
}
.podcast-extras {
display: flex;
align-items: center;
gap: 10px;
}
.podcast-volume-wrap {
display: flex;
align-items: center;
gap: 5px;
}
.podcast-volume-wrap svg {
fill: #8b9dc3;
width: 16px;
height: 16px;
cursor: pointer;
flex-shrink: 0;
}
.podcast-volume-wrap svg:hover {
fill: #c8d0e0;
}
.podcast-volume-slider {
-webkit-appearance: none;
appearance: none;
width: 60px;
height: 4px;
background: rgba(255,255,255,0.12);
border-radius: 2px;
outline: none;
cursor: pointer;
}
.podcast-volume-slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 12px;
height: 12px;
background: #6a9bcc;
border-radius: 50%;
cursor: pointer;
}
.podcast-speed-btn {
background: rgba(255,255,255,0.08);
border: 1px solid rgba(255,255,255,0.12);
color: #c8d0e0;
font-size: 11px;
font-weight: 600;
padding: 3px 9px;
border-radius: 12px;
cursor: pointer;
transition: all 0.2s;
font-family: inherit;
min-width: 40px;
text-align: center;
}
.podcast-speed-btn:hover {
background: rgba(106,155,204,0.2);
border-color: #6a9bcc;
color: #f0ece2;
}
.podcast-download-btn {
background: none;
border: 1px solid rgba(255,255,255,0.12);
border-radius: 8px;
padding: 4px 10px;
cursor: pointer;
display: flex;
align-items: center;
gap: 4px;
color: #8b9dc3;
font-size: 11px;
font-family: inherit;
text-decoration: none;
transition: all 0.2s;
}
.podcast-download-btn:hover {
border-color: #6a9bcc;
color: #f0ece2;
background: rgba(106,155,204,0.1);
}
.podcast-download-btn svg {
width: 14px;
height: 14px;
fill: currentColor;
}
@media (max-width: 600px) {
.podcast-container { padding: 14px 16px 16px; }
.podcast-volume-wrap { display: none; }
.podcast-title-block h4 { font-size: 13px; }
.podcast-extras { gap: 8px; }
}
/* Video player overlay */
.video-overlay {
display: none;
position: fixed;
top: 0;
left: 0;
right: 0;
bottom: 0;
z-index: 9999;
background: rgba(0,0,0,0.85);
animation: vidFadeIn 0.3s ease-out;
}
@keyframes vidFadeIn {
from { opacity: 0; }
to { opacity: 1; }
}
.video-overlay.vid-closing {
animation: vidFadeOut 0.25s ease-in forwards;
}
@keyframes vidFadeOut {
from { opacity: 1; }
to { opacity: 0; }
}
.video-container {
position: absolute;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
width: 94%;
max-width: 1600px;
}
.video-top-row {
display: flex;
align-items: center;
justify-content: space-between;
margin-bottom: 10px;
}
.video-top-row h4 {
margin: 0;
color: #f0ece2;
font-size: 15px;
font-weight: 600;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
display: flex;
align-items: center;
gap: 10px;
}
.video-icon {
width: 34px;
height: 34px;
background: #ff0000;
border-radius: 8px;
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
}
.video-icon svg {
width: 18px;
height: 18px;
fill: #fff;
}
.video-close-btn {
background: none;
border: none;
cursor: pointer;
padding: 6px;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.2s;
}
.video-close-btn:hover {
background: rgba(255,255,255,0.15);
}
.video-close-btn svg {
width: 24px;
height: 24px;
fill: #c8d0e0;
}
.video-frame-wrap {
position: relative;
padding-bottom: 56.25%;
height: 0;
overflow: hidden;
border-radius: 8px;
background: #000;
box-shadow: 0 8px 40px rgba(0,0,0,0.6);
}
.video-frame-wrap iframe {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
border: 0;
border-radius: 8px;
}
@media (max-width: 600px) {
.video-container { width: 98%; }
.video-top-row h4 { font-size: 13px; }
}
&lt;/style>
&lt;div class="podcast-overlay" id="podOverlay">
&lt;div class="podcast-container">
&lt;div class="podcast-inner">
&lt;audio id="podAudio" preload="none" src="https://files.catbox.moe/s6tyrz.wav">&lt;/audio>
&lt;div class="podcast-top-row">
&lt;div class="podcast-icon">
&lt;svg viewBox="0 0 24 24">&lt;path d="M12 1a5 5 0 0 0-5 5v4a5 5 0 0 0 10 0V6a5 5 0 0 0-5-5zm0 16a7 7 0 0 1-7-7H3a9 9 0 0 0 8 8.94V22h2v-3.06A9 9 0 0 0 21 10h-2a7 7 0 0 1-7 7z"/>&lt;/svg>
&lt;/div>
&lt;div class="podcast-title-block">
&lt;h4>AI Podcast: Introduction to DiD in Stata&lt;/h4>
&lt;span id="podDurationLabel">Click play to load&lt;/span>
&lt;/div>
&lt;button class="podcast-close-btn" onclick="podClose()" title="Close player">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"/>&lt;/svg>
&lt;/button>
&lt;/div>
&lt;div class="podcast-progress-wrap">
&lt;div class="podcast-time-row">
&lt;span id="podCurrent">0:00&lt;/span>
&lt;span id="podDuration">0:00&lt;/span>
&lt;/div>
&lt;div class="podcast-bar-bg" id="podBarBg" onclick="podSeek(event)">
&lt;div class="podcast-bar-buffered" id="podBuffered">&lt;/div>
&lt;div class="podcast-bar-progress" id="podProgress">&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class="podcast-controls-row">
&lt;div class="podcast-transport">
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(-15)" title="Back 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1L7 6l5 5V7c3.31 0 6 2.69 6 6s-2.69 6-6 6-6-2.69-6-6H4c0 4.42 3.58 8 8 8s8-3.58 8-8-3.58-8-8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-play" id="podPlayBtn" onclick="podToggle()" title="Play">
&lt;svg id="podIconPlay" viewBox="0 0 24 24">&lt;path d="M8 5v14l11-7z"/>&lt;/svg>
&lt;svg id="podIconPause" viewBox="0 0 24 24" style="display:none">&lt;path d="M6 19h4V5H6v14zm8-14v14h4V5h-4z"/>&lt;/svg>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(15)" title="Forward 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1l5 5-5 5V7c-3.31 0-6 2.69-6 6s2.69 6 6 6 6-2.69 6-6h2c0 4.42-3.58 8-8 8s-8-3.58-8-8 3.58-8 8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;/div>
&lt;div class="podcast-extras">
&lt;div class="podcast-volume-wrap">
&lt;svg id="podVolIcon" onclick="podMute()" viewBox="0 0 24 24">&lt;path d="M3 9v6h4l5 5V4L7 9H3zm13.5 3A4.5 4.5 0 0 0 14 8.5v7a4.47 4.47 0 0 0 2.5-3.5zM14 3.23v2.06a6.51 6.51 0 0 1 0 13.42v2.06A8.51 8.51 0 0 0 14 3.23z"/>&lt;/svg>
&lt;input type="range" class="podcast-volume-slider" id="podVolume" min="0" max="1" step="0.05" value="0.8">
&lt;/div>
&lt;button class="podcast-speed-btn" id="podSpeedBtn" onclick="podCycleSpeed()" title="Playback speed">1x&lt;/button>
&lt;a class="podcast-download-btn" href="https://files.catbox.moe/s6tyrz.wav" download="stata_did_podcast.wav" title="Download">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 9h-4V3H9v6H5l7 7 7-7zM5 18v2h14v-2H5z"/>&lt;/svg>
&lt;/a>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;script>
(function(){
var overlay = document.getElementById('podOverlay');
var a = document.getElementById('podAudio');
var speeds = [0.75, 1, 1.25, 1.5, 2];
var si = 1;
var opened = false;
function fmt(s){
if(isNaN(s)) return '0:00';
var m=Math.floor(s/60), sec=Math.floor(s%60);
return m+':'+(sec&lt;10?'0':'')+sec;
}
/* Intercept clicks on the YAML podcast button (match by text, not href,
because Wowchemy's relURL mangles fragment-only URLs) */
document.addEventListener('click', function(e){
var link = e.target.closest('a.btn-page-header');
if(!link) return;
var text = link.textContent.trim();
if(text.indexOf('AI Podcast') === -1) return;
e.preventDefault();
e.stopPropagation();
overlay.style.display = 'block';
overlay.classList.remove('pod-closing');
if(!opened){
a.preload = 'metadata';
a.load();
opened = true;
}
});
a.volume = 0.8;
a.addEventListener('loadedmetadata', function(){
document.getElementById('podDuration').textContent = fmt(a.duration);
document.getElementById('podDurationLabel').textContent = fmt(a.duration) + ' minutes';
});
a.addEventListener('timeupdate', function(){
document.getElementById('podCurrent').textContent = fmt(a.currentTime);
var pct = a.duration ? (a.currentTime/a.duration)*100 : 0;
document.getElementById('podProgress').style.width = pct+'%';
});
a.addEventListener('progress', function(){
if(a.buffered.length>0){
var pct = (a.buffered.end(a.buffered.length-1)/a.duration)*100;
document.getElementById('podBuffered').style.width = pct+'%';
}
});
a.addEventListener('ended', function(){
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
});
window.podToggle = function(){
if(a.paused){a.play();document.getElementById('podIconPlay').style.display='none';document.getElementById('podIconPause').style.display='';}
else{a.pause();document.getElementById('podIconPlay').style.display='';document.getElementById('podIconPause').style.display='none';}
};
window.podSkip = function(s){a.currentTime = Math.max(0,Math.min(a.duration||0,a.currentTime+s));};
window.podSeek = function(e){
var rect = document.getElementById('podBarBg').getBoundingClientRect();
var pct = (e.clientX - rect.left)/rect.width;
a.currentTime = pct * (a.duration||0);
};
window.podMute = function(){
a.muted = !a.muted;
document.getElementById('podVolume').value = a.muted ? 0 : a.volume;
};
window.podCycleSpeed = function(){
si = (si+1) % speeds.length;
a.playbackRate = speeds[si];
document.getElementById('podSpeedBtn').textContent = speeds[si]+'x';
};
window.podClose = function(){
overlay.classList.add('pod-closing');
setTimeout(function(){ overlay.style.display='none'; }, 300);
a.pause();
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
};
document.getElementById('podVolume').addEventListener('input', function(){
a.volume = this.value;
a.muted = false;
});
/* Auto-open player when arriving from homepage with #podcast-player hash */
if(window.location.hash === '#podcast-player'){
overlay.style.display = 'block';
a.preload = 'metadata';
a.load();
opened = true;
}
})();
&lt;/script>
&lt;div class="video-overlay" id="vidOverlay">
&lt;div class="video-container">
&lt;div class="video-top-row">
&lt;h4>
&lt;span class="video-icon">
&lt;svg viewBox="0 0 24 24">&lt;path d="M10 15l5.19-3L10 9v6m11.56-7.83c.13.47.22 1.1.28 1.9.07.8.1 1.49.1 2.09L22 12c0 2.19-.16 3.8-.44 4.83-.25.9-.83 1.48-1.73 1.73-.47.13-1.33.22-2.65.28-1.3.07-2.49.1-3.59.1L12 19c-4.19 0-6.8-.16-7.83-.44-.9-.25-1.48-.83-1.73-1.73-.13-.47-.22-1.1-.28-1.9-.07-.8-.1-1.49-.1-2.09L2 12c0-2.19.16-3.8.44-4.83.25-.9.83-1.48 1.73-1.73.47-.13 1.33-.22 2.65-.28 1.3-.07 2.49-.1 3.59-.1L12 5c4.19 0 6.8.16 7.83.44.9.25 1.48.83 1.73 1.73z"/>&lt;/svg>
&lt;/span>
AI Video: Introduction to DiD in Stata
&lt;/h4>
&lt;button class="video-close-btn" onclick="vidClose()" title="Close video">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"/>&lt;/svg>
&lt;/button>
&lt;/div>
&lt;div class="video-frame-wrap">
&lt;iframe id="vidFrame" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen>&lt;/iframe>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;script>
(function(){
var overlay = document.getElementById('vidOverlay');
var frame = document.getElementById('vidFrame');
var vidSrc = 'https://www.youtube.com/embed/qObP9bGU5rM?enablejsapi=1&amp;rel=0';
function vidOpen(){
frame.src = vidSrc;
overlay.style.display = 'block';
overlay.classList.remove('vid-closing');
}
window.vidClose = function(){
overlay.classList.add('vid-closing');
setTimeout(function(){
overlay.style.display = 'none';
frame.src = '';
}, 250);
};
/* Intercept clicks on the YAML video button */
document.addEventListener('click', function(e){
var link = e.target.closest('a.btn-page-header');
if(!link) return;
var text = link.textContent.trim();
if(text.indexOf('AI Video') === -1) return;
e.preventDefault();
e.stopPropagation();
vidOpen();
});
/* Close on backdrop click */
overlay.addEventListener('click', function(e){
if(e.target === overlay) vidClose();
});
/* Auto-open when arriving from homepage with #video-player hash */
if(window.location.hash === '#video-player'){
vidOpen();
}
})();
&lt;/script>
&lt;hr>
&lt;h2 id="setup-and-packages">Setup and packages&lt;/h2>
&lt;p>Before running the analysis, we install the required Stata packages. The &lt;code>capture&lt;/code> prefix ensures the script does not fail if a package is already installed.&lt;/p>
&lt;pre>&lt;code class="language-stata">capture ssc install diff_plot, replace
capture ssc install diff, replace
capture net install ftools, from(&amp;quot;https://raw.githubusercontent.com/sergiocorreia/ftools/master/src/&amp;quot;) replace
capture ftools, compile
capture net install reghdfe, from(&amp;quot;https://raw.githubusercontent.com/sergiocorreia/reghdfe/master/src/&amp;quot;) replace
capture ssc install panelview, replace
capture ssc install eventdd, replace
capture ssc install matsort, replace
capture ssc install outreg2, replace
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Package&lt;/th>
&lt;th>Purpose&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>diff&lt;/code>, &lt;code>diff_plot&lt;/code>&lt;/td>
&lt;td>Simple DiD estimation and visualization&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>ftools&lt;/code>, &lt;code>reghdfe&lt;/code>&lt;/td>
&lt;td>High-dimensional fixed effects regression&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>panelview&lt;/code>&lt;/td>
&lt;td>Treatment timing visualization&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>eventdd&lt;/code>&lt;/td>
&lt;td>Event study estimation&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>outreg2&lt;/code>&lt;/td>
&lt;td>Formatted regression tables&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;hr>
&lt;h2 id="data-loading-and-exploration">Data loading and exploration&lt;/h2>
&lt;p>We load the 2x2 DiD dataset directly from GitHub. This simulated dataset contains school-level panel data with GPA outcomes for low-income students.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/isds/tutoring_did.dta&amp;quot;, clear
describe
summarize
xtset id time
xtsum
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Observations: 70
Variables: 7
Variable | Obs Mean Std. dev. Min Max
-------------+-------------------------------------------------
id | 70 18 10.17 1 35
time | 70 1.5 0.50 1 2
treated | 70 0.286 0.46 0 1
gpa | 70 77.12 10.88 59.39 99.15
female_share | 70 0.528 0.03 0.47 0.57
Panel variable: id (strongly balanced)
Time variable: time, 1 to 2
&lt;/code>&lt;/pre>
&lt;p>The dataset covers 35 schools observed at two time points (70 total observations). Ten schools (28.6%) are in the treated group and received the after-school tutoring program, while 25 schools serve as the comparison group. The panel is strongly balanced, meaning every school is observed in both periods with no missing data. GPA ranges from 59.4 to 99.2 on a 0-100 scale, with substantial variation (SD = 10.88). The &lt;code>xtsum&lt;/code> output reveals that most GPA variation is within-school over time (within SD = 10.82) rather than between schools (between SD = 1.12), suggesting that a large treatment effect drives the time-series variation.&lt;/p>
&lt;h3 id="treatment-visualization">Treatment visualization&lt;/h3>
&lt;p>The &lt;code>panelview&lt;/code> command provides a visual overview of the treatment timing. Each row is a school, and the shading indicates treatment status across time periods.&lt;/p>
&lt;pre>&lt;code class="language-stata">panelview gpa txp, i(id) t(time) type(treat) ///
prepost bytiming ///
xtitle(&amp;quot;Time Period&amp;quot;) ytitle(&amp;quot;School ID&amp;quot;) ///
legend(position(6))
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_did_panelview_2x2.png" alt="Treatment timing for the 2x2 DiD dataset">&lt;/p>
&lt;p>The heatmap confirms a clean treatment design: all 10 treated schools (IDs 26-35) switch from pre-treatment (teal) to post-treatment (dark blue) simultaneously at time 2, while the 25 comparison schools (IDs 1-25) remain untreated throughout. There is no staggering &amp;mdash; every treated school receives the program at the same time. This is the ideal setup for the standard 2x2 DiD design.&lt;/p>
&lt;hr>
&lt;h2 id="the-problem-with-naive-comparisons">The problem with naive comparisons&lt;/h2>
&lt;p>Before introducing the DiD method, let us see what happens if we simply compare the treated group&amp;rsquo;s GPA before and after the program. This approach is called an &lt;strong>Interrupted Time Series (ITS)&lt;/strong> &amp;mdash; it tracks a single group over time and attributes any change to the intervention.&lt;/p>
&lt;pre>&lt;code class="language-stata">preserve
collapse (mean) gpa, by(time treated)
twoway (connected gpa time if treated==1, ///
msymbol(O) mcolor(gs1) lcolor(gs1) ///
ylab(0(10)100) xlab(1(1)2)), ///
ytitle(&amp;quot;GPA&amp;quot;) xtitle(&amp;quot;Time&amp;quot;) ///
xline(1.5, lcolor(red) lpattern(dash))
graph export &amp;quot;stata_did_its.png&amp;quot;, replace width(2400)
restore
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_did_its.png" alt="Figure 1: Interrupted Time Series showing treated group only">&lt;/p>
&lt;p>The treated group&amp;rsquo;s average GPA jumped from 60.17 (pre-program) to 96.37 (post-program), a raw increase of 36.20 GPA points. At first glance, this looks like a spectacular program effect. However, this naive comparison is misleading because it ignores &lt;strong>secular time trends&lt;/strong> &amp;mdash; students' GPA may naturally improve over time due to maturation, grade inflation, or other factors unrelated to the tutoring program. Without a comparison group, we cannot distinguish the program&amp;rsquo;s causal effect from these natural trends. This is precisely where the DiD design helps.&lt;/p>
&lt;hr>
&lt;h2 id="the-did-design-using-a-comparison-group">The DiD design: using a comparison group&lt;/h2>
&lt;p>The key insight of DiD is to use the comparison group&amp;rsquo;s change over time as a proxy for what &lt;em>would have happened&lt;/em> to the treated group in the absence of the program. This unobserved scenario is called the &lt;strong>counterfactual&lt;/strong>.&lt;/p>
&lt;h3 id="the-counterfactual-and-parallel-trends">The counterfactual and parallel trends&lt;/h3>
&lt;pre>&lt;code class="language-stata">preserve
collapse (mean) gpa, by(time treated)
* Add counterfactual observations
* Counterfactual = treated_pre + control_change
insobs 2
replace time = 1 in 5
replace time = 2 in 6
replace treated = 2 in 5
replace treated = 2 in 6
replace gpa = 60.17 in 5
replace gpa = 71.05 in 6
twoway (connected gpa time if treated==1, msymbol(O) mcolor(gs1) lcolor(gs1)) ///
(connected gpa time if treated==0, msymbol(+) mcolor(gs5) lcolor(gs5)) ///
(connected gpa time if treated==2, msymbol(O) mcolor(gs1) lcolor(gs1) lpattern(shortdash_dot)), ///
ylab(0(10)100) xlab(1(1)2) ///
legend(order(1 &amp;quot;Treated&amp;quot; 2 &amp;quot;Comparison&amp;quot; 3 &amp;quot;Counterfactual&amp;quot;)) ///
ytitle(&amp;quot;GPA&amp;quot;) xtitle(&amp;quot;Time&amp;quot;)
graph export &amp;quot;stata_did_counterfactual.png&amp;quot;, replace width(2400)
restore
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_did_counterfactual.png" alt="Figure 2: DiD design with counterfactual trend">&lt;/p>
&lt;p>Figure 2 shows three lines: the actual treated group (solid, rising sharply from 60.17 to 96.37), the comparison group (rising gently from 71.22 to 82.10), and the &lt;strong>counterfactual&lt;/strong> (dashed line, showing where the treated group would have ended up without the program, at approximately 71.05). The gap between the actual treated outcome (96.37) and the counterfactual (71.05) is the DiD estimate of approximately 25.32 GPA points. The counterfactual is constructed by assuming the treated group would have experienced the same time trend as the comparison group &amp;mdash; this is the &lt;strong>parallel trends assumption&lt;/strong>, the fundamental assumption underlying DiD.&lt;/p>
&lt;h3 id="the-parallel-trends-assumption">The parallel trends assumption&lt;/h3>
&lt;p>The parallel trends assumption states that in the absence of treatment, the difference between the treated and comparison groups would have remained constant over time. Formally:&lt;/p>
&lt;p>$$E[Y_{i,1}(0) - Y_{i,0}(0) \mid D=1] = E[Y_{i,1}(0) - Y_{i,0}(0) \mid D=0]$$&lt;/p>
&lt;p>In words, this says that the expected change in the untreated potential outcome over time is the same for both groups. Here, $Y_{i,t}(0)$ is the potential outcome for school $i$ at time $t$ without treatment, and $D$ is the treatment indicator. If this assumption holds, then the comparison group&amp;rsquo;s observed change serves as a valid estimate of what the treated group&amp;rsquo;s change would have been without the program. We cannot test this assumption directly (because we never observe the treated group&amp;rsquo;s outcome without treatment), but we can check whether the two groups followed &lt;strong>parallel pre-trends&lt;/strong> before the intervention &amp;mdash; a topic we address in the event study section.&lt;/p>
&lt;h3 id="the-sutva-assumption">The SUTVA assumption&lt;/h3>
&lt;p>A second assumption, the &lt;strong>Stable Unit Treatment Value Assumption (SUTVA)&lt;/strong>, requires two conditions: (1) one school&amp;rsquo;s treatment does not affect another school&amp;rsquo;s outcome (no spillovers &amp;mdash; for example, students do not transfer between treated and untreated schools in response to the program), and (2) the treatment is applied consistently across all treated schools (no hidden variations in the tutoring program). SUTVA matters because if students transfer to treated schools or if the program varies in quality, our estimate could be biased.&lt;/p>
&lt;hr>
&lt;h2 id="manual-did-calculation">Manual DiD calculation&lt;/h2>
&lt;p>The 2x2 DiD estimate is computed by subtracting the comparison group&amp;rsquo;s change from the treated group&amp;rsquo;s change. This &amp;ldquo;double difference&amp;rdquo; removes both baseline differences between groups and common time trends.&lt;/p>
&lt;h3 id="did-means-table-table-1">DiD means table (Table 1)&lt;/h3>
&lt;pre>&lt;code class="language-stata">table treated post, stat(mean gpa) nformat(%12.2f)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> | Pre Post Diff
--------------------------+----------------------------
Control (25 schools) | 71.22 82.10 10.88
Treated (10 schools) | 60.17 96.37 36.20
--------------------------+----------------------------
DiD estimate | 25.32
&lt;/code>&lt;/pre>
&lt;p>Formally, the DiD estimator takes the following form:&lt;/p>
&lt;p>$$DiD = \Big(E[Y_{i,1} \mid D=1] - E[Y_{i,0} \mid D=1]\Big) - \Big(E[Y_{i,1} \mid D=0] - E[Y_{i,0} \mid D=0]\Big)$$&lt;/p>
&lt;p>In words, this says: take the treated group&amp;rsquo;s change over time (36.20) and subtract the comparison group&amp;rsquo;s change over time (10.88). The result (25.32) is the causal effect of the program, after removing the natural time trend. Think of it like measuring two runners' speed improvements between races: if both were expected to improve equally due to training, any &lt;em>extra&lt;/em> improvement by the runner who received coaching can be attributed to the coaching itself. The comparison group&amp;rsquo;s 10.88-point improvement represents the natural &amp;ldquo;training effect,&amp;rdquo; and the remaining 25.32 points represent the &amp;ldquo;coaching effect&amp;rdquo; &amp;mdash; the tutoring program.&lt;/p>
&lt;h3 id="did-visualization">DiD visualization&lt;/h3>
&lt;p>The &lt;code>diff_plot&lt;/code> command produces a visual summary of the DiD, showing both groups' trajectories and the parallel trend line.&lt;/p>
&lt;pre>&lt;code class="language-stata">diff_plot gpa, group(treated) time(post)
graph export &amp;quot;stata_did_diff_plot.png&amp;quot;, replace width(2400)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_did_diff_plot.png" alt="DiD plot showing both groups with labeled values">&lt;/p>
&lt;p>The plot labels each group&amp;rsquo;s mean GPA at both time points (60.17, 71.22, 96.37, 82.10) and displays the intervention effect of 25.31 GPA points. The dashed green line extending from the treated group&amp;rsquo;s pre-period mean shows the counterfactual trajectory under the parallel trends assumption. The vertical gap between the actual treated outcome and this counterfactual is the DiD estimate.&lt;/p>
&lt;h3 id="formal-did-table">Formal DiD table&lt;/h3>
&lt;p>The &lt;code>diff&lt;/code> command provides a formal DiD estimation with standard errors and significance tests.&lt;/p>
&lt;pre>&lt;code class="language-stata">diff gpa, treated(treated) period(post)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 70
Outcome var. | gpa | S. Err. | |t| | P&amp;gt;|t|
----------------+---------+---------+---------+---------
Before
Diff (T-C) | -11.049 | 0.443 | -24.94 | 0.000***
After
Diff (T-C) | 14.266 | 0.443 | 32.20 | 0.000***
Diff-in-Diff | 25.315 | 0.627 | 40.40 | 0.000***
R-square: 0.99
&lt;/code>&lt;/pre>
&lt;p>The DiD estimate of 25.315 (SE = 0.627, t = 40.40, p &amp;lt; 0.001) is highly statistically significant and precisely estimated. Before the program, treated schools had GPAs 11.05 points &lt;em>lower&lt;/em> than comparison schools (p &amp;lt; 0.001). After the program, treated schools had GPAs 14.27 points &lt;em>higher&lt;/em> than comparison schools (p &amp;lt; 0.001). This reversal from a significant deficit to a significant advantage is one of the most compelling patterns in the data, and it is entirely attributable to the tutoring program under the DiD assumptions.&lt;/p>
&lt;hr>
&lt;h2 id="did-via-regression">DiD via regression&lt;/h2>
&lt;p>While the manual subtraction approach is intuitive, researchers typically prefer &lt;strong>regression-based methods&lt;/strong> because they allow for the inclusion of control variables, flexible standard error estimation, and extension to more complex designs. We demonstrate five equivalent approaches that all converge on the same DiD estimate.&lt;/p>
&lt;h3 id="classical-did-regression">Classical DiD regression&lt;/h3>
&lt;p>The simplest regression formulation explicitly includes the treatment indicator, the time indicator, and their interaction:&lt;/p>
&lt;p>$$Y_{it} = \alpha + \beta_1 \text{Treat}_i + \beta_2 \text{Post}_t + \beta_3 (\text{Treat}_i \times \text{Post}_t) + \varepsilon_{it}$$&lt;/p>
&lt;p>In words, this says: the outcome for school $i$ at time $t$ is a function of group membership ($\beta_1$), time period ($\beta_2$), and their interaction ($\beta_3$). The coefficient $\beta_3$ is the DiD estimate &amp;mdash; the additional change in the treated group beyond what the comparison group experienced. Here, $\alpha$ is the comparison group&amp;rsquo;s pre-period mean, $\beta_1$ captures the baseline group difference, $\beta_2$ captures the common time trend, and $\varepsilon_{it}$ is the error term.&lt;/p>
&lt;pre>&lt;code class="language-stata">reg gpa treated post txp, robust
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> gpa | Coefficient std. err. t P&amp;gt;|t| [95% conf. interval]
-------------+----------------------------------------------------------------
treated | -11.04936 .2878309 -38.39 0.000 -11.62404 -10.47469
post | 10.88589 .3389564 32.12 0.000 10.20915 11.56264
txp | 25.3149 .6149733 41.16 0.000 24.08706 26.54273
_cons | 71.21514 .2183689 326.12 0.000 70.77915 71.65113
&lt;/code>&lt;/pre>
&lt;p>The regression decomposes the DiD into its building blocks. The constant (71.22) is the comparison group&amp;rsquo;s pre-period mean GPA. The &lt;code>treated&lt;/code> coefficient (-11.05) tells us treated schools started with 11 fewer GPA points than comparison schools at baseline. The &lt;code>post&lt;/code> coefficient (10.89) captures the natural time trend shared by both groups. The interaction &lt;code>txp&lt;/code> (25.31, SE = 0.61, 95% CI: [24.09, 26.54]) is the DiD estimate, confirming the manual calculation. The tight 95% confidence interval (width of 2.46 points) indicates precise estimation.&lt;/p>
&lt;h3 id="stata-built-in-did">Stata built-in DiD&lt;/h3>
&lt;p>Stata 17 introduced the &lt;code>didregress&lt;/code> command, which estimates the DiD directly and labels the output as ATET (Average Treatment Effect on the Treated).&lt;/p>
&lt;pre>&lt;code class="language-stata">didregress (gpa) (txp), group(id) time(time)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">ATET
txp (1 vs 0) | 25.3149 .8337103 30.36 0.000 23.62059 27.0092
&lt;/code>&lt;/pre>
&lt;p>The point estimate (25.31) is identical, but the standard error is larger (0.83 vs. 0.61) because &lt;code>didregress&lt;/code> automatically clusters standard errors at the school level, accounting for within-school correlation of errors. The 95% CI [23.62, 27.01] is wider but still excludes zero by a large margin.&lt;/p>
&lt;h3 id="two-way-fixed-effects-twfe">Two-Way Fixed Effects (TWFE)&lt;/h3>
&lt;p>The TWFE model replaces the explicit &lt;code>Treat&lt;/code> and &lt;code>Post&lt;/code> indicators with &lt;strong>unit fixed effects&lt;/strong> ($\gamma_i$) and &lt;strong>time fixed effects&lt;/strong> ($\vartheta_t$):&lt;/p>
&lt;p>$$Y_{it} = \beta_3 (\text{Treat}_i \times \text{Post}_t) + \gamma_i + \vartheta_t + \varepsilon_{it}$$&lt;/p>
&lt;p>In words, this says: after removing all time-invariant school characteristics (captured by $\gamma_i$) and all common time shocks (captured by $\vartheta_t$), the remaining variation in GPA attributable to the treatment interaction is the DiD estimate $\beta_3$. Think of fixed effects like a before-and-after photo filter: by comparing each school only to itself over time, the unit fixed effects automatically strip away all permanent differences between schools &amp;mdash; whether they are rich or poor, urban or rural, large or small. The time fixed effects then remove any changes that hit all schools equally (like a nationwide curriculum reform). What remains is the treatment effect. This is equivalent to the classical regression but more flexible for larger panels.&lt;/p>
&lt;pre>&lt;code class="language-stata">xtreg gpa txp i.time, fe vce(cluster id)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Fixed-effects (within) regression Number of obs = 70
Group variable: id Number of groups = 35
R-squared:
Within = 0.9946
txp | 25.3149 .5851062 43.27 0.000 24.12582 26.50398
&lt;/code>&lt;/pre>
&lt;p>The &lt;code>xtreg&lt;/code> command with &lt;code>fe&lt;/code> estimates the within-school regression with clustered standard errors. The within R-squared of 0.9946 indicates that the treatment interaction alone explains 99.5% of the within-school GPA variation after removing fixed effects. The very high R-squared reflects the simulated nature of the data; real-world applications typically show lower values.&lt;/p>
&lt;h3 id="high-dimensional-twfe-with-reghdfe">High-dimensional TWFE with reghdfe&lt;/h3>
&lt;p>The &lt;code>reghdfe&lt;/code> command provides a computationally faster alternative for models with many fixed effects. It produces identical results to &lt;code>xtreg&lt;/code> but scales better to large datasets.&lt;/p>
&lt;pre>&lt;code class="language-stata">reghdfe gpa txp, absorb(id time) cluster(id)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> txp | 25.3149 .5851062 43.27 0.000 24.12582 26.50398
&lt;/code>&lt;/pre>
&lt;p>The estimate is identical: 25.31 with clustered SE of 0.585 and a 95% CI of [24.13, 26.50].&lt;/p>
&lt;h3 id="adding-covariates">Adding covariates&lt;/h3>
&lt;p>Researchers may include exogenous control variables to improve the precision of the DiD estimate. An important caveat is to &lt;strong>never control for variables that are affected by the treatment&lt;/strong> (known as post-treatment bias). The share of female students (&lt;code>female_share&lt;/code>) is a safe control because it is determined by school demographics, not by the tutoring program.&lt;/p>
&lt;pre>&lt;code class="language-stata">reghdfe gpa txp female_share, absorb(id time) cluster(id)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> txp | 25.32806 .6047651 41.88 0.000 24.09903 26.55709
female_share | -3.216239 8.700428 -0.37 0.714 -20.89764 14.46516
&lt;/code>&lt;/pre>
&lt;p>Adding the female share control has virtually no effect on the DiD estimate, which shifts from 25.31 to 25.33 (a change of ~0.01 points). The control itself is not statistically significant (p = 0.71), confirming it is unrelated to GPA in this dataset. This result demonstrates that in well-designed DiD settings with proper fixed effects, adding unrelated covariates does not change the estimate but may slightly increase standard errors.&lt;/p>
&lt;h3 id="comparing-all-five-approaches">Comparing all five approaches&lt;/h3>
&lt;p>All five estimation methods converge on the same DiD estimate, as summarized below:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Estimate&lt;/th>
&lt;th>SE&lt;/th>
&lt;th>95% CI&lt;/th>
&lt;th>Clustered&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>diff&lt;/code> (manual)&lt;/td>
&lt;td>25.315&lt;/td>
&lt;td>0.627&lt;/td>
&lt;td>&amp;ndash;&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>reg&lt;/code> (OLS interaction)&lt;/td>
&lt;td>25.315&lt;/td>
&lt;td>0.615&lt;/td>
&lt;td>[24.09, 26.54]&lt;/td>
&lt;td>No (robust)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>didregress&lt;/code> (Stata 17+)&lt;/td>
&lt;td>25.315&lt;/td>
&lt;td>0.834&lt;/td>
&lt;td>[23.62, 27.01]&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>xtreg&lt;/code> (TWFE)&lt;/td>
&lt;td>25.315&lt;/td>
&lt;td>0.585&lt;/td>
&lt;td>[24.13, 26.50]&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>reghdfe&lt;/code> (HD-TWFE)&lt;/td>
&lt;td>25.315&lt;/td>
&lt;td>0.585&lt;/td>
&lt;td>[24.13, 26.50]&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>reghdfe&lt;/code> + covariate&lt;/td>
&lt;td>25.328&lt;/td>
&lt;td>0.605&lt;/td>
&lt;td>[24.10, 26.56]&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The consistency across methods confirms the robustness of the 25.32-point DiD estimate. The differences in standard errors reflect whether and how clustering is applied. In this simulated dataset, clustering has minimal impact; in real-world applications, school-level clustering typically increases standard errors substantially.&lt;/p>
&lt;hr>
&lt;h2 id="table-2-three-regression-specifications">Table 2: Three regression specifications&lt;/h2>
&lt;p>Following Corral and Yang (2024), we replicate their Table 2 with three specifications to show the stability of the estimate across modeling choices.&lt;/p>
&lt;pre>&lt;code class="language-stata">* (1) Baseline TWFE, no controls, no clustering
reghdfe gpa i.txp, absorb(id time)
outreg2 using table2.doc, replace keep(1.txp) ///
addtext(Controls, No, Clustered SEs, No) dec(2)
* (2) + Covariate (female_share), no clustering
reghdfe gpa i.txp c.female_share, absorb(id time)
outreg2 using table2.doc, append keep(1.txp) ///
addtext(Controls, Yes, Clustered SEs, No) dec(2)
* (3) No controls, + clustered SEs at school level
reghdfe gpa i.txp, absorb(id time) cluster(id)
outreg2 using table2.doc, append keep(1.txp) ///
addtext(Controls, No, Clustered SEs, Yes) dec(2)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Table 2: Difference-in-Differences Regression Coefficients
(1) (2) (3)
GPA GPA GPA
Treatment 25.31*** 25.33*** 25.31***
(0.607) (0.615) (0.585)
Observations 70 70 70
R-squared 0.99 0.99 0.99
Controls No Yes No
Clustered SEs No No Yes
&lt;/code>&lt;/pre>
&lt;p>The three specifications produce nearly identical estimates (25.31, 25.33, 25.31), all significant at the 1% level. This stability is encouraging: the result does not depend on whether we include covariates or cluster standard errors. In column (2), adding the female share control changes the estimate by only 0.02 points. In column (3), clustering at the school level slightly &lt;em>reduces&lt;/em> the standard error (from 0.607 to 0.585), which is unusual &amp;mdash; in practice, clustering almost always increases SEs because it accounts for within-school error correlation. The R-squared of 0.99 across all specifications reflects the strong treatment effect in the simulated data.&lt;/p>
&lt;hr>
&lt;h2 id="event-study-dynamic-treatment-effects">Event study: dynamic treatment effects&lt;/h2>
&lt;p>The 2x2 DiD assumes that the treatment effect is constant over time. But what if the program takes time to show results, or its effect fades out? An &lt;strong>event study&lt;/strong> design addresses this by replacing the single treatment interaction with a set of time-specific treatment indicators &amp;mdash; &lt;strong>leads&lt;/strong> (pre-treatment periods) and &lt;strong>lags&lt;/strong> (post-treatment periods). This serves two purposes: (1) it tests the parallel trends assumption by checking whether pre-treatment coefficients are near zero, and (2) it reveals the dynamic trajectory of the treatment effect.&lt;/p>
&lt;h3 id="event-study-data">Event study data&lt;/h3>
&lt;p>We load the expanded dataset with 8 time periods (4 pre-treatment, 4 post-treatment).&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/isds/tutoring_didevent.dta&amp;quot;, clear
describe
summarize
xtset id time
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Observations: 280 (35 schools x 8 periods)
Variables: 8 (includes timeToTreat: relative time to treatment onset)
Panel variable: id (strongly balanced)
Time variable: time, 1 to 8
&lt;/code>&lt;/pre>
&lt;p>The event study dataset extends the case study to 8 time periods, with the tutoring program starting at period 5. The &lt;code>timeToTreat&lt;/code> variable measures relative time to treatment onset, ranging from -4 (four periods before treatment) to +3 (three periods after treatment). This variable is defined only for the 10 treated schools (80 observations).&lt;/p>
&lt;h3 id="treatment-visualization-1">Treatment visualization&lt;/h3>
&lt;pre>&lt;code class="language-stata">panelview gpa txp, i(id) t(time) type(treat) ///
prepost bytiming ///
xtitle(&amp;quot;Time Period&amp;quot;) ytitle(&amp;quot;School ID&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_did_panelview_event.png" alt="Treatment timing for the event study dataset">&lt;/p>
&lt;p>The heatmap shows the same 10 treated schools now observed over 8 periods. The pre-treatment phase (periods 1-4, teal) allows us to assess whether treated and comparison schools followed similar GPA trajectories before the program, while the post-treatment phase (periods 5-8, dark blue) captures the dynamic treatment effects.&lt;/p>
&lt;h3 id="event-study-model">Event study model&lt;/h3>
&lt;p>The event study replaces the single treatment interaction from the TWFE model with a vector of lead and lag indicators:&lt;/p>
&lt;p>$$Y_{it} = \alpha + \sum_{j=-m}^{q} \theta_j \cdot \text{treat}_{it}(t = k + j) + \gamma_i + \vartheta_t + \varepsilon_{it}$$&lt;/p>
&lt;p>In words, this says: the outcome for school $i$ at time $t$ equals a constant, plus a separate coefficient ($\theta_j$) for each relative time period $j$ from the treatment onset at time $k$, plus school and time fixed effects. The leads ($j &amp;lt; 0$) capture pre-treatment differences, and the lags ($j \geq 0$) capture post-treatment effects. The reference period (typically $j = -1$, the period just before treatment) is omitted, so all coefficients are measured relative to this baseline.&lt;/p>
&lt;pre>&lt;code class="language-stata">eventdd gpa i.time, timevar(timeToTreat) ///
method(hdfe, absorb(id time) cluster(id)) ///
keepdummies ///
graph_op(ylab(-10(5)30) ///
ytitle(&amp;quot;GPA Effect&amp;quot;) ///
xtitle(&amp;quot;Time to Treatment&amp;quot;) ///
xlab(-4(1)4))
graph export &amp;quot;stata_did_event_study.png&amp;quot;, replace width(2400)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_did_event_study.png" alt="Figure 3: Event study showing dynamic treatment effects">&lt;/p>
&lt;p>The event study plot is the most informative figure in the analysis. The pre-treatment coefficients (periods -4 through -2) cluster around zero, with point estimates of 0.34, -0.32, and 0.59 &amp;mdash; all statistically insignificant (p = 0.40, 0.47, 0.17). This provides compelling evidence that the parallel trends assumption holds: treated and control schools were following similar GPA trajectories in the four periods before the program started. At the moment of treatment (period 0), the effect jumps sharply to approximately 25 GPA points and remains stable through period +3. The tight confidence intervals (shown in blue) confirm that the effect is precisely estimated in every post-treatment period.&lt;/p>
&lt;h3 id="event-study-coefficients-table-4">Event study coefficients (Table 4)&lt;/h3>
&lt;pre>&lt;code class="language-stata">outreg2 using table4.doc, replace ///
keep(lead4 lead3 lead2 lag0 lag1 lag2 lag3) dec(2)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Table 4: Event Study Results
Pre-treatment (leads):
lead4 = 0.342 (SE = 0.401) p = 0.400
lead3 = -0.322 (SE = 0.441) p = 0.471
lead2 = 0.593 (SE = 0.423) p = 0.170
Post-treatment (lags):
lag0 = 25.028 (SE = 0.445) p = 0.000
lag1 = 24.705 (SE = 0.559) p = 0.000
lag2 = 24.768 (SE = 0.739) p = 0.000
lag3 = 25.701 (SE = 0.797) p = 0.000
N = 280, 35 schools, R-squared = 0.991
&lt;/code>&lt;/pre>
&lt;p>The event study coefficients tell a clear story. Before the program, none of the lead coefficients are statistically significant, and they range from -0.32 to 0.59 &amp;mdash; fluctuations well within normal sampling variation. After the program begins, the treatment effect is immediate and persistent: lag coefficients range from 24.71 to 25.70, a span of less than 1 GPA point over four periods. There is no evidence of fade-out (declining effect over time) or ramp-up (gradually increasing effect). The program delivered its full benefit from the first period and maintained it consistently, suggesting a sustained structural change in academic support rather than a temporary boost.&lt;/p>
&lt;hr>
&lt;h2 id="discussion">Discussion&lt;/h2>
&lt;p>Returning to our case study question: &lt;strong>Did the after-school tutoring program improve the GPA of low-income students?&lt;/strong> The evidence is clear. The DiD estimate of 25.32 GPA points is large, statistically significant (p &amp;lt; 0.001), and robust across five estimation methods, multiple regression specifications, and an event study design. The program transformed treated schools from having the lowest average GPA (60.17) to having the highest (96.37).&lt;/p>
&lt;p>Three findings merit special attention for policymakers:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>The naive before-after comparison overstates the effect by 43%.&lt;/strong> The ITS approach attributes the entire 36.20-point increase to the program, but 10.88 points (30% of the raw gain) are attributable to natural time trends. DiD corrects for this by netting out the comparison group&amp;rsquo;s change.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The event study confirms there were no differential pre-trends.&lt;/strong> All pre-treatment coefficients are near zero and insignificant, supporting the causal interpretation. If treated schools had been improving faster than comparison schools even before the program, our DiD estimate would be biased upward.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The effect is constant over time.&lt;/strong> The event study shows no fade-out, suggesting the program produces sustained benefits rather than temporary gains. This is important for cost-benefit analyses: policymakers can expect the GPA improvement to persist as long as the program continues.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="important-caveats">Important caveats&lt;/h3>
&lt;p>This tutorial uses simulated data designed to illustrate DiD mechanics cleanly. Several features of this example would be unusual in a real-world application:&lt;/p>
&lt;ul>
&lt;li>The R-squared of 0.99 reflects the simulated data&amp;rsquo;s low noise. Real educational interventions typically explain a much smaller share of outcome variation.&lt;/li>
&lt;li>A 25-point GPA increase on a 100-point scale is unrealistically large. Real after-school programs typically produce effect sizes of 0.1-0.3 standard deviations.&lt;/li>
&lt;li>The parallel pre-trends are nearly perfect by construction. In practice, researchers must carefully argue for the plausibility of this assumption using domain knowledge, pre-trend tests, and robustness checks.&lt;/li>
&lt;li>This example uses simultaneous treatment timing (all schools treated at once). When treatment timing varies across units &amp;mdash; called &lt;strong>staggered DiD&lt;/strong> &amp;mdash; the standard TWFE estimator can produce biased estimates. Modern estimators by Callaway and Sant&amp;rsquo;Anna (2021), Sun and Abraham (2021), and Borusyak et al. (2023) address this issue.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="summary-and-takeaways">Summary and takeaways&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>DiD removes time trends:&lt;/strong> The naive ITS comparison overstated the program effect by 10.88 GPA points (43%). DiD corrects this by subtracting the comparison group&amp;rsquo;s change, yielding a causal estimate of 25.32 points.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Five methods, one answer:&lt;/strong> Classical OLS, &lt;code>didregress&lt;/code>, &lt;code>xtreg&lt;/code>, &lt;code>reghdfe&lt;/code>, and &lt;code>reghdfe&lt;/code> with covariates all produce the same DiD estimate (25.31-25.33), demonstrating the equivalence of these approaches in the standard 2x2 case.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Event studies test parallel trends:&lt;/strong> Pre-treatment coefficients (0.34, -0.32, 0.59, all p &amp;gt; 0.10) provide evidence that treated and comparison schools followed similar trajectories before the program, strengthening the causal claim.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The effect is immediate and sustained:&lt;/strong> Post-treatment coefficients range from 24.71 to 25.70 with no fade-out pattern, suggesting the program delivers lasting benefits.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Covariates matter less than design:&lt;/strong> Adding the female share control changed the estimate by only ~0.01 points. In a well-designed DiD with proper fixed effects, the research design does the heavy lifting.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Limitations:&lt;/strong> This tutorial covers the standard 2x2 DiD and event study. For staggered treatment timing (where units receive treatment at different times), modern estimators that avoid the negative-weights problem in TWFE are recommended.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="exercises">Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Robustness check:&lt;/strong> Re-estimate the DiD using only the event study dataset (280 observations) with a simple 2x2 specification (collapsing to pre/post). Does the estimate change compared to the 2-period dataset? Why or why not?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Placebo test:&lt;/strong> Using the event study dataset, restrict the sample to pre-treatment periods only (time 1-4) and assign a &amp;ldquo;fake&amp;rdquo; treatment at time 3. Run the DiD. If the parallel trends assumption holds, you should find no significant effect. What do you find?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Staggered DiD:&lt;/strong> Read about the Callaway and Sant&amp;rsquo;Anna (2021) estimator and the &lt;code>csdid&lt;/code> Stata package. How would the analysis change if schools adopted the tutoring program at different times?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://doi.org/10.1007/s12564-024-09959-0" target="_blank" rel="noopener">Corral, D. &amp;amp; Yang, M. (2024). An introduction to the difference-in-differences design in education policy research. &lt;em>Asia Pacific Education Review&lt;/em>.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1016/j.jeconom.2020.12.001" target="_blank" rel="noopener">Callaway, B. &amp;amp; Sant&amp;rsquo;Anna, P.H. (2021). Difference-in-differences with multiple time periods. &lt;em>Journal of Econometrics&lt;/em>, 225(2), 200-230.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1016/j.jeconom.2021.03.014" target="_blank" rel="noopener">Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. &lt;em>Journal of Econometrics&lt;/em>, 225(2), 254-277.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1016/j.jeconom.2020.09.006" target="_blank" rel="noopener">Sun, L. &amp;amp; Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. &lt;em>Journal of Econometrics&lt;/em>, 225(2), 175-199.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.48550/arXiv.2108.12419" target="_blank" rel="noopener">Borusyak, K., Jaravel, X. &amp;amp; Spiess, J. (2023). Revisiting event study designs: robust and efficient estimation. &lt;em>Review of Economic Studies&lt;/em>.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1016/j.jfineco.2022.01.004" target="_blank" rel="noopener">Baker, A.C., Larcker, D.F. &amp;amp; Wang, C.C.Y. (2022). How much should we trust staggered difference-in-differences estimates? &lt;em>Journal of Financial Economics&lt;/em>, 144(2), 370-395.&lt;/a>&lt;/li>
&lt;li>&lt;a href="http://scorreia.com/research/hdfe.pdf" target="_blank" rel="noopener">reghdfe &amp;mdash; Correia, S. (2016). Linear models with high-dimensional fixed effects: An efficient and feasible estimator.&lt;/a>&lt;/li>
&lt;/ol></description></item></channel></rss>