Capstone Project — Session 3: Causal Analysis & Final Presentation

Data to answer — Diff-in-diffs, heterogeneity, and presentation

Published

June 1, 2026

Capstone Project — Session 3

From panel to causal effect — and how to present what you found

Before you come to class (30–60 min)

✅ Pre-class checklist

Dataset frozen — your match + manager-change panel (Session 1) and expectation scores (Session 2) are final and committed. No more scraping today.
Merge plan — sketch how the two tables join: shared keys, time unit, what to do with (team, week) rows that have no articles.
Review terms — difference-in-differences, parallel trends, event-study, two-way fixed effects, staggered treatment (glossary). You saw these in Week 4 — Econometrics with AI.
Pick a presentation format — decide as a team: HTML slides, dashboard, or PDF report.

1. Where we are

Session 1 → match + manager-change dataset.
Session 2 → a text-based expectation score per change.
Session 3 (today) → build the analysis panel, run a causal design (difference-in-differences), explore heterogeneity — especially by expectations — and present your findings.

Review week 02

Data collection
- scraping (technical issues)
- API calls (technical issues)
- key decision points
Review classification solutions
- simple
- two-step
- tool use

2. The causal question

🎯 What are we actually trying to answer?

Research question: Does changing a manager improve team performance? And does the impact vary by manager and team characteristics?

Your first task is to translate it into a very specific question you will estimate.

Define all aspects of the question
Make editorial decisions.

Think about what can go wrong:

sample design
confounders
anything else

3. Building the analysis panel

🧩 Your two datasets don’t quite fit together

You arrive at Session 3 with two outputs:

From Session 1:

a match/manager-change panel. Rows indexed by some combination of team, season, date, match_id.
team characteristics
manager characteristics

From Session 2:

an article-level table
a (team, gameweek) → avg_score aggregation.

Potential issues

We have to create a single panel dataset.

A short list of what to expect:

Entity resolution
Time unit problems, inconsistencies and mismatch
Coverage gaps regarding the expectation data
Date alignment

These are not bugs in your pipeline. They are the substance of working with real data. Fixing them is the first half of this session. Don’t start the regression until you have a merge you trust.

🏗️ Decisions you have to name (write them in PANEL.md)

Before you merge anything, pick and write down your answers to these:

Unit of observation. (team, match)? (team, calendar_week)? (team, gameweek)?
Time index.
Entity keys. One clean team identifier across all tables. One clean manager identifier.
Treatment timing. What is “the” date of the change.
Pre / post window. How many periods before and after the change do you include? 5 matches? 10 weeks? To end of season?
Missing-coverage policy. A (team, week) with zero articles: do you drop it, impute zero, or carry forward the previous week’s score?

None of these has a universally right answer. But you do need to commit to one answer per question and defend it in the slides.

4. Difference-in-Differences — vocabulary & tools

This is a vocabulary to get started. You can learn the rest with AI — not a from-scratch DiD lecture.

📐 Key concepts

Setup

Treatment — the event whose effect you want to measure. Here: a manager change.
Treated unit — a team that experiences a manager change. Control unit — a team that does not (in the same window).

Methods

Two-way fixed effects (team FE + time FE) — the standard panel regression that could be a good start.
Event-study specification — transform data into event time (when the intervention happens); one coefficient per lead / lag around the event, to see the dynamics rather than a single number.
Pre / post — periods before vs after the change.
DiD estimator — the difference between treated and control in the change from pre to post. In plain words: how much more (or less) did treated teams’ performance move than control teams’?

Key challenges

Parallel trends — the core assumption: absent treatment, treated and control would have moved in parallel. Mostly untestable, but pre-trends are a useful proxy.
Staggered treatment — different teams are treated at different times. This has important econometric implications that you shall investigate.
Ask AI on what can go wrong with a naive DiD in your setup.

📦 Package pointers

Python

diff-diff — a general difference-in-differences / event-study package with sensible defaults. New but already great.
pyfixest — general regression package with options for difference in differences.

R (if you prefer)

did — the original Callaway–Sant’Anna implementation.
fixest — fastest FE in the ecosystem; feols, etable, coefplot.

Tip

Use AI to learn the details

— ask for worked mini-examples in your setup. Then read the code, don’t just run it. - Use the AI-interviews-me technique to better understand the setup, what you expect, and what your key modelling choices are.

5. The expectations angle

🧭 What is the purpose of knowing if a manager change was expected?

Look at data

Focus on the season(s) you have data on expectations.
Descriptive: What share of changes were expected? Were there cases when expectation was high but the manager was not fired?

Econometrics

Parallel trends: what does it mean in this context?
What does it mean to have anticipation in terms of the causal estimation?

Use

Heterogeneity: do expected changes have a different effect than unexpected ones?
- Compare dynamics and simple diff-in-diffs estimates.

6. Analysis work block (90 min)

🔬 Suggested workflow

Here is a suggested workflow. But you can proceed differently.

Merge & validate. Join Session 1 and Session 2 on your chosen keys. Print counts before and after. Eyeball a few rows.
Write the spec — in METHODS.md, before any regression:
- Outcome.
- Unit.
- Treatment definition (date, caretaker handling).
- Window (pre/post periods).
- Controls (team FE, time FE, opponent strength).
- Cluster level for SEs (team).
Test on a subset — one season. Get the plumbing right before you scale.
Try out one or two models — think about what econometric models you shall use. Start with something you understand.
Test additional econometric model — think about time, dynamics. Carefully interact with AI to pick one or two extra models.
Heterogeneity 1 — interact with a binary indicator of manager characteristic. Repeat for team characteristic.
Expectations — interact with expectation score.
Robustness — think about 2 key decisions you have made. Test how much they matter.
Draft slides while code runs. Don’t leave them for the last 15 minutes.

7. Deliverables

📑 The presentation

One piece, with a choice of delivery format:

HTML presentation
Interactive dashboard
PDF report generated from markdown or LaTeX

This must be fully human reviewed.

📦 Submission: reproducible research, presentation, reflection

Who: by group — same team you started with in Session 1.+ Individual note
What:
- Presentation file (HTML, PDF, or dashboard link)
- GitHub repo that allows reproduction — see Reproducible Research).
  - Ingests raw data and reproduces all figures and tables in a clear workflow
  - HINT: AI use, scraping, AI classification not needed – use classified data as raw
- METHODS-AI.md — where and how AI was used
- INDIVIDUAL reflection (3 paragraphs per person): (1) hardest part and how you solved it, (2) most important or notable learning experience, and (3) how you felt working in the group, would you aim for a different work process next

8. Further resources

For going deeper on modern DiD, two canonical stops:

Pedro Sant’Anna — DiD resources — the maintained hub for modern difference-in-differences. Papers, code, slides, videos, curated by one of the authors of the Callaway–Sant’Anna estimator. Start here for anything staggered.
Carlos Mendez — DiD in Python — a worked Python tutorial walking through classic and event-study / staggered DiD on real data. Good companion to the csdid / pyfixest pointers above.

--- title: "Capstone Project — Session 3: Causal Analysis & Final Presentation" subtitle: "Data to answer — Diff-in-diffs, heterogeneity, and presentation" date: "2026-06-01" --- ::::::: {.hero-section} :::::: {.container} ::: {.hero-title} Capstone Project — Session 3 ::: ::: {.hero-subtitle} From panel to causal effect — and how to present what you found ::: :::::: ::::::: ------------------------------------------------------------------------ ## Before you come to class (30–60 min) ::::: {.week-card .card} ::: card-header ✅ **Pre-class checklist** ::: ::: card-body - ☐ **Dataset frozen** — your match + manager-change panel (Session 1) and expectation scores (Session 2) are final and committed. No more scraping today. - ☐ **Merge plan** — sketch how the two tables join: shared keys, time unit, what to do with `(team, week)` rows that have no articles. - ☐ **Review terms** — difference-in-differences, parallel trends, event-study, two-way fixed effects, staggered treatment ([glossary](../da-knowledge/technical-terms-page.qmd)). You saw these in [Week 4 — Econometrics with AI](../unit4/index.qmd). - ☐ **Pick a presentation format** — decide as a team: HTML slides, dashboard, or PDF report. ::: ::::: ------------------------------------------------------------------------ ## 1. Where we are - **Session 1 →** match + manager-change dataset. - **Session 2 →** a text-based expectation score per change. - **Session 3 (today) →** build the analysis panel, run a **causal design** (difference-in-differences), explore heterogeneity — especially by expectations — and **present** your findings. ## Review week 02 * Data collection * scraping (technical issues) * API calls (technical issues) * key decision points * Review classification solutions * simple * two-step * tool use ------------------------------------------------------------------------ ## 2. The causal question ::::: {.week-card .card} ::: card-header 🎯 **What are we actually trying to answer?** ::: ::: card-body **Research question:** *Does changing a manager improve team performance? And does the impact vary by manager and team characteristics?* Your first task is to translate it into a very specific question you will estimate. * Define all aspects of the question * Make editorial decisions. Think about what can go wrong: * sample design * confounders * anything else ::: ::::: ------------------------------------------------------------------------ ## 3. Building the analysis panel ::::: {.week-card .card} ::: card-header 🧩 **Your two datasets don't quite fit together** ::: ::: card-body You arrive at Session 3 with two outputs: **From Session 1:** - a match/manager-change panel. Rows indexed by some combination of `team`, `season`, `date`, `match_id`. - team characteristics - manager characteristics **From Session 2:** - an article-level table - a `(team, gameweek) → avg_score` aggregation. **Potential issues** We have to create a single panel dataset. A short list of what to expect: - Entity resolution - Time unit problems, inconsistencies and mismatch - Coverage gaps regarding the expectation data - Date alignment **These are not bugs in your pipeline. They are the substance of working with real data.** Fixing them is the first half of this session. Don't start the regression until you have a merge you trust. ::: ::::: ::::: {.week-card .card} ::: card-header 🏗️ **Decisions you have to name (write them in `PANEL.md`)** ::: ::: card-body Before you merge anything, pick and **write down** your answers to these: 1. **Unit of observation.** `(team, match)`? `(team, calendar_week)`? `(team, gameweek)`? 2. **Time index.** 3. **Entity keys.** One clean team identifier across all tables. One clean manager identifier. 4. **Treatment timing.** What is "the" date of the change. 5. **Pre / post window.** How many periods before and after the change do you include? 5 matches? 10 weeks? To end of season? 6. **Missing-coverage policy.** A `(team, week)` with zero articles: do you drop it, impute zero, or carry forward the previous week's score? None of these has a universally right answer. **But you do need to commit to one answer per question and defend it in the slides.** ::: ::::: ------------------------------------------------------------------------ ## 4. Difference-in-Differences — vocabulary & tools This is a **vocabulary** to get started. You can learn the rest with AI — not a from-scratch DiD lecture. ::::: {.week-card .card} ::: card-header 📐 **Key concepts** ::: ::: card-body **Setup** - **Treatment** — the event whose effect you want to measure. Here: a manager change. - **Treated unit** — a team that experiences a manager change. **Control unit** — a team that does not (in the same window). **Methods** - **Two-way fixed effects** (team FE + time FE) — the standard panel regression that could be a good start. - **Event-study specification** — transform data into event time (when the intervention happens); one coefficient per lead / lag around the event, to *see* the dynamics rather than a single number. - **Pre / post** — periods before vs after the change. - **DiD estimator** — the difference *between treated and control* in the *change* from pre to post. In plain words: how much more (or less) did treated teams' performance move than control teams'? **Key challenges** - **Parallel trends** — the core assumption: absent treatment, treated and control would have moved in parallel. Mostly untestable, but *pre-trends* are a useful proxy. - **Staggered treatment** — different teams are treated at different times. This has important econometric implications that you shall investigate. - Ask AI on what can go wrong with a naive DiD in your setup. ::: ::::: ::::: {.week-card .card} ::: card-header 📦 **Package pointers** ::: ::: card-body **Python** - [`diff-diff`](https://diff-diff.readthedocs.io/en/stable/) — a general difference-in-differences / event-study package with sensible defaults. New but already great. - [`pyfixest`](https://github.com/py-econometrics/pyfixest) — general regression package with options for [difference in differences](https://pyfixest.org/tutorials/difference-in-differences.html#event-study-under-staggered-adoption-via-feols-event_study-did2s-lpdid). **R (if you prefer)** - [`did`](https://bcallaway11.github.io/did/) — the original Callaway–Sant'Anna implementation. - [`fixest`](https://lrberge.github.io/fixest/) — fastest FE in the ecosystem; `feols`, `etable`, `coefplot`. ::: ::::: ::: {.callout-tip} **Use AI to learn the details** — ask for worked mini-examples in *your* setup. Then read the code, don't just run it. - Use the `AI-interviews-me technique` to better understand the setup, what you expect, and what your key modelling choices are. ::: ------------------------------------------------------------------------ ## 5. The expectations angle ::::: {.week-card .card} ::: card-header 🧭 **What is the purpose of knowing if a manager change was expected?** ::: ::: card-body **Look at data** * Focus on the season(s) you have data on expectations. * Descriptive: What share of changes were expected? Were there cases when expectation was high but the manager was not fired? **Econometrics** * Parallel trends: what does it mean in this context? * What does it mean to have anticipation in terms of the causal estimation? **Use** * Heterogeneity: do expected changes have a different effect than unexpected ones? * Compare dynamics and simple diff-in-diffs estimates. ::: ::::: ------------------------------------------------------------------------ ## 6. Analysis work block (90 min) ::::: {.week-card .card} ::: card-header 🔬 **Suggested workflow** ::: ::: card-body Here is a suggested workflow. But you can proceed differently. 1. **Merge & validate.** Join Session 1 and Session 2 on your chosen keys. Print counts before and after. Eyeball a few rows. 2. **Write the spec** — in `METHODS.md`, before any regression: - Outcome. - Unit. - Treatment definition (date, caretaker handling). - Window (pre/post periods). - Controls (team FE, time FE, opponent strength). - Cluster level for SEs (team). 3. **Test on a subset** — one season. Get the plumbing right before you scale. 4. **Try out one or two models** — think about what econometric models you shall use. Start with something you understand. 5. **Test additional econometric model** — think about time, dynamics. Carefully interact with AI to pick one or two extra models. 6. **Heterogeneity 1** — interact with a binary indicator of manager characteristic. Repeat for team characteristic. 7. **Expectations** — interact with expectation score. 8. **Robustness** — think about 2 key decisions you have made. Test how much they matter. 9. **Draft slides while code runs.** Don't leave them for the last 15 minutes. ::: ::::: ------------------------------------------------------------------------ ## 7. Deliverables ::::: {.week-card .card} ::: card-header 📑 **The presentation** ::: ::: card-body One piece, with a choice of delivery format: * HTML presentation * Interactive dashboard * PDF report generated from markdown or LaTeX This must be fully human reviewed. ::: ::::: ::::: {.week-card .card} ::: card-header 📦 **Submission: reproducible research, presentation, reflection** ::: ::: card-body - **Who:** by **group** — same team you started with in Session 1.+ Individual note - **What:** - Presentation file (HTML, PDF, or dashboard link) - GitHub repo that allows reproduction --- see [Reproducible Research](../da-knowledge/reproducible-research.qmd)). - Ingests raw data and reproduces all figures and tables in a clear workflow - HINT: AI use, scraping, AI classification not needed -- use classified data as raw - `METHODS-AI.md` — where and how AI was used - **INDIVIDUAL reflection (3 paragraphs per person)**: (1) hardest part and how you solved it, (2) most important or notable learning experience, and (3) how you felt working in the group, would you aim for a different work process next ::: ::::: ------------------------------------------------------------------------ ## 8. Further resources For going deeper on modern DiD, two canonical stops: - **[Pedro Sant'Anna — DiD resources](https://psantanna.com/did-resources/)** — the maintained hub for modern difference-in-differences. Papers, code, slides, videos, curated by one of the authors of the Callaway–Sant'Anna estimator. Start here for anything staggered. - **[Carlos Mendez — DiD in Python](https://carlos-mendez.org/post/python_did/)** — a worked Python tutorial walking through classic and event-study / staggered DiD on real data. Good companion to the `csdid` / `pyfixest` pointers above.