Capstone Project — Session 3: Causal Analysis & Final Presentation

Data to answer — Diff-in-diffs, heterogeneity, and presentation

Published

April 27, 2026

Capstone Project — Session 3

From panel to causal effect — and how to present what you found

Where we are

Session 1 → match + manager-change dataset.
Session 2 → a text-based expectation score per change.
Session 3 (today) → build the analysis panel, run a causal design (difference-in-differences), explore heterogeneity — especially by expectations — and present your findings as an HTML report.

Data plumbing and panel construction

🧩 Your two datasets don’t quite fit together

You arrive at Session 3 with two outputs:

From Session 1:

a match/manager-change panel. Rows indexed by some combination of team, season, date, match_id.
team characteristics
manager characteristics

From Session 2:

-an article-level table - a (team, gameweek) → avg_score aggregation.

potential issues

We have to create a single panel dataset.

A short list of what to expect:

Entity resolution
Time unit problems, inconsistencies and mismatch
Coverage gaps regarding the expectation data
Date alignment

These are not bugs in your pipeline. They are the substance of working with real data. Fixing them is the first half of this session. Don’t start the regression until you have a merge you trust.

🏗️ Building the analysis panel — decisions you have to name

Before you merge anything, pick and write down (in PANEL.md) your answers to these:

Unit of observation. (team, match)? (team, calendar_week)? (team, gameweek)?
Time index.
Entity keys. One clean team identifier across all tables. One clean manager identifier.
Treatment timing. What is “the” date of the change
Pre / post window. How many periods before and after the change do you include? 5 matches? 10 weeks? To end of season?
Missing-coverage policy. A (team, week) with zero articles: do you drop it, impute zero, or carry forward the previous week’s score?

None of these has a universally right answer. But you do need to commit to one answer per question and defend it in the slides.

The causal question

🎯 What are we actually trying to answer?

Research question: Does changing a manager improve team performance? And does the impact vary by manager and team characteristics?

Your first task is to translate it into a very specific question you will estimate.

Define all aspects of the question
Make editorial decisions.

Think about what can go wrong

sample design
confounders
anything else

Difference-in-Differences — overview and vocabulary

This is a vocabulary to get started. you can learn the rest with AI, not to teach DiD from scratch.

📐 Key concepts

setup

Treatment — the event whose effect you want to measure. Here: a manager change.
Treated unit — a team that experiences a manager change. Control unit — a team that does not (in the same window).

methods

Two-way fixed effects (team FE + time FE) — the standard panel regression that could be a good start
Event-study specification — transform data into event time (when the intervention happens) one coefficient per lead / lag around the event, to see the dynamics rather than a single number.
Pre / post — periods before vs after the change.
DiD estimator — the difference between treated and control in the change from pre to post. In plain words: how much more (or less) did treated teams’ performance move than control teams’?

key metrics challenges

Parallel trends — the core assumption: absent treatment, treated and control would have moved in parallel. Mostly untestable, but pre-trends are a useful proxy.
Staggered treatment — different teams are treated at different times. This has important econometric implications that you shall investigate
ask AI on what can go wrong with a naive DiD in your setup

📦 Package pointers

Python

diff-diff — a general difference in differences / event-study package with sensible defaults. New but already great.
pyfixest — general regression package with options for difference in differences at

R (if you prefer)

did — the original Callaway–Sant’Anna implementation.
fixest — fastest FE in the ecosystem; feols, etable, coefplot.

Tip

Use AI to learn the details

— ask for worked mini-examples in your setup- Then read the code, don’t just run it. - Use the AI-interviews-me technique to better understand the setup and what you expect, and what are you key modelling choices.

What can we use expedtations for?

🧭 What is the purpose of knowing if a manager change was expected

look at data

focus on the season(s) you have data on expectations
Descriptive: What share of changes were expected? Were there cases when it was high expectation but not fired?

econometrics

Parallel trends: what does it mean in the context?
What does it mean to have anticipation in terms of the causal estimation?

use

Heterogeneity: do expected changes have a different effect than unexpected ones?
- compare dynamics and simple diff-in-diffs estimates

Analysis work block (90 min)

🔬 Suggested workflow

Merge & validate. Join Session 1 and Session 2 on your chosen keys. Print counts before and after. Eyeball a few rows.
Write the spec — in METHODS.md, before any regression:
- Outcome.
- Unit.
- Treatment definition (date, caretaker handling).
- Window (pre/post periods).
- Controls (team FE, time FE, opponent strength).
- Cluster level for SEs (team).
Test on a subset — one season. Get the plumbing right before you scale.
Try out one or two models – think about what econometric models you shall use. Start with something you understand.
Test additional econometric model – think about time, dynamics. Carefully interact with AI to pick one or two extra model.
Heterogeneity 1 — interact with a binary indicator of manager characteristic. Repeat for team characteristic.
Expectations — interact with expectation score.
Robustness — think about 2 key decisions you have made. Test how much they matter
Draft slides while code runs. Don’t leave them for the last 15 minutes.

Final deliverable — “presentation”

📑 One piece, many optional delivery format

Delivery may be any of these three options.

HTML presentation
Interactive Dashboard
PDF report generated from markdown or latex

This must be fully human reviewed.

Delivery (Session 3 — end of the project)

📦 Final submission

Who: by group — same team you started with in Session 1.
Deadline: Sunday 23:55 (the Sunday after this session).
What:
- GitHub repo with: data (or fetcher), Session 1 tests, Session 2 article + expectations tables, Session 3 analysis code, and the rendered HTML report.
- README.md describing how to reproduce all numbers end to end (make all should work from a cold clone — see Reproducible Research).
- PANEL.md — the six panel-construction decisions and your answers.
- METHODS.md — the DiD spec, parallel-trends check, packages used, robustness run.
- DATA.md — updated with the final merged panel documented.
- Individual reflection (1–2 pages per person): hardest part and how you solved it, what you would do differently, how labour was split.

See evaluation criteria in the project description.

Further resources

For going deeper on modern DiD, two canonical stops:

Pedro Sant’Anna — DiD resources — the maintained hub for modern difference-in-differences. Papers, code, slides, videos, curated by one of the authors of the Callaway–Sant’Anna estimator. Start here for anything staggered.
Carlos Mendez — DiD in Python — a worked Python tutorial walking through classic and event-study / staggered DiD on real data. Good companion to the csdid / pyfixest pointers above.

--- title: "Capstone Project — Session 3: Causal Analysis & Final Presentation" subtitle: "Data to answer — Diff-in-diffs, heterogeneity, and presentation" date: "2026-04-27 (version 1.1)" --- ::::::: {.hero-section} :::::: {.container} ::: {.hero-title} Capstone Project — Session 3 ::: ::: {.hero-subtitle} From panel to causal effect — and how to present what you found ::: :::::: ::::::: ------------------------------------------------------------------------ ## Where we are - **Session 1 →** match + manager-change dataset. - **Session 2 →** a text-based expectation score per change. - **Session 3 (today) →** build the analysis panel, run a **causal design** (difference-in-differences), explore heterogeneity — especially by expectations — and **present** your findings as an HTML report. ------------------------------------------------------------------------ ## Data plumbing and panel construction ::::: {.week-card .card} ::: card-header 🧩 **Your two datasets don't quite fit together** ::: ::: card-body You arrive at Session 3 with two outputs: **From Session 1:** - a match/manager-change panel. Rows indexed by some combination of `team`, `season`, `date`, `match_id`. - team characteristics - manager characteristics **From Session 2:** -an article-level table - a `(team, gameweek) → avg_score` aggregation. **potential issues** We have to create a single panel dataset. A short list of what to expect: - Entity resolution - Time unit problems, inconsistencies and mismatch - Coverage gaps regarding the expectation data - Date alignment **These are not bugs in your pipeline. They are the substance of working with real data.** Fixing them is the first half of this session. Don't start the regression until you have a merge you trust. ::: ::::: ::::: {.week-card .card} ::: card-header 🏗️ **Building the analysis panel — decisions you have to name** ::: ::: card-body Before you merge anything, pick and **write down** (in `PANEL.md`) your answers to these: 1. **Unit of observation.** `(team, match)`? `(team, calendar_week)`? `(team, gameweek)`? 2. **Time index.** 3. **Entity keys.** One clean team identifier across all tables. One clean manager identifier. 4. **Treatment timing.** What is "the" date of the change 5. **Pre / post window.** How many periods before and after the change do you include? 5 matches? 10 weeks? To end of season? 6. **Missing-coverage policy.** A `(team, week)` with zero articles: do you drop it, impute zero, or carry forward the previous week's score? None of these has a universally right answer. **But you do need to commit to one answer per question and defend it in the slides.** ::: ::::: ------------------------------------------------------------------------ ## The causal question ::::: {.week-card .card} ::: card-header 🎯 **What are we actually trying to answer?** ::: ::: card-body **Research question:** *Does changing a manager improve team performance? And does the impact vary by manager and team characteristics?* Your first task is to translate it into a very specific question you will estimate. * Define all aspects of the question * Make editorial decisions. Think about what can go wrong * sample design * confounders * anything else ::: ::::: ------------------------------------------------------------------------ ## Difference-in-Differences — overview and vocabulary This is a **vocabulary** to get started. you can learn the rest with AI, not to teach DiD from scratch. ::::: {.week-card .card} ::: card-header 📐 **Key concepts** ::: ::: card-body **setup** - **Treatment** — the event whose effect you want to measure. Here: a manager change. - **Treated unit** — a team that experiences a manager change. **Control unit** — a team that does not (in the same window). **methods** - **Two-way fixed effects** (team FE + time FE) — the standard panel regression that could be a good start - **Event-study specification** — transform data into event time (when the intervention happens) one coefficient per lead / lag around the event, to *see* the dynamics rather than a single number. - **Pre / post** — periods before vs after the change. - **DiD estimator** — the difference *between treated and control* in the *change* from pre to post. In plain words: how much more (or less) did treated teams' performance move than control teams'? **key metrics challenges** - **Parallel trends** — the core assumption: absent treatment, treated and control would have moved in parallel. Mostly untestable, but *pre-trends* are a useful proxy. - **Staggered treatment** — different teams are treated at different times. This has important econometric implications that you shall investigate - ask AI on what can go wrong with a naive DiD in your setup ::: ::::: ::::: {.week-card .card} ::: card-header 📦 **Package pointers** ::: ::: card-body **Python** - [`diff-diff`](https://diff-diff.readthedocs.io/en/stable/) — a general difference in differences / event-study package with sensible defaults. New but already great. - [`pyfixest`](https://github.com/py-econometrics/pyfixest) — general regression package with options for [difference in differences at](https://pyfixest.org/tutorials/difference-in-differences.html#event-study-under-staggered-adoption-via-feols-event_study-did2s-lpdid) **R (if you prefer)** - [`did`](https://bcallaway11.github.io/did/) — the original Callaway–Sant'Anna implementation. - [`fixest`](https://lrberge.github.io/fixest/) — fastest FE in the ecosystem; `feols`, `etable`, `coefplot`. ::: ::::: ::: {.callout-tip} **Use AI to learn the details** — ask for worked mini-examples in *your* setup- Then read the code, don't just run it. - Use the `AI-interviews-me technique` to better understand the setup and what you expect, and what are you key modelling choices. ::: ------------------------------------------------------------------------ ## What can we use expedtations for? ::::: {.week-card .card} ::: card-header 🧭 **What is the purpose of knowing if a manager change was expected** ::: ::: card-body **look at data** * focus on the season(s) you have data on expectations * Descriptive: What share of changes were expected? Were there cases when it was high expectation but not fired? **econometrics** * Parallel trends: what does it mean in the context? * What does it mean to have anticipation in terms of the causal estimation? **use** * Heterogeneity: do expected changes have a different effect than unexpected ones? * compare dynamics and simple diff-in-diffs estimates ::: ::::: ------------------------------------------------------------------------ ## Analysis work block (90 min) ::::: {.week-card .card} ::: card-header 🔬 **Suggested workflow** ::: ::: card-body 1. **Merge & validate.** Join Session 1 and Session 2 on your chosen keys. Print counts before and after. Eyeball a few rows. 2. **Write the spec** — in `METHODS.md`, before any regression: - Outcome. - Unit. - Treatment definition (date, caretaker handling). - Window (pre/post periods). - Controls (team FE, time FE, opponent strength). - Cluster level for SEs (team). 3. **Test on a subset** — one season. Get the plumbing right before you scale. 4. **Try out one or two models** -- think about what econometric models you shall use. Start with something you understand. 5. **Test additional econometric model** -- think about time, dynamics. Carefully interact with AI to pick one or two extra model. 6. **Heterogeneity 1** — interact with a binary indicator of manager characteristic. Repeat for team characteristic. 7. **Expectations** — interact with expectation score. 8. **Robustness** — think about 2 key decisions you have made. Test how much they matter 9. **Draft slides while code runs.** Don't leave them for the last 15 minutes. ::: ::::: ------------------------------------------------------------------------ ## Final deliverable — "presentation" ::::: {.week-card .card} ::: card-header 📑 **One piece, many optional delivery format** ::: ::: card-body Delivery may be any of these three options. * HTML presentation * Interactive Dashboard * PDF report generated from markdown or latex This must be fully human reviewed. ::: ::::: ------------------------------------------------------------------------ ## Delivery (Session 3 — end of the project) ::::: {.week-card .card} ::: card-header 📦 **Final submission** ::: ::: card-body - **Who:** by **group** — same team you started with in Session 1. - **Deadline:** **Sunday 23:55** (the Sunday after this session). - **What:** - GitHub repo with: data (or fetcher), Session 1 tests, Session 2 article + expectations tables, Session 3 analysis code, and the rendered HTML report. - `README.md` describing how to reproduce all numbers end to end (`make all` should work from a cold clone — see [Reproducible Research](../da-knowledge/reproducible-research.qmd)). - `PANEL.md` — the six panel-construction decisions and your answers. - `METHODS.md` — the DiD spec, parallel-trends check, packages used, robustness run. - `DATA.md` — updated with the final merged panel documented. - **Individual reflection (1–2 pages per person)**: hardest part and how you solved it, what you would do differently, how labour was split. ::: ::::: See [evaluation criteria](../capstone/index.qmd#evaluation) in the project description. ------------------------------------------------------------------------ ## Further resources For going deeper on modern DiD, two canonical stops: - **[Pedro Sant'Anna — DiD resources](https://psantanna.com/did-resources/)** — the maintained hub for modern difference-in-differences. Papers, code, slides, videos, curated by one of the authors of the Callaway–Sant'Anna estimator. Start here for anything staggered. - **[Carlos Mendez — DiD in Python](https://carlos-mendez.org/post/python_did/)** — a worked Python tutorial walking through classic and event-study / staggered DiD on real data. Good companion to the `csdid` / `pyfixest` pointers above.