Capstone Project — Session 3: Causal Analysis & Final Presentation
Data to answer — Diff-in-diffs, heterogeneity, and presentation
Capstone Project — Session 3
From panel to causal effect — and how to present what you found
1. Where we are
- Session 1 → match + manager-change dataset.
- Session 2 → a text-based expectation score per change.
- Session 3 (today) → build the analysis panel, run a causal design (difference-in-differences), explore heterogeneity — especially by expectations — and present your findings.
Review week 02
- Data collection
- scraping (technical issues)
- API calls (technical issues)
- key decision points
- Review classification solutions
- simple
- two-step
- tool use
2. The causal question
🎯 What are we actually trying to answer?
Research question: Does changing a manager improve team performance? And does the impact vary by manager and team characteristics?
Your first task is to translate it into a very specific question you will estimate.
- Define all aspects of the question
- Make editorial decisions.
Think about what can go wrong:
- sample design
- confounders
- anything else
3. Building the analysis panel
🧩 Your two datasets don’t quite fit together
You arrive at Session 3 with two outputs:
From Session 1:
- a match/manager-change panel. Rows indexed by some combination of
team,season,date,match_id. - team characteristics
- manager characteristics
From Session 2:
- an article-level table
- a
(team, gameweek) → avg_scoreaggregation.
Potential issues
We have to create a single panel dataset.
A short list of what to expect:
- Entity resolution
- Time unit problems, inconsistencies and mismatch
- Coverage gaps regarding the expectation data
- Date alignment
These are not bugs in your pipeline. They are the substance of working with real data. Fixing them is the first half of this session. Don’t start the regression until you have a merge you trust.
🏗️ Decisions you have to name (write them in PANEL.md)
Before you merge anything, pick and write down your answers to these:
- Unit of observation.
(team, match)?(team, calendar_week)?(team, gameweek)? - Time index.
- Entity keys. One clean team identifier across all tables. One clean manager identifier.
- Treatment timing. What is “the” date of the change.
- Pre / post window. How many periods before and after the change do you include? 5 matches? 10 weeks? To end of season?
- Missing-coverage policy. A
(team, week)with zero articles: do you drop it, impute zero, or carry forward the previous week’s score?
None of these has a universally right answer. But you do need to commit to one answer per question and defend it in the slides.
4. Difference-in-Differences — vocabulary & tools
This is a vocabulary to get started. You can learn the rest with AI — not a from-scratch DiD lecture.
📐 Key concepts
Setup
- Treatment — the event whose effect you want to measure. Here: a manager change.
- Treated unit — a team that experiences a manager change. Control unit — a team that does not (in the same window).
Methods
- Two-way fixed effects (team FE + time FE) — the standard panel regression that could be a good start.
- Event-study specification — transform data into event time (when the intervention happens); one coefficient per lead / lag around the event, to see the dynamics rather than a single number.
- Pre / post — periods before vs after the change.
- DiD estimator — the difference between treated and control in the change from pre to post. In plain words: how much more (or less) did treated teams’ performance move than control teams’?
Key challenges
- Parallel trends — the core assumption: absent treatment, treated and control would have moved in parallel. Mostly untestable, but pre-trends are a useful proxy.
- Staggered treatment — different teams are treated at different times. This has important econometric implications that you shall investigate.
- Ask AI on what can go wrong with a naive DiD in your setup.
📦 Package pointers
Python
diff-diff— a general difference-in-differences / event-study package with sensible defaults. New but already great.pyfixest— general regression package with options for difference in differences.
R (if you prefer)
Use AI to learn the details
— ask for worked mini-examples in your setup. Then read the code, don’t just run it. - Use the AI-interviews-me technique to better understand the setup, what you expect, and what your key modelling choices are.
5. The expectations angle
🧭 What is the purpose of knowing if a manager change was expected?
Look at data
- Focus on the season(s) you have data on expectations.
- Descriptive: What share of changes were expected? Were there cases when expectation was high but the manager was not fired?
Econometrics
- Parallel trends: what does it mean in this context?
- What does it mean to have anticipation in terms of the causal estimation?
Use
- Heterogeneity: do expected changes have a different effect than unexpected ones?
- Compare dynamics and simple diff-in-diffs estimates.
6. Analysis work block (90 min)
🔬 Suggested workflow
Here is a suggested workflow. But you can proceed differently.
- Merge & validate. Join Session 1 and Session 2 on your chosen keys. Print counts before and after. Eyeball a few rows.
- Write the spec — in
METHODS.md, before any regression:- Outcome.
- Unit.
- Treatment definition (date, caretaker handling).
- Window (pre/post periods).
- Controls (team FE, time FE, opponent strength).
- Cluster level for SEs (team).
- Test on a subset — one season. Get the plumbing right before you scale.
- Try out one or two models — think about what econometric models you shall use. Start with something you understand.
- Test additional econometric model — think about time, dynamics. Carefully interact with AI to pick one or two extra models.
- Heterogeneity 1 — interact with a binary indicator of manager characteristic. Repeat for team characteristic.
- Expectations — interact with expectation score.
- Robustness — think about 2 key decisions you have made. Test how much they matter.
- Draft slides while code runs. Don’t leave them for the last 15 minutes.
7. Deliverables
📑 The presentation
One piece, with a choice of delivery format:
- HTML presentation
- Interactive dashboard
- PDF report generated from markdown or LaTeX
This must be fully human reviewed.
📦 Submission: reproducible research, presentation, reflection
- Who: by group — same team you started with in Session 1.+ Individual note
- What:
Presentation file (HTML, PDF, or dashboard link)
GitHub repo that allows reproduction — see Reproducible Research).
- Ingests raw data and reproduces all figures and tables in a clear workflow
- HINT: AI use, scraping, AI classification not needed – use classified data as raw
METHODS-AI.md— where and how AI was usedINDIVIDUAL reflection (3 paragraphs per person): (1) hardest part and how you solved it, (2) most important or notable learning experience, and (3) how you felt working in the group, would you aim for a different work process next
8. Further resources
For going deeper on modern DiD, two canonical stops:
- Pedro Sant’Anna — DiD resources — the maintained hub for modern difference-in-differences. Papers, code, slides, videos, curated by one of the authors of the Callaway–Sant’Anna estimator. Start here for anything staggered.
- Carlos Mendez — DiD in Python — a worked Python tutorial walking through classic and event-study / staggered DiD on real data. Good companion to the
csdid/pyfixestpointers above.