Week 4 — Econometrics with AI
AI as research companion: controls, instruments, and DiD
Week 4 — Econometrics with AI
AI as a research companion for causal identification — designing controls, finding instruments, and difference-in-differences

The first research-focused week. Here AI acts like a researcher: helping you reason toward causal identification. We cover three tools — control variables, instrumental variables, and difference-in-differences — and the prompting strategies that make AI genuinely useful for each. (DiD overlaps with the capstone on purpose — repetition is intended.)
Before you come to class (30–60 min)
✅ Pre-class checklist — this is where the causality recap lives now
Learning objectives
By the end of this unit you will be able to:
- Use a two-session “helpful vs adversarial” prompt pattern to design a control set.
- Use prompt chaining to surface candidate instruments without leaking that you want IVs.
- State the DiD setup and assumptions, and have AI sketch a worked design in your setting.
- Translate a research question into a defensible identification strategy.
Session shape (200 min · 50·50·50·50)
| Chunk | Focus | Mode |
|---|---|---|
| 1 (50) | AI-as-research-companion; framing | Talk |
| 2 (50) | Designing controls (Z) | Talk + run prompts |
| 3 (50) | Instrumental variables | Talk + run prompts |
| 4 (50) | Difference-in-differences + work-together | Talk + group |
Chunk 1 — AI as a research companion (50 min)
🤖 A different way to prompt
Until now AI wrote code. Now we ask it to reason like a researcher. The skill is not asking “what’s my instrument?” — it’s structuring the conversation so the AI reasons from the setting up to the econometrics. Two ideas we’ll use repeatedly:
- Adversarial pairing — one session proposes, a second session (or model) attacks, then you feed the critique back. A debate between a helpful and a suspicious AI.
- Prompt chaining — break a hard question (find an IV) into steps: map relationships first, then narrow to candidates. Avoids generic textbook answers and training-data bias.
Chunk 2 — Designing controls “Z” (50 min · run prompts)
Running question: Do firms with better management export more of their production? — World Management Survey data, codebook.
😇😈 Helpful vs adversarial, in three prompts
🤖 Prompt 1 (Original LLM). You are a researcher who wants to find control variables to estimate the association between percent of production exported (outcome) and management quality (variable of interest). Attached is a codebook. Choose variables for a multivariate OLS regression. Return each with expected direction of association, then a plain copyable list (one variable per line).
😈 Prompt 2 (Adversarial LLM, separate session). Below is a list of variables someone wants in a regression of percent exported on management quality. Argue, for each, why it should not be included. [paste list]
🤖 Prompt 3 (back to Original). Below are counter-arguments to your selection. Revise and give a final list. [paste critique]
As you go: how do the answers differ from a one-shot prompt? Which proposed controls are bad controls? What got left out?
Chunk 3 — Instrumental variables (50 min · run prompts)
Prompting approach adapted from Sukjin Han (2024), “Mining Causality: AI-Assisted Search for Instrumental Variables”.
🔗 The two-step chain — and why we don’t say “IV”
Why not say “IV”: if you ask directly, the model defaults to textbook instruments (weather, distance to school) regardless of context — shallow, biased toward popular examples. Instead, describe the setting and let it reason.
🤖 Step 1 (search, satisfies REL + EX). You are [agent] who must make a [treatment] decision in [scenario]. What factors determine your decision but do not directly affect [outcome] except through [treatment]? List [K] quantifiable factors and explain each.
🤖 Step 2 (refine, satisfies IND). Among those factors, choose the ones most likely unassociated with [confounders] of [outcome], while still influencing [treatment]. Explain each.
Worked examples to run together: demand estimation (Copacabana beer price) and peer effects on microfinance adoption. For each: does the AI seem to realise we’re testing something deeper? Are the instruments believable?
Chunk 4 — Difference-in-differences + work-together (50 min)
📐 DiD vocabulary (new) — overlaps the capstone on purpose
- Treatment / treated unit / control unit — the event and who does/doesn’t experience it in a window.
- Two-way fixed effects (unit FE + time FE) — the standard panel starting point.
- Event-study spec — one coefficient per lead/lag around the event, to see the dynamics.
- DiD estimator — the difference between treated and control in the change from pre to post.
- Parallel trends — the core (mostly untestable) assumption; check pre-trends as a proxy.
- Staggered treatment — units treated at different times; has real econometric consequences — ask AI what can go wrong with a naïve DiD.
Packages to ask AI for worked mini-examples in your setting: Python — diff-diff, pyfixest; R — did, fixest. Read the code, don’t just run it.
👥 Work together (groups of 3–4)
Research question: What is the effect of migration on wages in the target country? Pin down context (from→to country, industry, period, skill type). Then:
- Design a control set (helpful vs adversarial).
- Build Prompt 1 / Prompt 2 to surface an instrument; compare what you’d keep vs what the AI kept, and interrogate the difference.
- Sketch a DiD alternative: what’s the treatment, who are controls, what threatens parallel trends?
Delivery
📦 What to hand in (Sunday 23:55)
- A one-page identification memo for the migration→wages question (or your own RQ): the proposed control set and/or instrument and/or DiD design, with assumptions stated and the main threats named.
- Links to the chats you used, plus 2–3 sentences on which prompting pattern worked best and why.
- No regressions required — this unit is about identification, not estimation.