From Prompting to Context Engineering
2026-02-22
This slideshow is part of my data analysis with AI material.
Check out the course website gabors-data-analysis.com/ai-course/
What changed?
“The delicate art and science of filling the context window with just the right information for the next step.” — Andrej Karpathy
It is not only about wording prompts. It is about choosing the right combination of instructions, data, tools, and constraints for the task.
One motivating specification is:
Why this specification?
Recommended protocol for empirical tasks:
What would you add?
You are assisting with empirical economics research.
Objective:
[one-sentence goal]
Data context:
- Dataset: [name]
- Key variables: [list]
- Unit of observation: [e.g., individual]
Method constraints:
- Preferred language: R
- Required method: [OLS + robustness checks]
- Must report: robust SE, N, R^2, key diagnostics
Output format:
- Publication-ready table
- Brief interpretation + limitations
Quality bar:
- Explain assumptions
- Include explicit validation checksCore principle: modern models are literal.
Prompt example:
If not specified
<context>
You are analyzing cross-sectional wage data.
</context>
<data_description>
Variables: wage, education, experience, female, age, industry
Sample: workers in 2025 survey, N=50,000
</data_description>
<task>
Estimate OLS of log(wage) on education and experience with robust SEs.
Report a publication-ready table and diagnostics.
</task>Using XML tags in system prompts offers several practical benefits:
Use XML when your prompt is complex – the structure pays off in reliability and maintainability.
Use references to ground outputs:
Grounding details:
Best practice now:
Break into checkpoints instead of one giant request.
This is not only about context limits. It is about error localization and better iteration.
Even with large context windows:
Use reasoning requests when method choice is non-trivial.
Ask for:
Example:
Reasoning-capable models are stronger at multi-step logic and coding, but:
Use them for difficult specification choices, then verify with tests and diagnostics.
Do not rely on text-only answers for computational tasks.
Use tools for:
Example:
Every data-analysis pipeline carries two things you can test:
| Pipe (code & structure) | Water (data & results) | |
|---|---|---|
| Nature | Deterministic — pass / fail | Probabilistic — plausible / suspect |
| Failure looks like | Error, warning, wrong type | Implausible sign, odd magnitude, unexpected N |
| Examples | PDF parser crash, mixed types in a column, silent row drops | Negative wage premium for education, SE larger than the estimate, distribution anomaly |
Structural and mechanical checks — if something is broken you should get a clear signal.
These are classic unit-test territory: the answer is yes or no.
Analytical and domain checks — there is a distribution of acceptable results, but you need anchors.
These range from hard constraints (a coefficient must be positive) to softer judgments (is this effect size surprising?).
Unit tests are yes/no assertions. Most guard the pipe, but the powerful ones also guard the water.
# --- Pipe tests (structure & cleaning) ---
stopifnot(nrow(df_wage) == expected_n_wage)
stopifnot(sum(is.na(df_wage$wage)) == 0)
stopifnot(is.numeric(df_wage$education))
# --- Water tests (results & plausibility) ---
stopifnot(coef(ols_wage)["education"] > 0)
stopifnot(summary(ols_wage)$sigma < 1e6)
stopifnot(summary(ols_wage)$df[2] > 30)Ask AI to generate both kinds of checks, then run them before interpreting results.
Result: less time on “clever phrasing,” more time on context design and verification design.
Plan mode pattern:
Why it matters:
Long sessions can degrade quality:
Recovery pattern:
Reusable instructions (skills/gems/custom prompts) help with:
Think of them as analysis playbooks for AI workflows.
Next week we extend this framework to:
Project setup:
Project closeout:
Course reference pages:
General documentation:
From Anthropic’s guidance:
From Google’s guidance:
Examples:
Reference links:
Answer: It does not materially change output quality. Use the tone that feels natural.
This version: 2026-02-22 (v0.8.0)
Previous versions: v0.7.2 (2026-02-22), v0.7.1 (2026-02-16), v0.7.0 (2026-02-16), v0.6.0 (2026-01-19), v0.3.2 (2025-05-28), v0.1.2 (2025-04-21)
Gabors Data Analysis with AI – Prompting – 2026-02-23 v0.8.1