Data Analysis with AI 02: Prompting

Prompting for Data Analysis

Gábor Békés (CEU)

2025-05-28

The AI course

This slideshow is part of my data analysis with AI material.

Check out the course website gabors-data-analysis.com/ai-course/

About me and this slideshow

I am an economist and not an AI developer, expert, guru, evangelist
I am an active AI user in teaching and research
I teach a series a Data Analysis courses based on my textbook
- This project is closely related to concepts and material in the book, but can be consumed alone.
This slideshow was created to help students and instructors active in data analysis in education, research, public policy or business
Enjoy.

Prompting strategies

Prompting

It is helpful to think about prompts
But it’s not magic, just focus on defining what you want.
Think as giving instructions to someone else (= The RA approach)

Prompting for Effective LLM Use

AI model is a prediction machine. More input yields better prediction.
Clear communication with LLMs is crucial for accurate results
Specific, structured prompts lead to better outputs
Different strategies for different analytical needs

Six Key Prompting Strategies

Write clear instructions
Provide reference materials
Break complex tasks into subtasks
Step by step reasining
Use external tools
Test systematically

Source: openai guide to prompt engineering

1 Clear Instructions

General Principles

Be specific about style, requirements
Include relevant context and constraints
Use example formats when needed

Data Analysis Apps 📊

Define audience (academic/business/public)
Define desired statistical approaches
Clarify output format (tables, graphs, reports)

1 Clear Instructions Implementation

General Tactics ⚡

Adopt personas
Specify output format explicitly
Define scope clearly
Ask for specific extensions

Example Prompts 💡

“Write a report for government officials”
“Format correlation matrices as heatmaps”
“Show full statistical tests including p-values”
“Report extreme values and influential points”

1 Clear Instructions: A comment

High-level knowledge is important
- to guide the process
- check for errors in execution

2. Provide References

General Tactics ⚡

Share relevant documentation
Include descriptions
Require citations

Data Analysis Apps 📊

Reference statistical methodology
Share domain context and assumptions
Provide data quality metrics

2. Provide References: Tactics

General Tactics ⚡

Insert documentation at start
Include sample outputs

Data Analysis Tactics 📊

Link/upload statistical papers or paragraphs for methods
Share data dictionaries and variable descriptions

3. Break Complex Tasks

General Tactics ⚡

Divide into logical steps
Build complexity gradually
Validate intermediate steps

Data Analysis Applications 📊

Cleaning, EDA, Analysis
Xsec OLS, panelFE, event time
Does OLS \(y\), \(x\) make sense?

3. Break Complex Tasks: Tactics

General Tactics ⚡

Create checkpoints
Request validations
Chain related tasks

Data Analysis Tactics 📊

Verify distributions before tests
Check assumptions step by step
Build analysis pipeline incrementally
Validate transformations at each stage

4. Step by step

General Tactics ⚡

Request step-by-step reasoning
Explain assumptions
Validate interim results

Data Analysis Tactics 📊

Show data exploration process
Justify method selection
Document assumption checks
Explain statistical decisions

5. External Tools

General Tactics ⚡

Use code execution (Python)
use interactive tools like /canvas,
Share full folders like Claude projects

Data Analysis Tactics 📊

Ask for direct solution
Upload for code stack

6. Test Systematically

General Tactics ⚡

Validate against known results
Check validity
Compare methods

Data Analysis Tactics 📊

Cross-validation
Benchmark datasets
Statistical assumption verification
Sensitivity analysis

2025 May update

Very useful advice from Anthropic following the release of Claude 4

Some generic advice, similar to OpenAI, like being explicit and having a target audience
Some specific advice like using xml tags

Source: claude 4 best practice

Update: What is new in 2025 (2025-04-21)

Provide References: A comment

2025-02 vs 2024-07: Improved
Increased context window and reduced hallucinations made this much better

Agentic models

Various models “collaborate” and offer feedback
can do more alone

Reasning model’s (open ai chatgpt o3) Prompting tips

GPT o3

State objective + output format once; reasoning is automatic.
Less need for “think step‑by‑step” hacks.
Ask explicitly for citations / brevity / tables if required.

vs

GPT 4.5

Classic “let’s think step‑by‑step” improves depth.
Break large tasks into numbered subtasks to stay within context.
Be explicit: “Give only runnable Python; no commentary” to control style.

Agentic coding – Claude

Claude Code is a command line tool for agentic coding Code Best Practices by Anthropic

Should we say thank you?

Should we say thank you?

Date stamp

This version: 2025-04-21 (minor edits from v0.1.2)