Data Analysis with AI 02: Prompting

Prompting for Data Analysis

Gábor Békés (CEU)

2025-05-28

The AI course

This slideshow is part of my data analysis with AI material.

Check out the course website gabors-data-analysis.com/ai-course/

About me and this slideshow

  • I am an economist and not an AI developer, expert, guru, evangelist
  • I am an active AI user in teaching and research
  • I teach a series a Data Analysis courses based on my textbook
    • This project is closely related to concepts and material in the book, but can be consumed alone.
  • This slideshow was created to help students and instructors active in data analysis in education, research, public policy or business
  • Enjoy.

Prompting strategies

Prompting

  • It is helpful to think about prompts

  • But it’s not magic, just focus on defining what you want.

  • Think as giving instructions to someone else (= The RA approach)

Prompting for Effective LLM Use

  • AI model is a prediction machine. More input yields better prediction.
  • Clear communication with LLMs is crucial for accurate results
  • Specific, structured prompts lead to better outputs
  • Different strategies for different analytical needs

Six Key Prompting Strategies

  1. Write clear instructions
  2. Provide reference materials
  3. Break complex tasks into subtasks
  4. Step by step reasining
  5. Use external tools
  6. Test systematically

Source: openai guide to prompt engineering

1 Clear Instructions

General Principles

  • Be specific about style, requirements
  • Include relevant context and constraints
  • Use example formats when needed

Data Analysis Apps 📊

  • Define audience (academic/business/public)
  • Define desired statistical approaches
  • Clarify output format (tables, graphs, reports)

1 Clear Instructions Implementation

General Tactics ⚡

  • Adopt personas
  • Specify output format explicitly
  • Define scope clearly
  • Ask for specific extensions

Example Prompts 💡

  • “Write a report for government officials”
  • “Format correlation matrices as heatmaps”
  • “Show full statistical tests including p-values”
  • “Report extreme values and influential points”

1 Clear Instructions: A comment

  • High-level knowledge is important
    • to guide the process
    • check for errors in execution

2. Provide References

General Tactics ⚡

  • Share relevant documentation
  • Include descriptions
  • Require citations

Data Analysis Apps 📊

  • Reference statistical methodology
  • Share domain context and assumptions
  • Provide data quality metrics

2. Provide References: Tactics

General Tactics ⚡

  • Insert documentation at start
  • Include sample outputs

Data Analysis Tactics 📊

  • Link/upload statistical papers or paragraphs for methods
  • Share data dictionaries and variable descriptions

3. Break Complex Tasks

General Tactics ⚡

  • Divide into logical steps
  • Build complexity gradually
  • Validate intermediate steps

Data Analysis Applications 📊

  • Cleaning, EDA, Analysis
  • Xsec OLS, panelFE, event time
  • Does OLS \(y\), \(x\) make sense?

3. Break Complex Tasks: Tactics

General Tactics ⚡

  • Create checkpoints
  • Request validations
  • Chain related tasks

Data Analysis Tactics 📊

  • Verify distributions before tests
  • Check assumptions step by step
  • Build analysis pipeline incrementally
  • Validate transformations at each stage

4. Step by step

General Tactics ⚡

  • Request step-by-step reasoning
  • Explain assumptions
  • Validate interim results

Data Analysis Tactics 📊

  • Show data exploration process
  • Justify method selection
  • Document assumption checks
  • Explain statistical decisions

5. External Tools

General Tactics ⚡

  • Use code execution (Python)
  • use interactive tools like /canvas,
  • Share full folders like Claude projects

Data Analysis Tactics 📊

  • Ask for direct solution
  • Upload for code stack

6. Test Systematically

General Tactics ⚡

  • Validate against known results
  • Check validity
  • Compare methods

Data Analysis Tactics 📊

  • Cross-validation
  • Benchmark datasets
  • Statistical assumption verification
  • Sensitivity analysis

2025 May update

Very useful advice from Anthropic following the release of Claude 4

  • Some generic advice, similar to OpenAI, like being explicit and having a target audience
  • Some specific advice like using xml tags

Source: claude 4 best practice

Update: What is new in 2025 (2025-04-21)

Provide References: A comment

  • 2025-02 vs 2024-07: Improved
  • Increased context window and reduced hallucinations made this much better

Agentic models

  • Various models “collaborate” and offer feedback
  • can do more alone

Reasning model’s (open ai chatgpt o3) Prompting tips

GPT o3

  • State objective + output format once; reasoning is automatic.
  • Less need for “think step‑by‑step” hacks.
  • Ask explicitly for citations / brevity / tables if required.

vs

GPT 4.5

  • Classic “let’s think step‑by‑step” improves depth.
  • Break large tasks into numbered subtasks to stay within context.
  • Be explicit: “Give only runnable Python; no commentary” to control style.

Agentic coding – Claude

Should we say thank you?

Should we say thank you?

Date stamp

This version: 2025-04-21 (minor edits from v0.1.2)

bekesg@ceu.edu