Data Analysis with AI: Prompting

Prompting for Data Analysis

Gábor Békés

2025-01-01

Prompting strategies

Prompting

  • It is helpful to think about prompts

  • But it’s not magic, just focus on defining what you want.

  • Think as giving instructions to someone else (= The RA approach)

Prompting for Effective LLM Use

  • AI model is a prediction machine. More input yields better prediction.
  • Clear communication with LLMs is crucial for accurate results
  • Specific, structured prompts lead to better outputs
  • Different strategies for different analytical needs

Six Key Prompting Strategies

  1. Write clear instructions
  2. Provide reference materials
  3. Break complex tasks into subtasks
  4. Step by step reasining
  5. Use external tools
  6. Test systematically

Source: openai guide to prompt engineering

1 Clear Instructions

General Principles

  • Be specific about style, requirements
  • Include relevant context and constraints
  • Use example formats when needed

Data Analysis Apps 📊

  • Define audience (academic/business/public)
  • Define desired statistical approaches
  • Clarify output format (tables, graphs, reports)

1 Clear Instructions Implementation

General Tactics ⚡

  • Adopt personas
  • Specify output format explicitly
  • Define scope clearly
  • Ask for specific extensions

Example Prompts 💡

  • “Write a report for government officials”
  • “Format correlation matrices as heatmaps”
  • “Show full statistical tests including p-values”
  • “Report extreme values and influential points”

1 Clear Instructions: A comment

  • High-level knowledge is important
    • to guide the process
    • check for errors in execution

2. Provide References

General Tactics ⚡

  • Share relevant documentation
  • Include descriptions
  • Require citations

Data Analysis Apps 📊

  • Reference statistical methodology
  • Share domain context and assumptions
  • Provide data quality metrics

2. Provide References: Tactics

General Tactics ⚡

  • Insert documentation at start
  • Include sample outputs

Data Analysis Tactics 📊

  • Link/upload statistical papers or paragraphs for methods
  • Share data dictionaries and variable descriptions

2. Provide References: A comment

  • 2025-02 vs 2024-07: Improved
  • Increased context window and reduced hallucinations made this much better

3. Break Complex Tasks

General Tactics ⚡

  • Divide into logical steps
  • Build complexity gradually
  • Validate intermediate steps

Data Analysis Applications 📊

  • Cleaning, EDA, Analysis
  • Xsec OLS, panelFE, event time
  • Does OLS \(y\), \(x\) make sense?

3. Break Complex Tasks: Tactics

General Tactics ⚡

  • Create checkpoints
  • Request validations
  • Chain related tasks

Data Analysis Tactics 📊

  • Verify distributions before tests
  • Check assumptions step by step
  • Build analysis pipeline incrementally
  • Validate transformations at each stage

4. Step by step

General Tactics ⚡

  • Request step-by-step reasoning
  • Explain assumptions
  • Validate interim results

Data Analysis Tactics 📊

  • Show data exploration process
  • Justify method selection
  • Document assumption checks
  • Explain statistical decisions

5. External Tools

General Tactics ⚡

  • Use code execution (Python)
  • use interactive tools like /canvas,
  • Share full folders like Claude projects

Data Analysis Tactics 📊

  • Ask for direct solution
  • Upload for code stack

6. Test Systematically

General Tactics ⚡

  • Validate against known results
  • Check validity
  • Compare methods

Data Analysis Tactics 📊

  • Cross-validation
  • Benchmark datasets
  • Statistical assumption verification
  • Sensitivity analysis