Data Analysis with AI 02: Prompting
Prompting for Data Analysis
2025-05-28
About me and this slideshow
- I am an economist and not an AI developer, expert, guru, evangelist
- I am an active AI user in teaching and research
- I teach a series a Data Analysis courses based on my textbook
- This project is closely related to concepts and material in the book, but can be consumed alone.
- This slideshow was created to help students and instructors active in data analysis in education, research, public policy or business
- Enjoy.
Prompting
It is helpful to think about prompts
But it’s not magic, just focus on defining what you want.
Think as giving instructions to someone else (= The RA approach)
Prompting for Effective LLM Use
- AI model is a prediction machine. More input yields better prediction.
- Clear communication with LLMs is crucial for accurate results
- Specific, structured prompts lead to better outputs
- Different strategies for different analytical needs
Six Key Prompting Strategies
- Write clear instructions
- Provide reference materials
- Break complex tasks into subtasks
- Step by step reasining
- Use external tools
- Test systematically
Source: openai guide to prompt engineering
1 Clear Instructions
General Principles
- Be specific about style, requirements
- Include relevant context and constraints
- Use example formats when needed
Data Analysis Apps 📊
- Define audience (academic/business/public)
- Define desired statistical approaches
- Clarify output format (tables, graphs, reports)
1 Clear Instructions Implementation
General Tactics ⚡
- Adopt personas
- Specify output format explicitly
- Define scope clearly
- Ask for specific extensions
Example Prompts 💡
- “Write a report for government officials”
- “Format correlation matrices as heatmaps”
- “Show full statistical tests including p-values”
- “Report extreme values and influential points”
2. Provide References
General Tactics ⚡
- Share relevant documentation
- Include descriptions
- Require citations
Data Analysis Apps 📊
- Reference statistical methodology
- Share domain context and assumptions
- Provide data quality metrics
2. Provide References: Tactics
General Tactics ⚡
- Insert documentation at start
- Include sample outputs
Data Analysis Tactics 📊
- Link/upload statistical papers or paragraphs for methods
- Share data dictionaries and variable descriptions
3. Break Complex Tasks
General Tactics ⚡
- Divide into logical steps
- Build complexity gradually
- Validate intermediate steps
Data Analysis Applications 📊
- Cleaning, EDA, Analysis
- Xsec OLS, panelFE, event time
- Does OLS \(y\), \(x\) make sense?
3. Break Complex Tasks: Tactics
General Tactics ⚡
- Create checkpoints
- Request validations
- Chain related tasks
Data Analysis Tactics 📊
- Verify distributions before tests
- Check assumptions step by step
- Build analysis pipeline incrementally
- Validate transformations at each stage
4. Step by step
General Tactics ⚡
- Request step-by-step reasoning
- Explain assumptions
- Validate interim results
Data Analysis Tactics 📊
- Show data exploration process
- Justify method selection
- Document assumption checks
- Explain statistical decisions
6. Test Systematically
General Tactics ⚡
- Validate against known results
- Check validity
- Compare methods
Data Analysis Tactics 📊
- Cross-validation
- Benchmark datasets
- Statistical assumption verification
- Sensitivity analysis
2025 May update
Very useful advice from Anthropic following the release of Claude 4
- Some generic advice, similar to OpenAI, like being explicit and having a target audience
- Some specific advice like using xml tags
Source: claude 4 best practice
Update: What is new in 2025 (2025-04-21)
Agentic models
- Various models “collaborate” and offer feedback
- can do more alone
Reasning model’s (open ai chatgpt o3) Prompting tips
GPT o3
- State objective + output format once; reasoning is automatic.
- Less need for “think step‑by‑step” hacks.
- Ask explicitly for citations / brevity / tables if required.
vs
GPT 4.5
- Classic “let’s think step‑by‑step” improves depth.
- Break large tasks into numbered subtasks to stay within context.
- Be explicit: “Give only runnable Python; no commentary” to control style.
Should we say thank you?
Should we say thank you?
Date stamp
This version: 2025-04-21 (minor edits from v0.1.2)
bekesg@ceu.edu