Data Analysis with AI: Prompting
Prompting for Data Analysis
2025-01-01
Prompting
It is helpful to think about prompts
But it’s not magic, just focus on defining what you want.
Think as giving instructions to someone else (= The RA approach)
Prompting for Effective LLM Use
- AI model is a prediction machine. More input yields better prediction.
- Clear communication with LLMs is crucial for accurate results
- Specific, structured prompts lead to better outputs
- Different strategies for different analytical needs
Six Key Prompting Strategies
- Write clear instructions
- Provide reference materials
- Break complex tasks into subtasks
- Step by step reasining
- Use external tools
- Test systematically
Source: openai guide to prompt engineering
1 Clear Instructions
General Principles
- Be specific about style, requirements
- Include relevant context and constraints
- Use example formats when needed
Data Analysis Apps 📊
- Define audience (academic/business/public)
- Define desired statistical approaches
- Clarify output format (tables, graphs, reports)
1 Clear Instructions Implementation
General Tactics ⚡
- Adopt personas
- Specify output format explicitly
- Define scope clearly
- Ask for specific extensions
Example Prompts 💡
- “Write a report for government officials”
- “Format correlation matrices as heatmaps”
- “Show full statistical tests including p-values”
- “Report extreme values and influential points”
2. Provide References
General Tactics ⚡
- Share relevant documentation
- Include descriptions
- Require citations
Data Analysis Apps 📊
- Reference statistical methodology
- Share domain context and assumptions
- Provide data quality metrics
2. Provide References: Tactics
General Tactics ⚡
- Insert documentation at start
- Include sample outputs
Data Analysis Tactics 📊
- Link/upload statistical papers or paragraphs for methods
- Share data dictionaries and variable descriptions
3. Break Complex Tasks
General Tactics ⚡
- Divide into logical steps
- Build complexity gradually
- Validate intermediate steps
Data Analysis Applications 📊
- Cleaning, EDA, Analysis
- Xsec OLS, panelFE, event time
- Does OLS \(y\), \(x\) make sense?
3. Break Complex Tasks: Tactics
General Tactics ⚡
- Create checkpoints
- Request validations
- Chain related tasks
Data Analysis Tactics 📊
- Verify distributions before tests
- Check assumptions step by step
- Build analysis pipeline incrementally
- Validate transformations at each stage
4. Step by step
General Tactics ⚡
- Request step-by-step reasoning
- Explain assumptions
- Validate interim results
Data Analysis Tactics 📊
- Show data exploration process
- Justify method selection
- Document assumption checks
- Explain statistical decisions
6. Test Systematically
General Tactics ⚡
- Validate against known results
- Check validity
- Compare methods
Data Analysis Tactics 📊
- Cross-validation
- Benchmark datasets
- Statistical assumption verification
- Sensitivity analysis