Week 01: LLM Review
Introduction to Large Language Models and AI in Data Analysis
Week 01: LLM Review
Introduction to Large Language Models and their applications in data analysis
Learning Objectives
By the end of this session, students will:
- Understand core concepts and architecture behind large language models (LLMs)
- Learn how to incorporate AI into data analysis workflows effectively
- Distinguish between “Cyborg” and “Centaur” approaches to AI collaboration
- Experience the “jagged frontier” of LLM capabilities through hands-on practice
- Critically assess capabilities and limitations of AI tools in academic contexts
- Develop foundational prompt engineering skills for data analysis tasks
Preparation / Before Class
📚 No Required Reading
This is Week 1 - come ready to explore and discuss!
Reflection Preparation:
- Think about your current experience with AI tools (if any)
- Consider examples where you’ve encountered AI in your work/studies
- Identify one data analysis task you find time-consuming or repetitive
Optional Background:
- Ethan Mollick: “Co-Intelligence: Living and Working with AI” (Chapters 1-2)
Class Material
📊 Core LLM Concepts (60 min)
Slideshow: LLM Concepts and Applications
Key Topics Covered:
- What are LLMs? Statistical models predicting next tokens from massive training data
- The Transformer Revolution: How 2017’s “Attention is All You Need” changed everything
- Context Windows: why this matters for data analysis
- Training Process: Resources and human feedback
- Collaborative Frameworks: How to integrate human and AI work
🔧 Hands-on: FT Graph Challenge (25 min)
The Challenge:
Recreate the Financial Times tariff/yield visualization as accurately as possible using AI assistance.
- Note the sophisticated design: multiple data series, annotations, professional styling
- Consider: What would you need to recreate this manually vs. with AI assistance?
Learning Approach:
- Analyze the original - what data sources, what design elements?
- Use AI to find data and generate initial code
- Iterate and refine based on results
Discussion Questions
End of Week Reflection:
Personal AI Experience: How have you already incorporated AI into your routine? Which model feels most natural to you?
Error Management: How do you currently deal with AI hallucinations or imperfect answers? What strategies emerged during the FT graph exercise?
The Jagged Frontier: What tasks do you expect AI to excel at? Where do you think it will struggle? Did the visualization exercise match your expectations?
Assignment
Due: Before Week 2
Choose your approach:
- Option A (Standard): Use AI to find data and recreate the visualization with same design principles
- Option B (Advanced): Build an interactive dashboard that updates dynamically
Background, Tools and Resources
🤖 Recommended for Data Analysis:
Platform | Strengths | Best For | Cost |
---|---|---|---|
ChatGPT 4o/o1 | Excellent coding, data analysis tools | R/Python code, statistical methods | $20/month |
Claude 4 Sonnet | Great reasoning, document analysis | Research, writing, complex prompts | $20/month |
GitHub Copilot | Integrated coding assistance | Real-time code completion | $10/month |
Free Tier Capabilities:
- Most course tasks achievable with free versions
- Paid subscriptions recommended for intensive work – maybe try during this course
- Start free, upgrade if you hit limits
Detailed Comparison:
Academic Integrity and AI use
Course Philosophy:
- AI as Assistant: Use AI to enhance your capabilities, not replace your thinking
- Maintain Authority: You remain responsible for all outputs and interpretations
- Verify Everything: Always validate AI suggestions, especially statistical claims
- Document Usage: Keep track of how AI helped – useful for methodology sections
Red Lines:
- Never submit unverified AI output as your own work
- Always understand the analysis you’re presenting
- Cite AI assistance appropriately (we’ll discuss standards as they evolve)
The Goal:
Become a more capable data analyst who can leverage AI tools effectively while maintaining scientific rigor.
Next Week:
Week 2 - Data Discovery and Documentation where we’ll use AI to understand and document complex datasets.
Some personal comments on AI and this class
- I needed to rewrite, edit the slideshow every other month. Bloody hell, this course material is tricky. (While Gosset’s t-test has been around since 1908…)
- It was an AI (Claude Sonnet 4.0) that suggest to include the last bit on Academic Integrity and it wrote many lines such as having red lines. Hahh.