Large Language Models: Key Concepts
2025-02-18
Teaching Data Analysis courses + prepping for 2nd edition of Data Analysis textbook
AI is both amazing help and scary as #C!*
This is a class to
Share cool stuff
Key Breakthrough: “Attention is All You Need” (2017)
Vaswani et al. (2017). “Attention Is All You Need”. NeurIPS 30.
Tokenization example showing how text is processed
Note
Larger context = Better understanding but higher computational cost
1 token = 4 characters, 4 tokens= 3 words (In English)
ChatGPT 2022 window of 4,000 tokens.
ChatGPT 2025 – 128,000 tokens = 250p book
The Centaur and Cyborg Approaches based on Co-Intelligence: Living and Working with AI By Ethan Mollick
Co-Intelligence
Image created Claude.ai
Image created in detailed photorealistic style by Ralph Losey with ChatGPT4 Visual Muse version
Image created in detailed photorealistic style by Ralph Losey with ChatGPT4 Visual Muse version
Planning 👤 Design research plan
🤖 Suggest variables
Data Prep 👤 Define cleaning rules
🤖 Execute cleaning code 👤 Validate cleaning
Analysis 👤 Choose methods
🤖 Implement code
👤 Validate results
Reporting 👤 Outline findings
🤖 Draft sections
👤 Finalize
Planning 👤🤖 Interactive brainstorming
👤🤖 Collaborative refinement
Data Prep 👤🤖 Iterative cleaning
👤🤖 Real-time modification
👤🤖 Joint discovery
Analysis 👤🤖 Exploratory conversation
👤🤖 Dynamic adjustment
👤🤖 Continuous validation
Reporting 👤🤖 Co-writing process
👤🤖 Real-time feedback
👤🤖 Iterative improvement
Prompt as small task
Built into coding
Specialized tools (ChatGPT Canvas, Claude Projects)
Anthropic “prompt generator” to optimize the prompts that via Anthropic Console Dashboard (click “Generate a Prompt”).
Agents
Workspace | Key Features |
---|---|
Anthropic Claude Artifacts | • Dedicated output window • Supports text, code, flowcharts, SVG graphics, websites, dashboards • Real-time refinement and modification • Sharing and remixing capabilities |
ChatGPT Canvas | • Separate collaboration window • Text editing and coding capabilities • Options for edits, length adjustment, reading level changes • Code review and porting features |
OpenAI Advanced Data Analysis | • Data upload and analysis • Visualization capabilities • Python code execution in back end • Error correction and refinement |
Source: Korinek “Generative AI for Economic Research: Use Cases and Implications for Economists,” Journal of Economic Literature 61(4) December 2024 Update 1–74
Workspace | Key Features |
---|---|
Claude Analysis Tool | • Fast exploratory data analysis • Interactive visualizations with real-time adjustments |
Google NotebookLM | • Document upload for research grounding • Citation and quote provision • “Deep dive conversation” podcast generation |
Microsoft Copilot | • Assistance in Word, Excel, PowerPoint, etc. • Data analysis, formula construction |
Google Gemini for Workspace | • Integration with Google’s office suite, Assistance in Docs etc |
Cursor AI Code Editor | • AI-assisted coding • Code suggestions and queries, optimization, debugging • Real-time collaboration |
Source: Korinek “Generative AI for Economic Research: Use Cases and Implications for Economists,” Journal of Economic Literature 61(4) December 2024 Update 1–74
Image created in detailed photorealistic style by Ralph Losey with ChatGPT4 Visual Muse version
Stochastic = when prompted repeatedly, LLMs may give different answers
Parrot = LLMs can repeat information without understanding
Philosophy = to what extent do they understand the state of the world?
U.S. Copyright Office 2025 Jan report Copyright and Artificial Intelligence Part 2: Copyrightability: copyright protection is intended for human-created works.
Note
“Do not provide sufficient human control to make users of an AI system the authors of the output. Prompts essentially function as instructions that convey unprotectable ideas. While highly detailed prompts could contain the user’s desired expressive elements, at present they do not control how the AI system processes them in generating the output.”
Note
Artificial intelligence software, such as chatbots or other large language models, may not be listed as an author. If artificial intelligence software was used in the preparation of the manuscript, including drafting or editing text, this must be briefly described during the submission process, which will help us understand how authors are or are not using chatbots or other forms of artificial intelligence. Authors are solely accountable for, and must thoroughly fact-check, outputs created with the help of artificial intelligence software.
Literature Review & Summarization
AI helps quickly find relevant papers, summarize key arguments, and extract citations, saving time in reviewing large bodies of work.
Data Analysis & Coding Assistance
AI supports coding in R, Python, and Stata, assisting with debugging, automating repetitive tasks, and suggesting statistical methods for empirical research.
Writing & Editing Support
AI aids in drafting, structuring, and refining academic writing, improving clarity, grammar, and coherence while maintaining academic integrity.
All the time, ChatGPT 4o (Canvas), Claude (Projects), ChatGPT o1 (rare), both paid tiers
Github Copilot in VSCode and Rstudio
This presentation is massively helped by AI
Glossary of LLM terms Glossary of LLM Terms
What’s an LLM context window and why is it getting larger? IBM research on context window
Strategy in business Build a winning AI strategy, HBR 2023
This version: 2025-02-18
Gabors Data Analysis with AI - 2025-02-18 v0.3