Coding for Data Analysis – teaching in the era of AI
How to teach coding for data analysis in the era of LLM-based AI assistants? Focusing on in-person college education, especially at senior undergraduate and graduate level.
This version: 0.1, 2024-08-08 (Gábor Békés, bekesg@ceu.edu)
I. Starting points
-
We teach coding for Data Analysis in different languages.
-
People are now used to doing stuff with LLMs
- Direct communication, chats – such as ChatGPT or Claude
- Use built-in copilots like Github copilot -- tools that monitor your coding and make suggestions in real-time.
-
Copilots are built in modern IDEs like VScode, RStudio, or Anaconda. The default setting often is that these are switched on.
-
Currently, LLMs produce excellent answers to easy/prevalent coding prompts but need debugging for less frequent use cases
-
LLMs may be used differently, for instance
- As a source of input, improved Google search
- As a source of code suggestion, improved Stackoverflow
- As a tutor (explaining concepts), like a TA
- As a code writer for a given problem, this is a new possibility
II. Our approach
-
We believe that learning to code has value even if LLMs will do a great deal of assistance and even if the next iteration of AI will be markedly better in coding.
- It will help understand what code is suggested by LLM, help finetune it
- It has intrinsic value when thinking in terms of a code. This will be important for prompting or designing workflows.
-
Our teaching material will be mostly the same
- Based on https://github.com/gabors-data-analysis/da-coding-python
- But move towards the “why” we do certain tasks (such as use functions) rather than their syntax.
-
What is different is the examination and motivation of students.
III. The proposal: Staggered phase-in
We propose a 3-step phase-in of LLMs. Basically cut the term(s) into three distinct time periods, with different rules at each phase. (We’ll have 6 weeks at CEU.)
- Phase 1: Prohibit AI.
- Focus on core structures, objects, and basic skills.
- AI shall not be used in class, and encouraged to be avoided at home
- The exam as pen and paper style.
- Phase 2: Tolerate AI:
- Focus on core problem-solving skills.
- AI shall not be used in class, but it is encouraged as tutor (not as copilot).
- In-class short quiz without AI. The exam is online, so AI cannot be avoided, but debugging is key.
- Phase 3: Encourage AI:
- Focus on projects.
- AI is now a copilot that will raise the bar.
- Exam is assignment-focused, where high-quality and designed products are expected. AI use is assumed as default.
IV. Data Analysis with AI
Yes. We are creating a brand new course material. Coming soon.