Week 06: Sentiment Analysis with AI

Using API to AI to create data from text

Published

May 20, 2025

Using API to AI to create data from text

Overview

Continue using text for research with AI. Still on sentiment analysis but now we focus on scaling text analysis with APIs: from 20 texts to hundreds and more.

Learning Outcomes

By the end of the session, students will:

Gain hands-on experience with sentiment analysis.
Have experience integrating NLP in research
Think about what is ground truth
Get a taste of building an analytical pipeline with API usage

Preparation / Before Class

🔧 API Setup

API Access Setup:

How to get AI API keys (OpenAI ChatGPT and Anthropic Claude)
Budget: ~$5 minimum for course exercises

API Learning Resources:

Introduction to APIs - fundamental concepts
Advanced API knowledge - how APIs work under the hood

📊 Preparation

Materials from Moodle/Course Repository:

Combined interview dataset (text_id level)
Game information with results (win/draw/loss encoding)
Combined student ratings from Week 5 assignments (aggregated, anonymized)
Domain lexicon ratings

Data Structure:

texts: Individual interview transcripts with metadata
games: Match results and context information (soccer has 3 outcomes: win, draw, loss – impcation for result encoding)
ratings: Human ratings, AI ratings, lexicon scores by text_id

Code Examples Available:

R implementation requires API key setup in R
Python implementation

text and prompts

sentiment guidelines

Other

More advanced knowledge on APIs – how APIs work

Review

💬 Assignment 5 Discussion (20 min)

Review of transformation options

lexicon based counting numbers –> generate a transparent script
Machine-learning classifiers, n-grams etc.
LLM (transformer-based) one-shot: treating the LLM like a giant classifier: you hand it raw text and ask classes of sentiment. (Deep contextual understanding—word embeddings, attention across the whole sentence etc decide the sentiment.) No separate sentiment lexicon; it’s all encoded in the model weights.
You can also take a pretrained LLM and continue training it on thousands of labeled review. See our example guidelines HERE

Sharing Experiences:

Human vs. AI rating differences: Where did you disagree most?
Rating challenges: What aspects of manager interviews were hardest to classify?
Consistency issues: Did you rate similar texts consistently? Did AI? Was there a consistent gap?
Domain knowledge impact: How much did football expertise affect your ratings?

Class Material

🔍 API Walkthrough Sessions (30 min)

Introduction

How to use APIs

Beginner-Friendly Examples:

Simple walkthrough with GDP data – uses World Bank and FRED APIs
Bit harder walkthrough with football data – uses FBREF soccer data. Guess the club for example.

More advanced stuff

More advanced knowledge on APIs – how APIs work

📈 Data Analysis Workshop (30 min)

Data Integration Task:

Take the aggregated file and ask AI for a readme. Discuss what is in the data
Compare human, domain lexicon and AI rating. For human and AI take the average.
Think of an interesting comparison using AI rating
Compare results by human and lexicon rating

Discussion 2

What is ground truth

How to integrate AI into research

combine data with text
think RQ and how you’d use AI

🎯 Advanced Applications (20 min)

Additional tasks if time permits

Predict gender and result

Note: Men’s teams have male managers, women’s teams have female managers in this dataset

Show AI all texts and ask to predict the gender of speaker
Show AI all texts and ask to predict the result (manager’s team won, drew, lost)

Discussion:

What linguistic cues might reveal gender? Are these reliable?

End of Week Discussion points

Ground Truth Problem: In sentiment analysis, what constitutes the “correct” answer? How do we validate when humans disagree?
API Integration: What are the benefits and costs (risks) of using API to scale text analysis?

Assignment

Assignment 6: Gender Classification Pipeline

Due: Optional extension exercise

Create a similar pipeline for predicting manager gender from interview text, including AI explanations of classification decisions.

Full Assignment Details

Some personal comments on AI and this class

AI helped writing the Python code and translating to R. But it needed a great deal of debugging: working with tests and building stable pipelines are hard.

--- title: "Week 06: Sentiment Analysis with AI" subtitle: "Using API to AI to create data from text" date: "2025-05-20" --- ::: {.hero-section} ::: {.container} ::: {.hero-title} Week 06: Sentiment Analysis with AI ::: ::: {.hero-subtitle} Using API to AI to create data from text ::: ::: ::: ![](../images/week6_pic.png) ## Overview Continue using text for research with AI. Still on sentiment analysis but now we focus on scaling text analysis with APIs: from 20 texts to hundreds and more. ### Learning Outcomes By the end of the session, students will: - Gain hands-on experience with sentiment analysis. - Have experience integrating NLP in research - Think about what is ground truth - Get a taste of building an analytical pipeline with API usage ## Preparation / Before Class ::: {.week-card .card} ::: {.card-header} 🔧 **API Setup** ::: ::: {.card-body} **API Access Setup:** - [How to get AI API keys](/week06/assets/get-ai-api-key.html) (OpenAI ChatGPT and Anthropic Claude) - **Budget:** ~$5 minimum for course exercises **API Learning Resources:** - [Introduction to APIs](/week06/assets/api-use.html) - fundamental concepts - [Advanced API knowledge](/week06/assets/api-advanced.html) - how APIs work under the hood ::: ::: ::: {.week-card .card} ::: {.card-header} 📊 **Preparation** ::: ::: {.card-body} **Materials from Moodle/Course Repository:** - Combined interview dataset (text_id level) - Game information with results (win/draw/loss encoding) - Combined student ratings from Week 5 assignments (aggregated, anonymized) - Domain lexicon ratings **Data Structure:** - **texts:** Individual interview transcripts with metadata - **games:** Match results and context information (soccer has 3 outcomes: win, draw, loss -- impcation for result encoding) - **ratings:** Human ratings, AI ratings, lexicon scores by text_id **Code Examples Available:** - [R implementation](/code/interviews/sentiment-analysis.R) requires [API key setup in R](/code/interviews/api-key.R) - [Python implementation](/code/interviews/sentimet-analysis.py) **text and prompts** - [sentiment guidelines](/week05/assets/sentiment-guidelines.html) **Other** - [More advanced knowledge on APIs](assets/api-advanced.html) -- *how* APIs work ::: ::: ## Review ::: {.week-card .card} ::: {.card-header} 💬 **Assignment 5 Discussion (20 min)** ::: ::: {.card-body} **Review of transformation options** * lexicon based counting numbers --> generate a transparent script * Machine-learning classifiers, n-grams etc. * LLM (transformer-based) one-shot: treating the LLM like a giant classifier: you hand it raw text and ask classes of sentiment. (Deep contextual understanding—word embeddings, attention across the whole sentence etc decide the sentiment.) No separate sentiment lexicon; it’s all encoded in the model weights. * You can also take a pretrained LLM and continue training it on thousands of labeled review. See our example guidelines [HERE](/week05/assets/sentiment-guidelines.html) **Sharing Experiences:** - **Human vs. AI rating differences:** Where did you disagree most? - **Rating challenges:** What aspects of manager interviews were hardest to classify? - **Consistency issues:** Did you rate similar texts consistently? Did AI? Was there a consistent gap? - **Domain knowledge impact:** How much did football expertise affect your ratings? ::: ::: ## Class Material ::: {.week-card .card} ::: {.card-header} 🔍 **API Walkthrough Sessions (30 min)** ::: ::: {.card-body} **Introduction** - [How to use APIs](assets/api-use.html) **Beginner-Friendly Examples:** - [Simple walkthrough with GDP data](assets/walkthrough-wb-fred.html) -- uses World Bank and FRED APIs - [Bit harder walkthrough with football data](assets/walkthrough-fbref.html) -- uses FBREF soccer data. Guess the club for example. **More advanced stuff** - [More advanced knowledge on APIs](assets/api-advanced.html) -- *how* APIs work ::: ::: ::: {.week-card .card} ::: {.card-header} 📈 **Data Analysis Workshop (30 min)** ::: ::: {.card-body} **Data Integration Task:** - Take the aggregated file and ask AI for a readme. Discuss what is in the data - Compare human, domain lexicon and AI rating. For human and AI take the average. - Think of an interesting comparison using AI rating - Compare results by human and lexicon rating **Discussion 2** - What is ground truth **How to integrate AI into research** - combine data with text - think RQ and how you'd use AI ::: ::: ::: {.week-card .card} ::: {.card-header} 🎯 **Advanced Applications (20 min)** ::: ::: {.card-body} **Additional tasks if time permits** **Predict gender and result** Note: Men's teams have male managers, women's teams have female managers in this dataset - Show AI all texts and ask to predict the gender of speaker - Show AI all texts and ask to predict the result (manager's team won, drew, lost) **Discussion:** What linguistic cues might reveal gender? Are these reliable? ::: ::: ## End of Week Discussion points - **Ground Truth Problem:** In sentiment analysis, what constitutes the "correct" answer? How do we validate when humans disagree? - **API Integration:** What are the benefits and costs (risks) of using API to scale text analysis? ## Assignment ::: {.callout-note icon="📝"} ## Assignment 6: Gender Classification Pipeline **Due:** Optional extension exercise Create a similar pipeline for predicting manager gender from interview text, including AI explanations of classification decisions. [Full Assignment Details](../assignments/assignment_06.html){.assignment-badge} ::: ## Some personal comments on AI and this class * AI helped writing the Python code and translating to R. But it needed a great deal of debugging: working with tests and building stable pipelines are hard.