Course Datasets
There are three main case studies used throughout the course:
1. Simulated Austrian Hotels
A realistic simulated dataset of hotels across Austria for practicing data wrangling and table joins. Contains 8 related tables with hotels, cities, occupancy, tourism, and economic data.
Used in: Week 4 (Joining Tables)
Key features: Multiple join types, one-to-many relationships, composite keys
Files: 200 hotels across 10 Austrian cities with 2 years of monthly data
→ Explore Austrian Hotels Data
2. World Values Survey (WVS)
The 7th Wave of the World Values Survey dataset, cleaned and merged with World Bank GDP data. Available both as individual responses and country-level aggregations.
Used in: Week 2 (Data Documentation), Week 3 (Report Writing)
Key features: International survey data, economic indicators, multiple aggregation levels
Files: Individual responses (~2,000 sample) and country-year summaries
3. Football Manager Interviews
Post-match interview texts from football managers for sentiment analysis and NLP practice. Includes multiple rating systems for comparison and validation.
Used in: Week 5 (Text as Data I), Week 6 (Text as Data II)
Key features: Sentiment analysis, API integration, human vs AI comparison
Files: Interview texts, domain lexicons, rating comparisons
Data Access
All datasets are included in the course repository. Each folder contains: - Raw and processed data files - Detailed documentation and codebooks
- Example code for data processing - README files with usage instructions
Technical Notes
- Data formats: CSV, Excel (.xlsx), R scripts
- All code tested with R (tidyverse) and Python (pandas)
- Reproducible workflows with documented processing steps
- API examples require OpenAI account (budget ~$5 for course)