Chapter 10: Multiple Linear Regression

Understanding associations while controlling for multiple factors

Chapter 10: Multiple Linear Regression

Chapter Motivation

Gender wage gaps. There is a substantial difference in the average earnings of women and men in all countries. You want to understand more about the potential origins of that difference, focusing on employees with a graduate degree. You have data on a large sample of employees with their earnings and characteristics like age and degree type. How can you uncover gender differences that are not due to differences in these other characteristics?

Finding hotel deals. You’ve analyzed hotel prices in a city to find hotels that are underpriced relative to their distance from the city center. But hotels also differ in quality features related to price. How can you find hotels that are underpriced relative to all their features?


What You’ll Learn

This chapter introduces multiple linear regression – the most widely used method to uncover patterns of associations between variables. You’ll learn:

  • Why and when to use multiple regression
  • How to interpret coefficients in the presence of multiple explanatory variables
  • The concept of omitted variable bias and why it matters
  • Statistical inference with multiple regression
  • How to include categorical variables and interactions
  • Applications to causal analysis and prediction

πŸ“– Chapter Structure

This chapter is organized into 4 pages for optimal learning, with 6 case studies using real data:

Page 1: Foundation β†’ βœ…

Sections 10.1-10.3 - Why and when to use multiple regression (Section 10.1) - Multiple linear regression with two explanatory variables (Section 10.2) - Multiple regression and simple regression: Omitted variable bias (Section 10.3)

Case Study A1: Understanding the Gender Difference in Earnings
Multiple linear regression

What you’ll master: Core concepts of multiple regression and why controlling for other variables matters


Page 2: Statistical Inference β†’ βœ…

Sections 10.4-10.6 - Multiple linear regression terminology (Section 10.4) - Standard errors and confidence intervals (Section 10.5) - Hypothesis testing in multiple linear regression (Section 10.6)

Case Study A2: Understanding the Gender Difference in Earnings
Statistical inference

What you’ll master: Statistical inference in multiple regression and interpreting uncertainty


Page 3: Extensions β†’ βœ…

Sections 10.7-10.8 - Multiple linear regression with three or more explanatory variables (Section 10.7) - Nonlinear patterns and multiple linear regression (Section 10.8)

Case Study A3: Understanding the Gender Difference in Earnings
Nonlinear patterns and multiple linear regression

What you’ll master: Working with many variables and capturing nonlinear relationships


Page 4: Applications β†’ βœ…

Sections 10.9-10.12 - Qualitative right-hand-side variables (Section 10.9) - Interactions: Different slopes across groups (Section 10.10) - Multiple regression and causal analysis (Section 10.11) - Multiple regression and prediction (Section 10.12)

Case Studies: - A4: Gender Earnings - Qualitative variables - A5: Gender Earnings - Interactions - A6: Gender Earnings - Causal interpretation - B1: Hotel Prices - Prediction with multiple regression

What you’ll master: Categorical variables, interactions, and applying multiple regression to causal questions and prediction


🎯 Learning Objectives

By the end of this chapter, you will be able to:

  1. βœ… Identify questions best answered with multiple regression from available data
  2. βœ… Estimate multiple linear regression coefficients and present and interpret them
  3. βœ… Estimate appropriate standard errors, create confidence intervals and test coefficients
  4. βœ… Select variables to include in a multiple regression guided by the purpose of analysis
  5. βœ… Understand the relationship between multiple regression results and causal effects
  6. βœ… Use multiple regression for prediction and residual analysis
  7. βœ… Include categorical variables using dummy variables
  8. βœ… Use interaction terms to allow different relationships across groups

πŸ“Š Case Studies

This chapter includes 6 case studies using real data, distributed across the 4 pages:

Case Studies A1-A6: Gender Earnings Gap

πŸ“‚ Code Repository

  • Data: Current Population Survey (CPS), USA, 2014
  • Sample: 18,241 employees with graduate degrees (ages 24-65)
  • Question: Understanding the gender wage gap

The six case studies progressively build understanding: - A1 (Page 1): Basic multiple regression with age - A2 (Page 2): Statistical inference on coefficients - A3 (Page 3): Nonlinear age patterns - A4 (Page 4): Education categories - A5 (Page 4): Gender Γ— age interactions - A6 (Page 4): Causal interpretation and many covariates

Case Study B1: Hotel Prices

πŸ“‚ Code Repository

  • Data: ~217 hotels in Vienna, November 2017
  • Sample: Hotels with 3-4 stars within 8 miles of city center
  • Question: Finding underpriced hotels using multiple features
  • Location: Page 4

πŸ’» Code & Data

All case studies include: - R Code: Complete, reproducible analysis with tidyverse - Python Code: Python equivalents
- Stata Code: For regression-focused analyses - Datasets: Cleaned and ready to use - Codespaces: One-click cloud coding environment

πŸš€ Quick Start with Code

Click any β€œOpen in Codespace” button to: 1. Launch a pre-configured coding environment in your browser 2. Run all analyses without installing anything 3. Modify code and experiment with the data 4. See exactly how tables and figures were created

No setup required – just click and code!


πŸ€– AI Practice Tasks

Each page includes interactive AI practice tasks where you can: - Copy prompts to use with AI assistants - Get personalized explanations of concepts - Generate practice problems tailored to your learning - Check your understanding with worked examples

How to use: 1. Click β€œπŸ“‹ Copy & Open in AI Chat” on any AI task 2. Work through the explanation or problem 3. Return to the textbook to continue learning


⏱️ Time Estimates

  • Quick overview: 2-3 hours (read all pages, skim examples)
  • Deep learning: 6-8 hours (work through all examples and AI tasks)
  • With hands-on coding: 10-12 hours (replicate all analyses)
  • Complete mastery: 15+ hours (coding + practice problems + extensions)

Recommended pace: 1-2 pages per study session


πŸ“š Part II: Regression Analysis Context

Chapter 10 is part of the Regression Analysis section of the textbook:

Chapter 7: Simple Regression
Foundation – one explanatory variable

Chapter 8: Complicated Patterns and Messy Data
Nonlinear patterns and robust methods

Chapter 9: Generalizing Results of a Regression
Statistical inference in simple regression

β†’ Chapter 10: Multiple Linear Regression βœ… You are here
Multiple explanatory variables and controlling for covariates

Chapter 11: Modeling Probabilities
Binary outcomes and logistic regression

Chapter 12: Regression with Time Series Data
Temporal patterns and forecasting


πŸ“– Prerequisites

Required knowledge: - Chapter 7: Simple Regression (essential) - Chapter 9: Generalizing Regression Results (essential) - Basic statistics: mean, variance, correlation, standard deviation - Understanding of confidence intervals and hypothesis tests

Helpful but not essential: - Chapter 8: Complicated Patterns (for nonlinear models) - Linear algebra basics (not covered in this book) - Matrix notation (not used in this chapter)

Important Note on Sequence

Do not skip Chapter 7 and 9! Multiple regression builds directly on simple regression concepts. Without understanding simple regression, you will struggle with: - What regression coefficients mean - How to interpret standard errors and confidence intervals - The logic of hypothesis testing - The difference between correlation and causation


πŸŽ“ Study Strategies

For First-Time Learners

  1. Read sequentially – Don’t skip ahead; concepts build on each other
  2. Pause at examples – Try to interpret results before reading the interpretation
  3. Use AI tasks actively – They reinforce learning better than passive reading
  4. Focus on intuition first – Understand β€œwhy” before memorizing formulas
  5. Return to review boxes – They summarize key concepts

For Review or Reference

  1. Start with review boxes – Get the key concepts quickly
  2. Jump to specific sections – Use the detailed table of contents
  3. Check the glossary – Quick definitions of all terms
  4. Review case study summaries – See applications without details

For Instructors

  1. Assign pages progressively – 4 natural units for homework/discussion
  2. Use AI tasks as assignments – Students submit their AI conversations
  3. Focus on case study interpretation – Better than just running code
  4. Emphasize review boxes – Core concepts students must master
  5. Page 4 is comprehensive – May need two class sessions to cover fully

πŸ“ˆ What Makes This Chapter Unique

Compared to other textbooks:

  1. Integrated case studies – Same dataset across 6 studies showing progression
  2. Practical focus – Always connects theory to real applications
  3. Modern tools – Robust standard errors, emphasis on interpretation
  4. Honest about causality – Clear about what regression can and cannot show
  5. Visual learning – Extensive use of graphs and intuitive explanations
  6. Accessible math – Formulas included but explained intuitively

πŸ” Key Concepts Preview

By the end of this chapter, you’ll deeply understand:

Core concepts: - Multiple linear regression equation and interpretation - Conditional vs. unconditional differences - Controlling for covariates - Omitted variable bias

Statistical concepts: - Standard errors in multiple regression - Confidence intervals and hypothesis tests - F-tests for joint hypotheses - Multicollinearity and its consequences

Practical tools: - Dummy variables for categories - Interaction terms for different slopes - Nonlinear patterns in multiple regression - Prediction and residual analysis

Big picture: - When multiple regression helps with causality - When to focus on prediction vs. causal inference - How to select variables for different purposes - Limits of observational data


πŸŽ‰ Ready to Start?

Begin with Page 1: Foundation β†’

Or explore: - Page 2: Statistical Inference β†’ - Page 3: Extensions β†’
- Page 4: Applications β†’ - View Glossary – Quick reference for all terms

Chapter complete! All 4 pages covering Sections 10.1-10.12 are now available.


πŸ’‘ Pro Tips for Success
  1. Keep a notebook – Write down key insights and questions
  2. Work through formulas – Don’t just read them, calculate examples
  3. Compare specifications – Notice how results change with different models
  4. Think causally – Always ask β€œwhat’s omitted?” when interpreting coefficients
  5. Visualize results – Draw graphs to understand patterns
  6. Discuss with peers – Explaining concepts helps you learn
  7. Apply to your data – Think how methods apply to your research

πŸ“š Book Information

Full Title: Data Analysis for Business, Economics, and Policy
Authors: GΓ‘bor BΓ©kΓ©s & GΓ‘bor KΓ©zdi
Publisher: Cambridge University Press (2021)
Interactive Edition: 2025

Resources: - πŸ“– Main textbook site - πŸ’» Code repository - πŸ“Š Datasets - πŸŽ“ Instructor resources