Chapters

Each chapter provides summaries, outline, slides, and case study links.

Table of Contents

Part Ch. Title Links
I: Data Exploration 01 Origins of Data slides
  02 Preparing Data for Analysis slides
  03 Exploratory Data Analysis slides
  04 Comparison and Correlation slides
  05 Generalizing from Data slides
  06 Testing Hypotheses slides
II: Regression Analysis 07 Simple Regression slides
  08 Complicated Patterns and Messy Data slides
  09 Generalizing Results of a Regression slides
  10 Multiple Linear Regression slides
  11 Modeling Probabilities slides
  12 Regression with Time Series Data slides
III: Prediction 13 A Framework for Prediction slides
  14 Model Building for Prediction slides
  15 Regression Trees slides
  16 Random Forest and Boosting slides
  17 Probability Prediction and Classification slides
  18 Forecasting from Time Series Data slides
IV: Causal Analysis 19 A Framework for Causal Analysis slides
  20 Designing and Analyzing Experiments slides
  21 Regression and Matching with Observational Data slides
  22 Difference-in-Differences slides
  23 Methods for Panel Data slides
  24 Appropriate Control Groups for Panel Data slides

Downloads: Full contents (PDF), Index (PDF), Sample Chapters 10 & 14
Slides: For LaTeX versions, contact us.

PART I: DATA EXPLORATION

Chapter 01: Origins of Data

This chapter is about data collection and data quality. More

chapter outline → slides CH01A CH01B CH01C CH01B1 CH01B2 CH01B3 CH01C1 CH01C2 CH01C3

Section Title
1.1 What Is Data?
1.2 Data Structures
1.A1 CASE STUDY – Finding a Good Deal among Hotels: Data Collection
1.3 Data Quality
1.B1 CASE STUDY – Comparing Online and Offline Prices: Data Collection
1.C1 CASE STUDY – Management Quality and Firm Performance: Data Collection
1.4 How Data Is Born: The Big Picture
1.5 Collecting Data from Existing Sources
1.A2 CASE STUDY – Finding a Good Deal among Hotels: Data Collection
1.B2 CASE STUDY – Comparing Online and Offline Prices: Data Collection
1.6 Surveys
1.C2 CASE STUDY – Management Quality and Firm Size: Data Collection
1.7 Sampling
1.8 Random Sampling
1.B3 CASE STUDY – Comparing Online and Offline Prices: Data Collection
1.C3 CASE STUDY – Management Quality and Firm Size: Data Collection
1.9 Big Data
1.10 Good Practices in Data Collection
1.11 Ethical and Legal Issues of Data Collection
1.12 Main Takeaways
  Practice Questions
  Data Exercises
  References and Further Reading

Chapter 02: Preparing Data for Analysis

This chapter is about preparing data for analysis: how to start working with data. More

slides CH02A CH02B CH02C

Chapter 03: Exploratory Data Analysis

The chapter starts with exploratory data analysis is important. More

slides CH03A CH03B CH03C CH03D CH03U1

Chapter 04: Comparison and Correlation

Most methods of data analysis are based on comparing values of one variable, y, across observations with different values of another variable, x, or more such variables. This chapter introduces simple methods of such comparison. More

slides CH04A

Chapter 05: Generalizing from Data

This chapter introduces the conceptual issues with generalizing results from our data to the general pattern we care about and methods of statistical inference. More

slides CH05A

Chapter 06: Testing Hypotheses

This chapter introduces the logic and practice of testing hypotheses. More

slides CH06A CH06B

PART II: REGRESSION ANALYSIS

Chapter 07: Simple Regression

In this chapter, we introduce simple non-parametric regression and simple linear regression. More

slides CH07A

Chapter 08: Complicated Patterns and Messy Data

The first part of this chapter covers how linear regression analysis can accommodate nonlinear patterns. More

slides CH08A CH08B CH08C

Chapter 09: Generalizing Results of a Regression

This chapter discusses the methods of generalizing results of a linear regression from our data to the general pattern we care about. More

slides CH09A CH09B

Chapter 10: Multiple Linear Regression

This chapter introduces multiple regression. More

slides CH10A CH10B

Chapter 11: Modeling Probabilities

This chapter introduces probability models that have a binary dependent variable. More

slides CH11A CH11B

Chapter 12: Regression with Time Series Data

In this chapter we discuss the opportunities and challenges brought about by regression analysis of time series data and how to address those challenges. More

slides CH12A CH12B

PART III: PREDICTION

Chapter 13: A Framework for Prediction

This chapter introduces a framework for prediction. More

slides CH13A

Chapter 14: Model Building for Prediction

This chapter discusses how to build regression models for prediction and how to evaluate the predictions they produce. More

slides CH14A CH14B

Chapter 15: Regression Trees

This chapter introduces the regression tree, an alternative to linear regression for prediction purposes that can find the most important predictor variables and their interactions and can approximate any functional form automatically. More

slides CH15A

Chapter 16: Random Forest and Boosting

This chapter introduces two ensemble methods based on regression trees: the random forest and boosting. More

slides CH16A

Chapter 17: Probability Prediction and Classification

This chapter introduces the framework and methods of probability prediction and classification analysis for binary y variables. More

slides CH17A

Chapter 18: Forecasting from Time Series Data

This chapter discusses forecasting: prediction from time series data for one or more time periods in the future. More

slides CH18A CH18B

PART IV: CAUSAL ANALYSIS

Chapter 19: A Framework for Causal Analysis

This chapter introduces a framework for causal analysis. More

slides CH19A

Chapter 20: Designing and Analyzing Experiments

This chapter discusses the most important questions about designing an experiment and analyzing data from an experiment to estimate the average effect of an intervention. More

slides CH20A CH20B

Chapter 21: Regression and Matching with Observational Data

In this chapter we discuss how to condition on potential confounder variables in practice, and how to interpret the results when our question is causal. More

slides CH21A

Chapter 22: Difference-in-Differences

This chapter introduces difference-in-differences analysis, or diff-in-diffs for short, and its use in understanding the effect of an intervention. More

slides CH22A

Chapter 23: Methods for Panel Data

This chapter introduces the most widely used regression methods to uncover the effect of an intervention when observational time series (tseries) data or cross-section time-series (xt) panel data is available with more than two time periods. More

slides CH23A CH23B

Chapter 24: Appropriate Control Groups for Panel Data

This chapter discusses how data analysts can select a subset of the untreated observations in the data that are the best to learn about the counterfactual, and when that needs to be a conscious choice instead of using all available observations in the data. More

slides CH24A CH24B