Additional Reading Suggestions

Life did not stop when we finished the manusdcript. Actually, we keep finding great stuff. So let us make some suggestion for additional readings per chapters.

Part I

Chapter 01

On surveys, a great review is “How to run survey: A guide to creating your own identifying variation and revelealing the invisible”, NBER DP Stefanie Stantcheva.

Chapter 02

On the nature of variables, DuckDB has a great post on Data Types that for instance details length of numbers.

Chapter 06

On p-hacking, a fantastic story is about a body of research in social psychology written up in New York Times Magazine in 2017. The review of methods started in 2012 soon led to the birth of data investigation team Data Colada in 2013 by Profs Uri Simonsohn, Leif Nelson and Joe Simmons. They also wrote a paper on p-curve, a tool to analyze a body of literature. Read any other pieces of Data Colada on challenges to reproducibility. Amazing stuff.
Nice and fun piece on the birthday effect in pudding by Russel Samora. There is a commonly held view that you are more likely to die on your birthday? Is this true? Using millions of Massachusetts residents. Great illustration of step-by-step investigation.

Part II

Chapter 09

Regarding external validity, one way to check robustness is to take out 1% of the data and repeat the exercise. The simple take is to do it many times randonly + many times by edge of distribution of key variables. The smart take is suggested by Tamara Broderick, Ryan Giordano, Rachael Meager in “An Automatic Finite-Sample Robustness Metric: Can Dropping a Little Data Change Conclusions?” Hard-core statistics. Preprint

Chapter 10

on the Simpsons paradox, one article where it is front and center is about identifying who migrates published in the [Journal of Development Economics] (https://www.sciencedirect.com/science/article/abs/pii/S0304387824001081?via%3Dihub): Michael A. Clemens, Mariapia Mendola, Migration from developing countries: Selection, income elasticity, and Simpson’s paradox, Journal of Development Economics, Volume 171, 2024

Part III

Chapter 16

On the partial dependence plots, you may check out both a very useful review of R’s pdp package as well as Christoph Molnar’s Interpretable ML book.
On similar house prediction project, Julia Silge does a super nice job hoing through steps, showing graphs. Making great use of text. Boosted trees. Tidymodels and more. Check out her post and video: Predict housing prices in Austin TX with tidymodels and xgboost
Why Random Forest work. Useful paper Alicia Curth, Alan Jeffares, Mihaela van der Schaar

Part IV

Chapter 19

On DAGs and Potential outcomes, deep discussion for social scientists: Imbens, Guido W. 2020. “Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics.” Journal of Economic Literature, 58 (4): 1129-79. LINK to paper. An amazing review that includes Twitter quotes.

Chapter 19

Beetroot juice is said to be great. Review study Another review. For example, reference to an RCT with beetroot juice – dietary inorganic nitrate acutely reduces blood pressure. Study. Review in medical journal

Chapter 20

On A/B testing, some neat ideas in presentation by Harlan Harris, with code in R

Chapter 21

On the empirics of management, a great review study is The international empirics of management by Daniela Scur and her co-authors published in PNAS in 2024.