Reading recommendations
Books
Popular but high quality books on data
- Nate Silver (2012): The signal and the noise A great book from a leading expert in polling and sports statistics on prediction. A big picture including statistical and other kinds of predictions; a must read for all who want to do predictive analytics. [Recommended]
- Hans Rosling et al. (2018), Factfulness: Ten Reasons We’re Wrong About the World–and Why Things Are Better Than You Think) A book summarizing decades of public advocacy from the late doctor and epidemilogist Hans Rosling and his collaborators to understand the world around us by making sense of cross-country data. A must read for everyone, really.
- David Salsburg Lady tasting Tea - How Statistics revolutionized science in the twentieth century (2002) A great book about the history of statistics and statistical ideas with many great stories. A must read for statistics nerds.
- Philip Tetlock, Superforecasting: The Art and Science of Prediction A great book summarizing some of the research and ideas of one of the leading experst in prediction. Not explicitely about statistical predictions; more of a big picture reading for those who want to evaluate predictions.
- Nassim Nicholas Taleb: Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets A fascinating walk through of examples discussing the role of luck and human decision making. Shows how really really important randomness is.
- Seth Stephens-Davidowitz (2018) Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are So you think you can use data to answer some important questions? Watch out people often hide stuff. Look at what they do, not what they say. An amazing collection of stories.
- Cathy O’Neil, Weapons of Math Destruction A more sceptic take on the role of algorithms
- Albert-László Barabási (2014), Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life - a great intro into networks
- Gary Smith (2016) Standard Deviations: Flawed Assumptions, Tortured Data and Other Ways to Lie with Statistics looks interesting.
Books related to decison-making with data
- Alex Edmans (2024) May Contain Lies: How Stories, Statistics, and Studies Exploit Our Biases―And What We Can Do about It. maycontainlies.com One of planned reading for summer 2024
- Ethan Mollick (2024) Co-Intelligence: Living and Working with AI Super useful book on how we might integrate work with AI. Includes a supernatural animal.
- Daniel Kahneman (2011) Thinking fast and slow A great book summarizing a live’s resarch of the economics Nobel-winner psychologist.
- Michael Pollan (2008): In defense of food) A great book from an investigative journalist on what we should eat and why, with a very good description of what nutrition research can and cannot uncover using observational data.
- Andrew Leigh (2018): Randomistas: How Radical Researchers Are Changing Our World Interesting review on experiments in business, as well as government. From an academic/politician.
- Michael Luca and Max H. Bazerman (2020) The Power of Experiments: Decision Making in a Data-Driven World New dawn of experiments using large datasets with a focus on testing at businesses such as Airbnb or Uber.
- Carl Bergstrom Jevin West (2021) Calling-bullshit A fantastic book based on a very famours course will help you see through deception by statistics.
- Tim Harford (2021) Data Detective Great storytelling on how we use and feel about data and statistics. FYI, starts with a quote from the Empire Strikes Back.
Sports and data
- Stefan Szymanski Money and Soccer and Simon Kuper and Stefan Szymanski Socceronomics Two great books on understanding football via data.
- Michael Lewis Moneyball Super famous book on how finding mispriced players can give an edge is baseball.
- Chris Anderson and David Sally (2013) The Numbers Game: Why Everything You Know About Soccer Is Wrong I just realized on of the chapters inspired a case study in our book. If you know which one, DM me, and we’ll get a drink sometime.
- Yves Dominicy and Christophe Ley (eds) (2023) What We Can Learn from Sports Data - nice collection on sport analytics.
- Ben Lindbergh and Sam Miller (2017) The Only Rule Is It Has to Work: Our Wild Experiment Building a New Kind of Baseball Team - suggested by a dear reader, I’m excited to have a look.
- Chris Wiggins and Matthew L. Jones (2023) How Data Happened: A History from the Age of Reason to the Age of Algorithms Looks interesting – history of data and its technical, political, and ethical impact.
Intro Data Science / statistics books
- Andrij Burkov: The 100 page Machine Learning book A popular and concise review
- Roger Peng and Elizabeth Matsui The Art of Data Science Intro review on the steps of analyzing data.
- David Spiegelhalter The Art of Statistics A great review of some key statistics concept from a great statitsician. A very nice introduction to any data, stats or metrics course [Recommended]
- Ian Hacking (1990) The Taming of Chance Okay, I have not read it. But it promises to take you over the eremergence of statistics, starting in the 19th centrury. Suggested by a dear reader.
- Elena Llaudet and Kosuke Imai (2022) Data Analysis for Social Science: A Friendly and Practical Introduction Concepts and skills indeed
- Ethan Bueno de Mesquita and Anthony Fowler (2021) Thinking Clearly with Data: A Guide to Quantitative Reasoning and Analysis Nice intro to data science, non-technical focusing on ideas such as what is a relationship (in data…)
Data vizualization
- Cole Nussbaumer Knaflic: Storytelling with Data
- Kieran Healy Data Visualization - A practical introduction
- Claus Wilke Data vizualization
- Alberto Cario How Charts Lie: Getting Smarter about Visual Information
- Jonathan Schwabish (2021) Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks – practical guide to dataviz tools; more advanced.
- Online and in print – The Economist Graphic Detail - amazing resource with really good explanations
Our book
Okay, so you have read some nice books. Why not read our book:
- Gábor Békés and Gábor Kézdi (2021) Data Analysis for Business, Economics, and Policy :-)
More advanced stuff
Advanced/techincal books on data science and prediction
- Ajay Agrawal, Joshua Gans and Avi Goldfarb, Prediction Machines: The Simple Economics of Artificial Intelligence
- Eric Siegel Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
- Bradley Efron and Trevor Hastie , 2016 Computer Age Statistical Inference - intense stats book with Big Data in mind
- Christopher M. Bishop Pattern Recognition and Machine Learning
- Nina Zumel and John Mount Practical Data Science with R – nice collection of coding/analysis ideas - often related to our book.
- Hyndman and Athananasopoulos, 2020 Forecasting: Principles and Practice – useful time series book, matching how we think about time series, a good way to continue.
- Kelleher, John D. Brendan Tierney, 2018 Data Science
- Sarah Guido and Andreas Müller, 2016 Introduction to Machine Learning with Python: A Guide for Data Scientists
Advanced/technical books on causal inference
- Joshua Angrist and Jörn-Steffen Pischke (2009) Mostly Harmless Econometrics The book that started it all: talking about key econometrics tools in a precise yet accessible and focused way. Aimed at post-graduate economics student.
- Scott Cunningham, 2020 The Mixtape Advanced, formal but highly accessible discussion of key tools of causal inference using examples from some great academic papers.
- Judea Pearl The Book of Why - intermediate book on causality, with interesting stories and great care into developing theoretical structures and measurement of causal links.
Advanced book on the intersection of Econometrics and Machine Learning
- Martin Huber, 2023 Causal Analysis – Impact Evaluation and Causal Machine Learning with Applications in R a great new accessible yet advanced book, with yes, applications. Includes application with Machine Learning.
- Felix Chan and László Mátyás edited a great collection (2022) Econometrics with Machine Learning on well, how machine learning infused causal methods for a variety of areas such as policy evulation or development studies.
Other post course book
- Marc F. Bellemare (2022) Doing Economics: What You Should Have Learned in Grad School—But Didn’t title has it. How to extend what you learnt, especially for Economics students. I know folks who read it on an airpline ride to job interview.
- Lionel Page (2023) Optimally Irrational has data yes, but also a lot of economics. I’m yet to read it, but like Lionel’s work a lot. If you liked Kahnemann and Tversky, you will like this too.
Blogs and more
Interesting, non-technical articles
- Roger Peng on data science principles
- McKinsey’s non-technical discussion of machine learning
- Mike Yeomans in Harvard Business Review, an older but good piece What every managers should know about ML
- How Uber uses ML/AI in high level piece – Uber on Medium post by Jamal Robinson
- David Donoho 50 years of data science
- Susan Athey in Science (2017) Beyond prediction: Using big data for policy problems
- American statistical organization on research and the p-value. Statistical Significance and the Dichotomization of Evidence
- Time series forecasting competition materials https://www.m4.unic.ac.cy/
Blog posts
- Roger Peng on good data science
- NYT Upshot on Polling errors
- 538 on nutrition. https://fivethirtyeight.com/features/you-cant-trust-what-you-read-about-nutrition/
- Nick Barrowman on why data is not independent from judgement Why Data is never raw
Podcasts, blogs to follow
- http://nssdeviations.com/ - The Data Science Podcast Roger Peng and Hilary Parker talk about the latest in data science and data analysis in academia and industry. [recommended]
- https://simplystatistics.org/ - A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek
- http://andrewgelman.com/ - Statistical Modeling, Causal Inference, and Social Science
Practice data and code
- Nice collection of data collections - https://www.columnfivemedia.com/100-best-free-data-sources-infographic
- Weekly newsletter - tinyletter.com/data-is-plural
- Nicely searchable source - public.enigma.com/#data-connections
- Nice educational collection of coding http://Idre.ucla.edu
- Very nice initiative for collaborative data projects. Include many datasets with info. https://data.world/
- This is a collection of ML/AI papes with code. Mostly very technical - paperswithcode.com/
- Amazing collection by Hadley Wickham - DS Stats337
- U Washington Data Lab - Intetrviews on business data viz Enterprise-analysis-interviews