Popular but high quality books on data
- Nate Silver (2012): The signal and the noise A great book from a leading expert in polling and sports statistics on prediction. A big picture including statistical and other kinds of predictions; a must read for all who want to do predictive analytics. [Recommended]
- Hans Rosling et al. (2018), Factfulness: Ten Reasons We’re Wrong About the World–and Why Things Are Better Than You Think) A book summarizing decades of public advocacy from the late doctor and epidemilogist Hans Rosling and his collaborators to understand the world around us by making sense of cross-country data. A must read for everyone, really.
- David Salsburg Lady tasting Tea - How Statistics revolutionized science in the twentieth century (2002) A great book about the history of statistics and statistical ideas with many great stories. A must read for statistics nerds.
- Philip Tetlock, Superforecasting: The Art and Science of Prediction A great book summarizing some of the research and ideas of one of the leading experst in prediction. Not explicitely about statistical predictions; more of a big picture reading for those who want to evaluate predictions.
- Nassim Nicholas Taleb: Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets A fascinating walk through of examples discussing the role of luck and human decision making. Shows how really really important randomness is.
- Seth Stephens-Davidowitz (2018) Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are So you think you can use data to answer some important questions? Watch out people often hide stuff. Look at what they do, not what they say. An amazing collection of stories.
- Cathy O’Neil, Weapons of Math Destruction A more sceptic take on the role of algorithms
- Albert-László Barabási (2014), Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life - a great intro into networks
Books related to decison-making with data
- Daniel Kahneman (2011) Thinking fast and slow A great book summarizing a live’s resarch of the economics Nobel-winner psychologist.
- Michael Pollan (2008): In defense of food) A great book from an investigative journalist on what we should eat and why, with a very good description of what nutrition research can and cannot uncover using observational data.
- Andrew Leigh (2018): Randomistas: How Radical Researchers Are Changing Our World Interesting review on experiments in business, as well as government. From an academic/politician.
- Michael Luca and Max H. Bazerman (2020) The Power of Experiments: Decision Making in a Data-Driven World New dawn of experiments using large datasets with a focus on testing at businesses such as Airbnb or Uber.
- Carl Bergstrom Jevin West (2021) Calling-bullshit A fantastic book based on a very famours course will help you see through deception by statistics.
- Tim Harford (2021) Data Detective Great storytelling on how we use and feel about data and statistics. FYI, starts with a quote from the Empire Strikes Back.
Sports and data
- Stefan Szymanski Money and Soccer and Simon Kuper and Stefan Szymanski Socceronomics Two great books on understanding football via data.
- Michael Lewis Moneyball Super famous book on how finding mispriced players can give an edge is baseball.
- Chris Anderson and David Sally (2013) The Numbers Game: Why Everything You Know About Soccer Is Wrong I just realized on of the chapters inspired a case study in our book. If you know which one, DM me, and we’ll get a drink sometime.
Intro Data Science / statistics books
- Andrij Burkov: The 100 page Machine Learning book A popular and concise review
- Roger Peng and Elizabeth Matsui The Art of Data Science Intro review on the steps of analyzing data.
- David Spiegelhalter The Art of Statistics A great review of some key statistics concept from a great statitsician. A very nice introduction to any data, stats or metrics course [Recommended]
- Cole Nussbaumer Knaflic: Storytelling with Data
- Kieran Healy Data Visualization - A practical introduction
- Claus Wilke Data vizualization
- Alberto Cario How Charts Lie: Getting Smarter about Visual Information
- Jonathan Schwabish (2021) Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks – practical guide to dataviz tools; more advanced.
- Online and in print – The Economist Graphic Detail - amazing resource with really good explanations
Okay, so you have read some nice books. Why not read our book:
- Gábor Békés and Gábor Kézdi (2021) Data Analysis for Business, Economics, and Policy :-)
More advanced stuff
Advanced/techincal books on data science and prediction
- Ajay Agrawal, Joshua Gans and Avi Goldfarb, Prediction Machines: The Simple Economics of Artificial Intelligence
- Eric Siegel Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
- Bradley Efron and Trevor Hastie , 2016 Computer Age Statistical Inference - intense stats book with Big Data in mind
- Christopher M. Bishop Pattern Recognition and Machine Learning
- Nina Zumel and John Mount Practical Data Science with R – nice collection of coding/analysis ideas - often related to our book.
- Hyndman and Athananasopoulos, 2020 Forecasting: Principles and Practice – useful time series book, matching how we think about time series, a good way to continue.
- Kelleher, John D. Brendan Tierney, 2018 Data Science
- Sarah Guido and Andreas Müller, 2016 Introduction to Machine Learning with Python: A Guide for Data Scientists
Advanced/technical books on causal inference
- Joshua Angrist and Jörn-Steffen Pischke (2009) Mostly Harmless Econometrics The book that started it all: talking about key econometrics tools in a precise yet accessible and focused way. Aimed at post-graduate economics student.
- Scott Cunningham, 2020 The Mixtape Advanced, formal but highly accessible discussion of key tools of causal inference using examples from some great academic papers.
- Judea Pearl The Book of Why - intermediate book on causality, with interesting stories and great care into developing theoretical structures and measurement of causal links.
Blogs and more
Interesting, non-technical articles
- Roger Peng on data science principles
- McKinsey’s non-technical discussion of machine learning
- Mike Yeomans in Harvard Business Review, an older but good piece What every managers should know about ML
- How Uber uses ML/AI in high level piece – Uber on Medium post by Jamal Robinson
- David Donoho 50 years of data science
- Susan Athey in Science (2017) Beyond prediction: Using big data for policy problems
- American statistical organization on research and the p-value. Statistical Significance and the Dichotomization of Evidence
- Time series forecasting competition materials https://www.m4.unic.ac.cy/
- Roger Peng on good data science
- NYT Upshot on Polling errors
- 538 on nutrition. https://fivethirtyeight.com/features/you-cant-trust-what-you-read-about-nutrition/
- Nick Barrowman on why data is not independent from judgement Why Data is never raw
Podcasts, blogs to follow
- http://nssdeviations.com/ - The Data Science Podcast Roger Peng and Hilary Parker talk about the latest in data science and data analysis in academia and industry. [recommended]
- https://simplystatistics.org/ - A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek
- http://andrewgelman.com/ - Statistical Modeling, Causal Inference, and Social Science
Practice data and code
- Nice collection of data collections - https://www.columnfivemedia.com/100-best-free-data-sources-infographic
- Weekly newsletter - tinyletter.com/data-is-plural
- Nicely searchable source - public.enigma.com/#data-connections
- Nice educational collection of coding http://Idre.ucla.edu
- Very nice initiative for collaborative data projects. Include many datasets with info. https://data.world/
- This is a collection of ML/AI papes with code. Mostly very technical - paperswithcode.com/
- Amazing collection by Hadley Wickham - DS Stats337
- U Washington Data Lab - Intetrviews on business data viz Enterprise-analysis-interviews