This textbook provides future data analysts with the tools, methods, and skills needed to answer data-focused, real life questions, to choose and apply appropriate methods to answer those questions, and to visualize and interpret results to support better decisions in business, economics, and public policy. Data wrangling and exploration, regression analysis, prediction with machine learning, and causal analysis are comprehensively covered, as well as when, why, and how the methods work, and how they relate to each other.
As the most effective way to communicate data analysis, running case studies play a central role in this textbook. Each case starts with an industry relevant question and answers it by using real-world data and applying the tools and methods covered in the textbook. Learning is then consolidated by over 360 practice questions and 120 data exercises. Extensive online resources, including raw and cleaned data and codes for all analysis in Stata, R, and Python are available on this site.
“This exciting new text covers everything today’s aspiring data scientist needs to know, managing to be comprehensive as well as accessible. Like a good confidence interval, the Gabors have got you almost completely covered!”
Joshua Angrist, MIT, Nobel laureate in Economics 2021
“A beautiful integration of Econometrics and Data Science that provides a direct path from data collection and exploratory analysis to conventional regression modeling, then on to prediction and causal modeling. Exactly what is needed to equip the next generation of students with the tools and insights from the two fields.”
David Card, University of California, Berkeley, Nobel laureate in Economics 2021
Key information materials
- Front Matter,
- Table of content
- Sample chapters 10 and 15
- Short summary on why use this book
- A one-page summary, also available as PDF
Why use this book?
Data analysis is a process. It starts with formulating a question and collecting appropriate data, or assessing whether the available data can help answer the question. Then comes cleaning and organizing the data, tedious but essential tasks that affect the results of the analysis as much as any other step in the process. Exploratory data analysis gives context to the eventual results and helps deciding the details of the analytical method to be applied. The main analysis consists of choosing and implementing the method to answer the question, with potential robustness checks. Along the way, correct interpretation and effective presentation of the results are crucial. Carefully crafted data visualization help summarize our findings and convey key messages. The final task is to answer the original question, with potential qualifications and directions for future inquiries.
Our textbook equips future data analysts with the most important tools, methods and skills they need through the entire process of data analysis to answer data focused, real life questions. We cover all the fundamental methods that help along the process of data analysis. The textbook is divided into four parts covering data wrangling and exploration, regression analysis, prediction with machine learning, and causal analysis. We explain when, why, and how the various methods work, and how they are related to each other. MORE on content
A cornerstone of this textbook are 47 case studies spreading over one-third of our material. This reflects our view that working through case studies is the best way to learn data analysis. Each of our case studies starts with a relevant question and answers it in the end, using real life data and applying the tools and methods covered in the particular chapter. MORE on case studies
We share all raw and cleaned data we use in the case studies. We also share the codes that clean the data and produce all results, tables, and graphs in Stata, R, and Python so students can tinker with our code and compare the solutions in the different software. MORE on data and code
This textbook was written to be a complete course in data analysis. This textbook could be useful for university students in graduate programs as core text in applied statistics and econometrics, quantitative methods, or data analysis. It may also complement online courses that teach specific methods to give more context and explanation. Undergraduate courses can also make use of this textbook, even though the workload on students exceeds the typical undergraduate workload. Finally, the textbook can serve as a handbook for practitioners to guide them through all steps of real-life data analysis. MORE on why use this book?
Gábor Békés is an Assistant Professor at the Department of Economics and Business of the Central European University and director of the MS in Business Analytics program. He is a senior fellow at KRTK and a research affiliate at the Center for Economic Policy Research (CEPR). He published in top economics journals on multinational firm activities and productivity, business clusters, and innovation spillovers. He managed international data collection projects on firm performance and supply chains. He has done both policy advising (the European Commission, ECB) as well as private sector consultancy (in finance, business intelligence and real estate). He has taught graduate-level data analysis and economic geography courses since 2012. Personal website
Balatonudvari, Hungary, July 2018. Photo by Anna Fetter.
Gábor Kézdi is a Research Associate Professor at the University of Michigan’s Institute for Social Research. He published in top journals in economics, statistics, and political science on topics including household finances, health, education, demography, and ethnic disadvantages and prejudice. He has managed several data collection projects in Europe; currently, he is co-investigator of the Health and Retirement Study in the U.S. He has consulted various governmental and non-governmental organizations on the disadvantage of the Roma minority and the evaluation of social interventions. He has taught data analysis, econometrics, and labor economics from undergraduate to Ph.D. levels since 2002 and supervised a number of MA and PhD students. Personal website
We could not have done this alone. Far from it. So, we are grateful, really.
We provide access to get all the code we used – in R, Stata and Python.
For all the code that reproduces all the tables and graphs in the textbook, visit the Github page github.com/gabors-data-analysis/da_case_studies where the live version of the code is available.
You may download the latest release v0.8.1. as a zip file.
Learning to code for data analysis
Learning to code for data analysis – free fully online courses now available!
For R, Stata and Python!
We provide access to get all the data we used; see our dataset summaries.
Teaching material for instructors
There are several materials we prepare for instructors:
- This course may be used for a variety of courses, and it is needed used Management Phd, Applied Economics MA, Data Science MSc or even in Executive MBA. Let us offer some experience and advice on how to teach this textbook for different courses
- For a variety of course and program types Frequently asked Questions and Answers
- Answers to all 360 practice questions for instructors, available from Cambridge University Press Instructor Resources
- Slideshows – one for each of the 24 chapters available through Cambridge University Press Instructor Resources
- Adopting instructors may get access to slides in Latex. Contact us for access
Coding help and info
Users can see a
- Review of Data and code and information on how to set up folders
- Brief summary of languages used
- Some advice on learning to code.
- Technical help on
The book has many application.
Summary from JEL
Textbook for graduate students discusses the most important tools, methods, and skills necessary for carrying out a data analysis project, presenting case studies from around the world linking business or policy questions to decisions in data selection and the application of methods. Covers data collection and quality, exploratory data analysis and visualization generalizing from data, and hypothesis testing. Provides an overview of regression analysis, including probability models and time series regressions. Explores predictive analytics, cross-validation, tree-based machine learning methods, classification, and forecasting from time series data. Focuses on causal analysis, the potential outcomes framework and causal maps, difference-in-differences analysis, various panel data methods, and the event study approach.