Data Analysis for Business, Economics, and Policy

textbook cover

This textbook

This textbook provides future data analysts with the tools, methods, and skills needed to answer data-focused, real life questions, to choose and apply appropriate methods to answer those questions, and to visualize and interpret results to support better decisions in business, economics, and public policy. Data wrangling and exploration, regression analysis, prediction with machine learning, and causal analysis are comprehensively covered, as well as when, why, and how the methods work, and how they relate to each other.

As the most effective way to communicate data analysis, running case studies play a central role in this textbook. Each case starts with an industry relevant question and answers it by using real-world data and applying the tools and methods covered in the textbook. Learning is then consolidated by over 360 practice questions and 120 data exercises. Extensive online resources, including raw and cleaned data and codes for all analysis in Stata, R, and Python are available on this site.


This exciting new text covers everything today’s aspiring data scientist needs to know, managing to be comprehensive as well as accessible. Like a good confidence interval, the Gabors have got you almost completely covered!
Joshua Angrist, MIT Economics

MORE endorsements

Buy it or ask for an inspection copy


Published on 6 May 2021, the book is available from Cambridge University Press,, or a great deal of global options

You may also request an inspection copy from the Publisher!

Key information to download

Why use this book?

Data analysis is a process. It starts with formulating a question and collecting appropriate data, or assessing whether the available data can help answer the question. Then comes cleaning and organizing the data, tedious but essential tasks that affect the results of the analysis as much as any other step in the process. Exploratory data analysis gives context to the eventual results and helps deciding the details of the analytical method to be applied. The main analysis consists of choosing and implementing the method to answer the question, with potential robustness checks. Along the way, correct interpretation and effective presentation of the results are crucial. Carefully crafted data visualization help summarize our findings and convey key messages. The final task is to answer the original question, with potential qualifications and directions for future inquiries.

Our textbook equips future data analysts with the most important tools, methods and skills they need through the entire process of data analysis to answer data focused, real life questions. We cover all the fundamental methods that help along the process of data analysis. The textbook is divided into four parts covering data wrangling and exploration, regression analysis, prediction with machine learning, and causal analysis. We explain when, why, and how the various methods work, and how they are related to each other. MORE on content

A cornerstone of this textbook are 47 case studies spreading over one-third of our material. This reflects our view that working through case studies is the best way to learn data analysis. Each of our case studies starts with a relevant question and answers it in the end, using real life data and applying the tools and methods covered in the particular chapter. MORE on case studies

We share all raw and cleaned data we use in the case studies. We also share the codes that clean the data and produce all results, tables, and graphs in Stata, R, and Python so students can tinker with our code and compare the solutions in the different software. MORE on data and code

Follow your heart! Code is available in major scripting languages!

This textbook was written to be a complete course in data analysis. This textbook could be useful for university students in graduate programs as core text in applied statistics and econometrics, quantitative methods, or data analysis. It may also complement online courses that teach specific methods to give more context and explanation. Undergraduate courses can also make use of this textbook, even though the workload on students exceeds the typical undergraduate workload. Finally, the textbook can serve as a handbook for practitioners to guide them through all steps of real-life data analysis. MORE on why use this book?


For a quick review, wWatch our pre-launch slideshow presentation.

About authors

Gábor Békés

Gábor Békés is an Assistant Professor at the Department of Economics and Business of the Central European University and director of the MS in Business Analytics program. He is a senior fellow at KRTK and a research affiliate at the Center for Economic Policy Research (CEPR). He published in top economics journals on multinational firm activities and productivity, business clusters, and innovation spillovers. He managed international data collection projects on firm performance and supply chains. He has done both policy advising (the European Commission, ECB) as well as private sector consultancy (in finance, business intelligence and real estate). He has taught graduate-level data analysis and economic geography courses since 2012. Personal website


Gábor Kézdi

Gábor Kézdi is a Research Associate Professor at the University of Michigan’s Institute for Social Research. He published in top journals in economics, statistics, and political science on topics including household finances, health, education, demography, and ethnic disadvantages and prejudice. He has managed several data collection projects in Europe; currently, he is co-investigator of the Health and Retirement Study in the U.S. He has consulted various governmental and non-governmental institutions on the disadvantage of the Roma minority and the evaluation of social interventions. He has taught data analysis, econometrics, and labor economics from undergraduate to Ph.D. levels since 2002 and supervised a number of MA and PhD students. Personal website

We could not have done this alone. Far from it. So, we are grateful, really.


We provide access to get all the code we used – in R, Stata and Python.

For all the code that reproduces all the tables and graphs in the textbook, visit the Github page where the live version of the code is available.

Status update:

  • R – All codes ready. Used for graphs in textbook.
  • Stata – All codes ready. In the lack of machine learning capabilities, no code for chapters 15,16,17, some limitations for chapter 18.
  • Python – Under preparation. Chapters 01-12 ready, chapters 13-24 are under development. Should be ready by early 2021.

Coding help and info

Users can see a


We provide access to get all the data we used; see our dataset summaries.

Data is shared via a OSF project repository.
You can download it and use it, see Data and code for more information.

Teaching material for instructors

There are several materials we prepare for instructors: