Data and code

Publish the data and code or it didn’t happen *

We have created the textbook as a complete package of text, code and data. While the textbook is available for money, code and data are free. Read summary information: Case studies summary page
Read summary information: Datasets summary page

Basic setup

To ensure smooth sailing, you will need to create to folders on your computer, anywhere you like.

For the code: da_case_studies

For the data: da_data_repo

Getting code

All the code that reproduces all the tables and graphs in the textbook is available freely to use.


  1. Each case study has a separate folder.
  2. Within case study folders, codes in different languages are simply stored together.
  3. Some intermediary files (csv, rds) may be saved there, too.
  4. Output folders are created when you run the code

All codes in R and Stata should work well. But some improvement of codes may take place. We plan a locked version 1.0 is expected in March 2021.
Python is under preparation is the Github page for details.

Option A: Download in one [advised]

The whole codebase for the textbook may be simply downloaded, currently we have the pre-release version, codename: v.0.7.0. Clear Air Turbulence.


  1. Download it in a zipped file
  2. Unzip and rename da_case_studies

Option B: Fork and clone from Github [advanced]

Visit the Github page where the live version of the code is available.


  1. Sign up to Github
  2. Visit the Github page
  3. Fork the da_case_studies repository
  4. Clone to a local drive, name it da_case_studies

Getting data

Data is shared via a OSF project repository.

Option A: download dataset folders [advised]


  1. Create a da_data_repo folder on your local computer.
  2. Visit the OSF project repository. You will see a list of datasets. You will need to download each dataset folder one by one.
  3. For each dataset, click on the OSF Storage(United States) or OSF Storage(Germany - Frankfurt) icon and download as zip.
  4. Extract from the zip, making sure that the folder name is exactly the same as in the OSF repository
  5. Repeat for all the datasets you need.
  6. Add the dataset folders to a da_data_repo folder to ensure all codes work smoothly.

Option B: Download the whole textbook material - NOT READY YET

Yes, you will be able to download the whole material as a single .zip file. Forthcoming: March 2021

Option C: Directly open from script

At the same time, each dataset is a component and files may be directly opened from code. For example, with the hotel-europe dataset:

R: data1<-read.csv(url(""))

Python: pd.read_csv("")

Stata: import delimited ""

Really, really simple.

Setting up to run code

You will need install libraries and make some minor edits in some code bits. Tasks vary depending on the coding language. This textbook is coding language neutral. Our code is written in all three most widely used tools for data analysis. See our brief summary, so pick one and follow instructions!

How to set up for Stata?
How to set up for R?
How to set up for Python?

Some advice on learning to code

Differences in output

The graphs and results in the textbook come from R. However, most results and graphs should be the same when running from Stata or Python.

There could some differences across output from different languages.

  1. Graphs may vary as some settings vary. We made a great effort to reduce this as much as possible - sometimes adding more paramateres to graph making bits than we would normally do.
  2. Whenever there is any randomization in the background, results will indeed differ (example is cross-validation)
  3. Some minor differences are caused by variation in some defaults in some formula, such as degree of freedom (example is BIC)