Publish the data and code or it didn’t happen *
The textbook is a complete, reproducible package of text, code, and data. The code and datasets are freely available, and each case study provides what you need to reproduce its analyses, figures, and tables.
Quick links: Data (OSF) Dataset summaries Code in GitHub
Getting data
OSF project repository: raw and clean datasets.
Download all or only for specific case studies, see Step 6, ‘Getting data’.
Dataset summaries: descriptions, key variables, sources, and link to case studies.
Check out the code
All code on GitHub to reproduce the tables and graphs is free to use.
- Easy option: Download the latest release, unzip it, and rename the folder to
da_case_studies
. - Recommended option: Fork and clone the repository directly from GitHub for easier updates.
Organization:
- Each case study has a separate folder.
- Within case study folders, codes in different languages are stored together.
- Some intermediate files (
.csv
,.rds
) may be created.- Output folders are generated when you run the code.
Full setup guides
You will need install libraries and make some minor edits in some code bits. Tasks vary depending on the coding language. Each guide explains how to install the software, prepare folders, and connect code with data:
- Python setup provides two options:
- Full environment (recommended): install Anaconda, VS Code, Git, and GitHub for a professional workflow and maximum reproducibility.
- Minimum Requirements Option: install Python and Jupyter Notebook — faster to get started if you just want to focus on the content of the chapters and case studies.
- Stata setup, R setup
Differences in output
The graphs and results in the textbook come from R. However, most results and graphs should be the same when running from Stata or Python.
- Graphs may vary as some settings vary. We made a great effort to reduce this as much as possible - sometimes adding more paramateres to graph making bits than we would normally do.
- Whenever there is any randomization in the background, results will indeed differ (example is cross-validation).
- Some minor differences are caused by variation in some defaults in some formula, such as degree of freedom (example is BIC).
See also:
- This textbook is coding-language neutral. For choosing between the languages, see the programming languages overview.
- Advice on learning to code: practical tips for getting started or improving your skills.