Publish the data and code or it didn’t happen *
We have created the textbook as a complete package of text, code and data. While the textbook is available for money, code and data are free.
Read summary information: Case studies summary page
Read summary information: Datasets summary page
To ensure smooth sailing, you will need to create to folders on your computer, anywhere you like.
For the code:
For the data:
All the code that reproduces all the tables and graphs in the textbook is available freely to use.
- Each case study has a separate folder.
- Within case study folders, codes in different languages are simply stored together.
- Some intermediary files (csv, rds) may be saved there, too.
- Output folders are created when you run the code
All codes in R and Stata should work well. But some improvement of codes may take place. We plan a locked version 1.0 is expected in March 2021.
Python is under preparation is the Github page for details.
Option A: Download in one [advised]
The whole codebase for the textbook may be simply downloaded, currently we have the pre-release version, codename:
v.0.7.0. Clear Air Turbulence.
- Download it in a zipped file
- Unzip and rename
Option B: Fork and clone from Github [advanced]
Visit the Github page github.com/gabors-data-analysis/da_case_studies where the live version of the code is available.
- Sign up to Github
- Visit the Github page github.com/gabors-data-analysis/da_case_studies
- Fork the da_case_studies repository
- Clone to a local drive, name it
Data is shared via a OSF project repository.
Option A: download dataset folders [advised]
- Create a
da_data_repofolder on your local computer.
- Visit the OSF project repository. You will see a list of datasets. You will need to download each dataset folder one by one.
- For each dataset, click on the
OSF Storage(United States)or
OSF Storage(Germany - Frankfurt)icon and download as zip.
- Extract from the zip, making sure that the folder name is exactly the same as in the OSF repository
- Repeat for all the datasets you need.
- Add the dataset folders to a
da_data_repofolder to ensure all codes work smoothly.
Option B: Download the whole textbook material - NOT READY YET
Yes, you will be able to download the whole material as a single .zip file. Forthcoming: March 2021
Option C: Directly open from script
At the same time, each dataset is a component and files may be directly opened from code. For example, with the
import delimited "https://osf.io/p6tyr/download"
Really, really simple.
Setting up to run code
You will need install libraries and make some minor edits in some code bits. Tasks vary depending on the coding language. This textbook is coding language neutral. Our code is written in all three most widely used tools for data analysis. See our brief summary, so pick one and follow instructions!
Differences in output
The graphs and results in the textbook come from R. However, most results and graphs should be the same when running from Stata or Python.
There could some differences across output from different languages.
- Graphs may vary as some settings vary. We made a great effort to reduce this as much as possible - sometimes adding more paramateres to graph making bits than we would normally do.
- Whenever there is any randomization in the background, results will indeed differ (example is cross-validation)
- Some minor differences are caused by variation in some defaults in some formula, such as degree of freedom (example is BIC)