This textbook is coding language neutral. This means, that the book does not include code snippets, and nothing there depends on what coding (scripting) language you may use. There are many tools to carry out all the tasks in our case studies. We decided to focus the three most widely used tools for data analysis: R, Stata and Python.
Social scientists, especially economists like Stata for its power and sophisticated econometrics capabilities. It has a great interface, it is very easy to start doing analysis. It has a click and point user interface, too.
Learn more about Stata
How to set up for Stata?
Social scientists, data scienctists, statisticans like R for its great mix of data managament, statitical, and vizualization capacities. It has a large array of machine learning or natural language processing tools, it is great for web scraping or creating dashboards. It has a neatly assembled set of libraries, called Tidyverse, which helps learning elementary tools fast. R is free an open source.
Learn more about R
How to set up for R?
Python is the number one coding language for computer scientists and is widely used in data science applications from banking and finance to Industry of Things. Python is great for web scraping, building and maintaing databases, or all tasks of machine learning. Python is free an open source.
We show code in python notebooks.
Learn more about Python
How to set up for Python?
R vs Python vs Stata - which coding language to use?
Super hard question. AcademicTwitter and EconTwitter is full of discussions and burns on all sides. We have no preference either (hence the provision of codes in all three languages).
- At some level, it does not matter. All these three languages are great, and it is a good idea to become good in either one of them. Learning a second language is always easier.
- If you plan to work in Academia or international organizations like the World Bank, Stata is great. Even Nate Silver at fivethirtyeight.com uses Stata. It is still the most widely used statistical tool in Universities. The main cons is that it cost a lot of money, and hence, your institution or future place of work may not have it. Often, new methods are coded first in Stata and R.
- R has some major strengths in data vizualization and is great in all relevant domains for analysis: R is great to data wrangling and exploration as well as statistical analysis. R is great regardless your plans: it is more and more frequently used in Economics, it is the dominant language in political science, epidemology and many other social sciences. It is also used in industry. Most new solutions will have an R version.
- If your aim is to used data science to build products and services, and will integrate data analysis into an application, Python could be the answer.
- If you are not a techie, Stata is the easiest, as it has a point and click way, too. Some aspects of analysis, like running a regression, doing a histogram is the easiest in Stata. As it has a single version for a year, it is the most stable, as well.
- If machine learning is something that matters, R and Python are equally good (maybe Python has some new libraries first), Stata is currently much weaker
- If you plan to work with large datasets and speed matters, R and Python have often some very fast solutions.
There are many other popular tools, such as SAS, SPSS, MATLAB and newer languages like Julia. We do not have codes in these tools, but if you want to translate our scripts, we would love to share them, too.