Learning to code

Here is some tip and advice on how to learn coding.

Big picture

A well respected resource that introduces thinking about coding for data analysis is Code and Data for the Social Sciences: A Practitioner’s Guide by Matthew Gentzkow and Jesse M. Shapiro. They talk about issues like replication, organization of a project, or version control.

Learning to code for data analysis

R

There are two popular toungues (beyond base) in R, called data.table and tidyverse. We use tidyverse.
There are great many resources, to learn R for data analysis. Here are some ideas:

  1. To learn tidyverse, you may start with the wonderful book by Hadley Wickham and Garrett Grolemund R for Data Science.
  2. A wonderful intro, with a focus on starting R and data wrangling, is by Jenny Briant’s Data wrangling, exploration, and analysis with R course, aka STAT545.
  3. U Cincinatti has a very nice guide with discussions on basics, workflow, manipulation in R Programming Guide.
  4. At CMU, Alexandra Chouldechova has a nice programming in R materials.
  5. A great online course is by Roger Peng, Jeff Leek and Brian Caffo R programming onCoursera
  6. At Data Carpentry, François Michonneau and Auriel Fournier has a fantastic content –Data Analysis and Visualization in R for Ecologists.
  7. Grant McDermott has a more advanced lecture series with amazing content Data Science for Economists.
  8. Working with time series is hard. A great resource by Hansjörg Neth: Data Science for Psychologists Chapter 10 Dates and times.

Stata

There are many great materials, here is some we like:

  1. UCLA extensive material at UCLA IDRE Stats.
  2. Amazing two part series by Kurt Schmidheiny Part 1 Part 2
  3. At Data Carpentry, CEU’s Miklós Koren and Arieda Muco are developing a Stata course for Economist.
  4. Plus, a Stata cheatsheat.

Python

Python is a general purpose language, used for many applications beyond data science/statistics. There are great many resources, to learn Python for data analysis. Here are some ideas:

  1. Very nice courses are available widely, for instance on Datacamp, and Codeacademy.
  2. A set of very nice lessons at Python for Everybody.
  3. NYU has a great group also offering a Python cours: QuanEcon.

Learning a second language

Some people have experience using one language but would now learn a second one. Some ideas we found useful:

R for Stata users

In Economics and many other social sciences, we use Stata for research, and learnt R or Python as a second language. Here are some links and tutorials we found useful.

  1. Matthieu Gomez has a wonderful intro to R for Stata users . For instance the bit on regressions is pretty useful, I come back to it regularly.
  2. John Ricco has a short intro to basics of data wrangling

R to/from Python

For this textbook, Stata and R code were developed early on, and we started to work on Python code set only after the proof was ready. Some ideas we (and our RAs) found useful

  1. GGplot for Python by Monash
  2. Pandas and tidyverse

Python for Stata users

Othe useful sites

Get datasets for exercises, projects

Social Science Data Sources & Statistical Methods

tools

Great list of data tools by the UC Berkeley Library and Research IT run Research Data Management (RDM) Program