Data source ideas

Many of you, dear readers, are either teaching or studying metrics, and look for nice data sources for assignments, term projects or just practice new skills. Here are some suggestions.


The textbook

Data about the economy, society - most country level

  • World Bank – international data on almost everything partly used in the textbook
  • Our World in Data – A recent and great set of data is that became famous because of Covid coverage
  • FRED – mostly USA, but some international
  • OECD – standard macro data for OECD countries
  • World Inequality Database – on the historical evolution of the world distribution of income and wealth within and between countries
  • NBER business cycles – information on GDP growth and contraction by the committee that calls recessions
  • FED consumer finance survey – The Survey of Consumer Finances (SCF) is a triennial cross-sectional survey of U.S. families. The study is sponsored by the Federal Reserve Board in cooperation with the Department of the Treasury, collected by NORC at the University of Chicago.
  • USA inequality historical data – Ellora Derenoncourt, Chi Hyun Kim Moritz, KuhnMoritz Schularick wonderful dataset on USA racial inequality (and more) used in their research

Data about firms, business

Data about people

Global trade

  • UN ComTrade – the most wellknown and widely used trade data
  • WTO datasets – you may download several datasets here, goods and services.
  • CEPII datasets BACI – BACI provides data on bilateral trade flows for 200 countries at the product level (5000 products). Products correspond to the “Harmonized System” nomenclature (6 digit code).
  • US product level data by Peter Schott at Yale. Also technical data on matching datasets
  • CEPII Gravity country pair data – trade, distance between country pairs

Data on cities, locations


Culture and language

Climate, environment, energy

Government, policy

Sports data

Transport, travel, commute

Health, medical, Covid

  • Covid data hub – a unified dataset by collecting worldwide fine-grained case data, merged with exogenous variables helpful for a better understanding of COVID-19, by Emanuele Guidotti. Has now an R package
  • Our world in data / Covid page
  • SGIM Research Dataset Compendium is designed to assist investigators conducting research on existing datasets, with a particular emphasis on health services research, clinical epidemiology, and research on medical education. Public dataset list.

Historidcal data

