Many of you, dear readers, are either teaching or studying metrics, and look for nice data sources for assignments, term projects or just practice new skills. Here are some suggestions.
- We used dozens of datasets. Check the the dataset review section
Data about the economy, society - most country level
- World Bank – international data on almost everything partly used in the textbook
- Our World in Data – A recent and great set of data is that became famous because of Covid coverage
- FRED – mostly USA, but some international
- OECD – standard macro data for OECD countries
- World Inequality Database – on the historical evolution of the world distribution of income and wealth within and between countries
Data about firms, business
- World Bank microdata
- EBRD business surveys
- World Management Survey – used in several case studies in the textbook
- OECD on multinational companies, their measurement, controlled foreign firms
- ECB Compnet European firm data – harmonized firm level datasets
- US historical industry data – 60 years
Data about people
- World value survey - regular international survey on values
- CDC NHANES – US health surveys, partly used in the textbook
- IPMUS: census and survey data from around the world integrated across time and space
- UN ComTrade – the most wellknown and widely used trade data
- WTO datasets – you may download several datasets here, goods and services.
- CEPII datasets BACI – BACI provides data on bilateral trade flows for 200 countries at the product level (5000 products). Products correspond to the “Harmonized System” nomenclature (6 digit code).
- US product level data by Peter Schott at Yale. Also technical data on matching datasets
- CEPII Gravity country pair data – trade, distance between country pairs
Data on cities, locations
- Eurostata European Cities – socio-economic, population, transport data on European cities
- City-data.com – USA city information on a vareity of features: schools to restaurant inspections
- Airbnb – massive resource of Airbnb offers around the world by Inside Airbnb partly used in the textbook
- Yahoo Finance has historical data on stocks, bonds, indices, such as Microsoft, used in the textbook ,also great Python API yFinance
- Google finance – great API and may also be linked to Google Sheets
Culture and language
- Open movies database get movie data via API
- IMDB data - get movie data from popular size IMDB
- CEPII language and history – country pair level info on shared common language, historical links (like colonial ties)
- Domestic and International Common Language Database (DICL)
Climate, environment, energy
- NOAA Climate Data Online – provides free access to NCDC’s archive of global historical weather and climate data
- City climate data
- BP global energy – energy consumption and co2 emissions
- Air quality – air pollution data via an API
- Open tenders – you could get government contracting datasets from 33 countries
- Election results and institutions around the globe – a collaborative project
- Quality of Government – several open-source datasets, some extra information, all about Quality of Government
- The Global Open Data Index provides the most comprehensive snapshot available of the state of open government data publication. Read about the methodology
- PPEG political parties, Presidents, Elections, and Governments – The PPEG Database from around the world. It brings together a range of datasets produced by the department “Democracy & Democratization” of the WZB Berlin Social Science Center.
- Football/Soccer: Football-data.co.uk) – teams, games, odds, partly used in the textbook – great way to simply download data
- Football/soccer: Soccerway and whoscored – Great deal of football data, but you may need webscraping to collect datasets.
- Baseball: Sean Lahman Baseball collection
- Baseball: Baseball-refernce
- Tennis: tennis-data.co.uk
Transport, travel, commute
- Open flights data – flight routes, airport locations. Data for 2014-2017 only.
- US airline tickets — Bureau of Transportation Statistics’ Passenger Origin and Destination (O&D) Survey. An earlier version is used in the textbook
- Commuting zones datasets – Facebook collected data on users’ position to estimate commuting zone areas. Check out the Data overview as well
Health, medical, Covid
- Covid data hub – a unified dataset by collecting worldwide fine-grained case data, merged with exogenous variables helpful for a better understanding of COVID-19, by Emanuele Guidotti. Has now an R package
- Our world in data / Covid page
- SGIM Research Dataset Compendium is designed to assist investigators conducting research on existing datasets, with a particular emphasis on health services research, clinical epidemiology, and research on medical education. Public dataset list.
- Medieval and Early Modern Data Bank – The Medieval and Early Modern Data Bank (MEMDB) is a project established at Rutgers University to provide scholars with an expanding library of information in electronic format on the medieval and early modern periods of European history, circa 800-1815 C.E. It has six different datasets on prices and currencies and textile production. Like on European currency excnage rates in mediveal times
- Yale historical financial research data – old stockmarkets, plus cool stuff like data on South Seas Bubble of 1720
- Data is plural spreadsheet – One of the collest data collection is based on a newsletter by Jeremy Singer-Vine Data is plural
- Public API collection – just a wonderful collection of APIs to a plethora of sources, really great on environment, finance, popular culture, transport, any many more
- 538 some datasets shared – Fivethirtyeight.com is politics, sports and entertainment website focusing on data driven, analysis.
- Tableau – If interested in more sports data, check out the collection by Tableau
- Machine Learning Repository – For machine learning projects,a gateway to wealthy resources is University of California in Irvine’s repo
- Tidy Tuesdays, a weekly data project series. Not only for R users: a great collection for data wrangling and vizualization projects.
- Data.world is a fantastic collection of datasets such various sources on environment. Partly free, may need sign-up
- R datasets - a collection of datasets included in various R packages by Vincent Arel-Bundock
- Datahub’s collection Another collection data on the economy, environment and more
- Social Science Data Sources & Statistical Methods
- Trade, globalization, tax datasets by Baptiste Souillard