Each dataset entry shows its content and coverage, key variables, related case studies, data access, source, and copyright, with links to the full description.

airbnb

Content and coverage: The airbnb dataset includes information on the price and feature of apartments let out via airbnb. It covers Greater London. It is a single data table. The data refer to rental prices for one night in March 2017. N=51,646 more about the data → access dataset (OSF) data dictionary

Key variables: price per night per person, number of people that can be accomodated, apartment features, location (borough).

Case studies:
CH14B - predicting AirBnB apartment prices: selecting a regression model
CH16B - predicting Airbnb apartment prices with random forest

Source: downloaded from web Inside Airbnb.
Copyright: “The data behind the Inside Airbnb site is sourced from publicly available information from the Airbnb site. The data has been analyzed, cleansed and aggregated where appropriate to faciliate public discussion. Creative Commons CC0 1.0 Universal (CC0 1.0) ‘Public Domain Dedication’ license.”

airline-tickets-usa

Content and coverage: The US-airlines dataset is a 10 percent sample of all tickets sold on the US market taken in each quarter, starting with 2010. For a single quarter, the raw data on tickets has about 3–3.5 million observations. The total data used for the case study has the size of around 15 GB. The unit of observation in the data is an airline ticket. N=112,632 more about the data → access dataset (OSF)

Key variables: airports visited, origin airport, subsequent airports, ticket price, number of passengers, airline.

Case studies: CH22A - How does a merger between airlines affect prices?

arizona-electricity

Content and coverage: The arizona-electricity dataset includes monthly residential electricity consumption data for Arizona state and monthly weather data with cooling degree days and heating degree days for Phoenix Airport. N=204 more about the data → access dataset (OSF)

Key variables: monthly residential electricity consumption, monthly cooling degree days, monthly heating degree days.

Case studies: CH12B - Electricity consumption and temperature

Source: Downloaded from the US Energy Information Administration (EIA) and National Oceanic and Atmospheric Administration (NOAA)
Copyright: EIA:Public domain and NOAA:Public domain

asia-industry

Content and coverage: The asia-industry data consists of monthly time series of industrial production from four countries and monthly total imports into the USA. N=243 access dataset (OSF)

Key variables: industrial production of Thailand, industrial production of Malaysia, industrial production of Singapore, industrial production of the Philippines, total US imports (bn US dollars).

Case studies: CH23A - Import demand and industrial production

Source: Downloaded from The World Bank: World Development Indicators
Copyright: Creative Commons Attribution 4.0 International License (CC BY 4.0)

australia-weather-forecasts

Content and coverage: The australia-weather-forecasts dataset includes data on daily rain forecasts and actual rain for the Northern Australian city of Darwin. N=350 access dataset (OSF)

Key variables: actual frequency of rain, predictions of rain.

Case studies: CH11B - Are Australian weather forecasts well calibrated?

Source: Downloaded from the Australian government. The file is called “bometa20150501-20160430.zip”.
Copyright: [Creative Commons Attribution Share-Alike 3.0 Australia]

billion-prices

Content and coverage: The billion-prices data includes online and offline prices of selected products sold by selected retailers in the USA. N=6,439 access dataset (OSF)

Key variables: online-offline price difference (US dollars).

Case studies:
CH01B - Comparing online and offline prices: data collection CH06A - Comparing online and offline prices: testing the difference

Source: From the paper Cavallo, Alberto, 2016, “Cavallo (2017) “Are Online and Offline Prices Similar? Evidence from Large Multi-Channel Retailers” - American Economic Review - Vol. 107(1), p.283–303”, Harvard Dataverse, V4
Copyright: CC0 Public Domain

bisnode-firms

Content and coverage: The bisnode-firms data includes wide-ranging business information on firms operating in a few industries in manufacturing and services in a European country. N=19,036 access dataset (OSF)

Key variables: firm size, management, financial variables, other characteristics.

Case studies: CH17A - Predicting firm exit: probability and classification

Source: From Bisnode, a major European business information company.
Copyright: The dataset as is, may be used for educational purposes. Bisnode has all other rights

case-shiller-la

Content and coverage: The case-shiller-la data includes monthly time series of the S&P/Case-Shiller Greater Los Angeles Home Price Index and monthly time series of unemployment rate and total employment for California. The dataset covers 1990-2018 period. access dataset (OSF)

Key variables: Case-Shiller Home Price Index, unemployment rate, total employment.

Case studies: CH18B - Forecasting a home price index

Source: S&P Dow Jones Indices LLC, S&P/Case-Shiller CA-Los Angeles Home Price Index [LXXRNSA]. Retrieved from FRED, Federal Reserve Bank of St. Louis; December 1, 2019.
Copyright: S&P Dow Jones Indices LLC. All rights reserved. Reproduction of Home Price Index for Los Angeles, California in any form is prohibited except with the prior written permission of S&P Dow Jones Indices LLC “S&P”.
Source 2: Employment data: U.S. Bureau of Labor Statistics.
Copyright 2: Public domain.

city-size-japan

Content and coverage: The city-size-japan data includes population data on Japanese cities. N=159 access dataset (OSF)

Key variables: rank of Japanese cities, population of Japanese cities.

Case studies: CH03U1 - City size distribution in Japan

Source: From Wikipedia contributors. (2020, August 4). List of cities in Japan. In Wikipedia, The Free Encyclopedia. Retrieved 14:09, September 2, 2020.
Copyright: Creative Commons Attribution-ShareAlike License

cps-earnings

Content and coverage: The cps-earnings data includes earnings data for 2014, taken from the Merged Outgoing Rotation Groups (MORG) datasets of the Current Population Survey (CPS) of the USA. N=149,316 more about the data → access dataset (OSF)

Key variables: female-male wage difference among market analysts, hourly wage, age of market analysts, gender, graduate degree category (master's, professional, PhD).

Case studies: CH09A - Estimating gender and age differences in earnings CH10A - Understanding the gender difference in earnings

Source: Downloaded from the National Bureau of Economic Research
Copyright: No copyright restrictions on extracts use.

food-health

Content and coverage: The food-health data includes data on the health status of the population in the USA. N=7,358 more about the data → access dataset (OSF)

Key variables: blood pressure, fruit and vegetables consumed per day, household income, days per week of exercising.

Case studies:
CH19A - Food and health

Source: Downloaded and combined from the National Health and Nutrition Examination Survey (NHANES) of the CDC’s National Center for Health Statistics [NCHS]
Copyright: CDC:Public domain

football

Content and coverage: The football data includes data on games and teams of the English Premier League, the top football division in England. The dataset covers 11 seasons: from 2008/2009 to 2018/19. access dataset (OSF)

Key variables: home team - away team goal difference, average points before manager change, average points after manager change.

Case studies:
CH02C - Identifying successful football managers CH03C - Measuring home team advantage in football CH24 - Estimating the impact of replacing football team managers

Source 1: Game results come from football-data.co.uk website.
Copyright 1: N/A
Source 2: Managers data come from Wikipedia contributors. (2020, August 15). List of Premier League managers. In Wikipedia, The Free Encyclopedia. Retrieved July 1, 2019.
Copyright 2: Creative Commons Attribution-ShareAlike License

haiti-earthquake

Content and coverage: The haiti-earthquake data includes economic indicators for Haiti and 21 other countries for 2004-2015. access dataset (OSF)

Key variables: total GDP in Haiti (bn US dollars), total GDP in synthetic Haiti (bn US dollars).

Source: Data and code available from authors Best, R. and Burke, P. J.
Copyright: N/A

height-income-distributions

Content and coverage: The height-income-distributions data includes data on height and household income retrieved from the Health and Retirement Study taken in 2014 in the USA. N=1,988 access dataset (OSF)

Key variables: height, household income.

Case studies:
CH03D - Distributions of body height and income

Source:
Copyright:

hotels-europe

Content and coverage: The hotels-europe data includes information on price and features of hotels in 46 European cities and for 10 different dates. N=148,021 more about the data → access dataset (OSF)

Key variables: hotel price, hotel's distance from the center of the city.

Case studies:
CH03B - Comparing hotel prices in Europe: Vienna vs. London CH09B - How stable is the hotel price–distance to center relationship?

Source: Authors’ collection.
Copyright: N/A

hotels-vienna

Content and coverage: The hotels-vienna data includes information on price and features of hotels in Vienna for one date. N=428 more about the data → access dataset (OSF)

Key variables: hotel price, hotel's distance from the center of the city.

Case studies: CH01A - Finding a good deal among hotels: data collection CH02A - Finding a good deal among hotels: data preparation CH03A - Finding a good deal among hotels: data exploration CH07A - Simple regression CH08A - Non-linear function (logs) CH08C - Measurement error in hotel ratings CH10B - Multiple regression

Source: Authors’ collection.
Copyright: N/A

sp500

Content and coverage: The sp500 data includes day-to-day returns on the S&P 500 stock market index. N=2,519 access dataset (OSF)

Key variables: daily returns on S&P 500 stock market index, percent of days with losses of 5% or more.

Case studies: CH05A - What likelihood of loss to expect on a stock portfolio? CH06B - Testing the likelihood of loss on a stock portfolio

Source:
Copyright:

share-health

Content and coverage: The raw data is EasySHARE version 6.0.0., N=288,736. Our share-health dataset includes information on the health of people aged 50 to 60 from 14 European countries who reported to be healthy in 2011. N=3,109 more about the data - incl. how to get it → access dataset (OSF)

Key variables: current smoker, years of education (three categories), gender.

Case studies:
CH11A - Does smoking pose a health risk?

Source: SHARE Project
Copyright: SHARE. Access is provided after filling in and submitting a data user statement

stocks-sp500

Content and coverage: The stocks-sp500 data consists of daily data on the closing price of the Microsoft company stock and the S&P 500 stock market index. The dataset covers 21 years: from 31 December 1997 to 31 December 2018. access dataset (OSF)

Key variables: monthly returns on Microsoft stock, monthly returns on S&P 500 index.

Case studies:
CH12A - Returns on a company stock and market returns

Source:
Copyright:

swim-transactions

Content and coverage: The swim-transactions data includes information on daily ticket sales of an outdoor swimming pool operating in Albuquerque (New Mexico, USA). N=2,522 access dataset (OSF)

Key variables: daily ticket sales, monthly binary variables, day-of-the-week binary variables.

Case studies:
CH18A - Forecasting daily ticket sales for a swimming pool

Source: Downloaded from the City of Albuquerque Open Data (New Mexico, USA).
Thanks to the city for help!
Copyright: Public domain. City of Albuquerque Data Disclaimer: “This site provides applications using data that has been modified for use from its original source, www.cabq.gov, the official website of the City of Albuquerque. The City of Albuquerque makes no claims as to the content, accuracy, timeliness, or completeness of any of the data provided at this site. The data provided at this site is subject to change at any time. It is understood that the data provided at this site is being used at one’s own risk.”

used-cars

Content and coverage: The used-cars data includes data on offers of used Toyota Camry cars advertised in the Chicago and Los Angeles areas, in 2018. N=477 access dataset (OSF)

Key variables: price, age, type of car, odometer.

Case studies:
CH13A - Predicting used car value with linear regressions CH14A - Predicting used car value: log prices CH15A - Predicting used car value with regression trees

Source: Authors’ collection.
Copyright: N/A

wms-management-survey

Content and coverage: The wms-management-survey data includes data on manufacturing companies from 24 countries and was collected between 2004 and 2015. more about the data → access dataset (OSF)

Key variables: management score, founder/family ownership.

Case studies: CH01C - Management quality: data collection CH04A - Management quality and firm size CH21A - Founder/family ownership and quality of management

Source: Prepared for this study by the World Management Survey project.
Thanks to Scur, Bloom and Van Reenen!
Copyright: Must reference

working-from-home

Content and coverage: The working-from-home data includes information about the employees of a travel agency in China from 2010. N=249 more about the data → access dataset (OSF)

Key variables: employee retention, employee performance.

Case studies:
CH20A - Working from home and employee performance

Source: The data and Stata do-files used to replicate clean datasets and results are available from Nick Bloom’s website
The case study is based on the paper by Nicholas Bloom, James Liang, John Roberts, Zhichun Jenny Ying, Does Working from Home Work? Evidence from a Chinese Experiment , The Quarterly Journal of Economics, Volume 130, Issue 1, February 2015, Pages 165–218.

world-bank-immunization

Content and coverage: The world-bank-immunization data includes data on immunization rate against measles and child survival rate in 172 countries, among children of age 12 to 23 months old, from 1998 till 2017. N=3,440 access dataset (OSF)

Key variables: immunization rate, child survival rate, population, GDP per capita.

Case studies: CH02B - Displaying immunization rates across countries CH23B - Immunization against measles and saving children

Source: Downloaded from The World Bank: World Development Indicators.
Copyright: Creative Commons Attribution 4.0 (CC-BY 4.0)

worldbank-lifeexpectancy

Content and coverage: The worldbank-lifeexpectancy data includes data on life expectancy and GDP per capita for 182 countries in 2017. N=182 access dataset (OSF)

Key variables: life expectancy, GDP per capita.

Case studies: CH08B - How is life expectancy related to the average income of a country?

Source: Downloaded from The World Bank: World Development Indicators
Copyright: Creative Commons Attribution 4.0 (CC-BY 4.0)