Datasets summary

airbnb

Content and coverage: The airbnb dataset includes information on the price and feature of apartments let out via airbnb. It covers Greater London. The full London data has N=51 646 observations. It is a single data table. The data refer to rental prices for one night in March 2017.

Key variables: price per night per person, number of people that can be accomodated, apartment features, location (borough).

Used in case studies: Predicting AirBnB apartment prices: selecting a regression model Predicting Airbnb apartment prices with random forest

Data: Access dataset

Source: Downloaded from web inside airbnb.
Copyright: “The data behind the Inside Airbnb site is sourced from publicly available information from the Airbnb site. The data has been analyzed, cleansed and aggregated where appropriate to faciliate public discussion. Creative Commons CC0 1.0 Universal (CC0 1.0) “Public Domain Dedication” license.”

MORE about the data Data library

airline-tickets-usa

Content and coverage: The US-airlines dataset is a 10 percent sample of all tickets sold on the US market taken in each quarter, starting with 2010. For a single quarter, the raw data on tickets has about 3–3.5 million observations. The total data used for the case study has the size of around 15 GB. The unit of observation in the data is an airline ticket. The dataset has N=112632 observations.

Key variables: the airports visited including the origin and all subsequent airports, ticket price, number of passengers and airline.

Used in case studies: How does a merger between airlines affect prices?

Data: Access dataset

Source:Downloaded from the US Bureau of Transportation Statistics
Copyright: N/A

MORE about the data

arizona-electricity

Content and coverage: The arizona-electricity dataset includes monthly residential electricity consumption data for Arizona state and monthly weather data with cooling degree days and heating degree days for Phoenix Airport. The dataset has N=204 observations.

Key variables: monthly residential electricity consumption, monthly cooling degree days and monthly heating degree days.

Used in case studies: Electricity consumption and temperature

Data: Access dataset

Source: Downloaded from the US Energy Information Administration (EIA) and National Oceanic and Atmospheric Administration (NOAA) Copyright: EIA:Public domain and NOAA:Public domain

MORE about the data

asia-industry

Content and coverage: The asia-industry data consists of monthly time series of industrial production from four countries and monthly total imports into the USA. The dataset has N=243 observations.
Key variables: industrial production of Thailand, Malaysia, Singapore and the Philippines, and total US imports (bn US dollars).
Used in case studies: Import demand and industrial production

Data: Access dataset

Source:Downloaded from The World Bank: World Development Indicators
Copyright:Creative Commons Attribution 4.0 International License (CC BY 4.0)

australia-weather-forecasts

Content and coverage: The australia-weather-forecasts dataset includes data on daily rain forecasts and actual rain for the Northern Australian city of Darwin. The dataset has N=350 observations.
Key variables: actual frequency of rain, predictions of rain.
Used in case studies: Are Australian weather forecasts well calibrated?

Data: Access dataset

Source:Downloaded from the Australian government. The file is called “bometa20150501-20160430.zip”.
Copyright:[Creative Commons Attribution Share-Alike 3.0 Australia]

billion-prices

Content and coverage: The billion-prices data includes online and offline prices of selected products sold by selected retailers in the USA. The dataset has N=6439 observations.
Key variables: online-offline price difference (US dollars).
Used in case studies:
Comparing online and offline prices: data collection
Comparing online and offline prices: testing the difference

Data: Access dataset

Source: From the paper Cavallo, Alberto, 2016, “Cavallo (2017) “Are Online and Offline Prices Similar? Evidence from Large Multi-Channel Retailers” - American Economic Review - Vol. 107(1), p.283–303”, Harvard Dataverse, V4 Copyright:CC0 Public Domain

bisnode-firms

Content and coverage: The bisnode-firms data includes wide-ranging business information on firms operating in a few industries in manufacturing and services in a European country. The dataset has N=19036 observations.
Key variables: many variables that fall into four groups: firm size, management, financial variables, and other characteristics.
Used in case studies: Predicting firm exit: probability and classification

Data: Access dataset

Source: From Bisnode, a major European business information company.
Copyright: The dataset as is, may be used for educational purposes. Bisnode has all other righst

case-shiller-la

Content and coverage: The case-shiller-la data includes monthly time series of the S&P/Case-Shiller Greater Los Angeles Home Price Index and monthly time series of unemployment rate and total employment for California. The dataset covers 1990-2018 period.
Key variables: Case-Shiller Home Price Index, unemployment rate, total employment.
Used in case studies: Forecasting a home price index

Data: Access dataset

Source S&P Dow Jones Indices LLC, S&P/Case-Shiller CA-Los Angeles Home Price Index [LXXRNSA]. Retrieved from FRED, Federal Reserve Bank of St. Louis;, December 1, 2019.
Copyright: S&P Dow Jones Indices LLC. All rights reserved. Reproduction of Home Price Index for Los Angeles, California in any form is prohibited except with the prior written permission of S&P Dow Jones Indices LLC “S&P”.
Source 2 Employment data: U.S. Bureau of Labor Statistics.
Copyright 2: Public domain.

city-size-japan

Content and coverage: The city-size-japan data includes population data on Japanese cities. The dataset has N=159 observations.
Key variables: rank and population of Japanese cities.
Used in case studies: City size distribution in Japan

Data: Access dataset

Source:From Wikipedia contributors. (2020, August 4). List of cities in Japan. In Wikipedia, The Free Encyclopedia. Retrieved 14:09, September 2, 2020..
Copyright:Creative Commons Attribution-ShareAlike License

cps-earnings

Content and coverage: The cps-earnings data includes earnings data for 2014, taken from the Merged Outgoing Rotation Groups (MORG) datasets of the Current Population Survey (CPS) of the USA. The dataset has N=149316 observations.
Key variables: female-male wage difference among market analysts, hourly wage and age of market analysts, age and gender of employees with a graduate degree, three categories of graduate degree (master’s, professional and PhDs).
Used in case studies: Estimating gender and age differences in earnings
Understanding the gender difference in earnings

Data: Access dataset

Source: Downloaded from the National Bureau of Economic Research
Copyright:No copyright restrictions on extracts use.

MORE about the data

food-health

Content and coverage: The food-health data includes data on the health status of the population in the USA. The dataset has N=7358 observations.
Key variables: blood pressure, fruit and vegetables consumed per day, household income, days per week of exercising.
Used in case studies:
Food and health

Data: Access dataset

Source: Downloaded and combined from the [National Health and Nutrition Examination Survey (NHANES) of the CDC’s National Center for Health Statistics NCHS
Copyright: CDC:Public domain

MORE about the data

football

Content and coverage: The football data includes data on games and teams of the English Premier League, the top football division in England. The dataset covers 11 seasons: from 2008/2009 to 2018/19.
Key variables: home team - away team goal difference, average points before and after manager change.
Used in case studies:
Identifying successful football managers
Measuring home team advantage in football
Estimating the impact of replacing football team managers

Data: Access dataset

Source 1: Game results come from football-data.co.uk website. Copyright 1: N/A Source 2 Managers data come from [Wikipedia contributors. (2020, August 15). List of Premier League managers. In Wikipedia, The Free Encyclopedia. Retrieved July 1, 2019].
Copyright 2:Creative Commons Attribution-ShareAlike License

haiti-earthquake

Content and coverage: The haiti-earthquake data includes economic indicators for Haiti and 21 other countries for 2004-2015.
Key variables: total GDP in Haiti and synthetic Haiti (bn US dollars).
Used in case studies:
Estimating the effect of the 2010 Haiti earthquake on GDP
The case study is based on Best, R., & Burke, P. J. (2019). Macroeconomic impacts of the 2010 earthquake in Haiti. Empirical Economics, 56(5), 1647–1681.

Data: Access dataset

Source: Data and code available from authors Best, R. and Burke, P. J.
Copyright:N/A

height-income-distributions

Content and coverage: The height-income-distributions data includes data on height and household income retrieved from the Health and Retirement Study taken in 2014 in the USA. The dataset has N=1988 observations.
Key variables: height and household income.
Used in case studies:
Distributions of body height and income

Data: Access dataset

Source: Copyright:

hotels-europe

Content and coverage: The hotels-europe data includes information on price and features of hotels in 46 European cities and for 10 different dates. N=148,021. Key variables: hotel price, hotel’s distance from the center of the city.
Used in case studies:
Comparing hotel prices in Europe: Vienna vs. London
How stable is the hotel price - distance to center relationship?

Data: Access dataset

Source: Authors’ collection.
Copyright: N/A

MORE about the data

hotels-vienna

Content and coverage: The hotels-vienna data includes information on price and features of hotels in Vienna for one date. The dataset has N=428 observations.
Key variables: hotel price, hotel’s distance from the center of the city.
Used in case studies: Finding a good deal among hotels: data collection
Finding a good deal among hotels: data preparation
Finding a good deal among hotels: data exploration
Finding a good deal among hotels with simple regression
Measurement error in hotel ratings
Finding a good deal among hotels with non-linear function
Finding a good deal among hotels with multiple regression

Data: Access dataset

Source: Authors’ collection.
Copyright: N/A

MORE about the data

sp500

Content and coverage: The sp500 data includes day-to-day returns on the S&P 500 stock market index. The dataset has N=2519 observations.
Key variables: percent of days with losses of 5% or more.
Used in case studies: What likelihood of loss to expect on a stock portfolio?
Testing the likelihood of loss on a stock portfolio

Data: Access dataset

Source:
Copyright:

share-health

Content and coverage: The raw data is EasySHARE version 6.0.0., N=288,736. Our share-health dataset includes information on the health of people aged 50 to 60 from 14 European countries who reported to be healthy in 2011. The dataset has N=3109 observations.
Key variables: current smoker, three categories for years of education, gender.
Used in case studies:
Does smoking pose a health risk?

Data: Access dataset

Source: SHARE Project
Copyright: SHARE. Access is provided after filling in and submitting a data user statement

MORE about the data - incl. how to get it

stocks-sp500

Content and coverage: The stocks-sp500 data consists of daily data on the closing price of the Microsoft company stock and the S&P 500 stock market index. The dataset covers 21 years: from 31 December 1997 to 31 December 2018.
Key variables: monthly returns on the Microsoft stock, monthly returns on the S&P 500 index.
Used in case studies:
Returns on a company stock and market returns

Data: Access dataset

Source: Copyright:

swim-transactions

Content and coverage: The swim-transactions data includes information on daily ticket sales of an outdoor swimming pool operating in Albuquerque (New Mexico, USA). The dataset has N=2522 observations.
Key variables: daily ticket sales, monthly binary variables, day-of-the-week binary variables.

Used in case studies:
Forecasting daily ticket sales for a swimming pool

Data: Access dataset

Source: Downloaded from the City of Albuquerque Open Data (New Mexico, USA).
Thanks to the city for help!

Copyright: Public domain. City of Albuquerque Data Disclaimer: “This site provides applications using data that has been modified for use from its original source, www.cabq.gov, the official website of the City of Albuquerque. The City of Albuquerque makes no claims as to the content, accuracy, timeliness, or completeness of any of the data provided at this site. The data provided at this site is subject to change at any time. It is understood that the data provided at this site is being used at one’s own risk.”

used-cars

Content and coverage: The used-cars data includes data on offers of used Toyota Camry cars advertised in the Chicago and Los Angeles areas, in 2018. The dataset has N=477 observations.
Key variables: price, age and type of car, odometer.
Used in case studies:
Predicting used car value with linear regressions
Predicting used car value: log prices
Predicting used car value with regression trees

Data: Access dataset

Source: Authors’ collection.
Copyright: N/A

wms-management-survey

Content and coverage: The wms-management-survey data includes data on manufacturing companies from 24 countries and was collected between 2004 and 2015. Key variables: management score, founder/family ownership.
Used in case studies: Management quality and firm size: data collection
Management quality and firm size: describing patterns of association
Founder/family ownership and quality of management

Data: Access dataset

Source: Prepared for this study by the World Management Survey project.
Thanks to Scur, Bloom and Van Reenen!
Copyright: Must reference

MORE about the data

working-from-home

Content and coverage: The working-from-home data includes information about the employees of a travel agency in China from 2010. The dataset has N=249 observations.
Key variables: employee retention, employee performance.
Used in case studies:
Working from home and employee performance

Data: Access dataset

Source: The data and Stata do-files used to replicate clean datasets and results are available from Nick Bloom’s website
The case study is based on the paper by Nicholas Bloom, James Liang, John Roberts, Zhichun Jenny Ying, Does Working from Home Work? Evidence from a Chinese Experiment , The Quarterly Journal of Economics, Volume 130, Issue 1, February 2015, Pages 165–218.

MORE about the data

world-bank-immunization

Content and coverage: The world-bank-immunization data includes data on immunization rate against measles and child survival rate in 172 countries, among children of age 12 to 23 months old, from 1998 till 2017. The dataset has N=3440 observations.
Key variables: immunization rate, child survival rate, population, GDP per capita.
Used in case studies: Displaying immunization rates across countries
Immunization against measles and saving children

Data: Access dataset

Source: Downloaded from The World Bank: World Development Indicators.
Copyright: Creative Commons Attribution 4.0 (CC-BY 4.0)

worldbank-lifeexpectancy

Content and coverage: The worldbank-lifeexpectancy data includes data on life expectancy and GDP per capita for 182 countries in 2017. The dataset has N=182 observations.
Key variables: life expectancy, GDP per capita. Used in case studies: How is life expectancy related to the average income of a country?

Data: Access dataset

Source: Downloaded from The World Bank: World Development Indicators
Copyright:Creative Commons Attribution 4.0 (CC-BY 4.0)