README: airline-tickets-usa dataset
This is a README file for the airline-tickets-usa
dataset.
Used in case study 22A How does a merger between airlines affect prices?
Data source
Airline Origin and Destination Survey (DB1B) collected by the Office of Airline Information of the Bureau of Transportation Statistics of the United States Department of Transportation.
Data access and copyright
- The data is part of the United States Government Information System.
- You can use this publicly available data for educational purposes.
Note for all users
- y = year (for now 2011 and 2016 only)
- q = quarter (1 to 4)
- aiport code used here is the 3-letter string
About the data
Original data collection at US DT
More info about data collection as well some explanation
Raw data tables
Stata data file: DB1B_COUPONS_y_q.dta
- observations: itinerary level, n ~ 3 million per quarter
- ID variable: itinid
Important variable
variable name | info | type |
dest_id | “destination”: all airports in the route except origin | string |
Origin_and_Destination_Survey_DB1BTicket_y_q.csv
- observations: itinerary level, n ~ 3 million per quarter
- ID variable: itinid
Important variables
variable name | info | type |
---|---|---|
origin | origin airport | string |
dollarcred | dollar credibility indicator | string |
rpcarrier | reporting carrier (airline) | string |
passengers | number of passengers on itinerary (ticket) | numeric |
itinfare | airfare (total price of ticket) | numeric |
Tidy data tables
Stata data file: airline-route-panel.dta
- observations: airline X route X year X quarter level,
- n ~ 600-700 thousand per quarter (total n=18,410,466)
- route: unique combination of airports in itinerary (e.g., DTW MSP DTW)
- merged DB1B_COUPONS and DB!B_Ticket files, by itinid wihtin each quarter
- ISSUE: some 20% of itineraries don’t match, mostly only in DB1B_Ticket
- ID variables: airline (string) route (string) year quarter (created from “origin” and “dest_id”)
Important variables
variable name | info | type |
---|---|---|
passengers | total number of passengers | numeric |
itinfare | sum of airfare | numeric |
return | whether return route | binary |
return_sym | whether symmretric return route | binary |
finaldest | final destination, created from route | string |
stops | number of stops | numeric |
Notes:
- return: calculated from route, if the first 3 letters are the same as the last 3 letters
- non-return routes: last airport
- return routes: middle airport, defined only for symmetric routes
- missing value for 10% of passengers b/c non-symmetric return route
- stops in non-return routes: number of airports between first and last airport
- stops in return routes: only if symmetric route, number of airport between first and middle airport
Stata data file: airline-originfinaldest-panel.dta
- observations: airline X origin X finaldest X return X year X quarter level
- n ~ 230 thousand per quarter (total n=6,530,571)
- aggregated from airline-route-panel.dta (missing finaldest dropped)
- ID variables: airline (string), origin (string), finaldest (string) return year quarter
Important variables
variable name | info | type |
---|---|---|
passengers | total number of passengers | numeric |
itinfare | sum of airfare | numeric |
return_sym | whether symmretric return route | binary |
stops | number of stops | numeric |
Stata data file: originfinaldest-panel.dta
- observations: origin X finaldest X return X year X quarter level
- n ~ 100 thousand per quarter (total n=2,670,174), aggregated from airline-route-panel.dta (missing finaldest dropped)
- ID variables: origin (string), finaldest (string), return year quarter
Important variables
variable name | info | type |
---|---|---|
passengers | total number of passengers | numeric |
average price | average of itinfare | numeric |
return_sym | whether symmretric return route | binary |
stops | number of stops | numeric |
shareXX | market share of each (XX) airline | numeric |
sharelargest | market share of largest airline | numeric |
MORE details
The merger
- American Airlines filed for bankruptcy in November 2011.
- US Airways, a competitor, announced its intent to take over American Airlines in 2012.
- After years of legal and regulatory deliberations, the merger was allowed in April 2015. The reservation systems of American and US Airways were merged in the second half of 2015.
The data setup
- Our data comes from a very large database maintained by the Department of Transportation (DB1B data).
- It is a 10\% sample of all tickets sold on the U.S. market taken in each quarter, starting with 2010.
- The unit of observation in the data is an airline ticket. For confidentiality reasons, the date of each flight is unknown in the data – only the quarter is known.
- The variables include the airports visited including the origin and all subsequent airports, ticket price, number of passengers and airline.
- The data we use is another example of Big Data. It comes from automatic ticketing system, it is complicated in nature, and it is very large. For a single quarter, the raw data on tickets has about 3-3.5 million observations. The total data used for the case study has the size of around 15GB.
Market definition
Defining the market in this setup is not straightforward:
- Complicated trips. Here considered routes with the same origin and the same final destination as one market.
- One-way tickets versus return tickets. More than one third of the tickets are return tickets. Consider return tickets where the final destination is reasonably clear and drop the other return tickets.
- Market one-way: a market defined by the origin and the final destination = last airport here.
- Market return: selected routes with a clear middle airport only: routes with an odd number of airports. We dropped all other return routes.
- This affected many routes
- Affected less than 10\% of the passengers.
- Asymmetric routes have different airports between the origin and the designated destination and may have a different destination in fact, but this is the less likely case and thus we decided to keep them.
Summary
- Summary: Markets are defined by their origin airport and their destination airport, and whether they are one-way or return routes.
- Size: 500 airports in the U.S.A., 2 types (one-way, return) = 500,000 possibilities, passengers flying in less than 150,000 actual markets in our data in any year.
- Most of these markets have very few passengers and a few markets have many passengers
- number of passengers was 7
- mean was 170.
- The JFK–LAX one-way market had 31 thousand passengers
- Variation in markets. We defined small markets as those with less than 5000 passengers, 99\% of the markets are small according to this definition, with around 60\% of the passengers.
- Average price of an airline ticket: skewed, long right tail
- For some markets, the data shows zero average price, which are likely errors - very small, with 450 passengers altogether across the two years, and we dropped them from the data.
- Diff-in-diff data panel: There are around 140 thousand markets in both 2011 and 2016;
- 113 thousand are in both years
- 30 thousand are only in one of the years.