README: cps-earnings dataset

This is a README file for the cps-earnings dataset.
Used in the case studies 9A Estimating gender and age differences in earnings and 10A Understanding the gender difference in earnings

Data source

[CPS Annual Earnings File (also known as the Merged Outgoing Rotation Groups or morg)] extracted by the National Bureau of Economic Research in the USA.

The Current Population Survey (CPS) is the monthly household survey conducted in the USA by the Bureau of Labor Statistics. Its primary goal is to measure labor force participation and employment. It interviews 50-60,000 households per month.
New households enter the CPS in each month by careful random sampling. Each household is then interviewed once in each month for 4 months. They are not interviewed for 8 months, after which they are interviewed again for 4 more months. As a result, the data collected in each month contains data from four subsamples that give their 1st-4th or 5th-8th interview in that month. The questions about usual weekly hours and earnings are asked only at households in their 4th and 8th interview; for each participating household these two interviews are 12 months apart from each other.

The NBER morg extracts include data on households who provide their 4th or 8th interview. These monthly data tables are then combined to yearly data tables.

In the household all persons of age 16 or more are included in the extract files.

No copyright restrictions; you can use this dataset for educational purposes.

About the data

Raw data tables

morg79 - morg19

  • yearly cross-sectional data tables
  • orignally in Stata format (.dta extension)
  • We converted the 2014 file into csv for our data repository.

morg14.csv

  • observations are individuals age 16 or over
  • ID variables: hhid houeshold id
    hrhhid2 2nd part of houeshold id (some original hhid-s are split)
    lineno person (“line”) number in household

Important variables

variable name info type
age age numeric
sex gender binary
earnwke weakly earnings numeric
uhourse usual work hours numeric
occ2012 occupational code (census 2010 classification) categorical
grade92 highest educational grade completed categorical

Note:

  • Labels of the variables can be downloaded from here:(http://data.nber.org/morg/docs/cpsx.pdf).
  • For the occupational classification (census 2010) codes:(https://www.bls.gov/cps/cenocc2010.htm).

Tidy data table

morg-2014-emp

  • cross-sectional data
  • observations are individuals age 16 or over, with usual hours non-missing and greater than zero and with weakly earnings non-missing and greater than zero.
  • ID variables:
    hhid houeshold id
    hrhhid2 2nd part of houeshold id (some original hhid-s are split)
    lineno person (“line”) number in household

Important variables

variable name info type
age age numeric
sex gender binary
earnwke weakly earnings numeric
uhourse usual work hours numeric
occ2012 occupational code (census 2010 classification) categorical
grade92 highest educational grade completed categorical

Note:

  • Labels of the variables can be downloaded from here:(http://data.nber.org/morg/docs/cpsx.pdf).
  • For the occupational classification (census 2010) codes:(https://www.bls.gov/cps/cenocc2010.htm).