Abstract:
This paper is concerned with conducting univariate multiple imputation for employee income data that is comprised of continuously distributed observations, observations that are bounded by consecutive income brackets, and observations that are missing. A variable with this mixture of data types is a form of coarsening in the data. An interval-censored regression imputation procedure is utilised to generate plausible draws for the bounded and nonresponse subsets of income. We test the sensitivity of results to mis-specification in the prediction equations of the imputation algorithm, and we test the stability of the results as the number of imputations increase from two to five to twenty. We find that for missing data, imputed draws are very different for respondents who state that they don't know their income compared to those who refuse. The upper tail of the income distribution is most sensitive to mis-specification in the imputation algorithm, and we discuss how best to conduct multiple imputation to take this into account. Lastly, stability in parameter estimates of the income distribution is achieved with as little as two multiple imputations, due largely to (a) the small fraction of missing data, in combination with (b) reduced within- and between-imputation components of variance for imputed draws of the bracketed income subset, a function of the defined lower and upper bounds of the brackets that restrict the range of plausibility for imputed draws. This is a joint SALDRU and DataFirst working paper
Description:
Multiple Imputation, Coarse Data, Income Distribution Classification-JEL: C15, C83, D31