Abstract:
The October Household Surveys (OHS) (1994-9) and the General Household Surveys (GHS) (2002-present) collected by StatsSA comprise South Africa's only nationally-representative time series with information on both people and households for (almost) every year of the post-apartheid period. However, the quality of these data has been compromised in three ways by how the survey weights have been calibrated. We document these problems and their implications in detail; and then use cross-entropy estimation to recalibrate the survey weights for a stacked version of these surveys between 1995 and 2011 to address these weaknesses. The first of these is that the weight calibration procedure breaks with sampling practise by calibrating person and household weights separately. This creates conceptual problems because the data is not properly representative of the population. It also creates statistical problems, including that a series of total population and household counts cannot be reliably extracted from the series, which is typically a first-order output for such a time series. Secondly, issues with the benchmarks StatsSA use mean the series of household counts extracted from the GHS is probably too low. Thirdly, no compensation is made by the survey weights for the chronic undersampling of small households over the entire period. Our new weights make headway in resolving these issues. Our weights yield consistent counts of people and households benchmarked on both person and household auxiliary information for the first time; and, benchmarked counts of one-, two-, and three-person households. Work is ongoing to improve the weights.