20 September 2016

Research lessons from Census 2016 - making sense of a senseless fail

Let's admit - it's a trainwreck 
The ABS is threatening two million households with fines if they do not complete their census at the same time as being "adamant the quality of data has not been compromised."

It's not so long ago that they reassured the public they would be able to cope with the demand on census night. They were wrong then - and they are wrong now.

The data quality has been hopelessly compromised. For one thing, at this point in time, there are two million out of "close to 10 million dwellings"  that are yet to complete. That is, 20% who have not completed yet.

In the short run, #censusfail was about a data collection problem. The website for collecting the census data was inaccessible on the night of the census at the time that most people would have completed the census form. It remained inaccessible for a further 48 hours. Even longer for some.

The initial response to this colossal data collection glitch was a flurry of fingerpointing and promises that "heads will roll."

However, the bigger problem appears to have been totally overlooked. The real crux of #censusfail is less a data collection fail and more a data quality fail.   

Two million households or 20% of households have not completed at this point. The first threat to data quality that this creates is a non-response error, or in ABS terminology, "undercount".

The statistics derived from this 80% sample will be wrong - as acknowledged by ABS itself.

In the best case, the error is simply noise. That is, the people missed are the same as everyone else, and the final estimate might be a little higher or a little lower than the true number (we don't know which), but the estimate is "unbiased".

In the worst case, the estimate will be systematically wrong, that is biased. For instance, we can guess that a large proportion of those two million non-responders were those who chose to do the census online and were foiled by #censusfail.

So are those who chose to do the online version systematically different from those who chose to do the paper version? Do we actually have a nonresponse bias as opposed to simple noise?

You bet. We can guess that the non-responders are more likely to be younger, live in major cities, have children under 15 years, etc. A guess based on nothing less than ABS research.

So now those who have not completed the census will bias the results: we know the characteristics of non-responders will be under-represented, we know our picture of Australia will be wrong. We don't know by how much.

However, there's still time to complete. Maybe the ABS will deliver on its threat of issuing fines of $180/day until non-responders complete the census as required, but this simply introduces respondent error.

Respondent error is when people do respond, but respond incorrectly. There are two forms of respondent error which can be intentional or unintentional.

Frustrated respondents might intentionally falsify or fabricate responses - and who wouldn't be angry after the outtage on census night and beyond, suspicions about the reasons for #censusfail, fears about privacy, and general heavy-handedness with threats about heavy fines for not completing the census, not to mention simply going through the motions (i.e., making up responses) so as to avoid the fine.

Or perhaps it is unintentional, they have just forgotten to complete the census. But if that is the case, how likely are they to remember who was staying in the household that night and who was not, not to mention remembering (if they ever knew) all their relevant details: age, previous addresses, religion, race, occupation, income, education.

Perhaps you would not be frustrated and wouldn't forget - but then we'd have an over-representation of saintly types and mnemonists. You may not be a  problem, but a successful census is a group effort. Can you really be sure that everyone else did their duty and did it dutifully?

Too big to fail

Bigger than the fail itself is the reluctance to admit to the extent of the fail. As one columnist at The Australian noted, it is a mystery why the government has not acknowledged that the 2016 census is a complete loss. 

The reason for the non-admission is perhaps that the census is simply too big to fail. The data provided by the census are very important.

In the words of the ABS, the census provides "a snapshot of Australia, helping to shape our nation’s education health, transport and infrastructure", and allows us to "plan for the future".

The census provides data used by and guiding government, businesses, and social institutions.

But the data from the 2016 census will almost certainly be inaccurate and will therefore presumably misguide. 

What do we do?

First, let's admit we got it wrong! We messed up the data collection at a critical time. And then, we have almost certainly spent much more money than initially planned in an ultimately doomed effort to get complete data - even though that pressure is very likely to encourage intentional respondent error. We have poor data quality as a result.

Second, let's resolve not to do that again. Census 2016 was a write-off. Maybe online is trickier than we thought. Maybe we should admit when we have it wrong and question the value of sending good money after bad, of pressing on when the process is doomed.

Maybe we should rethink census altogether. A sample could do as good a job as a census. Indeed, at this point, a survey conducted on a random sample of the Australian population will almost certainly return a truer result than the 2016 census. And at a fraction of the price.


  1. Anonymous21/9/16

    Hmm - Isn't Big Brother watching us anyway? why do we need to supply this extra so called mandatory insight into our souls - surely we have given enough information already - via - passports - licence - birth certificates - tax file & medicare # - bank cards - google - outlook - Medical data - social media (to boot) financial institutions, conveyancing, property management - the list goes on.
    My question is why do we need to hand over this extra information that is going to be incorrect if our personal information is already exposed/gathered. Very curious.

    1. Good point - don't we already have all this data? The answer however is that we may have much of this information about people (eg, we know Australia's population from birth and death records, not from the census), but we don't have it all in one source. ABS might be able to 'scrape' all the data from multiple sources, but it is hard to match all the data up so we have a complete picture of each individual - and that's important. However, this does not change the fact that this census was a colossal fail, and anyway, we ought to use what we know from sampling science - a sample can be as good as a census. Even better in the case of census 2016!