
Debating Ideas is a new section that aims to reflect the values and editorial ethos of the African Arguments book series, publishing engaged, often radical, scholarship, original and activist writing from within the African continent and beyond. It will offer debates and engagements, contexts and controversies, and reviews and responses flowing from the African Arguments books.

Credit: WHO
What do we know about Covid in African countries? Here is one thing we know for certain. We know less that we would like to know. There are some data, but how should we approach these? Here, I provide a few pointers on studying how numbers work. But first a caveat from well learned lessons from writing about numbers in this context. Please do not read this as a “picking on African countries” type of comment. The measurement problems that I address here are generic. In Sweden, epidemiologist Anders Tegnell contested the WHO numbers when they put them in the red zone and argued that their health statistics were misread, and caseloads were driven up by their efforts of testing. In Norway and indeed the UK, the slogan has been “Data not Dates”, but when it comes to actual politics, data has had to give way for important dates for re-openings – as politicians have had as much a keen eye on dates (and the economy) as they have on data.
So please keep in mind that these lessons about Covid numbers do apply generally, but there are a few reasons why the knowledge problems are bigger in many African contexts. First off, we do not know the size of the denominator. Total population estimates are weak. The extreme example is Nigeria. We do not know the size of the population of the most populous country on the continent, and any estimate given should be taken as a guess with an error margin of 30-40 million people, give or take. It is further indicative of the general administrative records that no country has a universal record of vital statistics or civil registration; for most countries we do not have estimates of the ratio of how many births and deaths are registered as a proportion to the probable total. For most indicators of development that means that we rely on surveys – where we have semi-regular queries conducted for a sample of the population.
The representativeness and reliability of the survey are always an issue. Even for some of the best funded survey instruments, such as the Living Standard Measurement Surveys, that amongst other things give us the poverty numbers, there have been 154 surveys over 3 decades. That equates to 1.6 surveys per country, so not enough to draw a line on average, and the coverage is uneven: 43 of the surveys are from 6 countries, and only 27 out of 48 countries in Sub-Saharan Africa had at least two surveys from 1990 to 2012 with which to track poverty.
There are three main statistical sources of knowledge. The administrative record, the census, and the survey. We know that the administrative record is limited. We know that the census is unreliable and sometimes missing, and finally we know that the sample surveys that rely on the census for representativeness have uneven coverage. This is the state of statistical knowledge.
That has implications for how we should approach Covid numbers in Africa – where, according to WHO, 6 out of 7 cases goes undetected and unreported. The approach is to use knowledge of the consequences of not knowing in other domains to draw inferences of the kind of knowledge problems we should expect in the Covid numbers.
Better data usually means worse statistics
This sounds counterintuitive, but it is the same defence that Tegnell used above. The more you know, the worse it gets. This is particularly true when it comes to testing. If you are only recording actual positive cases, the more you record the worse it gets. This dynamic is particularly true for issues such as birth mortality, where the rule is that once you increase testing, surveying and sampling you find more of what you are measuring. This was the experience for infant mortality statistics. Once it was included in the Millennium Development Goals, measurement efforts intensified, and as surveys were implemented in poorer countries, or in poorer regions of countries, rates rose.
The difference between case data and inferential statistics
When testing or surveying capacity is low, one often relies on different models of inferential statistics to generate infection rates. There are key differences between case data and inferential statistics. First, you cannot disaggregate the data to individuals. Thus, the numbers are just a signal, but the signal might be noise. Operationally, if you decide to act on the signal, the data does not contain the information needed to actually track the infected cases. One starting point for thinking about asymmetries in statistical information across countries comes from unemployment statistics. In Norway unemployment statistics are generated from people reporting as unemployed in order to qualify for benefits. Thus, it underreports on the cases of those who has given up as jobseekers, but for every positive case, there is an individual datapoint with a name and a number. In most African countries unemployment statistics are not available or reliable, but where they do exist, they are drawn from a labour force survey. These statistics are generated from sample data, and you cannot disaggregate the total rate to individual data points. The total is just generated from a few observations, and depends on assumptions of representativeness and that parameters in the sample hold for the whole population. HIV and AIDS prevalence statistics in Africa were subject to the same weaknesses, and this has fed controversy and debate, such as whether the Uganda success story was as successful as the statistics indicated.
Incentives rule when recording is sparse
Statistics are political products. If recording was complete and transparent, and the final counts were subject to no disputable assumptions, relationships and guesswork we would be confident to say the numbers don’t lie. However, in reality most final official numbers are subject to judgement and negotiation. That is why saying “data, not dates” as a guideline for opening or restricting mobility might sound smart in theory, but may be foolish in reality. One can say that as soon as an indicator becomes important, it very quickly becomes useless. The recent controversy surrounding the World Bank’s Ease of Doing Business Indicator shows this clearly. The indicator became so important that countries concentrated their efforts in manipulating the index to be measured as if business was good. Focus on reforming the business environment became second priority. When you measure and the incentives of the outcomes of the measurement are easily identifiable and you are in a terrain where things are not well measured, you risk upending the measurement process. Many countries pay a hefty financial penalty if movement to and from their airports is suspended, and thus staying in the green zone is essential. This does not necessarily predict that there will be manipulation, but bear in mind that more testing might yield more cases. It stands to reason that this might dampen rather than encourage the will to get better data. We have already seen that countries such as Sweden have taken issue with how they were reported by the WHO. When the incentives to be measured as performing well are high, it may shift attention to contestation around rankings rather than monitoring the actual health situation.
Without ground data Big Data has a small impact
Anyone still remembers Google Flu Trends? It was launched in 2008, and it seemed just a matter of when, not whether, it would replace institutions such as the U.S. Centers for Disease Control and Prevention (CDC) in reporting on and monitoring flu outbreaks. In the first wave of “bird flu” it claimed it could predict flu outbreaks with 97 percent accuracy, and because it was predicting rather than reporting actual cases like the CDC it was faster and more up-to-date. It turned out that in the second wave of “swine flu” the parameters and algorithms were completely off, and in 2015 the project was closed down. In retrospect it is hard to imagine that the ambitious blueprints for “a data revolution in Development” and the claims surrounding the Sustainable Development Goals – one technical report acclaimed: “Never again should it be possible to say ‘we didn’t know’. No one should be invisible. This is the world we want – a world that counts.” – would have been possible without the belief in what kind of monitoring benefits Big Data and algorithms.
The current pandemic has reminded us that the data revolution is some time ahead of us yet, and that depending on official institutions that are linked to the actual deliveries of health services is crucial. Recent events such as the unfair and unwarranted travel bans laid on Southern Africa in response to the reporting of cases there further jeopardize this precarious state of knowledge.