Big Questions for Big Data and what it can do for African Economic Development – By Morten Jerven
Is your public health information system broken? Not to worry, Google Flu Trends has got it covered. Is your autocrat overstating economic growth? Fear not, true growth can be measured from space with satellite imagery. Has more than a decade passed since poor households in your country were surveyed? Relax, these data can be harvested off mobile phone payment records.
For every dire knowledge problem, “˜Big Data’ and the “˜data revolution‘ seem to provide a solution. According to the UN’s “˜A World that Counts‘ report on the “˜data revolution,’ poor numbers will soon be a problem of the past: “Never again should it be possible to say “˜we didn’t know’. No one should be invisible. This is the world we want – a world that counts.”On the contrary, I maintain that it should always be possible to say “we didn’t know”.
The authors of the report can perhaps be excused for reverting to sloganism; the phrase “˜a world that counts’ was simply intended to make a report on data for development exciting, but what’s next? Poverty Reductions Strategy Papers: “˜You can’t beat the feeling’ or Data Quality Assessment Framework “˜Because you’re worth it’? Call me old fashioned, but I preferred it when background documents from international organisations were boring.
To think that one can simply count and therefore know everything is a misleading notion. Not knowing something is fine, but thinking that you do know when you don’t is foolish. And if there is one narrative that is guilty of this particular brand of hubris it is the one on Big Data.
In large part, the enthusiasm comes from the unknown bounties that “˜big’ and “˜open’ data will bring. The narrative on Big Data and what it can do for development is both hyped and confused. This is partly because you can never be sure what Big Data actually means. It is a moving target (what was a “˜big’ dataset last year is “˜small’ next year). Moreover, it is often used as shorthand for application, collection and transmission of old “˜small’ data with new information and communication technologies.
Here I present some big questions for Big Data and its use for African economic development. I suggest that before we get any good answers to these, we are still stuck with some of the same old knowledge problems we had before the advent of Big Data.
In an excellent post for the World Bank’s Development Impact blog, Florence Kondylis outlines one immediate payoff of new technologies for researchers based in the US and Europe: “Instead of writing large grants, spending days traveling to remote field sites, hiring and training enumerators, and dealing with inevitable survey hiccups, what if instead you could sit at home in your pajamas and, with a few clicks of a mouse, download the data you needed to study the impacts of a particular program or intervention?” As it turns out, there are many real world obstacles that have to be negotiated before this dream of doing field research in one’s pajamas can become a reality.
How do you overcome the need for a benchmark?
The genius of satellite data is that you can observe things like agricultural production from space; Kondylis shows how maize crops in Mexico can be measured quite accurately in this way. But there’s a catch: This only works if you already know what (and where) something is being grown. It gives an approximate measure which isn’t small enough to detect the kind of yield changes that very closely watched interventions capture, and moreover as anyone who has tried to use Google Maps to get around in small, old towns can attest to – handheld GPS devices measure with some inaccuracy and delay. The technology is not currently appropriate for the study of small-holder mix cropping in sub-Saharan Africa (and that’s even if you had an extremely accurate agricultural census as a benchmark study). Less than half of African countries have conducted such a census in the past decade.
This is true for even the most impressive and promising Big Data applications. The MIT Billion Prices Project captures prices across the world and aggregates these to give an alternative to the official consumer price index measure of inflation. The project has measured its success by how close it is to the official CPI. Without a benchmark and an official process of agreeing upon appropriate weights, this would just be an ungrounded collection of many prices.
How do you overcome sample bias?
The common defense of Big Data is that when it is a big enough measurement errors don’t matter. That may be true, but sample bias still does. One celebrated Big Data application was the idea of using phones to detect potholes in roads so that road maintenance departments could accurately devote resources to fixing them. But this approach hit a bump of its own when it became apparent that this would mean that road departments were reacting more quickly to fixing problems in neighborhoods with higher smartphone density. This could be the case in any situation where a simple data and sampling bias gets aggravated and translated into a serious problem of political bias. The uncounted in official statistics are also uncounted in Big Data.
Our knowledge problem in development statistics is double biased. We know less about poor countries, and less about the poor people in those poorer countries. This knowledge problem is replicated in Big Data and in fact might be worse. A traditional survey misses, by design, criminals, the homeless, refugees, nomads and the sick – if you do a survey on mobile phones, or capture passively exhausted data from smart phones, you miss most of the population, and you lose the poorest part (most of these people live in rural areas).
How do you get states and commercial operators to share data?
It is my impression that while we want all data to be open, some data are more open than others. The irony of this was well displayed when the African Development Bank paid McKinsey to tell them how much it would be worth to have open data. If McKinsey could have made those estimates public, then AfDB would not have had to pay them to begin with. Note as well that this evaluation was on the value of open public data, but not the opening up of commercial data.
Since the publication of my book Poor Numbers, I have frequently sat on panels with investment bankers, who operate on one hand as salesmen of sovereign bonds issued on private markets for profits, and on the other hand as writers of briefs and reports on the rise of economies and their middle classes on the African continent. One line I hear often is: “We have access to other data”. These data come at a premium and are only available to paying clients. The bigger the problem with official statistics, the bigger the premium on private data, and currently datasets on African business is using the toothbrush approach to data management: no sharing.
Should we expect it to be any different when it comes to other Big Data? What would compel the big companies to suddenly share all their data to states, citizens and competitors? Big Data is being hoarded with the hope that someday it will have big value. Even if all benchmark problems and bias data problems did one day disappear, is it conceivable or even advisable for policy-makers and statistical officers to entrust data availability, validity and trends with commercial operators?
When Google Flu Trend data predicted flu spread faster and with better accuracy than the official statistics from the Center of Disease Control, the correct policy implication seemed to be to scrap the official statistics approach and go with the Big Data. The algorithm was based on symptoms – online mentions of flu – rather than actual cases of flu. Then the second flu came along, and Google Flu Trend massively overestimated its spread. When we were wondering how fast and how many people Ebola had spread to researchers did turn to mobile phone companies for help, but they do not share the data, and we had to rely on to conventional information systems operated through the public health system.
In the best of worlds counting and accounting are related, and through this process we may end up with actual accountability. But there is no guarantee of this chain of events. If you separate who counts and who is responsible then there is no one who can be held accountable. There is one view of the world that seems to think that there is objective data and true evidence out there and all we need to do is to collect it and we will know everything. I do not subscribe to this view. Data are social products, and Big Data, just as official statistics, is a fingerprint of the place we live in. Our knowledge of the world is structured by current patterns of power and poverty – both in “˜small’ and “˜big’ data.
Morten Jerven is Associate Professor at the Simon Fraser University, School for International Studies. His book Poor Numbers: how we are misled by African development statistics and what to do about it is published by Cornell University Press.
I largely agree, and I otherwise support Big statisticians approach when they scientifically analyse how Big Data can contribute to the production of official statistics.
My only point is that if a state considers big companies data holdings strategically vital for the production of official statistics and the national use of it, then existing or new national laws will permit access, this backed by principles 5 and 6 of the UN fundamental principles of official statistics. Note that actually, the issue of combining data held by different operators is never raised by Big data aficionados as if they will provide the same answer to a questioning.
Sincèrement
Gérard Chenais