Democracy and data following Covid-19

Updated: Apr 28

The quick spread of Covid-19 and subsequent confinement regulations have meant that economic activity has shifted at historically quick speeds. Our best indication is that unemployment across the developed world has likely increased by double digits in a matter of weeks, and we are likely to have had one of the sharpest declines in GDP in history. Luckily, we live in a time when information collection over the internet is both abundant and quick, as businesses have improved their analytical capabilities dramatically in the past decade. However, the level of uncertainty in the economy is still extremely high due to the fact that all this data is held privately.

Both Mastercard and Visa offer data analytic services where they offer up to date GDP projections based off of their own privately collected customer data. This data is not available to government, international, and independent policy organizations for their use in publicly published reports. Nor do most academic institutions have subscriptions to these services. Google notoriously hired Hal Varian, one of the most prolific microeconomists currently practicing, back in 2002 exclusively to focus on their privately collected data.

The Federal Reserve has recently teamed up with ADP, a payroll processing company, to augment their understanding of unemployment figures, and some academic economists have gained access to individual datasets on a case-by-case basis (see here for another example using data from BBVA bank). These exceptions only highlight the wide trend of private data proliferation and hoarding while publicly available data continues to rely on costly, labor intensive, and slow surveys. Next week, we will receive US consumption data for March, 4 weeks after the end of the period concerned. Unemployment comes to us a bit faster: we have the survey data in the US on the first Friday after end of the month concerned, but GDP has a delay: we don’t see preliminary estimates for January until the end of April for most countries (this is partly due to the fact that it is reported quarterly).

There are a wide variety of pressing policymaking topics with which high-frequency, user-sourced data could really help. For one, understanding regional production and consumption declines from the Covid-19 confinement could help target government response at a geographical level. I have also written in earlier posts on how the speed of the recovery is likely to depend on bankruptcy rates and continued disruptions to supply chains. Real-time data monitoring of these trends by industry and location could allow for much more nuanced and targeted policy in this respect. Real-time aggregate price and sales quantity data on key consumption goods could allow policy-makers to understand where particular items were in short supply. Much of the current data infrastructure in the United States dates back to the command economy of the Second World War, and similar efforts to improve our data infrastructure would be similarly useful as the government takes a more active role during this crisis.

The private hoarding of data should be seen as a market failure, requiring policy intervention to rectify the problem. There are a series if economic issues for allowing private corporations exclusive access to private data:

1) Information asymmetries: this is a huge issue in economics, and it is the reason why insider trading can lead to jail time for millionaires and why your car loses a significant value as soon as you drive it off the lot. If a seller of a product knows more about the product or market than the buyer, then the entire market mechanism is undermined (lack of information asymmetry is actually one of the conditions behind the fundamental theorems stating when markets are efficient). If the big banks were able to use their customers’ consumption patterns to guide the investments of their hedge funds, would other investors ever want to buy a stock that they had decided to sell?[1]

2) Natural monopolies: since the benefit of an aggregated dataset is only felt if the sample size is large enough, it gives a distinct advantage to companies that command a large number of customers. This creates network externalities, or value beyond the market price of a transaction due to the fact that other people also use the same supplier. Meanwhile, the marginal cost of collecting data on one extra customer is negligible. These effects create economies of scale, or economic conditions naturally favoring larger companies. All of this creates a natural landscape for monopolies, decreasing bargaining power of the individual customer in the long term.

3) Social benefits: economists who study technology development love to make a big deal over the fact that information is non-rival, meaning many people can use it at once without harming its value. If economic policymakers were to have access to the data used my credit card for their marketing purposes, their use of it would not inhibit the marketing managers at my credit card company from using the data to improve their products. Within the class of non-rival goods, there is a further distinction of whether those goods are excludable, or whether owners can make others pay for access. Non-excludable, non-rival goods are the definition of public goods, whereas excludable, non-rival goods are called club goods. Thus, this data is something that naturally could be a public good if we allowed it to be, allowing everyone to use this information to make better policies, better advertising tools, better academic papers, and better public understanding of social issues. Alternatively, it could be used by a few people to make their friends rich.

4) Moral hazard: when it comes to devising data regulation policy, this is the issue that I think is overlooked the most. In economics, moral hazard exists when there is an opportunity for an agent to act secretively. Under these conditions, economists believe that we have to just assume people will act selfishly, often at the detriment of other people. When data-gathering happens within the confines of a private organization, where the information gathered is itself hidden from view to give it value, it is secretive by nature. And the scope for hurting other people with data is enormous. As a result, all the legal limitations imposed by the government have the fundamental issue of enforcement. If we can’t see whether someone is doing something illegal, then how effective can the law really be?

5) Income inequality: this is really just an accumulation of the effects above. But together, these effects have a profound effect on our democratic institutions. In an economy in which information is a key source of growth, concentrated in a few monopolies, not monitored properly for exploitation, and kept from socially beneficial uses, the overall effect is one where privileged people leverage their wealth and power to hoard and exploit data.

The frustrating thing about data privacy policy is that there is an easy and simple solution: require all firms engaged in aggregate data collection and analytics to provide aggregated and anonymized data publicly for free. If they collect it at an aggregated level, then they should have to share it. That would provide a much larger amount of transparency to what is now an extremely opaque practice, with little regulatory oversight needed. People would be able to see for themselves what data was being collected. If a business is able to identify people as ethnic minorities or LGBT for statistical purposes, then it would be there for everyone to see, and we would see the purposes for which they use that information, as well. You could call it the social cost of keeping people’s personal data.

The current dialogue on data privacy misses this issue completely. For example, the OECD guidelines on data privacy include an “openness principle,” in which they claim data privacy laws should require “establishing the existence and nature of personal data, and the main purposes of their use, as well as the identity and usual residence of the data controller.” This simply says that companies should be open about what data they have and where they put it. There is no suggestion that the data itself should be shared for any public benefit, nor is there any indication that they believe that there is any social responsibility for people collecting the data.

Critics will argue that such a requirement would eliminate the value of collecting such data, depleting the overall supply. They would also claim that allowing access to that data would be cumbersome and expensive. This latter point is probably not a big issue, as most businesses would have very little demand for their data, and the large companies that would have the bulk of requests already have large compliance teams for other types of regulatory burdens. I see no reason why data collection should have a lower priority than other compliance issues, and the higher costs associated with size would serve as a natural counterbalance to the economies of scale of data gathering mentioned above. As I was writing this post, the Economist newspaper published an article detailing how Microsoft is trying to push for data-openness, creating data-sharing groups and sharing some of its own data. They cite Microsoft's president's claims that fewer than 100 firms collect more than half of data generated online, even claiming that it would "counteract the concentration of economic--and political--power."

As for the incentives to collect the data in the first place, the answer to this is simple: most of these companies already claim that their value comes not exclusively in the access to data (as they realize that this would sound like extorsion), but to their brilliant market insights based on models and industry experience. They could still charge for their models and industry insight, but the underlying data would have to be publicly available. Make them compete on a level playing field, and see whether their models are really that good. If anything, this would bring competition into the consulting sphere, a benefit to anyone arguing that the free market should be allowed to operate unperturbed.

[1] This specific example is illegal in the US and Europe, but it illustrates a fundamental issue of big data and informational asymmetries in markets.