COVID-19: We Need More Than Data, We Need Insights!

Written by federico | Published 2020/04/08
Tech Story Tags: data-science | data-analysis | hackernoon-top-story | covid-19 | living-in-lockdown | corona-data | south-korea | data-analytics

TLDR We are managing the pandemic situation only with part of the data and not necessarily representative of reality. We must take a census of the number of positive and negative cases within a population. In the long term, the strategy of aggressive testing (South Korea model) is the only viable and sustainable to manage coexistence between the virus and the human beings until a vaccine will be available. The main problem is that if we have data in the situation of the two football players from before, we completely lack insights.via the TL;DR App

TL;DR We are managing the pandemic situation only with part of the data and not necessarily representative of reality. We must take a census of the number of positive and negative cases within a population. The officially reported positive cases contain a bias: they are cases that already manifest the disease in a more or less serious way. In the long term, the strategy of aggressive testing (South Korea model) is the only viable and sustainable to manage coexistence between the virus and the human beings until a vaccine will be available.
What's the difference between a data set and a list of insights? More or less the difference exists between the ingredients of a recipe and the actual explanation on how to execute the same recipe. If you just read a list of ingredients you can subjectively interpret the execution of the recipe, while if you follow the instructions then you will make sense of the whole list of ingredients and you can touch the final result of the recipe.
Another example, we are all athletes and in Italy, we are all "football coaches". Today I propose you to buy two strikers for your favorite team. You can choose only one of them, but I'll give you some data: one has scored 25 goals, while the other has only scored 10. Which one do you buy? The problem is that we only have data but we lack insights. If I told you, the first one scored 25 goals in 100 games, while the second one scored 10 goals in 15 games, then we can calculate the average goal per game and get a precise insight into the performance of the two players. Having the time dimension, we can also compare the figures on a common basis and generate more insights.
The first day I saw the John Hopkins University dashboard with my daughter, I realized that the only number, probably, certain was the number of COVID-19 deaths. Analyzing in detail the dashboard I understood that it listed several data but did not allow us to extract insights that would allow us to better understand the situation. Everything started to be very clear when Germany brought in the first cases and it reports its numbers. My sister's question, who lives in Veneto (Italy), was: "But are the data from Germany right in your opinion?"
The main problem is that if we compare the data in the dashboard we have the situation of the two football players from before. That is, we have two data, but we completely lack insights. What are we missing?
Let's compare the first data that would allow us to dimension the problem and understand something more: the population that we are observing. In statistic, we define N the whole population that we are observing, while with n (n minuscule) we define the sample extracted from that population we will use to perform our statistical calculations and then we can infer the results on the whole population. I am talking about inferential statistics, but let's not get lost in technicalities, for the moment.
So if we take Switzerland as an example we should say that the population we want to observe is N = 8,544,527 million. This according to the official statistics of the Confederation as of 2018 . We have 3.8 million households, 87,851 births, and 67,088 deaths.
To those sets of data, we must now add the data we know about COVID-19, and here the first problems begin. The cases are saved at Cantonal level, and then sent to the Confederation, so there is a certain transmission delay. To get the last count I logged on to the corona-data.ch site at the moment I am writing about 11,888 registered cases and 193 deaths.
Unfortunately, we are missing a fundamental number to be able to create insights: the number of tests carried out on the population. The only two certain numbers are N (8.5 million) and the declared deaths 193. At the moment we are talking about a mortality rate on the entire population of 0.002%, against an actual mortality rate in 2018 of 0.78%. Seeing it with only these two numbers, the death rate for COVID-19 is not the main cause of death in Switzerland at the moment, on the contrary.
But at this point, the main problem arises. To create serious insights, one should know how many of the 8.5 million are positive and how many are negative to the virus at this time. Unfortunately, we do not know it. The test is only done for people with symptoms, you can see the announcement of the University Hospital of Zurich here.
We need to consider those "missing tests" also in the light of what the professor of Clinical Immunology at the University of Florence Sergio Romagnani said to the Italian newspaper La Repubblica on the basis of the study on the inhabitants of Vo' Euganeo (Veneto) where the 3000 inhabitants of the town were tested: "The vast majority of people infected by Covid-19, between 50 and 75%, is completely asymptomatic but represents a formidable source of contagion".
As in the case of our two hypothetical players, we are missing a fundamental piece of information: the number of games played to score those goals. In our case, the number of infected cases in Switzerland. That is, how many of the 8.5 million are positive and how many are negative? With this information, we would understand exactly how high and dangerous the infection is... how many people are carriers of the virus, how many were positive and healed without being hospitalized, etc.
Consider also "At Vo' - professor Romagnani points out - with the isolation of infected subjects, the total number of sick people fell from 88 to 7 (at least 10 times less) within 7-10 days. The isolation of the infected (symptomatic or not symptomatic) not only proved to be able to protect other people from the infection, but it also appeared to be able to protect from the serious evolution of the disease in the infected subjects because the healing rate in the infected patients, if isolated, was in 60% of the cases equal to only 8 days." Unfortunately, we lack this key insight.
A glimmer of light comes from the University Hospital of Zurich, where Adriano Aguzzi, head of the Institute of Neuropathology, is experimenting with a test to be able to census virus-positive cases much faster. For the time being, it is necessary to rely on empirical numbers without a solid statistical basis.
This is a gap that needs to be filled immediately and with very high priority, without this data and without a vaccine that could arrive perhaps in 18 months, we cannot manage the situation rationally. 
The same must be done in all the other countries. Without this data, for example, it is impossible to give a logical explanation for the enormous difference in positive cases and deaths between Germany and Italy.
What we should have under control is a system for which we could have the following data available: population (N), total negatives, total positives, hospitalized, hospitalized in intensive care, deceased. With this set of certain data, we could assess the situation that at the moment is very confused. Valuable insights could be extracted to manage the situation correctly not only from health but also from an economic and social point of view.
Let's go back to the known facts. The Swiss government announced important and drastic new social-distancing measures on Monday, 16 March 2020 to fight further propagation of this novel virus. The spread of SARS-CoV-2 must be slowed dramatically and immediately. The Italian situation demonstrates how quickly the healthcare system can be overwhelmed. The number of deaths from COVID-19 in Italy (8'215 - 27 March 2020) is already surpassing that in the whole of China (3'291) and unfortunately Spain is growing faster (4'365).
Basic epidemiological models of the spread of the SARS-CoV-2 virus suggest that owing to its contagiousness and the lack of immunity in the population, 40-70% of the population could become infected unless strong measures are taken. Data from China and Italy indicate that a sizable fraction (5-10%) of the symptomatic cases will need hospitalization. The overall fraction of SARS-CoV-2 infections that cause serious illness or death is still uncertain, but mortality from COVID-19 increases with age and exceeds that from seasonal influenza. (Source Swiss Medical Weekly - https://smw.ch/)
Until an efficacious and safe vaccine becomes available - with even the most optimistic estimates putting this at 9 to 18 months - the only way to prevent the above scenario is to control the spread of SARS-CoV-2. While strict social distancing measures are necessary, nobody can imagine such measures being enforceable for extended periods.
In the absence of pharmaceutical measures, the only way to return more quickly to normal life is to keep the spread of the virus under control by preventing transmission, and with active, forceful and rapid extinction of local outbreaks. A liberal strategy for testing, contact tracing and subsequent self-isolation of individuals who test positive for SARS-CoV-2, and precautionary self-isolation of close contacts, is critical to achieving this goal. (Aggressive Testing Strategy)
We need a robust dataset to manage the situation by making decisions based on insights, not just a list of messy data. What do we need?
We have the total number of population (N), we have the total number of deceased, we have the total number of hospitalized cases, and we have the number of hospitalized in intensive care. These are the numbers that are safe enough in our hands. What we need quickly is the number of infected to be able to distinguish positive from negative. This would allow us to create a certain number of infected, the share between not infected and infected, and the exact number of infected hospitalized and hospitalized in intensive care.
To all these data we can then apply the temporal and geographical dimensions - if available - to create both statistical models and machine learning models to support decisions rationally with the strong help of numbers. The idea is to be able to manage the situation like at Vo Euganeo in Italy where, thanks to aggressive testing, it was possible to reduce the number of infected people to zero in 2 weeks.
I will never be enough tire of repeating it, we must implement a strategy of aggressive testing, otherwise the long-term situation, both health and social but especially economic, will no longer be sustainable.
Interesting sources to assess aggressive testing strategy:

Published by HackerNoon on 2020/04/08