Covid-19: Analysing The Spread Across Populations

Written by AIRam | Published 2020/04/02
Tech Story Tags: analytics | coronavirus | data-science | machine-learning | big-data | dashboard | ml-top-story | hackernoon-top-story

TLDR The current coronavirus disease, Covid-19, has been called a once-in-a-century pandemic. As of April 1st we have 935,817 confirmed cases, 193,700 recovered and 47,208 dead. The data will never be perfect, the true cases are likely much larger as the testing frequency and effectiveness vary in different regions. From a population point of view, countries like Spain, Italy, France and Iran have had more impact than other countries that had bigger population.via the TL;DR App

A large portion of mild and asymptomatic cases may go unreported. The data will never be perfect, the true cases are likely much larger as the testing frequency and effectiveness vary in different regions.

Overview
The current coronavirus disease, Covid-19, has been called a once-in-a-century pandemic. As of April 1st we have 935,817 confirmed cases, 193,700 recovered and 47,208 dead. The USA has claimed the undesirable distinction of most Coronavirus cases worldwide with 216,154 cases.
In many countries including the US, people are only being tested if they have symptoms. That means a large portion of cases may go unreported if they are mild and asymptomatic.
The data will never be perfect, the true cases are likely much larger as the testing frequency and effectiveness vary in different regions.
We see every day in the media, the number of people who have tested positive for Covid-19 and the number of fatalities. Looking at these shocking numbers, a question that comes to mind is the correlation of these numbers to the overall population. Some of the variables include
  • Percentage of population affected
  • Test performed in comparison to the population 
  • Impact of population density
  • Comparing percentage of population tested 
The source for the data I used for my analysis is here. I enriched this data with world population. Source for the population data is here.

Population affected by coronavirus

Here I only included countries with confirmed cases of more than 15,000 as of March 31.
Confirmed cases, Population and Affected population sorted by Total cases.
As you can see, as of now, USA has the most amount of Covid-19 cases even though it is second in population after China in the chart and the % of affected population is not as high compared to Italy, Spain or Switzerland
Confirmed cases, Population and Affected population sorted by Population
Even though China is the country with the highest population, the number of cases is 50% lower than the USA and only a small % of the people were affected. The USA has close to ¼ of the population of China, but has 2.3 times the number of cases and 10 times more % of people affected than China.
Confirmed cases, Population and Affected population sorted by Affected Population.
From a percentage of population affected, Spain, Switzerland and Italy ranks in the top. 
The above chart is of March 31..
Please note that due to limitations to testing, this number obviously excludes people who are affected but were not tested. 
Comparing impact between USA with other countries
One of the next questions to address would be, how does the USA compare with other countries.
In the chart below, the last 3 columns depict the numbers in comparison to USA numbers. For example, the third row for the country Italy shows that it has 0.56 times the number of cases in the USA, has 0.18 times the US population but has 3 times more people affected per million population compared to the USA..
Ratio of cases, population and affecting population in various countries as a ratio of USA sorted by cases affected ratio
Even though the countries such as Spain, Switzerland, Italy, Germany and France have a smaller population ratio and a smaller number of cases, if you look at the affected population ratio, that number is a lot higher than the USA.
Geographical view of these numbers
The following is a geographical representation of the map based on population.
The dark blues are the ones one may expect will have the biggest impact. 
See the same plotted below overlaying the cases instead of the population.
The following is a geographical representation of the map based on the total cases. The dark blue color in the map below are the regions where the number of cases are raising fast.
Source: Google
As more testing continues to happen it will be interesting to see how the countries in light blue do over time. From a population point of view, countries like Spain, Italy, France and Iran have had more impact than other countries that had bigger population. 

Impact of testing volume

Population may not be the only factor. It’s also how many people are tested per million is also an important factor to consider. The more the number of tests per population, the better the trust in the number.
Let’s take the USA for example. As of March 31, we have done 1,048,971 tests in USA with a total population of 331 million which is close to 3,169 people being tested for every 1 million population with 18% testing positive. 
On the other hand, South Korea has done 410,564 tests with a population of 51 million during the same period and that represents close to 8,050 people are being tested for every 1 million which is 2.5  times higher than the rate USA is testing. South Korea has 4.8% positive testing rate compared to USA which is at 18% rate.

Impact of Population density

Even though it appears Italy is affected the most, one important factor this does not take into consideration is the population density.
The more important factor here would be population density in an area where there are a significant number of coronavirus cases. For example, in NYC the population density is like 10,000 people per square kilometer while in London its 4,542 people per square kilometer. As of March 31, we have 43,139 cases in New York City versus 25,150 cases in London. 
On the other hand, in places in India like Dharavi in Mumbai the population density is something like 280,000 people per square kilometer. Population density & the effectiveness of quarantining will play a significantly larger role in how the virus spreads than the total population of the country.
Population density by itself is an interesting topic to get more analytics. I will cover that in detail in a subsequent article.

Written by AIRam | Product Management
Published by HackerNoon on 2020/04/02