How Data Analysis Helps Unveil the Truth of Coronavirus

Written by xyng17 | Published 2020/02/09
Tech Story Tags: coronavirus | data-analysis | web-scraping | hackernoon-top-story | datasets | analysis | machine-learning | machine-learning-uses

TLDR The public needs to have a fair-minded understanding of this outbreak with transparent data sources. The goal of this article is to collect data from primary sources, and make the data reliable and transparent. The death rate in Hubei province by far is 2.7%, compared to 0.19% elsewhere in China. The data indicates the number of infected cases increases without any sign of slowing down. The number of suspected infections declines steadily, indicating the overall suspected infection pool has been shrinking as they have turned into confirmed cases.via the TL;DR App

These days we are all scared of the new airborne contagious coronavirus (2019-nCoV). Even if it is a tiny cough or low fever, it might underlie a lethargic symptom. However, what is the real truth?
On January 28th, someone posted a tweet falsely claiming a coronavirus case was confirmed around the Lorenzo housing in USC, a populous place with Chinese international students. Then, another tweet came along claiming his friend’s roommate brother girlfriend also got infected. People retweeted and panicked. Later, the University clarified this was a mix-up, that no coronavirus cases had been suspected or confirmed. 
This is how rumors get spread rapidly and eventually become “fact” --- when people are ignorant regarding the truth. I thought it’d be necessary to collect data from both official and unofficial sources, and stay impartial. More importantly, the public needs to have a fair-minded understanding of this outbreak with transparent data sources. 
The goal of this article is to collect data from primary sources, and make the data reliable and transparent. As we collect more accurate information, it would help the public discover the facts and restrain extreme opinions. 

Collect Data From Primary Source

I use a web scraping tool to save the efforts from building a scraper in order to pull data from each website. There are many options but I found Octoparse to be the best. They recently created a scraping “recipe” to extract live data from that China Healthcare Department’s database. This is so much easier as I don’t even need to configure the task as most scraping tools require, making the data more accessible to everyone. 

Data Analysis

I collected data ranging from January 22nd to February 4th. The data indicates the number of infected cases increases without any sign of slowing down. Yet, the number of suspected infections declines steadily, which indicates the overall suspected infection pool has been shrinking as they have turned into confirmed cases. 
However, some people discovered that the death counts are a little bit on the odd side. I pulled the numbers and did a little research. According to the data, the death rate in Hubei province by far is 2.7%, compared to 0.19% elsewhere in China. That means Hubei has 15 times more death rates than the rest of the country. 
In this case, we draw two assumptions that may lead to such high discrepancy:
Assumption 1: The government has under-reported the actual infected cases.
Disproof: If the statement is true, the actual number of infected would be the death toll divided by 0.19%, which equals to 288,947. The result is contradicted with the R0 (reproduction number of an infection) which is the metric used to measure how contagious a virus is likely to be. Most studies [Maclntyre, 2020] show that the R0 in this outbreak is between 2 to 2.5 which is a little higher than a seasonal flu. As a result, it isn't likely the coronavirus becomes so contagious that it would infect around 300,000 in less than a month.
So what other factors would cause such a high fatality rate?
Assumption 2: People can’t be cured because of shortage in health resources.
This statement makes more sense. The shortage of medical supplies, hospital beds and staffing would lead to more people left with no choice  besides self-quarantine at home. Improper self-medication exacerbates the illness. In addition, coronavirus is fatal to the seniors with pre-existing health issues, let alone lack of proper treatment in time.  
That being said, the coronavirus isn’t lethargic as it appears to the general public in the U.S. As there are more healthcare resources available in the United States than most countries in the rest of the world, we really should not get terrified of the disease that is a Pacific Ocean away. Furthermore, the U.S. government has already banned foreign nationals who have traveled in China for the past 14 days from coming in (except immediate relatives of citizens and permanent residents). On the other hand, the undergoing seasonal flu that caused 19 millions illnesses and 10,000 deaths is more worrisome than the new coronavirus.

News Reports Collection

Using the scraping tool, I also collected news reports since the outbreak from dozens of media channels. Just in case you haven’t used a scraping tool, this video may help create advanced scraping tasks.
I scraped articles from the Wall Street Journal, New York Times, and Reuters through the search term “coronavirus” in order to compare the difference among a few news media.
Many news articles put a lot of emphasis on the severity of the outbreak and lighten up other metrics like suspected infections and recovery numbers. Such incomplete narratives lead to a false impression about not only the Chinese government but also the disease itself. As a result, we become paranoid when we see others cough, catch a cold, or even shake hands with colleagues of different races. 
I came across this news article from WSJ written by Walter Mead entitled “China Is the Real Sick Man of Asia.” Regardless of the title being extremely xenophobic, there are dozens of places in his article showed disinformation. He said that “We do not know how dangerous the new coronavirus will be. There are signs that Chinese authorities are still trying to conceal the true scale of the problem.” On the article’s publication date, the WTO has already discovered that the R0 was around 2, and the fatality rate was less than 3 percent which is close to a seasonal flu. In addition, there is no evidence to prove that Chinese government tried to hide anything. In fact, the data I got from the open source database in China government website was consistent with the data from WHO, CDC, ECDE, NHC and DXY. There are factors that could affect the accuracy, however, the miscalculation should be within permissible range not to get questioned by worldwide mainstream media.
Meadalso posted a video entitled “A Communist Coronavirus” which declared the actual cheering sound (Wuhan, Jia You!) from the public in the video as anSOS signal of “a total lockdown in Wuhan, Hubei, China.”  “Communist” is a political party term. Mead used it as an adjective to describe an illness which underlies a sense of possessiveness. Moreover, the Chinese people were shouting “Hanging in there, Wuhan!” in the video. Yet, the narratives made it seem people were desperately yelling for help because of the lockdown. 
This reminds me of a tweet posted on January 31st, when an Asian woman said a patient was joking about shaking her hand because of coronavirus. This was not the only joke spread on Twitter. As thousands of people are undergoing a sensation of fear, the outbreak of coronavirus becomes an entertaining content used to discriminate against certain groups of people.
Just like the outbreak of the HIV panic in the 1980s that led to the criminalization of the LGBTQ people, infectious disease has been used by the general public to justify prejudice. Doesn’t it make people feel furious? 
I would like to quote from Frank Shyong, a columnist from the Los Angeles Times, “Our willingness to understand each other is what protects us from fear and its disastrous consequences.”  Let’s not exaggerate the threat of the disease, nor fuel the racism attitude that already has been too much in this country. It’s our righteousness to learn about facts and not spread any xenophobic comments. 

Source


Published by HackerNoon on 2020/02/09