Fake News and the Detection Methods from Psychology to Machine Learning Part 1

Part 1

Bio of the Author

Parsa Yousefi is a Machine Learning Engineer at Filestack, where his main role is building and implementing deep learning microservices, APIs, and software as they relate to technology such as optical character recognition and other machine learning services. He is a PhD Candidate currently pursuing his degree in Electrical and Computer Engineering at the University of Texas at San Antonio. His main fields of interest are Computer Vision and Natural Language Processing.

Introduction

Social media has affected society during the last decade by providing free milieus for everyone to share their thoughts, ideas, and also news. As a negative effect, these environments have been used for propagation of low-quality, content-bare, and even outright “Fake” news. Spreading of Fake news has extreme effects on people’s minds and societies, such as decreasing one’s trust to all sources of news, making readers defensive against most news channels, etc. This is why recently the detection of Fake news has become one of the top trends in the field of research. Referring to one of the latest works on this area, detecting Fake news in social media has exclusive features which cannot be found in traditional methods and approaches for reality detection. In this blog post, we will have a comprehensive analysis of Fake news regarding its characteristics and detection scenarios. In the first section, we will examine characteristics such as definitions, basic concepts, and its features; in the second part, the detection approaches with their corresponding feature extraction and modeling procedure are going to be discussed. Figure 1 shows the general map of Fake news analysis in this blog post.

Figure 1: General map of Fake news analysis

Characteristics of Fake News

This section includes characteristics of Fake news on both traditional-based news media and modern social media. To make this blog clear for everyone, let us define Fake news:

Fake News, also known as “yellow journalism”, includes gathered articles which are deliberately changed and/or are conveying wrong information to the recipients for a specific purpose.

If we want to divide the history of news, there would be two periods: prior to social media (which is referred to “Traditional Media”), and post social media. First, we will discuss the characteristics and features of Fake news on both periods, and after that we will discuss detection methods.

Characteristics of Fake News in Traditional News Media

Before the advent of social media, the characteristics of Fake news were focusing on Psychological and Social aspects. When a person hears news, he or she may not be able to distinguish whether it is real or not. Intrinsic to all humans, one believes that their knowledge of realistic facts are the most precise ones, and they tend to have bias on it. Therefore, this can make them to be target recipients of yellow journalists who want to place that first-experience knowledge before other opinions are formed. Psychology studies have proven that correcting mis-perceived information takes more time than transferring real data from scratch to the people, and in most cases that makes people enter into defensive modes in their perception due to their biases which they may not know from whence they came.

Another feature of Fake news in traditional media is social acceptance. One of the leverages of yellow journalists is the effect of repetition in society. When a person hears a fact from different sources such as media channels or even different people, this can make them believe what they have already heard is real. This fact can act like a positive feedback loop in people’s minds to accept all of the incoming information.

Characteristics of Fake News in Social Media

In this section we will discuss the effect of social media on Fake news and its new characteristics and features. The most important tools in spreading Fake news in social media can be “Malicious Accounts” and the “Echo Chamber Effect”. In the first years of social networks, the creators of these platforms could not imagine that their products can be used as one of the most powerful tools for yellow journalists in spreading their Fake information. Considering the features of Fake news learned in traditional news media, they realized that by creating multiple accounts in social networks they could not only boost the proliferation procedure around the world, but also it can be advantageous for them to use its psychological and social effects in sticking in people’s minds.

The second characteristic of social media in playing a main role for yellow journalists in multiplication of Fake information between people is the “Echo Chamber Effect”. This effect is one of the outputs from the specific features in social networks such as following famous people or celebrities. As a social phenomenon, most of the famous people and celebrities have accounts in almost all of the social networks. As a consequence of famous people entering into these environments, there would be millions of people following them and, as an effect of this following procedure, people would be notified about their recent updates and what they publish. Also, as a natural chain sequence, the famous people (also known as “influencers”) are following each other. Therefore, this pyramid-based sequence of following and followers plays an inevitable role in spreading Fake news between influencers and their followers who are typical people susceptible to the same psychological factors as anyone else.

Detecting Fake News

In the first part of this blog post, we have discussed the characteristics and properties of Fake News in both traditional media and modern social media. Now it is time to define the problem function of Fake. A news article can be mapped with a binary string which represents whether there is any Fake part in it or not. For example, let’s assume the following news:

“The United States of America has 51 states.”

This is obvious that this information is wrong. So how we can realize the truth of this information?

There is a function for this sentence which can map it to a sequence of binary numbers 0, 1. For each part of the news, we can define both binary rate and probability function of Fake which can be shown as below:

Considering the aforementioned definitions, the sentence can be mapped as the following sequence:

This sentence has two different sentiments (semantic parts), one of which is true and the other one is wrong which means that the probability Fake news in the provided news is 50%.

These mathematical mapping functions are being used in training Machine Learning in Deep Neural Networks for finding the rates of truth versus Fake.

Feature Extraction

The features of any news piece are generally included in the categories of “Linguistic-based” such as the source of news, headline, and body texts, or “Visual-based” such as images and videos which are the main subjects of this blog post for detection. Based on Figure 2 and Figure 3, each color image is a matrix of pixels each of which represents the intensity of the provided color (values from 0–255).

Figure 2: A basic representation of color images

Figure 3: Intensity-based mapping of images for each channel

The features in images and video frames are the main contents which can show whether there is a Fake part in it or not. In Computer-vision-based Machine Learning algorithms the features of images are the results of implementation of convolutional and pooling layers. Figure 4 shows examples of different feature maps belonging two image samples using convolutional layers.

Figure 4: Feature maps of two different images using convolutional layers

In Part II, in this category of detecting Fake information within images and video frames, we will concentrate on feature maps to see whether or not there are any artificial layers.

Modeling and Evaluation

In modeling Fake detection problems, Supervised Learning (i.e. Classification) or Unsupervised Learning (i.e. Clustering), use the following definitions based on Equation (1):

True Positive (TP): represents the fact that the news sentiment was Fake, and it was predicted Fake
True Negative (TN): represents the fact that the news sentiment was real, and it was predicted non-Fake
Fake Positive (FP): represents the fact that the news sentiment was real, and it was predicted Fake
Fake Negative (FN): represents the fact that the news sentiment was Fake, and it was predicted real

Following the above-mentioned definitions, the main validation values for evaluating ML models are as follows:

Conclusion

In this blog post we provided a summary of Fake news, how it is created, and based upon which characteristics of human psychology it is made. First, we provided the properties of Fake news in traditional news media, and after that the novelty of social media and its effects on Fake news. Finally, we directed the focus to the study of Fake news in images and videos, and how they can be automatically evaluated.

Part 2

In the second part of this two-part blog series, the main focus would be on one of the most top trends in Deep Learning and Fake News, “DeepFake”, which is a comprehensive discussion and study of Fake news in images and videos using Machine Learning approaches and Deep Neural Networks. Two main solutions will be proposed, the first of which is using “Generative Adversarial Networks” for generating and detecting fake images, and the second one shows “DeConvolution Neural Networks” and their effects in regenerating the image and detecting the fake layers inside.