How To Predict Election Results using Twitter

Elections play crucial role in all democracies and social media is an important aspect in this process. Presently, political parties increasingly rely on social media platforms like Twitter and Facebook for political communication.The use of social media in political marketing campaigns has grown dramatically over the past few years. It is also expected to become even more critical to future political campaigns, as it creates two-way communication and engagement that stimulates and fosters candidates relationships with their supporters.

If you are interested in deep learning, checkout my other articles related to deep learning for artistic applications AI for artists : Part 1, Part 2, GANs : Art of Creating Fakes

There are different methods proposed in research papers for predicting elections such as volumetric analysis, sentiment and social network analysis. Volumetric analysis — The number of votes was taken as a dependent variable and number of Facebook connections, along with other factors were taken as independent variables. The results reported the size of the network and the chances of a win are significantly correlated to each other. Hence, the election results were reported to be inferred by looking at the size of social network.

Social network techniques i.e. volumetric analysis, sentiment analysis, has been utilized by authors to evaluate the predictive power of Twitter data for inferring electoral results for three countries, Pakistan, India, and Malaysia. The data preprocessing was performed on approximately 3.4 million Tweets collected using Twitter streaming API. To separate the tweets in English language a natural langue toolkit of python was used. The results reported that the Twitter data was not effective for making election predictions for Malaysia, but in the case of Pakistan and India, it appeared as an effective and efficient for electoral predictions. By combining multiple techniques the proposed model for predicting electoral outcomes was also effective for candidates and parties having small vote count.

In this article we will discuss about how we can use a method called sentiment analysis for predicting elections using data acquired from twitter.

Sentiment Analysis

Sentiment analysis is the process of determining the emotion underlying a bunch of words. It helps to understand the attitudes, opinions and emotions expressed within an online mention. Sentiment analysis has been predominantly used in data science for analysis of customer feedback on products and reviews. They can be used to understand user ratings on different kinds of products, hospitality services like travel, hotel bookings. The Obama administration used sentiment analysis to gauge public opinion to policy announcements and campaign messages ahead of 2012 presidential election.

We will be discussing mechanisms for labeling tweets, and classifying and summarizing them from different viewpoints. We can use Python / R for the process.

Let’s start !

Initially we will start by getting the twitter data by authenticating with twitter API. We need to get access from developer.twitter.com and then create an app. This generates different set of keys which can be used for authentication.

Send request to Twitter API to fetch tweets associated with a particular query. Now we clean the tweet to remove unnecessary characters. Then, as we pass tweet to create a TextBlob object which tokenizes , performs POS( part of speech) tagging and selects only significant features. Finally the text is passed to a sentiment classifier which classifies the tweet sentiment as positive, negative and neutral(-1.0 to 1.0). Data used for TextBlob was trained on Naive Bayes Classifier. Now we can classify each tweet based on its polarity value into positive, negative and neutral.

After authentication, just run the python file and give search for a particular keywords related to what we need. For this example, let’s take Narendra Modi as the search term.

To understand what is happening behind the scenes, we need to know about the process which consists of following steps -

1. Send request to Twitter API to fetch tweets associated with a particular query.

2. Now we clean the tweet to remove unnecessary characters.

3. Then, as we pass tweet to create a TextBlob object which tokenizes , performs POS( part of speech) tagging and selects only significant features. Finally the text is passed to a sentiment classifier which classifies the tweet sentiment as positive, negative and neutral(-1.0 to 1.0). Data used for TextBlob was trained on Naive Bayes Classifier.

4. Now we can classify each tweet based on its polarity value into positive, negative and neutral. Here we will be taking only the positive and negative comments.

             Wordcloud for Positive sentiment

             Wordcloud for Negative sentiment

Here we can see that for Narendra Modi , there is more positive sentiment when analyzing the tweets.

Now let us consider Rahul Gandhi,

             Wordcloud for Positive sentiment

                    Negative sentiment

We can see more percentage of the negative sentiment when compared to Narendra Modi.

In this article, I have mentioned a simple method for analyzing the public sentiment of people associated with certain party on twitter. There are some challenges like language barrier, misclassification, data imbalance and data reliability which need to considered while making predictions.

Even though, we can’t exactly predict who will win the election based only on twitter data, the politicians can get an idea regarding the most important problems that need to be addressed.

If you are interested in learning more about social media analytics, just checkout this book (Really useful if you are starting from scratch ! )

“ Learning Social Media Analytics with R “ — https://amzn.to/2Ljfjlh

References

Predicting elections : Social media data and techniques

For the code in python, checkout here