A Fact-checking Telegram Bot that Busts Fake News

Introduction

Although fake news has been around for decades the eruption of social media has created turmoil with the vast data that it brings with itself. With everyone having free access and no filter on what they post, it is hard to choose what to believe and what not to believe. Checking facts on this news is important to make an informed decision.

NaradaFakeBuster is a telegram bot developed for fact-checking which takes in phrases or article titles and runs them on databases to check if fact check has been done. The NaradaFakeBuster uses the AWS Lambda function as its backend. The lambda function calls an API for verifying the facts, it also logs the queries and user analytics in an elasticsearch index, since the queries are logged into a bot, a python notebook is used to examine the elasticsearch index. AWS has made creating and deploying chatbots easier, with AWS Lambda you can create chatbots in five minutes.

Architecture

VERiCLAIM Architecture

Some similar works which were used as inspiration include VERiCLAIM which is an automated fact-checking system, it uses the state art of NLP and evidence retrieval method.

Claimbuster Framework

Another one is automated live fact-checking called ClaimBuster. The ClaimBuster project was initially developed as an AI model that could automatically detect claims that were worth checking but now has become a full-fledged automated fact-checking system.

Below is a system diagram that depicts the current framework with the claim-spotting component high-lighted in the light-blue outlined box.

NaradaFakeBuster Architecture

The architecture of NaradaFakeBuster in lambda function is a modular type in which the text is parsed to extract the required information. The user inputs the value on the telegram bot which calls the webhook which is an API endpoint for lambda. The text is passed to the lambda function which is further parsed and used to call the google check API, the output obtained from the API is parsed and sent back to the front-end to be displayed on the screen. All input queries and user analytics are stored in an elasticsearch index. Data from elasticsearch index can then be queried to obtain user analytics.

Why Web-hooks and not polling

The API integration should be efficient enough that sharing of data between apps provides great value to the user. In the polling method, we send a request for new events (specifically, Create, Retrieve, and Delete events, which signal changes in data) at a predetermined frequency and wait for the endpoint to respond. If the endpoint doesn’t respond, there are no new events to share. Similar to polling, webhooks provide your application a way of consuming new event data from an endpoint. However, instead of sending repeated requests for new events, you provide the endpoint with a URL, usually within the endpoint UI, which your application monitors. Whenever a new event occurs within the endpoint app, it posts the event data to your specified URL, updating your application in real-time.

Initially, this bot was created with a polling method then later it was changed to a web-hook method as they are more efficient. In polling, the data is always out-of-date for example if the frequency of polling is set as every 6 hrs then the returned event could have happened any time in this 6-hour gap whereas in web-hook the event is posted immediately to the monitored URL, and the app is updated with new data instantly. Zapier found that over 98.5% of polls are wasted. In contrast, webhooks only transfer data when there is new data to send, making them 100% efficient. That means that polling creates, on average, 66x more server load than webhooks. That’s a lot of wasted time, and if you’re paying per API call, a whole lot of wasted money. To save us from all the losses web-hook method has been used NaradaFakeBuster.

Difference Between Polling and Webhook

Setting up web-hook with API Gateways

There’s a feature on Amazon API Gateway called stage variables. Stage variables act like environment variables and can be used to change the behavior of your API Gateway methods for each deployment stage; for example, making it possible to reach a different back end depending on which stage the API is running on. Environment variables are helpful because they allow you to change which of your environments use which third party service environment by changing an API key, endpoint, token, or whatever the service uses to distinguish between environments.

API Gateway can block improper requests without invoking the backing Lambda functions, it can save Lambda invocation costs this way and can also offload request validation from your Lambda function. The metrics can be enabled from the stage setting console and all logging is done in two forms: execution logs and access logs. The metadata captured from each message is stored in the elasticsearch index. With AWS Serverless Platform you can build a Webhook which runs independently and you don’t have to worry to monitor and manage it. For NaradaFakeBuster webhook was created using AWS Lambda and API Gateway.

To set up the API gateway URL as a web-hook for the telegram bot we run the following command on the browser window:

https://api.telegram.org/<bot-token>/setWebHook?url=<api endpoint>

Development Issues

The two issues faced while developing were :

1. A google protobuf error was seen, the lambda function was unable to recognize the google packages

This command was run which resulted in DisUtils error which was unable to locate the prefix

python3 -m pip install --target=./ --user --upgrade --force-reinstall -r requirements.txt

Then again the command was run ignoring the user attribute but that also did not work.

python3 -m pip install --target=./ --upgrade --force-reinstall -r requirements.txt

The building of the setup file took some time. When the setup is in working mode there are two different packages in google, one for the API calls of objects with no native client library, another for those who have a client library. After the setup was completed the following command was given python3 -c "import google.cloud.bigquery; print(google.cloud.bigquery.version)"

The result of the above code can be interpreted as a library has not been installed and there were some package dependencies that were being installed in a different folder, this would be a github issue

When running the following command, it gives us the location of lib directories.

python3 -m pip show

Next, we tried this command, this also didn’t work because protobuf is only installed within the google folder, however, aberrant behavior was noticed.

python3 -m pip install --user --target=./ --upgrade --force-reinstall -r requirements.txt

Finally, the following command installed all required packages i.e the website now had

google > cloud > bigquery packages on the site. However, with every upgrade, these were removed.

python3 -m pip install --user --t . google-cloud-bigquery

It was after this the original error came up i.e. google.protobuf was not installed. According to analysis, the package was not found within the google folder due to the use of --target attribute. Using the following finally solved the issue. After this installed google-cloud-query without the --user attribute in the current directory the installed protobuf by changing directory to google directory python3 -m pip install --t . protobuf

The folder structure is as follows:

Moved the protobuf folder one-up. So the final structure is as follows

…….and this worked!!!

2. A connection pool error was seen using the cloud watch logs and accessing the elastic search index had also caused an error, this error was due to AWS credits expiry on a third-party account. To resolve this it was recreated on a personal account and the lambda function is pointed to the new ES endpoint domain using the configuration tab. The configuration is kept as a variable so that URL changes do not affect the lambda code. For a better understanding watch, the following video: NaradaFakeBuster Walkthrough.

Code Walkthrough

Lambda_handler is the main function, it invokes checkCommand function, prints the messages, and checks for the bot commands or the message is coming from a group.

Using the import function following are imported :

1. Operating system (os): functions for interacting with the operating system.

2. Regular expression (re): specifies a set of strings that matches it.

3. Googleapiclient, discovery: Construct a Resource for interacting with an API.

4. ElasticSearch - helpers from elasticsearch library.

5. Datetime module supplies classes to work with date and time.

6. The requests module allows you to send HTTP requests using Python. The HTTP request returns a Response Object with all the response data (content, encoding, status, etc).

7. Logging is a means of tracking events that happen when some software runs. Using the logging.getlogger we set the logging, then read sensitive variables like telegram token which are stored as an environment variable, this prevents them from being inadvertently exposed.

8. An API token is an alphanumeric code, unique to your account, which can be used from any system to validate your API calls. That is set using os.getenv.

9. Build is used to make the fact check API service as a global variable.

The checkCommand function includes all the messages that are included as we press START on the bot. When “/start” is entered all if statements are executed that is like an introductory message. In case you type in “/help” all the text from elif will be shown below. And if “/factcheck” along with a string is passed it will call the factcheck function.

The above snaps are of the bot user interface, as soon as we press the START button a welcome message comes as a reply from the bot. This message is invoked by the checkCommand function’s if statement. If we pass the \help then one of the elif statements is invoked and the bot sends us a guideline on how to use the bot. Further, when you pass the /factcheck message it would invoke the factCheck function which is described further.

In this function the factcheck which was input by the user is saved with their chat id, text message, timestamp and updated on the elasticsearch index with a claimcheck text. Basically, saveFactCheck helps us in getting useful metadata information from the users.

The factcheck function takes in text written by the user for checking the fact. It saves the message and runs the google API only if it is a non-empty string for this it calls the saveFactCheck function, it runs its try block where it searches its databases. In case there is an error while searching it sends a response to the user. If the bot is unable to find the fact in its database it would ask you to try with another string. If the bot finds claims to your fact search it shows all URLs. The createKb function extracts part of the URL (fact-check URL) and adds it to a list. This will be used to display a list of buttons for the user to select

Conclusion

Fact-checking claims with reliable information from credible sources is perhaps the best way to fight the spread of misinformation. By double-checking a claim you see on social media or in an online article, you can verify whether or not it’s true. The NaradaFakeBuster is a smart approach in the fact-checking world that uses top-grade technologies, from creating a bot using AWS Lambda, to coupling it with web-hook makes this one of its kind.

For NaradaFakeBuster a bot was created keeping many advantages in mind like availability to users being 24/7, cost optimization. Working with the latest technology and the zeal to go serverless AWS Lambda is used in this bot. The bot got all benefits of being serverless from having no servers to manage, to cost optimization and having a consistent performance at all scales. We are working forward to make this technology global by using it as a back-end for other social media bots like Twitter, Reddit, Linkedin, etc.