How to Protect Chatbots from Machine Learning Attacks

Artificial Intelligence is a growing industry powered by advancements from large tech companies, new startups, and university research teams alike. While AI technology is advancing at a good pace, the regulations and failsafes around machine learning security are an entirely different story.

Failure to protect your ML models from cyber attacks such as data poisoning can be extremely costly. Chatbot vulnerabilities can even result in the theft of private user data. In this article, we’ll look at the importance of machine learning cyber security. Furthermore, we’ll explain how Scanta, an ML security company, protects Chatbots through their Virtual Assistant Shield.

Why is Machine Learning Security Important?

Protecting machine learning models against cyber attacks is similar to making sure your vehicle has passed safety inspections. Just because a car can move doesn’t mean it’s safe to drive on public roads. Failure to protect your machine learning models can lead to data breaches, hyperparameter theft, or worse.

A great example is how one of Tesla’s autonomous vehicles was hacked by McAfee technicians. An earlier model of Tesla’s road sign detection system left it vulnerable to very simple attacks. Technicians were able to trick the Tesla vehicle into misreading a 35 MPH sign just by adding a few inches of black tape. This caused the vehicle to interpret it as an 85 MPH sign. As a result, the car accelerated past 35 MPH until the McAfee tester applied the brakes.

Vulnerabilities in autonomous vehicles could lead to fatal accidents. For chatbots and virtual assistants, these machine learning hacks could lead to large breaches of private customer data, phishing attacks, and costly lawsuits for your company, which is exactly what happened to Delta Airlines.

In 2019, the company sued their chatbot developer for a passenger data breach that occurred in 2017. Hackers gained access to Delta’s chatbot system and modified the source code. This allowed them to scrape data entered by users. The fallout was costly for Delta, resulting in millions of dollars investigating the breach and protecting customers that were affected.

Machine Learning Security Vulnerabilities in Chatbots

Chatbots are particularly vulnerable to machine learning attacks due to their constant user interactions, which are often completely unsupervised. We spoke to Scanta to get an understanding of the most common cyber attacks that chatbots face.

Scanta CTO Anil Kaushik tells us that one of the most common attacks they see are data poisoning attacks through adversarial inputs.

What is Data Poisoning?

Data poisoning is a machine learning attack in which hackers contaminate the training data of a machine learning model.

They do this by injecting adversarial inputs, which are purposefully altered data samples meant to trick the system into producing false outputs.

Systems that are continuously trained on user-inputted data, like customer service chatbots, are especially vulnerable to these kinds of attacks. Most modern chatbots operate autonomously and answer customer inquiries without human intervention. Often, the conversations between chatbot and user are never monitored unless the query is escalated to a human staff member. This lack of supervision makes chatbots a prime target for hackers to exploit.

To help companies protect their chatbots and virtual assistants, Scanta is continuously improving their ML security system, VA Shield (Virtual Assistant Shield).

How To Protect Chatbots and Virtual Assistants

Founded in 2016 by Chaitanya Hiremath, Scanta is a tech company that provides machine learning security services for chatbots and virtual assistants.

Scanta’s VA Shield is a machine learning security system that protects chatbots at the model, dataset, and conversational levels. “VA Shield uses ML to protect against ML attacks. We do behavior analysis for each user and flag any anomalous behavior,” says CTO Anil Kaushik. “Behavior analysis is done for the end user as well as the chatbot. All input, output, and input-output combined entities are analyzed to detect any malicious activities.”

On the conversational level, Scanta evaluates the chatbot’s output to both block malicious exploits and capture business insights. “Contextual analysis is a simple concept where response from the chatbot is viewed in context to the request,” says Kaushik. “Also, the next request in a conversation is seen in context to the previous request. To do these analyses, we use historical data. For example, we look at the user’s historical request characteristics and responses from the chatbot, as well as the chatbot’s response characteristics.”

Why can’t regular IT teams handle these attacks?

When speaking to Scanta CEO Chaitanya Hiremath, I asked him why companies with their own IT teams would bother outsourcing machine learning security services. Couldn’t those IT teams incorporate ML security protocols on their own?

“We’ve spoken to many companies and I was quite surprised to learn that these ML threats are something that most people are unaware of,” says Hiremath. “The reality is many people don’t even know that this is something they have to protect against.”

“Most IT teams and security solutions offer things like network security and web application firewalls. This type of security is different from what Scanta provides. What we are talking about and introducing is on a different level. It goes far beyond removing bias from training data.”

In the Delta Airlines example mentioned previously, someone hacked their chatbot and modified the source code. This hack gave them access to private customer data. “This is because no one was monitoring what was going into the chatbot and what was coming out,” says Hiremath.

“This is the result of the way machine learning technologies are built these days. However, it is imperative to have a mechanism that can interpret if there is any malicious intent. We call this system a zero trust framework. You have to make sure that all aspects are protected. This is just as important as protecting your database or network.”

Our daily lives and our personal data are becoming more and more intertwined with computer systems. The increasing digitization of modern society makes heightened data security a top priority. Particularly with the data laws brought out by organizations like the GDPR, it is more important than ever that companies protect their private data and their client’s data at every level.

The Future of Machine Learning Security

Heightened security for machine learning models will benefit both the data science community and regular everyday users of AI technology. In the first half of 2020, we saw IBM boycott facial recognition technology due to evidence of inherent racial bias and possible misuse by law enforcement. It is important that more large companies like IBM, Delta, and Tesla take a step back and put security and social impact before development.

Hopefully, more companies like Scanta will emerge in the machine learning field to create safer AI systems for the companies that develop machine learning technologies and the people that use them.

Previously published on: https://lionbridge.ai/articles/the-importance-of-machine-learning-security-for-chatbots-and-virtual-assistants/