Challenges in Deep Learning

A neural network architecture. Credits

Deep Learning has become one of the primary research areas in developing intelligent machines. Most of the well-known applications (such as Speech Recognition, Image Processing and NLP) of AI are driven by Deep Learning. Deep Learning algorithms mimic human brains using artificial neural networks and progressively learn to accurately solve a given problem. But there are significant challenges in Deep Learning systems which we have to look out for.

In the words of Andrew Ng, one of the most prominent names in Deep Learning:

“I believe Deep Learning is our best shot at progress towards real AI.”

If you look around, you might realize the power of the above statement by Andrew. From Siris and Cortanas to Google Photos, from Grammarly to Spotify’s music recommendations are all powered by Deep Learning. These are just a few examples of how deep in our life Deep Learning has come.

But, with great technological advances comes complex difficulties and hurdles. In this post, we shall discuss prominent challenges in Deep Learning.

Challenges in Deep Learning

Lots and lots of data

Deep learning algorithms are trained to learn progressively using data. Large data sets are needed to make sure that the machine delivers desired results. As human brain needs a lot of experiences to learn and deduce information, the analogous artificial neural network requires copious amount of data. The more powerful abstraction you want, the more parameters need to be tuned and more parameters require more data.

For example, a speech recognition program would require data from multiple dialects, demographics and time scales. Researchers feed terabytes of data for the algorithm to learn a single language. This is a time-consuming process and requires tremendous data processing capabilities. To some extent, the scope of solving a problem through Deep Learning is subjected to availability of huge corpus of data it would train on.

The complexity of a neural network can be expressed through the number of parameters. In the case of deep neural networks, this number can be in the range of millions, tens of millions and in some cases even hundreds of millions. Let’s call this number P. Since you want to be sure of the model’s ability to generalize, a good rule of a thumb for the number of data points is at least P*P.

Overfitting in neural networks

At times, the there is a sharp difference in error occurred in training data set and the error encountered in a new unseen data set. It occurs in complex models, such as having too many parameters relative to the number of observations. The efficacy of a model is judged by its ability to perform well on an unseen data set and not by its performance on the training data fed to it.

Training error in blue, Validation error in red (Overfitting) as a function of the number of cycles. Credits: Wikipedia

In general, a model is typically trained by maximizing its performance on a particular training data set. The model thus memorizes the training examples but does not learn to generalize to new situations and data set.

Hyperparameter Optimization

Hyperparameters are the parameters whose value is defined prior to the commencement of the learning process. Changing the value of such parameters by a small amount can invoke a large change in the performance of your model.

Relying on the default parameters and not performing Hyperparameter Optimization can have a significant impact on the model performance. Also, having too few hyperparameters and hand tuning them rather than optimizing through proven methods is also a performance driving aspect.

Requires high-performance hardware

Training a data set for a Deep Learning solution requires a lot of data. To perform a task to solve real world problems, the machine needs to be equipped with adequate processing power. To ensure better efficiency and less time consumption, data scientists switch to multi-core high performing GPUs and similar processing units. These processing units are costly and consume a lot of power.

Facebook’s Oregon Data Center. Credits: MIT Technology Review

Industry level Deep Learning systems require high-end data centers while smart devices such as drones, robots other mobile devices require small but efficient processing units. Deploying Deep Learning solution to the real world thus becomes a costly and power consuming affair.

Neural networks are essentially a Blackbox

We know our model parameters, we feed known data to the neural networks and how they are put together. But we usually do not understand how they arrive at a particular solution. Neural networks are essentially Balckboxes and researchers have a hard time understanding how they deduce conclusions.

The Neural Network Blackbox. Credits: University of Florida

The lack of ability of neural networks for reason on an abstract level makes it difficult to implement high-level cognitive functions. Also, their operation is largely invisible to humans, rendering them unsuitable for domains in which verification of process is important.

However, Murray Shanahan, Professor of Cognitive Robotics at Imperial College London, has presented a paper with his team which discusses Deep Symbolic Reinforcement Learning, which showcases advancements in solving aforementioned hurdles.

Lack of Flexibility and Multitasking

Deep Learning models, once trained, can deliver tremendously efficient and accurate solution to a specific problem. However, in the current landscape, the neural network architectures are highly specialized to specific domains of application.

Google DeepMind’s Research Scientist Raia Hadsell summed it up:

“There is no neural network in the world, and no method right now that can be trained to identify objects and images, play Space Invaders, and listen to music.”

Most of our systems work on this theme, they are incredibly good at solving one problem. Even solving a very similar problem requires retraining and reassessment. Researchers are working hard in developing Deep Learning models which can multitask without the need of reworking on the whole architecture.

Although, there are small advancements in this aspect using Progressive Neural Networks. Also, there is significant progress towards Multi Task Learning(MTL). Researchers from Google Brain Team and University of Toronto presented a paper on MultiModel, a neural network architecture that draws from the success of vision, language and audio networks to simultaneously solve a number of problems spanning multiple domains, including image recognition, translation and speech recognition.

Deep Learning may be one the primary research verticals for Artificial Intelligence, but it certainly is not flawless. While exploring new and less explored territories of cognitive technology, it is very natural to come across certain hurdles and difficulties. As is the case with any technological progress. The future beholds the answer for the question “Is Deep Learning our best solution towards real AI?” And a part of an applied AI research group, I am certainly are all ears to it.

I work for Paralleldots, an applied AI research group. We develop AI powered solutions for real world problems. The article was originally published here.

If you liked reading the article, please clap it, share it or comment below for further discussions.