The Tourist’s Guide to Deep Learning

You don’t need a PhD to understand what it is and what it can do

Artificial Intelligence will soon be eating the world, at least if the tech industry is to be believed. Powered by Deep Learning, a technology that seemed to fall out of the sky a few years ago, machines will be able to drive you to work, regularly humiliate you at video games, and detect cancer before it’s too late. If you’ve read a few articles mentioning deep learning you probably know that this breakthrough was modeled after neurons, unlocking mystical properties of the brain. But what does that even mean? Hasn’t something called Machine Learning been around for a while, so what’s changed? Should we be worried about SkyNet?

Last year I often found myself nodding along blankly whenever deep learning came up in conversations, so I decided to spend the last couple of months attempting to verse myself on the topic via online courses (see the end of this post for recs) and hacking on fun projects.

Teaching computers Impressionism - I made a neural art Android app and you can look at the source code on Github

Here we’ll go over a few questions that you might have wanted to ask but were afraid to appear woefully behind the times.

What is Machine Learning?

Machine learning is any technique that enables a computer to learn a behavior without having to explicitly declare “if this, then do that” and deep learning is one of those techniques. Typical machine learning algorithms take in lots of data to make predictions — everything from picking a movie you may be interested in, determining whether a photo contains a cat, or what angle to turn a steering wheel. These algorithms usually rely heavily on statistics and linear algebra.

Some everyday examples of machine learning at work

If you’ve ever used Excel to plot a line-of-best-fit, congratulations! You’ve done machine learning (maybe hold off on putting that in your resume though). For example, let’s say you want to predict Software Engineering salaries based on years of experience. You might collect a lot of data (your training set), plot it, and click the “trendline” option in Excel which runs a simple linear regression (your learning algorithm). Using your slick new model you can make inferences on inputs that were not present in the original dataset by applying the trendline’s formula: salary = years * a+ b. You didn’t know the values of a and b beforehand — these two parameters were learned! Machine learning isn’t magic, it’s just a lot of math.

Collect data, train a model, make predictions!

How is Machine Learning different from Regular Programming?

Programming as most of us know it is fairly deterministic — a long series of instructions following the unforgiving logic of a silicon chip. When you buy a concert ticket, the website looks for available seats and shows them to you. When you press the A button, Mario jumps.

If Netflix were to implement a recommendation system without using machine learning, someone would have to go through thousands of movies and shows to come up with a list of rules like “if you’ve watched Iron Man, suggest X-Men: First Class” or “if you’ve watched a movie starring Emily Blunt, recommend more Emily Blunt movies.” Then some poor developer would tediously code it all up so when you reach their homepage, the system can step through those rules to create a list of recommendations. If you exclusively watch Korean dramas and the rules don’t cover it, you’d see irrelevant suggestions until Netflix hires a K-Drama expert.

A machine learning approach would be to predict what you want to see by learning from other users’ binge-watching habits, similar to how we trained our salary predictor on existing salary data. Whereas our salary equation only has two parameters a and b, Netflix learns potentially millions of parameters for formulas that “score” each piece of content for you. If you’ve just watched Edge of Tomorrow and lots of users who’ve watched Edge of Tomorrow also watch Looper, you might see Looper show up on your Netflix homepage the next day. The system can even learn patterns from the data that its creators never even thought of!

A downside to the machine learned approach is that it can be extremely difficult to explain why something happened. Logic in source code written by humans can be inspected and changed based on intuition, but with machine learning there are only dizzying arrays of learned parameters with no human-discernible meaning.

Try explaining to your boss that this was why we recommended Rambo 4 to users who watched Finding Dory

This means the system is essentially a black box, so researchers have to come up with novel techniques to probe what it’s doing. Google’s trippy Deep Dream was the result of such an effort.

What is Deep Learning?

Deep Learning is a machine learning technique that comes from an old concept that first appeared in the 1950’s. It even had a kitschy 1950’s style name — the Perceptron. The Perceptron is based on the idea of an artificial neuron.

A neuron has many inputs called dendrites that receive electrochemical signals from other cells. If the combined strength of its inputs exceed some threshold, then an output signal is generated, surging through the axon to other neurons via output terminals. This signal continues through a web of neurons in your brain until you do something interesting like recognize a friend’s face or eat a potato chip.

A cartoon neuron. The signal won’t get generated unless the strength of the inputs is enough to excite it

An artificial neuron has several inputs and, just like a real neuron, if the weighted sum of those inputs is greater than some threshold it is “activated” and an output signal is generated — otherwise the output is suppressed (the function that determines this is called the ‘activation function’).

A look inside an artificial neuron with three inputs. Kinda looks like a real neuron if you squint

One neuron is not that interesting, but when you arrange them into layers they become quite powerful. The original Perceptron consisted of one layer of these artificial neurons and it made simple “yes/no” predictions. If the sum of all the outputs is positive, then it predicts a “yes” otherwise “no.”

A layer of three artificial neurons with four inputs.

Scientists figured if one layer of neurons was good, then stacking many layers on top of each other should be even better. Remarkably, neurons in these networks learn to become specialized and get activated when they detect specific features. For example in an image classifier (a network that tells you if an image contains a cat, a broom, a truck, etc), neurons might activate when they detect things like zig-zag patterns, triangles, or even a face. Deep networks even have a fascinating property where neurons learn to recognize increasingly abstract concepts the deeper you go. Each layer can be represented by a simple matrix multiplication — the same one you learned in high school — so again, underneath it all is just more math.

Each layer identifies increasingly abstract concepts built on previous layers, in this case resulting in existential crisis

But Why Now?

If neural nets were invented in the 50’s why is deep learning en vogue now, 60 years later? Since its inception, the field saw a series of decades-long booms and busts — a breakthrough would kick off a wave of excitement only to hit another roadblock. By 2010 deep learning was seen as a dead end at worst and an interesting curiosity at best, easily surpassed by other machine learning techniques.

This recent hype cycle was brought about by several factors:

Researchers finally figured out how to train very deep networks. While it was assumed that “many layers == better” for a while, networks with over a handful of layers stubbornly refused to be trained. Other machine learning techniques looked more promising at handling complex problems, so neural network research all but died off except for a handful of universities in Canada. Their work on deep networks have since allowed very deep networks to finally realize their potential.
Large labelled datasets were created. Large networks need lots of data to train to become effective. In the last few years, more and more datasets became open to the public — ImageNet is one of the biggest, with over a million images and over 1,000 object categories (amusingly about 120 of which are breeds of dogs). Datasets for speech, video, human poses, and many others have also been published by universities and companies alike while the proliferation of smart phones (along with their cameras + sensors) provide incomprehensibly large datasets for the tech giants.
Easy to use frameworks. Mobile frameworks like Android and iOS handle common actions like touch, scrolling, and animations so developers can focus on creating cool apps. Analogously, deep learning frameworks like Tensorflow, Torch, and Caffe take care of the mundane nuts and bolts, freeing up researchers to spend more time on interesting problems and less time reinventing the wheel.
Cheaper, faster processing power. Large networks can take an intimidating amount of computing power to crunch their numbers. Just like how GPUs accelerate video games and scientific work like proteins folding, deep learning can gain similar boosts. Most deep learning frameworks now utilize GPUs and some companies are going even further and using programmable chips or creating special hardware whose sole purpose is to train neural nets. As computing power continues to get cheaper and smaller, networks that currently require supercomputers may soon fit in your HoloLens, smartwatch, AirPods, or any other computer we’ve been convinced to wear in the future.

This chart takes a somewhat liberal interpretation of Moore’s Law, but the point is still made

5. High profile successes. All of these bullet points paved the way for some very impressive achievements. In 2012 a team from the University of Toronto stunned an image recognition competition by cutting the previous year’s error rate nearly in half by using deep learning (see next chart). In 2016 AlphaGo defeated a professional Go player in a high profile live-streamed match. Deep learning can now spot skin cancer as well as trained dermatologists. These are milestones that were thought to be a decade out, yet here we are.

The best image classifiers got 27% of their guesses wrong in 2011, deep learning cut that down to 3.5% by 2016

What Can Deep Learning Really Do?

Robots can already beat humans handily at arm wrestling

Since deep learning is most powerful with tons of training data, its major accomplishments have centered around domains where labelled data is abundant (e.g. images, sequences like voice or translations) or where there are clearly defined rewards (e.g. winning vs losing a video game). Unlike “general intelligences” like C-3PO or Iron Man’s Jarvis, the neural networks designed to power these systems are still fairly specialized and require mountains of human-labelled training data. You can’t use a network trained for detecting pictures of puppies for anything other than detecting puppies, and AlphaGo can’t take over our military just yet (I may be more worried once it learns how to play StarCraft).

What makes neural networks so exciting is that while you can’t reapply a network wholesale, you can still take big pieces, rearrange them, and train on new data to effectively tackle different tasks. This means it’s easier for anyone to build on past successes; if I took a network that was trained for puppies and retrained it on tens of thousands of cat photos, I might have a network that can detect cats. Rearranging these neural Lego blocks results in creative applications from generating music to writing Trump tweets. Current research is looking into ideas like how to train neural networks with less data, adding the concept of memory, and even using machine learning to learn how to learn!

How Do I Learn More?

If you have some machine learning background, I highly recommend going through Stanford’s CS231N as it dives into the theoretical foundations of Deep Learning; the instructors are also fantastic at explaining difficult concepts. Udacity’s Deep Learning course is a great overview with no ML background required and focuses more on practical applications and using Tensorflow. If you don’t have the time for either, check out this TED talk by Dr. Fei Fei Li.

It is an exciting time for AI — while it might not live up to all the wild promises that have been made, it has already rendered this XKCD comic obsolete which I think bodes well for the future.

If you want to try to create your own neural networks, my previous post compared developing on a Macbook Pro vs popular cloud services.