A Difference Between DL and Statistics

One thing that I love about being in grad school is the unending innovation that reverberates in the corridors. Sure, I sit in a cubicle part-time coding my life away, but there are moments where people step out of their hole to converse with those around them.

One of these structured ways is through the weekly update meetings, and one of the many conversations we have inspired this post.

Let’s start off with statistics.

One of statistics’ main focus is to create a generalizable model, such as a linear or multivariate regression model, to best fit the data to represent the pattern you are investigating.

There are other key topics in statistics like statistical significance, correlation vs. causation, probability theory, and model evaluation that are shared with the machine learning community. However, we like abstraction, so let’s stick with regression for now.

Linear regression is powerful and easy to understand. However, the errors that are associated with each data point get accumulated such that the inherent nature of the point is lost with the goal of minimizing the global loss/error. This is a clear trade-off.

Now, don’t get me wrong, deep learning systems still try to minimize the loss throughout the system, and there are many clever ways to complete these optimizations.

However, one thing that differentiates the deep learning approach versus the approach above (though there are many) is now we can perturb the model to understand what drove the algorithm to influence its decision.

We can see the importance of features on the individual level through point by point investigation, rather than relying on statistics to generalize on the global level.

We can now peer into the complex black box, and people are definitely starting to.

Photo by Dhruv Deshmukh on Unsplash

There are now more methods coming out to dissect the activations or neurons that drove the model’s decision. These are visually useful in imaging applications because we can actually see what parts helped the model.

Further, these methods can be used in a better understanding of the model’s performance — i.e. where the model predicted correctly, where the model messed up, where the model was learning from noise. Or maybe most significantly, where the model can drive new discoveries by uncovering important features not previously thought of as important.

Maybe deep learning can be a tool for further understanding a problem like neurodegeneration or time-sensitive areas of concern in addition to a predictive tool.

Activation exploration can lead to the next frontier or scientific discovery.

Thanks for reading.