The Programming Language For Machine Learning Projects

Written by Iustin_G | Published 2020/05/01
Tech Story Tags: machine-learning | artificial-intelligence | python | programming | coding | programming-languages | ml | machine-learning-benefits

TLDR Python is the de facto programming language used is machine learning. This is owed to it’s simplicity and readability. People use mostly the same flagship modules (keras, scikit-learn, numpy) unlike in other languages such as Javascript which have a multitude of libraries and patterns, or like Java which has many data structures. Python is modular and uses external libraries for diverse tasks and concerns. The language has been widely used in the software space throughout the years. Python 3.x is the latest version, with all its respective new features, but Python 2.x remained widely used even today.via the TL;DR App

…and why Python is the de facto in ML

Python is the de facto programming language used is machine learning. This is owed to it’s simplicity and readability, which allows users to focus on the algorithms and results, rather than wasting time on structuring code efficiently and keeping it manageable.

Simplicity

Python is also consistent across projects. People use mostly the same flagship modules (keras, scikit-learn, numpy) unlike in other languages such as Javascript which have a multitude of libraries and patterns, or like Java which has many data structures. These ample possibilities and oprions would require a lot of design decisions from the programmer, and would inevitably entail technical debt.
The simplicity of python, not only offers a standard way of doing things that make the work easier for the author of the application, but it makes it easier to scrutinise and improve for other people who would review the code.

Modules

Python is modular. This means that it uses external libraries for diverse tasks and concerns. Even though I mentioned earlier that python doesn’t have a plethora of modules that do the same thing, it does have a number of modules which cover pretty much any use case or need that the developer might encounter. 
From off-the-shelf algorithms to graphical visualisation tools. They are available and easy to import in any project.
A brief list of them would be:
  • KerasTensorFlow, and Scikit-learn for frequent machine learning tasks
  • NumPy for high-performance scientific computing and data analysis
  • SciPy for advanced computing
  • Pandas for general-purpose data analysis
  • Seaborn for data visualization

Docstrings

Modules (and functions for that matter), have a feature called Docstrings. They are basically a short description that the developer can attach to functions and modules that describes what it’s supposed to do and instructions on how to use it.
This quirk eliminates the need to shift to the documentation pages and browse to each function that you think you might need. I saves a lot of time and, I think, it makes developers keep their functions and modular small and focused on a single-purpose.
This makes the module/function more reusable and makes the work easier in the long-run.

Short history

Python was created by Guido van Rossum in 1991 and developed by Python Software Foundation. It was designed as a widely general-purpose, high-level programming language.
Development on the programming language started in 1980 and it was intended to be a successor to the ABC programming language, which interfaced with the Amoeba Operating System and had the feature of exception handling. Nowadays, exception handling is common to most (if not all) programming languages.
Through time, there have been 2 versions of Python which became popular and still compete to this day on number of people using them. Pyhton 2.x and Python 3.x.
It is quite unusual for a programming language to have 2 flagship versions. Most software is recommended to be used on its latest stable version. Python 3.x is the latest version, with all its respective new features, but because of Python having been widely used in the software space throughout the years, Python 2.x remained widely used even today because of on factor: compatibility.
I guess this is the drawback of having implemented a programming language that changed and evolved so much that, in a lot of use cases, is not backwards compatible. For how much contribution this programming language set forward, it is a worthy trade-off.
Below we have a graph which depicts Interest over Time for Python (the blue line), spanning from 2004 until today:
Interest over Time (Google Trends)
Python has come a long way to become the most popular coding language in the world. In today’s much more diverse field of programming (with more specialisations than before) Python has established a reputation for itself based on it usefulness.

Virtual Environments

Passing through the era of virtual machines, containers and continuous integration, Python has also made use of virtual environments. This tool makes programming easier by removing common bugs which aren’t even related to machine learning.
Tools such as Venv and Conda were put to use to make sure that any computer has a properly configure environment for developing, debugging and running python apps. Both of the ones mentioned above have been adopted by industry practice, but Venv is more Python-oriented, whereas Conda is meant to be used as a general virtual environment.
Both of them take the approach of containers, making the environment readily available and with minimal or no configuration required. There may be other alternatives available, but these 2 are the ones that I know are currently mostly used.
Photo by Fabian Grohs on Unsplash

Python Notebooks (game-changer)

The next step from using virtual environments to isolate python codebases is using them in a collaborative and share-able place in the cloud. Much like GitHub 😊
Python Notebooks ( Jupiter Notebooks, as they are actually called) are a tool that delivers a Python workspace environment, altogether with an IDE straight to your web browser. It can’t get easier than this!
The reason I state that they are game changers is that the issues that come with sharing code: versioning, environment incompatibilities, and time actually being spent on it, are gone!
Checking out the latest Machine Learning models and algorithms and tweaking and experimenting with them, if you are so inclined, has become as effortless as reading the documentation or checking the code out on Github. Modern machine learning tools such as Google Collaboratory or education sites like Kaggle rely heavily on them. Tackling Python and machine learning algorithms has become so flexible and readily-available that it is hard to imagine what next breakthrough could be…

Conclusion

I highly recommend Kaggle as a hub for Machine Learning enthusiasts, as they have a lot of people who share their algorithms and know-how. It’s a great resource for improving your Machine Learning practices and keeping up-to-date with recent advancements.
In a further post I will describe the most important python modules used within the Machine Learning space.
Also, be sure to grab our Python cheat sheet, which eloquently lists everything you need to start using python in a quite comprehensive manner and eloquently listed with examples.

Published by HackerNoon on 2020/05/01