How to combine some machine learning methods for traffic prediction?

“Four lane urban busy traffic congestion in Bangkok” by Connor Williams on Unsplash

As a data scientist who has worked on geospatial data for more than one year, traffic prediction has always been a great challenge for our team. We use a machine learning algorithm for traffic estimation and a navigation system based on our live traffic estimated data. We have built a simple traffic estimation prediction that is used to predict navigation travel time. The output of our services is surprisingly accurate. On top of that, our travel time prediction has shown brilliant results, except in anomalies like vacation days or unpredictable changes in traffic behaviors. These anomalies happen less than 20 percent of the time but our users need more accurate prediction especially in anomalies moments, so here I will talk about the approach we took to solve the problem.

There are many kinds of solutions for traffic prediction like ARIMA, SARIMA, Kalman filter, particle filter, and theoretical traffic propagation methods. But in the real world, when you must respond to your users in less than half a second and you have a huge amount of dirty/missed/imbalanced data, employing these methods does not seem to be working.

So, here is what we did to predict traffic:

As an initial idea, I proposed to use a simple linear algebra-based method to find the most similar traffic state to live traffic. Suppose we found the most similar state to the current state. We assume that 20 minutes later from now, the traffic should be similar to 20 minutes after the state we found as similar. This method had a bunch of errors. For example, when you use a dot product, you can not avoid the effect of an accident in the north-west of a city on the traffic state of the south-east in predicting similar states.

Then we used a method to do a local search in order to find the most similar traffic state in our traffic history. The whole idea is very simple. Cut the cities map into some pieces. For example, we used square cells with 200m * 200m dimensions. Then we defined a mesh which consists of 5*5 cells, so its area was 1Km². Each mesh had 9 collections of 3*3 cells as illustrated in the following figure.

The basic idea of building mesh is borrowed from convolution neural networks.

The traffic in this local area will be predicted by its own data. The whole amount of data is huge, on the other hand, we face totally missed and imbalanced data in smaller pieces. What we needed here was feature engineering on our data. I defined some features for all traffic data in each cell and did some matrix operations on each mesh and evaluated the live state of the mesh with its historical states. Finally, I found the best method to find the most similar state is to consider a 25-dimensional space so that each mesh is a 25-element vector. Now we can calculate the dot product between each historical states with live traffic state and find the most similar state very fast by the use of cosine similarity. After finding the similar state, in the next step, we tested our initial idea. We examined if we can consider 20 minutes after that state as a prediction of 20 minutes later traffic from now on. The answer is yes. The predictions are amazingly fast and accurate. Another advantage of this method is the fact that it can predict traffic in a database layer via SQL queries.

How to get rid of the network`s issues that cause an error in traffic predictions? Here we go to the last part of our traffic prediction journey. When we are working on an open street map network graph, it always has issues like mistaken dead ends and some reverse directions. So when you are predicting traffic on this network the results may be incorrect. Assume each state of traffic as an image. We can take an image from each mesh by opening our lens for 10 minutes and then take a shot. Next idea is to create a HOG from this images. HOG is a very simple representation that captures the basic structure of a face in a simple way. For more information on HOG please read this post.

The original image is turned into a HOG representation that captures the major features of the image regardless of image brightness.

We have some traffic movements in our mesh. Each of them has a location and a velocity. This velocity has a direction and a magnitude. So we can define our HOG by using their location and normalizing the magnitude of the velocity for the length of that line on the image. Then we divide the length of each line into 2 equal parts and will use intense of white color to represent the vector direction. For instance, if we have a movement from north to south, the top north part of the line in the image will have 2 times more intense than the bottom south part.

Now we converted traffic prediction into a near duplicate image search. If we want to find the most similar image very fast we can create a hash from the image of each state. As I have tested some methods, it seems Locality Sensitive Hashing has very good results in this case. For more information, I recommend reading this post about finding near-duplicate images very fast.

This method can be applied in a variety of situations and can easily predict traffic on any graph network. Even if our information about the network has some errors or enough knowledge does not exist, we can use the last part to predict traffic. Applications of this method can help us to predict a website traffic in a black Friday by collecting social media data about the site. Furthermore, we can use this approach in case of fraud detection, finding anomaly behavior in network traffic, aerial traffic prediction, and maybe in many other situations which you may know.

I would like to thank my teammates Mostafa Jalambadani, Davood Amini and Mahdi Yeilaghi who helped me to solve this problem.

Mahdi R. - Data Scientist & Product Owner - Rajman Information Structures | LinkedIn_View Mahdi R.'s profile on LinkedIn, the world's largest professional community. Mahdi has 3 jobs listed on their…_www.linkedin.com