Traditional vs. Deep Learning in the Telecom Industry: Architecture and Algorithm Categorization

Google Cloud Architecture for Machine Learning Algorithms in the Telecom Industry

Introduction

The unprecedented growth of mobile devices, applications, and services had placed the utmost demand on mobile and wireless networking infrastructure. Rapid research and development of 5G systems have found ways to support mobile traffic volumes, real-time extraction of fine-grained analytics, and agile management of network resources, so as to maximize user experience.

Moreover, inference from heterogeneous mobile data from distributed devices experiences challenges due to computational and battery power limitations. ML models employed at the edge-servers are constrained to light-weight to boost model performance by achieving a trade-off between model complexity and accuracy. Also, model compression, pruning, and quantization are largely in place.

In this blog, we try to understand the different use-cases, problems, and solutions that can be leveraged with ML as follows:

Different telecom use-cases solved by traditional ML models for customer satisfaction/end-user experience catered to higher ROI.
Limitations of traditional models, the evolution of deep learning model and its usage in the telecom industry.
Categorization of different ML models and how it fits in an end to end cloud architecture starting from app-level data ingestion to running predictive models in the pipeline.

Use cases of traditional Machine Learning algorithms

In this section, let’s look at the different use cases in the telecom industry where different ML and AI algorithms have played a significant role in network traffic prediction, customer retention, and fraud analysis.

Smart traffic prediction and path optimization

The network and service control layer contains multi-dimension convergent management and control functions to manage and control traditional and SDN/NFV cloud networks.

AI-driven smart reasoning capability has expedited Intelligent Network Operations and Management in terms of reporting anomalies and day to day activities. Network performance data can help identify sleeping cells and trigger an automatic restart, Network Optimization (coverage optimization, capacity optimization, Massive MIMO optimization) RCA (Root Cause Analysis), and Intelligent Transmission Route Optimization and Network Strategy Optimization, etc.

Security

Features governing Network security include:

Fast tracing and filtering records with Naïve Bayesian Classification, Support Vector Machine, K Nearest Neighbor, Neural Network.
Deriving Rule-based security governance techniques by data extraction and rule formulation with Ensemble Methods like Aggregated Decision Trees.
Identification and interception of malicious behaviors, prevention of attacks, etc with Naive Bayes, Multilayer Perceptron Neural Networks (MLPNNs), Radial Base Function Neural Networks (RBFNN), and SVM algorithms.

Sentiment analysis with social media

As network operators have been using Machine Learning to infer brand impact coverage and customer sentiment, end-users social networking posts help them to monitor language patterns. They also infer different kinds of sentiments to identify trends like how to capture a new market by analyzing factors driving new customers to subscribe or when do subscribers seek out a competitor.

Customer Service Recommendation and Business Personalization

Service recommenders may also be used to boost existing services or to identify why users do not adopt some services and, in turn, suggest them value-added services based on their profile and choice. In addition, they also predict churn based on the usage patterns of past churners and changes in other usage profiles.

SVM (Support Vector Machine)-based music recommendation system is often used to extract personal user-level information, timing, location, activity records, along with musical context to suggest suitable music services.

With customer-generated network data, it is easier to automate the process of grouping customers into segments, like profiling customers based on their calling and messaging behavior.

Personalized ads

Operators try to present product/service advertisements that are tailored to an individual, situation, and device. This type of target-advertising, when directed at the right intended customer bases, helps operators and advertisers to zero in on customers with ads that fit their needs and interests.

Customer segmentation on call-records

Different clustering techniques and classification techniques like K-means and other cluster mobile customers based on their call detail records and analyze their consumer behavior. PCA-based dimensionality reduction techniques can be used for the identification of relevant and recurrent patterns (e.g. location to identify common presence patterns) among the CDRs of a given user. Further, matrix factorization is employed to infer location preferences on sparse CDR data and generate location-based recommendations.

Customer Churn Prediction

Applications of SVM, Naive Bayes, Decision Tree, Boosting, Bagging, Random Forest are found in Customer Churn Prediction through supervised/unsupervised (clustering) techniques.

Traffic Flow Prediction

k-NN and Linear Discriminant Analysis (LDA), SVM, Decision Trees are used to map network traffic into different classes of interest-based on QoS requirements. The traffic classification framework uses statistics leveraging both packet-level and flow-level features.
Flow clustering using Expectation Maximization: Based on flow features (packet length, inter-arrival time, byte count, etc.) EM algorithm groups the traffic into a small number of clusters.
AutoClass: Unsupervised Bayesian classifier using the EM algorithm to select best clusters from a set of training data. To achieve global maxima, it repeats EM searches multiple times.
K-means: Unsupervised ML using the first few packets of traffic flow during the application negotiation phase.
Density-based spatial clustering (DBSCAN) has the ability to classify noisy data in contrast to k-Means and AutoClass.
Profiling by association: PBA takes as input an IP-to-IP connectivity graph and information about a small subset of IP hosts and produces a prediction about the class of all the flows (edges) in the graph.

Topic Models for Mobile Short Messaging Service Communication

Latent Dirichlet Allocation (LDA), a generative topic modeling technique, is used to extract latent features arising from mobile Short Messaging Service (SMS) communication for automatic discovery of user interest. LDA segments the mobile SMS documents into segments, to discover topics in each segment by discovering latent features. This technique filters malicious mobile SMS communication. Topic models can effectively detect distinctive latent features to support automatic content filtering and remove security threats to mobile subscribers and operators.

Customer Segmentation

Clustering to segment customer profiles requires complex multivariate time series analysis-based models, that have limitations around scalability and the ability to accurately represent temporal behavior sequences (TBS) of users. LDA model serves as the best to represent the noisy temporal behavior of mobile subscribers. Designing compact and interpretable profiles helps to relax the strict temporal ordering of user preferences.

Categorization of Deep Learning algorithms and their use cases in the Telecom Industry, Source – https://pdfs.semanticscholar.org/55c1/9610017a65319b130911651fbb2e3b552e51.pdf

Advantages of Deep Learning in Mobile and Wireless Networking

The Telecom industry acknowledges several benefits of employing smart efficient Deep Learning to address automated network maintenance tasks:

Traditional ML algorithms require feature engineering. Deep learning can automatically extract high-level features from data that has a complex structure and inner correlations. Feature Engineering needs to be automated, particularly in the context of mobile networks, as mobile data is generated by heterogeneous sources, is often noisy, and exhibits non-trivial spatial/temporal patterns, whose labeling requires an outstanding human effort.

Deep Learning is capable of handling large amounts of data and control model over-fitting. Deep ML models are suited to high volumes of different types of data generated from mobile networks at a fast pace. Training traditional ML algorithms e.g., Support Vector Machine (SVM) and Gaussian Process (GP) sometimes requires storing all the data in memory, which is computationally infeasible under big data scenarios. In contrast to traditional ML models which do not scale, Stochastic Gradient Descent (SGD) employed to train NNs only requires sub-sets of data at each training step.

Traditional supervised learning is only effective when sufficient labeled data is available. However, most current mobile systems generate unlabeled or semi-labeled data, where some of the Deep Learning algorithms like Restricted Boltzmann Machine (RBM), Generative Adversarial Network (GAN), one/zero-shot learning demand wider applicability to solve telecom domain problems.

Compressive representations learned by deep neural networks can be shared across different networks/telecom providers, while this is limited or difficult to achieve in other ML paradigms (e.g., linear regression, random forest, etc.). Therefore, a single model can be trained to fulfill multiple objectives, without requiring complete model retraining for different tasks, thereby saving CPU and memory of mobile networks.

Deep Learning is effective in handing multivariate geometric mobile data, like user-location, represented by coordinates, surroundings, environment, altitude, topology, metrics, and order through dedicated Deep Learning architectures such as PointNet++ and Graph CNN.

A Hierarchical neural network is similar to conventional CNNs.

For a concentrated geographical region, applying PointNet recursively on a nested partitioning of the input point set.

Better able to capture local structures and finer details

Despite the challenges posed by Deep Learning models, emerging tools and technology make them tangible in mobile networks,

(i) Advanced On-Demand Parallel Computing Infrastructure, (ii) Distributed Scalable Machine Learning Systems, (iii) Dedicated Deep Learning libraries like Tensorflow and PyTorch, (iv) Fast Online Optimization algorithms, and (v) Fog Computing.

Deep Learning has a wide range of applications in mobile and wireless networks.

Streamed or real-time mobile big data aggregation/segregation mechanisms help to identify different types of traffic by classification algorithms, and also enhances Call Detail Record (CDR) mining.
Mobile data analytics on edge devices leverages smart Deep Learning-Driven App-Level Mobile Data Analysis to save power.
Deep Learning-Driven User Mobility Analysis identifies movement patterns of mobile users, either at the group or individual levels.
Deep Learning-Driven User Localization applies different signals received from mobile devices or wireless channels to identify users in indoor or outdoor environments.
Deep Learning-Driven Wireless Sensor Networks find application in centralized vs. de-centralized sensing, WSN data analysis, WSN localization, and other applications.
Deep Learning-Driven Network Control finds the usage of deep reinforcement learning and deep imitation learning on network optimization, routing, scheduling, resource allocation, and radio control.
Deep Learning-Driven Network Security extensively applies Deep Learning to improve network security at an overall infrastructure and software level.
Signal Processing relies on deep learning techniques to monitor the physical layer.
Deep Learning-based RCNN and Fast-RCNN algorithms are efficient when used in the context of object recognition problems in telecom Inventory management.
Media recognition (applied on pictures, sound, video, and traffic bursts)/Photo-tagging helps subscribers learn and classify known patterns in a collaborative image-classification system and then use this to identify the category to which previously unseen patterns belong. Transfer Deep Learning approach with ontology priors provides effective means of discovering intermediate image representations from deep networks and ensures good generalization abilities across two different domains.

Now, let’s take a quick look at the different Deep Learning platforms available, mobile hardware supported along with its speed and mobile compatibility.

Comparison of Mobile Deep Learning Models

Cloud Architecture with mobile data ingestion and Model Training, Prediction

The figure below depicts the different components involved in building the ML platform — Network Monitoring/Optimization, Media Settlement, Advertising, Audience Orientation, Pattern Recognition, Sensor Data Mining, and Mobility Analytics

Incoming real-time data from mobile SDK
Real-time data collection and computing engine receiving data from SDK, with a messaging pipeline to cache, frequently received records
Offline Computing and Analysis Engine
BI and Data Warehousing Engine

Cloud Architecture with GCP for telecom Machine Learning and AI algorithms

Network Monitoring and Optimization

Network State Prediction refers to inferring mobile network traffic or performance indicators, given historical cellular measurements of EnodeB, Sector, and Carrier data. MLPs and Deep Learning LSTM-based techniques are used to predict users’ QoE, and evaluate the best-beam for transmission based on:

Average user throughput
Number of active users in a cell
Average data volume per user
Channel quality indicators (uplink and downlink)
Beam Index (BI)
Beam Reference Signal Received Power (BRSRP)
Distance (of UE to serving cell site)
Position (GPS location of UE)
Speed (UE mobility)
Channel Quality Indicator (CQI)
Historic values based on past events and measurements related to above metrics and beam orientation.

By leveraging sparse coding and max-pooling, semi-supervised Deep Learning models have been developed to classify received frame/packet patterns and infer the original properties of flows in a WiFi network.

Further, AI capable 5G networks aid in:

Building a data map using enodeB based network slice using user-subscription, network performance, QoS, event logs.
Forecasting network resources
Anticipate network outages, equipment failures, and performance degradation
Predicting UE mobility in 5G networks, allowing Access and Mobility Management Function (AMF) to update mobility patterns based on user subscription, historical statistics, and instantaneous radio conditions.
Enhancing security in 5G networks, preventing attacks and frauds by recognizing user patterns, and tagging certain events to prevent similar attacks in the future.

Predicting Mobile traffic at city scale

Spatio-temporal correlations of geographic mobile traffic can be predicted with an AE-based architecture and LSTMs. Global and multiple local stacked AEs are used for spatial feature extraction, dimension reduction and training parallelism, while compressed representations extracted are subsequently processed by LSTMs, to perform final forecasting. A typical AE-LSTM architecture, where the AutoEncoder model is used to extract features and the LSTM model is used to predict the traffic flow:
Hybrid Multimodal Deep Learning method can be used for short-term traffic flow forecasting. The model, is composed of one-dimensional Convolutional Neural Networks (1D CNN) and Gated Recurrent Units (GRU) with the attention mechanism, and can jointly and adaptively learn the spatial-temporal correlation features and long temporal interdependence of multi-modality traffic data.
Multiple 3D Convolutional Neural Networks use 3D-CNNs to learn the Spatio-temporal correlation features jointly from low-level to high-level layers for traffic data.
Other commonly used traditional ML models for modeling Spatio-temporal characteristics include SVM and the Autoregressive Integrated Moving Average (ARIMA).

ST-DenNetFus based Deep Learning framework uses location-based ECI metrics to predict dynamically network demand (i.e. uplink and downlink throughput) in every region of a city. The ST-DenNetFus architecture captures unique properties (e.g., temporal closeness, period, and trend) from Spatio-temporal data, through various branches of dense neural networks (CNN). ST-DenNetFus enhances technicalities in network capacity estimation by introducing fusing external data sources (e.g., crowd mobility patterns, temporal functional regions, and the day of the week) that have not been considered before.

Mobile Traffic Super-Resolution (MTSR) technique uses probing to infer network-wide fine-grained mobile traffic consumption. MTSR works on the principle of image super-resolution, designed with a dedicated CNN with multiple skip connections between layers, named deep zipper network, along with a Generative Adversarial Network (GAN).
MLPs, CNNs, and LSTMs perform encrypted mobile traffic classification as deep NNs can automatically extract complex features (e.g., identify protocols in a TCP flow dataset). CNN's have also been used to identify malware traffic, where images and unusual patterns that malware traffic exhibits are classified by representation learning.
CDR Mining filters information related to a phone number, cell ID, session start/end time, traffic consumption, etc.

a. Estimating metro density from streaming CDR data, by using RNNs. The goal is to take the trajectory of a mobile phone user as a sequence of locations, which can then be fed to RNN-based models to handle the sequential data.

b. CDR data can also be used to study demographics, where CNN is used to predict the age and gender of mobile users.

c. CDR data is also used to predict tourists’ next locations.

d. Human activity chains generation by using an Input-Output based HMM-LSTM generative model.

RNN-based predictors significantly outperform traditional ML methods, including Naive Bayes, SVM, RF, and MLP.

Deep Learning-Driven App-level Mobile Data Analysis

Analysis of mobile data, therefore, becomes an important and popular research direction in the mobile networking domain, as the rapid emergence of IOT sensors and its data collection strategies have been able to provide a powerful solution for app-level data mining.

App-level mobile data analysis include: (i) Cloud-based computing and (ii) Edge-based computing.

In the former, mobile devices act as data collectors and messengers that constantly send data to cloud servers, via local points of access with limited data pre-processing capabilities. In Edge-based computing, pre-trained models are offloaded from the cloud to individuals. The primary applications include mobile healthcare, mobile pattern recognition, and mobile Natural Language Processing (NLP), and Automatic Speech Recognition (ASR).

Mobile Health: Wearable health monitoring devices being introduced in the market, incorporates medical sensors that capture the physical conditions of their carriers and provide real-time feedback (e.g., heart rate, blood pressure, breath status, etc.), or trigger alarms to remind users of taking medical actions.

Deep Learning-driven MobiEar assists deaf people’s awareness of emergencies by operating efficiently on smartphones.

Deep Learning-based (DL) models (CNNs and RNNs) are able to classify lifestyle and environmental traits of volunteers, different types of Human Activity Recognition with heterogeneous and high-dimensional mobile sensor data, including accelerometer, magnetometer, and gyroscope measurements. ConvLSTMs are known for fusing data gathered from multiple sensors and perform activity recognition.

Mobile motion sensors collect data via video capture, accelerometer readings, motion — Passive Infra-Red (PIR) sensing, specific actions, and activities that a human subject performs. Such models trained on the server for domain-specific tasks through federated learning, finally serve a broad range of devices.

Mobile Pattern Recognition relies on a mobile camera or other sensors to identify useful patterns.

Object Classification finds huge applications in mobile devices as devices take photos and rely on the photo-tagging process.

One such DL-based framework is the DeepCham that generates high-quality domain-aware training instances for adaptation from in-situ mobile photos. It has a distributed algorithm that identifies qualifying images stored in each mobile device for training and a user labeling process for recognizable objects identified from qualifying images using suggestions automatically generated by a generic deep model.

Mobile classifiers can also assist Virtual Reality (VR) applications, where Deep Learning object detectors are incorporated into a mobile Augmented Reality (AR) system. CNN-based frameworks do object detections for facial expression recognition when users wear head-mounted displays in the VR environment.

A lightweight Deep Learning-based object detection framework can be provided that combines spatial relations for:

Training and detection with the lightweight Single Shot Detector (SSD)
Combination of vision-based detection results and spatial relationships
Registration, geo-visualization and interaction.

Deep Learning-Driven Mobile Analytics

CNNs and RNNs are the most successful architectures in such applications as they can effectively exploit spatial and temporal correlations.

The “DeepSpace” model, built with hierarchal CNN structure, predicts individuals’ trajectories/moving paths with much higher accuracy as compared to naive CNNs, stacked RNN and LSTM, n-grams, and k nearest neighbor method. In addition to providing support to 2 parallel prediction models, the coarse prediction model and fine prediction models to deal with the continuous mobile data stream, the framework supports online training and learning to extract optimal feature set size for the online data.
The “DeepMove” model predicts human mobility from lengthy and sparse trajectories using an attentional recurrent network. It uses a multi-modal embedding recurrent neural network to capture the sequential human mobility transitions Further, it’s also extended to include a historical attention model to capture the multi-level periodicity. A historical attention module is equipped with an auto-selector, comprised of two components:

An attention candidate generator to generate the candidates, which are exactly the regularities of the mobility, and an attention selector to match the candidate vectors with the query vector, i.e., the current mobility status.

GPS records and traffic accident data are combined to understand the correlation between human mobility and traffic accidents. The design includes a stacked de-noising Auto Encoder to learn a compact representation of human mobility, and subsequently use that to predict traffic accident risk.

DBNs (Deep Belief Networks) are used extensively to sense and predict human emergency behavior mostly in case of natural disaster, through the use of GPS records.

A Deep Learning-based approach called ST-ResNet, is used to collectively forecast the inflow and outflow of crowds in each and every region of a city. The architecture of ST-ResNet (residual neural network framework) is based on unique properties of spatio-temporal data, to model the temporal closeness, period, and trend properties of crowd traffic. ST-ResNet while training on spatio-temporal factors, assigns different weights to different branches and regions, along with external factors, such as weather and day of the week.

Deep Learning-Driven User Localization

Location-based services and applications (e.g. mobile AR, GPS) demand precise individual positioning technology satisfied by Deep Learning techniques used on both device-free and device-based localization services.

Drawbacks of Deep Learning in Mobile and Wireless Networking

Although Deep Learning has unique advantages when addressing mobile network problems, it also has several shortcomings, which partially restrict its applicability in this domain. Specifically:

Deep Learning (including deep reinforcement learning) is vulnerable to adversarial/cyber attacks (especially CNN), where artifact inputs that are intentionally designed by an attacker to fool Machine Learning models into making mistakes, triggering mis-adjustments.
Deep Learning algorithms are largely black boxes and have low interpretability. This limits the applicability of Deep Learning, e.g. in network economics. Still, businesses continue to employ statistical methods that have high interpretability, whilst sacrificing on accuracy that could be attainable from Deep Learning models.
Deep Learning is heavily reliant on data, and models further benefit from training data augmentation. This creates an opportunity for mobile networking, as networks generate tremendous amounts of data. However, data collection may be costly, and face privacy concerns, therefore, it may be difficult to obtain sufficient information for model training.
Deep Learning can be computationally demanding and heavily relies on advanced parallel computing (e.g., GPUs, high-performance chips).
Deep neural networks usually have many hyperparameters (e.g., for a CNN, it includes number, shape, stride, and dilation of filters, as well as for the residual connections), and finding their optimal configuration can be difficult. The AutoML platform2 provides the first solution to this problem, by employing progressive neural architecture search.

Conclusion

In this blog, we discussed different traditional vs Deep Learning algorithms, DL-based architectures, their pros and cons, and applications in the telecom industry. We also explored the data ingestion, categorization, and model deployment architecture in production. We looked at the recent advances in ML driver mobile-app development (in object detection, speaker identification, emotion recognition, stress detection, and ambient scene analysis), in-built technologies to sustain limited mobile batteries by building memory-energy efficient apps, and model compression techniques.

References

Machine-learning technologies in telecommunications https://pdfs.semanticscholar.org/a367/f8cad03c1353e9fc36970e4cb4b8edc21fc0.pdf
Deep Learning in Mobile and Wireless Networking: A Survey : https://pdfs.semanticscholar.org/55c1/9610017a65319b130911651fbb2e3b552e51.pdf