What Is Edge AI?

Edge AI—also referred to as on-device AI—commonly refers to the components required to run an AI algorithm locally on a hardware device.

Of late, it means running deep learning algorithms on-device, and most articles tend to focus only on one component i.e. inference.

This series of articles will shed some light on the other components and challenges of Edge AI.

This series is divided as follows:

Part 1: Why Run AI Algorithms on Edge
Part 2: Edge AI Components - Sensor Data Capture
Part 3: Edge AI Components - Pre-processing
Part 4: Edge AI Components - Inference
Part 5: Edge AI Components - Performance Evaluation
Part 6: Edge AI Components - Real Time Metrics
Part 7: Edge AI Components - Scheduling & System Architecture
Part 8: Edge AI Components - Bridging the gap between Edge AI & Cloud AI

Experimental Setup

Edge devices are very diverse in their cost/capabilities. To make the discussion more concrete, here’s the experimental setup used in this series:

Qualcomm Snapdragon 855 Development Kit [4]

* Qualcomm Snapdragon 855 Development Kit.

* Object Detection as the Deep learning model to be run on an Edge device. There are a lot of good articles describing state of the art in Object detection [survey paper]. We will use Mobilenet SSD model for Object Detection in this series.

* Tensorflowjs to quickly run object detection model in nodejs environment

Why run AI algorithms on Edge

Why can’t we rely on cloud to run AI algorithms? After all scaling resources to run an AI/Deep learning model to match your performance needs is easier on cloud. So why should one worry about running them on an edge device with compute and power constraints? To answer this question let’s consider two scenarios:

a) Cloud based architecture, where inference happens on cloud.

b) Edge based architecture, where inference happens locally on device.

(To keep the comparison as fair as possible, in both the cases a nodejs webserver along with tensorflowjs (cpu only) will be used, only difference being that in case a) webserver will run on an EC2 instance and in case b) webserver will run locally on an edge device. Goal here is NOT to have an optimized implementation for a platform (cloud or edge) but rather to have a framework to do fair comparison.)

Cloud based architecture

Here’s how a cloud based setup would look like, it would involve the steps detailed below:

Cloud only Architecture for Inference. (image references at end).

Step 1: Request with input image

There are two possible options here:

* We can send the raw image (RGB or YUV) from edge device as it’s captured from a camera. Raw images are always bigger and takes longer to send to cloud.

* We can encode the raw image to JPEG/PNG or some other lossy format before sending, decode them back to raw image on cloud before running inference. This approach would involve an additional step to decode the compressed image as most deep learning models are trained with raw images. We will cover some more ground on different raw image formats in future articles in this series.

To keep the setup simple, first approach [RGB image] is used. Also HTTP is used as the communication protocol to POST an image to a REST endpoint (http://<ip-address>:<port>/detect).

Step 2: Run inference on cloud

* tensorflowjs is used to run inference on an EC2 (t2.micro) instance, only a single nodejs worker instance (no load balancing, no fail over etc) is used.

* Mobilenet version used is hosted here.

* Apache Bench (ab) is used to collect latency numbers for HTTP requests. In order to use ab, RGB image is base64 encoded and POST ed to an endpoint. express-fileupload is used to handle the POST ed image.

Total latency (RGB) = Http Request + Inference Time + Http Resp

ab -k -c 1 -n 250 -g out_aws.tsv -p post_data.txt -T "multipart/form-data; boundary=1234567890" http://<ip-address>:<port>/detect

This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking <ip-address> (be patient)
Completed 100 requests
Completed 200 requests
Finished 250 requests
Server Software:
Server Hostname:        <ip-address>
Server Port:            <port>
Document Path:          /detect
Document Length:        22610 bytes
Concurrency Level:      1
Time taken for tests:   170.875 seconds
Complete requests:      250
Failed requests:        0
Keep-Alive requests:    250
Total transferred:      5705000 bytes
Total body sent:        50267500
HTML transferred:       5652500 bytes
Requests per second:    1.46 [#/sec] (mean)
Time per request:       683.499 [ms] (mean)
Time per request:       683.499 [ms] (mean, across all concurrent requests)
Transfer rate:          32.60 [Kbytes/sec] received
                        287.28 kb/s sent
                        319.89 kb/s total
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   5.0      0      79
Processing:   530  683 258.2    606    2751
Waiting:      437  513 212.9    448    2512
Total:        530  683 260.7    606    2771
Percentage of the requests served within a certain time (ms)
  50%    606
  66%    614
  75%    638
  80%    678
  90%    812
  95%   1084
  98%   1625
  99%   1720
 100%   2771 (longest request)

Histogram of end to end Inference Latencies for Cloud based architecture (bucket size of 1s). It shows the inference latencies for requests generated by Apache Bench (ab) in a given second.

End to End Inference Latencies for Cloud based architecture sorted by response time (ms). This article explains the difference between the two plots.

As we can see here 95% percentile request latency is around 1084ms.

Edge based architecture

Web server (which runs tensorflowjs) is running locally on an edge device (Qualcomm Snapdragon 855 Development Kit [4]). We repeat the same steps using Apache Bench (with http requests to localhost this time instead of remote sever) and the results are as follows.

Histogram of end to end Inference Latencies for Edge based architecture (bucket size of 1s). It shows the inference latencies for requests generated by Apache Bench (ab) in a given second.

End to End Inference Latencies for Edge based architecture sorted by response time (ms). This article explains the difference between the two plots.

As we can see here 95% percentile request latency is around 357ms.

Optimization Opportunities

As you can see the latency numbers are fairly high, the numbers we obtained here are more like upper bound latencies, there are many optimization opportunities, some of them are detailed below:

Cloud based architecture:

* Have multiple nodejs worker instances and load balance between.Have multiple deployments (us-east, us-west etc) and route the request to the closest deployment.

* Batch multiple input images and run batched inference on cloud.

* Have a gpu based EC2 instance and use tensorflow-node-gpu to accelerate inference.

* Use a different communication protocol like MQTT geared more towards IOT / cloud connectivity to avoid overheads with HTTP.

Edge based architecture:

* Have an optimized implementation for your Edge device. In this case for Qualcomm Snapdragon 855 Development Kit [4] inference would be accelerated on GPU / DSP or their NPU.

* Most likely implementation on device would depend on native libraries through vendor frameworks like SNPE or tensorflow-lite.

* Optimize the data path consisting of image capture from camera to feeding the deep learning models to run inference.

Conclusion

We looked in detail at one of the factors to decide if you need Edge based solutions, as we saw if your application is tolerant to cloud latencies then cloud based inference would be the quickest way to get going. However if your application is latency sensitive then you can consider Edge based solutions. Be sure to benchmark your particular use case to pick one vs the other. In addition to latency these are some of the other reasons to consider Edge based solutions:

* You already have an existing deployment of Edge devices and want to leverage it to save on cloud compute costs.

* Privacy, you don’t want data to ever leave an edge device.

* Devices which are not fully connected / have poor connectivity to cloud, edge based solutions becomes inevitable

Future articles in the series will cover different components involved in an Edge based solution. Stay tuned!.

References

[1] https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train&type=detection&c=%2Fm%2F0k4j&id=101c3faac77e2e29 — Car Overlay image from Open Images Dataset V5

[2] https://c2.staticflickr.com/7/6021/6005573548_11b7b17c9b_o.jpg — Original Car image

[3] https://pixabay.com/illustrations/google-pixel-3-google-cell-phone-3738925/ — Pixel Phone image.

[4] https://www.intrinsyc.com/snapdragon-embedded-development-kits/snapdragon-855-hdk/ — Development Kit by Intrinsyc

[5] http://www.bradlanders.com/2013/04/15/apache-bench-and-gnuplot-youre-probably-doing-it-wrong/

Machine Learning Edge Ai Deep Learning Inference Ai On Device