Wehe: Hacking Traffic Differentiation Detection for Mobile Environments

Authors:

(1) Vinod S. Khandkar and Manjesh K. Hanawal, Industrial Engineering and Operations Research Indian Institute of Technology Bombay, Mumbai, India and {vinod.khandkar, mhanawal}@iitb.ac.in.

Table of Links

Abstract & Introduction

Related Work and Background

Challenges in TD Detection Measurement Setup Development

Case Study : Wehe - TD Detection Tool for Mobile Environment

Shortcoming of Wehe on HTTPS Traffic

TD Detection of HTTPS Traffic

Conclusion & References

IV. CASE STUDY : WEHE - TD DETECTION TOOL FOR MOBILE ENVIRONMENT

Wehe [13] is a tool for the detection of traffic differentiation over mobile networks (e.g., cellular and WiFi). It is available as an App on Android and the iOS platform. The user database of the Wehe tool consists of 126,249 users across 2,735 ISPs in 183 countries/regions generating 1,045,413 crowd-sourced measurements. European national telecom regulator, the US FTC and FCC, US Senators, and numerous US state legislators have used the Wehe tool’s findings.

The tool supports TD detection for many popular services such as Netflix, YouTube. The tool runs TD detection tests by coordinating with its server, called the “replay server”. The replay server keeps track of active users and maps replay runs to the correct user’s service.

A. Traffic generation

Wehe uses the “record-and-replay” method for generating probing traffic. The user-client exchanges the probing traffic with the replay server as per the replay script that captures the application-level traffic behavior, including the port number, data sequence, and timing dependencies from the original service’s network logs. Preserving timing is a crucial feature of Wehe’s approach. It expects network devices to use this information in case of non-availability of any other means to classify applications, e.g., HTTPS encrypted data transfer

Wehe tool uses two types of probing traffic streams. While one stream (original replay) is the same as the original service’s application-level network trace, another traffic stream (control replay) differs substantially from the first traffic stream. In one approach, Wehe uses the VPN channel to send a second probing traffic stream. This approach uses the meddle VPN [30] framework for data transfer and server-side packet capture. Another approach uses the bit-reversed version of the first traffic stream sent one the same channel. Currently, Wehe uses the latter approach due to its superior results.

B. Over the network response

Wehe is a differential detector tool that compares the network responses for two types of traffic streams generated by the tool. The service-specific information present in the original replay is useful for network devices with Deep Packet Inspection (DPI) capability to identify and classify the service correctly. So, the original replay’s traffic performance over the Internet closely resembles the original application traffic on the same network. While original replay is exposed for detection to network devices, the traffic streams with bit reversed data or control replay is equally “not detectable” for classification. Thus it is expected that the control replay traffic evades the content-based application-specific traffic differentiation. The performances of two such traffic streams (detectable and nondetectable) differ if network devices apply different traffic management or traffic differentiation on each traffic stream as per content-based classification.

C. TD detection scenario expectations

Wehe compares the throughput performances of original replay and control replay to detect TD. The methodology uses the throughput as a comparison metric due to its sensitivity to bandwidth-limiting traffic shaping. However, the tool expects that the TD detection algorithm does not detect TD based on throughput for traffic streams with traffic rates below the shaping rate. The rationale is that the shaper cannot affect the performance of such an application stream.

Many times both traffic streams get affected by other factors such as signal strength, congestion. It creates an irregularity in the received performance due to bandwidth volatility. It is mentioned to be leading to incorrect differentiation detection. The tool performs multiple test replays to overcome the effect of bandwidth volatility.

D. Operational requirements

Wehe server needs side-channels for each client to associate it with precisely one app replay. This side-channel supplies information about replay runs to the server. Each user directly connected to the Wehe replay server is uniquely identifiable on the server-side with an associated IP address of side channels that maps each replay to exactly one App.

The other operational requirement is that the Wehe clientserver communication uses customized socket connections with specific keep-alive behavior. Sometimes, the usage of translucent proxies by the user-client modifies this behavior. The Replay server handles this situation by handling such unexpected connections. The protocol-specific proxies, e.g., HTTP proxy, connect the user-client to the server through itself using specific port numbers, e.g., 80/443 for HTTP/HTTPS. Nevertheless, it allows the user-client to connect to the server for connections using other protocols directly. The side channels of Wehe do not use HTTP/HTTPS connections. So the IP address for the same user differs for side-channel and replay runs. Wehe server detects such connections and indicates such connections to the user-client using a special message. The special message triggers the exchange of further communication with a customized header.

E. Challenges of validating Wehe

The TD mechanism of Wehe is straightforward to use — the requirement changes when using it for its validation. The validation process may need to launch only one type of replay for different services during one test or may need to launch all replays in parallel. These are not requirements related to TD detection, Wehe’s primary goal, so understandably not supported. Hence the validation of Wehe’s working in such scenarios needs a specific client-server setup. Here the challenge is to separate the intended scenario-specific Wehe’s mechanism so that the resulting system still mimics Wehe’s actual behavior.

Moreover, Wehe does not provide error/failure notifications in all scenarios. Instead, it prompts the user to reopen the App. As a result, the validation setup loses vital feedback information regarding the error/failure induced by its validation scenario.

This paper is available on arxiv under CC 4.0 license.