Why are we (still) sending so much web traffic unencrypted over the Internet?

It’s 2019 and, still, around 25% of websites on the Internet are visited without encryption. Let’s look at why.

This is Part 2 of a three-part series on public Wi-Fi insecurity. In Part 1, I showed how easily, even today, a hacker could victimize users of public Wi-Fi networks. I created a tool to catch a glimpse of how prevalent potentially-insecure activities occur on public Wi-Fi, and compared results to a similar report from Google.

The results were consistently bleak: about a quarter of all website visits occur without the use of HTTPS.

I was curious why there would be so much traffic over ol’ unencrypted HTTP, but that required actually inspecting packets at a deeper level. I decided jail didn’t sound like fun; so, back at home, I set up a little packet capture lab to capture and then deeply inspect my own personal traffic using Wireshark to see if it would shed any light on what the heck is going on.

Here’s what I found.

SUMMARY

Much of the Web’s traffic remains unencrypted — for no good reason
Popular websites, including google.com, netflix.com, and some major financial institutions improperly set security-related HTTP headers, potentially exposing users to man-in-the-middle attacks
DNS still undermines attempts at securing the Internet
The Internet’s chain of trust is very complex to implement correctly

Popular Websites Are Not Strict Enough about HTTPS

With some existing web browser tabs open and while already logged-in to services (email, social media, bank), my traffic looked pretty secure, with only an initial HTTP packet when I connected to my test network (which was generated by my OS to test and see if it was behind a captive portal or not). No harm in probing the network using HTTP (other than revealing my OS to anyone spying on that specific network).

The next thing to test was what would happen if I closed a tab and then attempted to revisit a website I’ve previously visited over HTTPS. Would my browser automatically know to send all traffic over HTTPS even if I didn’t specify it “https://” at the beginning of the URL? This is what should happen if websites implement HTTP Strict Transport Security (HSTS) policy correctly, which is as simple as using a 301 Redirect to switch from HTTP to HTTPS and then responding over HTTPS with a special HTTP header Strict-Transport-Security.

When I opened a new tab and typed in “google.com” (intentionally omitting the “https://”), I expected HSTS policy to kick in and automatically force HTTPS before hurling even one packet over plaintext HTTP.

Google, one of the most visited websites in the world, must implement HSTS correctly, right?

Nope.

I entered “netflix.com” and expected HSTS policy to kick in.

Nope.

Frantically, I visited a popular bank website, closed the tab, opened a new tab, and tried that. I tried a few additional popular websites.

Nope. Nope. Nope. Nope.

I tested on both Firefox and Chrome and got the same results, but then noticed “accounts.google.com” did enforce HSTS policy. That’s when it hit me: HSTS is (sometimes) only being enforced for subdomains, not parent domains, of popular websites.

I confirmed that “www.google.com” and “www.netflix.com” never go over HTTP, and neither does the root “github.com” (good job, Github, for doing the right thing!) revealing that browsers are enforcing HSTS policies correctly, but…

…a majority of popular websites, including Google and Netflix, are setting HSTS policies only for subdomains and not their parent domains, or not setting HSTS at all.

This undermines the whole point of HSTS, leading to traffic being sent unencrypted and creating an easy attack vector on any shared or public network.

Popular websites failing to enforce HTTP Strict Transport Security for root domains, potentially exposing visitors to man-in-the-middle attacks (as of March 2019)

An easy test to see if HTTP Strict Transport Security is enforced:

1. Select the Network tab in Developer tools on Chrome and then enter a URL you’ve previously visited (without the “https://”).

2. If you get a Status 307 with the response header “Non-Authoritative-Reason: HSTS” set, then HTTP Strict Transport Security is being enforced. If there’s any other type of redirect as the first redirect, then it’s not.

Great job, Github! You implemented HTTP Strict Transport Security properly!

Okay, so, that can’t explain all the insecure HTTP traffic that my analyzer tool was reporting, but it is hugely concerning that websites that need only be accessed over HTTPS, including many that are attempting to enforce strict HTTPS policies, are not configured properly. This means that users who don’t carefully inspect HTTPS certificates and URLs are at risk of man-in-the-middle (MITM) attacks phishing for information. No bueno.

But, HSTS isn’t infallible, either.

Remember UDP Port 123 from the statistics in Part 1 of this series? That’s used by the Network Time Protocol (NTP), and is completely unencrypted and unverified. HSTS informs web browsers to remember HSTS policies up until a set expiration time, but everyone’s clocks can be tampered with simply by performing an NTP MITM attack.

This means that, by simply hacking time, it’s possible to perform a TLS/SSL stripping attack even for websites that implement HSTS.

On a separate note, the use of HSTS also presents considerable privacy concerns. Well-resourced agencies who want to track individuals’ online behaviors can exploit HSTS in a way like supercookies.

DNS Stands for “Dinosaur”

This problem stems from the domain name system (DNS). DNS is efficient, but it’s horribly insecure and everyone has known that for decades. For example:

There is no verification that the IP address responses received over DNS are actually sent by our desired DNS servers and not some man-in-the-middle attacker
There is no encryption of DNS lookups
There’s also no linkage between DNS entries and web server encryption (the one exception being DomainKeys to reduce spoofed spam e-mail, I suppose), so it’s possible to spoof DNS replies to point at nefarious servers and exploit the fact that HSTS policies are not always configured correctly

Let’s get technical for a sec: I don’t want to get my head chewed off by the anti-DNSSEC crowd, so I’ll do my due diligence here and mention I’m NOT saying that I support the current DNSSEC protocol and I am fully aware that there are a number of efficiency and administrative headache concerns with DNSSEC/DANE. I even love the idea of HTTP Public Key Pinning and know about its virtues, and yet key pinning is now dead and HSTS is not enough to protect the public.

So, what I AM saying is this:

It’s 2019 and we’re still using the old, insecure DNS system and (despite techniques like HSTS) people are still practically vulnerable in very preventable ways.

I’m a technologist, so let me ask fellow technologists this: What the heck is wrong with technology and its practitioners (us) that we haven’t implemented a solution to this, and why is this still a problem? Let’s fix the Internet together! This touches upon a complicated discussion known as the usable security problem, but let’s continue to have this discussion and make it our priority to make technology work better for the people that use it.

Mic drop

Why haven’t we fixed the Internet yet?

I would need the space of more than just this article to cover the “Why?” question fully, but I’ll say that the way we got here is (at least) three-fold:

Management nightmares exist: Key pinning was complicated by the fact that encryption keys/certificates needed to be changed from time to time and management of such is a huge headache (poor usability for sysadmins) with current tools — so much so that key pinning is now being deprecated and completely abandoned
Extensive knowledge and accuracy are required: Security is hard to always implement correctly
We are lazy(?): Since there is a possible path where people could surf the web securely (albeit difficult for normal people and websites to do consistently) we’ve lost a sense of urgency in ripping everything out and starting over (and/or we’re really stupid)

There are other reasons as well, but the moral of the story is that this is entirely preventable, but it still isn’t being prevented, and that’s really messed up if you think about it.

In Internet We Trust

All of this is further complicated by the fact that, today, encryption on the Internet still relies upon trusting centralized authorities, certificate authorities (CAs) to be exact, in order to validate the authenticity of encryption keys used during an initial handshake that occurs between clients/browsers and servers. To make matters worse, which algorithms are going to be used to establish privacy are entirely variable, determined during the initial handshake, and the web is littered with clients and servers with a mismatched combination of (simultaneously) less/more secure algorithms and versions (which is why the TLS article is possibly the longest and ugliest article on all of Wikipedia).

TLS is anything but standard, and shouldn’t be trusted.

First, during the initial handshake, a client “Alice” (e.g. a web browser) that wishes to securely communicate with another entity “Bob” (e.g. a web server) must reach agreement with Bob as to what the definition of a “secure communications channel” even is. Next, in order for Alice to establish said secure channel with Bob (let’s call it Channel 1), Alice first needs to use another secure channel with Bob (Channel 2) for exchanging keys to encrypt Channel 1; but, to know that Channel 2 is secure, Alice first validates (over Channel 3) with the certificate authority “Val” to make sure an eavesdropper “Eve” isn’t disguised as Bob.

So, for Alice to trust that communications with Bob are secure from Eve, Alice needs to trust that Bob isn’t stupid and/or lazy and trust that Val isn’t the NSA in addition to trusting the protocols used to create Channels 1, 2, and 3.

This is known as a chain of trust. But, who watches the watchers? Is the solution to Trust No One?

Trustless is the only way forward.

We need a so-called “trustless” Internet, with privacy and security that works fundamentally differently than it does today. But, contrary to how the “trustless” security paradigm sounds, it isn’t really about eliminating trust completely; instead, it’s about minimizing the scope of risk resulting from trust — to trust less. That’s because our problem with security and privacy on the Internet isn’t that we trust things/people, the problem is that we don’t know who not to trust or when, and we do not always know with what information we are entrusting them. (After all, we trust lots of things and people every day of our lives, including that the food we eat won’t poison us and that the airplanes we fly in won’t crash.)

There are paradigms other than the chain of trust used on the Internet today. For example, the web/network of trust paradigms create a decentralized/distributed model of trust that eliminates single points of failure. (The problem with a chain of trust is that each and every link is a single point of failure, so each link increases a multiplier for the probability of failure instead of increasing a divisor.) Still, solutions using these paradigms are not yet ready for prime time.

Conclusion

In the future, the Internet will have adopted a trustless security paradigm, which will provide a reduction of risk and an increase of transparency, accountability, and audit. We will be trusting in systems and (virtual) policies, based upon mathematically provable and verifiable assurances, and darkness will be illuminated and exposed. Institutions will be replaced by communities, aligned through consensus algorithms. This is happening at higher-levels of the Internet protocol stack, with systems designed to replace how we govern, how we transfer wealth, etc. but has yet to happen at the lower, access-enabling and application transport layers.

Is effective encryption all that we need? Absolutely not. But, the Internet is built on trust systems, and all paradigms of trust depend upon, first, establishing a secure channel via infallible encryption. The integrity of chains not only hinge upon their weakest link, but dependable secure channels are the fundamental physics that holds any of the links together in the first place. Similarly, a sound web/network of trust must be based upon sound maths.

Over at my day job, Magic, we’re actively working on solving these problems by implementing VPN-like functionality and capabilities-based security for the Internet by default.

To learn more and join the conversation on how to build a safer, more performant Internet, check out magic.co or https://github.com/magic-network

But, this is where we are today. Solutions are within reach, and are even known, yet they are waiting for implementers. That’s why you should be afraid to use public Wi-Fi (or any shared network) today. That’s why you still must use the HTTPS Everywhere browser plugin, and use a reputable VPN always. In Part 3 of this series, I will address the long list of all that is needed to surf the web safely at present.