On the Security and Privacy Properties of Public WiFi

You really shouldn't have to trust the network you're on, but you kind of do

Posted by ekr on 25 Sep 2022

One of the most common security and privacy questions I get is whether it's safe to use public WiFi networks (and whether you should use a VPN). The answer is "it depends", for the reasons I lay out below. If you want to skip the rest of this, I'll tell you that I mostly just use airport and hotel WiFi but am more hesitant about it if I have to log in with my own identity.

"Safe" is a difficult word that covers a lot of territory. At a high level, there three main threats one might be concerned about in this context:

Compromise of your device (information security)
Compromise of the data you are transmitting over the network (communications security)
Monitoring of your use of the network (privacy)

Let's take these in turn.

Compromise of your device #

Often the first thing people worry about is that the network will be malicious and will subvert your device via some vulnerability in the browser, the operating system, etc. I'm certainly not going to tell you that this isn't possible (all software has defects, and some of them will be vulnerabilities) but vendors go to a lot of effort to find and fix these vulnerabilities, so it's also not a trivial matter to find them and they're quite valuable. As a concrete example, at this year's Pwn2Own competition, a full compromise of an iPhone 13 or a Pixel 6 was worth $200,000 USD, and an extra $50K if you got kernel access.

This is not to say that modern devices are somehow impregnable, but rather that it's relatively unlikely that an attacker is going to use a zero-day (i.e., undiscovered) vulnerability to attack random people at an airport Starbucks. Major OS vendors (both desktop and mobile) and major browser vendors are pretty good about quickly fixing vulnerabilities, so if you are running an up to date browser and an up to date OS, you should be relatively safe.

Moreover, even if your local network is safe,^[1] you still have to worry about compromise by other network actors, such as the Web sites you visit. Generally, if your browser and device aren't secure against network attack, you should be pretty concerned about your safety whatever the status of your local network.

Note: this advice does not apply if you are someone who is especially likely to be attacked by a powerful attacker, such as a state-level actor. If you are an activist or a dissident, you need a totally different level of operational security that probably involves having several machines.

Compromise of your communications #

HTTP, HTTPS, TLS, and QUIC #

Historically, Web encryption used the HTTP protocol, which ran over a channel provided by TCP. When run securely, it was layered over TLS, which sits between HTTP and TCP and provides a secure channel, with the result being called "HTTPS" (for HTTP Secure). The server indicates to the client that a given URL was to be retrieved via HTTPS by giving it a URL starting with https: rather than http:. Recently, the IETF has standardized a new version of HTTP (called HTTP/3) which runs over a network protocol called QUIC rather than TCP. QUIC uses the TLS 1.3 cryptographic handshake and TLS-like encryption, so HTTP/3 provides a similar set of security properties to earlier versions of HTTP over TLS. It still uses https: URLs, and so it's convenient to just call it all HTTPS, even though the protocol is different.

The second potential area of concern is compromise of your communications. The basic situation here is quite simple: The operator of the WiFi network can inspect and or modify every packet you send, so they get to see anything that's not encrypted. This actually applies to any network you use, not just WiFi networks.

When it comes to Web traffic, the news is generally pretty good: a very large fraction of Web sites are encrypted using either TLS or QUIC. These protocols were designed under the assumption that the attacker has full control of the network, and so provide security even if you are on a malicious WiFi network. In general, as long as you are on an encrypted Web site, you should not need to worry about your passwords, credit card numbers, etc. And if you're not an encrypted Web site, then you probably shouldn't do anything even if you are on a trusted WiFi network because you have to worry about attackers elsewhere on the Internet between you and the site.

It's a little hard to get a precise estimate of the fraction of traffic that is HTTPS;^[2] below I show measurements from Chrome and Firefox respectively, with Chrome showing rather more use of HTTPS than Firefox does. It's still not clear what the source of the difference is, but in any case the pattern is the same, which is that most traffic is encrypted, especially in the US, and it's gradually increasing.

Chrome HTTPS Stats [Chrome HTTPS data]

Firefox HTTPS Stats [Firefox HTTPS data]

The situation is somewhat worse for mobile apps. In a Web site, the client-side implementation of encryption is located in the browser, so the site only needs to configure their own server correctly—which is fairly standardized, especially if you use a hosting provider which has built in HTTPS support—and then send the client https: URLs. By contrast, mobile apps have to arrange for their own transport security. Historically this has led to a lot of apps not doing encryption at all or doing it in an insecure fashion. The latest work on this appears to be from Oltrogge, Huaman, Amft, Acar, and Backes in 2021, which reports a significant number of vulnerable Android apps, despite attempts from Google to prevent this.

Obviously, it's dangerous to use an app that doesn't implement encryption securely on an untrusted network. A VPN can sort of help here in that it prevents you from attack by the local network. However, this is only a partial solution: even if the last mile is secure there are hundreds to thousands of miles of network between you and the server; if the app doesn't implement encryption correctly, then you are vulnerable to attack anywhere along that path. In general, what you want is for your apps—and web sites—to encrypt their traffic.

Monitoring of your use of the network #

The really serious problem here is privacy. While HTTPS does a good job of protecting your actual Web traffic, such as passwords, credit card numbers, etc., it does not effectively conceal the sites you are going to.

Routes for Browsing Behavior Leakage #

There are four main avenues for this leakage (collectively called "metadata"). In order of when they are available to the attacker, they are:

The DNS resolution of the server
The IP address of the server
The TLS server name indication (SNI) field.
Traffic analysis from the pattern of data (message sizes, timing, etc.) sent and received

Taking these in turn...

DNS Resolution #

Typically the URL that the client starts with has a domain name in it, such as https://www.example.com/. Before the client can connect to the server it needs to know the server's IP address (the numeric address of the server). The client uses the Domain Name Service (DNS) to resolve the name into an IP address. Historically, the local network has provided the DNS server that the client uses to resolve the name. The result is that the local network learns the name of every server you are going to, with obviously negative implications on privacy. Note that it does not learn which pages on the site you are visiting, just the site names themselves.

In the United States and some other countries, Firefox has deployed a feature called DNS over HTTPS Trusted Recursive Resolver (DoH TRR), which encrypts the DNS traffic and sends it to a separate server with defined privacy policies; this prevents the local network from learning the sites you are going to via your DNS queries. On other browsers, however, you generally are leaking your DNS traffic to the network.

IPv4 and IPv6 #

The original version of IP, IPv4, had 32-bit addresses, for a maximum of about 4 billion total addresses. For obvious reasons, this isn't enough for every device on the Internet. In 1995, the IETF standardized IPv6, which has 128-bit addresses. However, IPv6 deployment has been, extremely slow. For example, over 25 years later, less than half of Google usage is over IPv6. In the meantime, people have developed a number of mechanisms for sharing IPv4 addresses, including NAT on the client side and virtual hosting on the server side. While these may not be the cleanest designs from an architectural perspective, they actually act to improve privacy by grouping together traffic that would otherwise be separable by IP.

IP Address #

The second major mechanism by which your browsing history leaks to the local network is via the server's IP address. This is a signal of variable quality. Big sites like Amazon or Google run their own servers and so they also have distinct IP addresses: in these cases it's easy to tell which site you are visiting, just by looking to see who operates the IP address in question.

Smaller sites, however, often operate on shared infrastructure, whether via shared hosting, or behind content distribution networks (CDNs), with more than one site on a single IP address. In this case, the IP address only allows you to narrow down the site to the set of all sites on the same IP address, which can be quite a large number of sites, especially with a big CDN.

Server Name Indication #

This kind of shared hosting is convenient operationally but presents a problem for TLS. When a TLS client connects to a server, the server needs to provide a certificate proving that it owns the site (the domain name) that the client is trying to connect to. If there is just one site on a single IP, then the server can provide the corresponding certificate, but if there are many such sites, then the server needs to know which certificate to present.

When TLS was originally deployed (back when it was called "SSL"), this was a real problem and each server needed its own IP address; this was eventually addressed by adding a TLS extension called Server Name Indication (SNI), in which the client provides the name of the server it is trying to connect to. The SNI is not encrypted and so a network observer can just read it off the wire and learn which site the client is trying to connect to.^[3] As with DNS or the IP address, this just leaks the server's name, not the pages on the site you are going to.

The TLS community has of course known more or less since the beginning that SNI was a privacy problem. In versions of TLS prior to TLS 1.3 the handshake—including the server's certificate—was largely unencrypted, so this didn't seem like as big a deal, because the certificate also leaked this information, but TLS 1.3 encrypts most of the handshake, and so SNI became the last major privacy leak in TLS proper. In the beginning of the TLS 1.3 design process, a number of attempts were made to design a solution for encrypting the SNI, but it turned out to be a really hard problem and ultimately it didn't make it into the final specification. However, the TLS working group is now working on a specification for Encrypted Client Hello (ECH), which will protect the SNI under some circumstances. ECH is not yet widely deployed, but hopefully we'll start to see more deployment relatively soon.

Traffic Analysis #

The final privacy leak is via traffic analysis, which is the generic term for measuring the traffic patterns of the connection, such as the size of the messages being sent, their timing, etc. This turns out to reveal quite a bit about the sites people are going to. Goldberg, Wang, and Wood provide a good overview of the research in this area. There has been some work on adding countermeasures to TLS or HTTP to prevent this kind of traffic analysis, but the problem isn't that well understood and so far at least, there aren't any agreed upon defenses.

The good news is that traffic analysis is a lot harder than it looks—though Cisco actually sells a product that does some of this. If we were to close the other routes, it would be a pretty substantial privacy improvement.

Privacy Implications #

The upshot of all this is that whoever operates the local network gets to learn quite a bit about the behavior of people on the network. This is true whether they are a public WiFi network or your internet service or mobile provider. [Clarified — 2022-09-24]. Specifically, they get to learn:

The identities of the Web sites you visit (just by looking at the connections).
Many of the apps on your device, because they "phone home" to some server.

That's a lot and is quite likely to include information that many people would consider sensitive. The example I usually give is that you might be visiting some medical site, but there is plenty of other sensitive behavior that people engage in that they don't want others to know about, such as visiting dating sites or watching porn.

The actual privacy impact of this depends a lot on the nature of the network, however. Specifically:

Are you identifiable?
Are the network operator or the people on the network actually bothering to record your behavior?

The answer to the first of these questions is something you can mostly figure out for yourself: how many other people are on the network? Did you have to log in? Was there a shared password? For example, if you are in an airport with shared WiFi and either no password or a simple captive portal where you don't identify yourself, then it's going to be fairly hard to attribute your behavior to you (though the network can generally create a profile corresponding to all the sites you visit).^[4] Note that the apps on your phone may provide a somewhat unique fingerprint that could at least in theory be shared between operators, though I don't know if this really happens.

Encryption and Wireless Networks #

It's very common for wireless networks to be encrypted, but this provides surprisingly weak security. The basic problem is that the encryption only prevents people who are not on the network from seeing the traffic. For consumer and public access points WiFi Protected Access (WPA-2) usually is operated in a pre-shared key mode where the encryption keys for the network are derived from the password via a handshake performed when each device joins. This means that anyone who (1) has the password and (2) is able to observe you joining is able to see all of your traffic. Moreover, because the passwords are usually quite weak, it is often possible to just brute force them. There has been work on a public-key based system (labeled "forward secrecy") that would prevent these forms of attack, but it is not widely deployed and appears to have other flaws.

On the other hand, if you had to provide your identity (pro tip: a lot of captive portals don't check the e-mail address you provide them), then the network operator can link the history of your sites to you. So this means that situations where you have a user-specific password or need to actually log in have a much worse privacy situation. Note that in an environment like a hotel where the operator knows where you are, then this is probably enough to identify your traffic even if there is no password or a shared password.

The actual impact of course depends on whether the network operator or other people on the network are actually spying on you. Of course, there's no real way to tell whether they are or not; even if the network has a privacy policy which says that they don't monitor your behavior you can't really tell if they are doing so or not. Moreover, on wireless networks it's generally the case that other users of the same network can observe your behavior—though they probably won't know the information you used to log in—so even if the network operator has a good privacy policy you still have to worry about other people.

This brings us to the topic of VPNs: if you use a VPN then this will (mostly) prevent local attackers from seeing the sites you are connecting to, which is good, but it's a tradeoff because it also provides a neatly labeled traffic set to the VPN operator of which sites you are going to, together with your identity (because you logged in to the VPN), so you're really trusting the VPN operator to protect your privacy. On balance, if you use a reputable VPN service see Consumer Reports' VPN report, then this likely provides better privacy than just using an untrusted local network, but it's important to remember that ultimately there are only policy and not technical controls on what the VPN operator can do. Note that a multi-hop system like Tor or iCloud private relay doesn't have this property, because there is no single entity who can de-anonymize both you and your traffic.

Closing Thoughts #

People who work in communications security like to talk about the Internet threat model in which the network is maximally malicious. This is often phrased as "you give the packets to the attacker to deliver".^[5] The idea is that protocols need to be designed to be secure even in this very difficult setting. If you succeed, then it doesn't matter what network conditions you're running in, and questions like "is public WiFi safe" would be irrelevant. Unfortunately, while there has been a lot of progress in designing and deploying security protocols such as TLS—to a lesser extent in building secure software—the privacy properties of these protocols leave a lot to be desired. The result is that it actually is important to ask whether you can trust the network to handle your data in the way you would like. The idea behind privacy enhancing technologies like DoH, ECH, and proxying/VPNs is that they replace this trust with technical mechanisms that prevent attack even if the network is malicious, but we're not there yet, and in the meantime, you still need to ask how much you trust the network with knowledge of your activity.

This is actually a lot less likely than you think because consumer networking gear is famously insecure, so it's reasonably likely that your average home network has actually been compromised. ↩︎
It's actually even slightly hard to define what you mean. For instance, should measure page loads or HTTP transactions, or... ↩︎
SNI is not a perfect signal from the attacker's perspective because HTTP allows the client to coalesce traffic to multiple servers on the same connection as long as they share a certificate. For instance, if the server has a certificate for mail.example.com and calendar.example.com and the client connects to mail.example.com, it can then send traffic destined for calendar.example.com without creating a new TLS connection. This makes the problem of learning which site the client is connecting to slightly harder, but as a practical matter, there are plenty of non-coalesced connections and even when they are coalesced they may be associated with the same server operator, so SNI is a pretty good signal. ↩︎
There is research by Bird, Segall, and Lopatka indicating that browsing history can be used for reidentification, so this is not a perfect case of hiding in the crowd but it would require a fair amount of work to identify you. ↩︎
I've heard Steve Bellovin say this, but I think he may have been quoting. ↩︎

Educated Guesswork