Educated Guesswork

What's with the www prefix in www.example.com?

You might have noticed that it's common for sites to have a domain name like www.example.com and a URL like https://www.example.com. You might wonder what the www is doing here. You're most likely loading this from a Web browser, so surely the browser knows you're on the Web. Why does it need the www prefix? The answer, like many things on the Internet, is that it was the quickest way to get to a result without having to change anything and now we're at a local minimum which is hard to change.

Protocol Separation #

In the early days of the Internet, it seemed like sites would be running a number of user-facing services (email, Web, gopher, NNTP, etc.) It quickly became apparent that even though it was technically possible to multiplex them on different TCP ports, you didn't actually want to run them all on the same machine, for several reasons.

First, you may not want them to be managed by the same person. The bigger your system gets, the more you want division of labor, and, for instance, you might not want your mail administrator to have access to your Web server.[1] Second, you might want to use multiple machines to manage load, initially by separating each service onto its own machine and then potentially later by having multiple Web servers. Load is generally more of an issue for Web than it is for other services, principally because it's possible to get flash crowds that suddenly dramatically increase the load on your Web server. For obvious reasons, you don't want a flash crowd that slows your Web server to a crawl to also bring down your mail server, which you may be using to coordinate fixing your Web server.

Unfortunately, in those early days, the DNS had no way to say that if you had the name example.com you should connect to machine A for Web and machine B for NNTP. Recall from an earlier post that a domain name is just an index into a distributed database, with the primary value in the database being the IP address associated with the name. This means that Web and NNTP for example.com have to point to the same IP address and hence the same machine. As you have probably guessed by now, the solution is to give each service a different domain name, e.g., www.example.com for Web, nntp.example.com for NNTP, etc. This allows you to configure a separate machine for each service with its own IP address. This also allows them to be in totally different data centers or even operated by different hosting providers.

Interestingly, it was possible to say that you should deliver mail for (say) example.com to mail.mailserver.example via something called an MX record; this allowed someone else to run a mailserver on your behalf. However, there was no generic mechanism to do so for other protocols. There are now several such mechanisms, starting with the the SRV record and now including the HTTPS record. However, the SRV record never got wide deployment—to the best of my knowledge, no browser supports it—and the HTTPS record is new. The problem with deploying any such record is that there are a significant number of browsers which don't support it, so if you want to steer Web traffic and other traffic to different places, you need to keep doing www.

CNAME and the Apex Zone #

Of course, at this point, there are mostly only two domain names that users regularly come into: email (e.g., ekr@example.com) and Web (https://example.com). As I mentioned above, it is possible to run email and Web on different machines without the www prefix. So, why does the prefix persist?

In part this is just inertia, but it's also partly a result of another shortcoming of the DNS which is that it's not possible to have a CNAME at the apex of a zone. Suppose that I want to have my web site hosted by cdn.example. The natural way to do this is with a CNAME record, which is basically an indication that the real (canonical) name of a domain is what's in the record. So, for instance, consider the following CNAME record:

www.example.com -> www.example.com.cdn.example

This would tell anyone that if they wanted to know about www.example.com they should go look up the records for www.example.com.cdn.example. This works well because it means I don't need to know anything about how the CDN's network is laid out or what IP addresses they have for their machines. I just set up the CNAME and then the CDN can have the name resolve to whatever IP address(es) they want. This allows them, for instance, to provide different answers based on load or where clients are geographically.[2] You can also use a CNAME to point to a service like Cedexis (now Citrix) which will steer traffic to different CDNs depending on network conditions. Unfortunately, while you can use a CNAME for www.example.com, you can't use it for example.com. The reason is that a CNAME is an all or nothing proposition: it means "look over here for every record" and because you also need to have NS records (as well as probably MX records) for the example.com, if you CNAME example.com and you just said "look over here for the name server for example.com, now you've created a circular dependency because how do people look up the name server (the NS record) that they need to look up the CNAME?

The result of all this is if you you want to host your Web site on a CDN and you want it to have a www (or some other) prefix, you have two main choices:

  1. Host your own DNS and populate your records with the CDN's IP address (this is what I do).
  2. Have the CDN host your DNS, so that they can then resolve the actual IP address however they please.

Neither of these is ideal. If you host your own DNS, you have more control but it's brittle because the CDN has to maintain a stable IP for your domain. If they decide to move things then your site breaks. It also means they can't do DNS-based load distribution.

It's generally a better idea in this case to have the CDN host your DNS, as then they can control how any given name resolve. Of course, if they don't also host your email, you'll need to populate the domain with MX records for your email server, but most anyone who hosts DNS will allow this. Of course, this is only a partial solution because as far as I can tell you still can't use a traffic management service to steer between CDNs. As I understand that, if you want to do that, you need to have some prefix (like www.) in front of your domain.

One way to try to split the difference here is to serve a page on example.com but then have most of your content on cdn.example.com, which can be load balanced invisibly. You can also redirect users from example.com to www.example.com, which isn't as invisible but lets you load balance even more because (1) the redirect is a short message and (2) you can tell the browser to remember the redirection, thus saving the trip to example.com in the future.

One more thing: because the the HTTPS record is needed for Encrypted Client Hello we should expect to see browsers support it for that reason, and so there should eventually be a fair amount of HTTPS record support, though it won't be universal. Sites will then be able to use a HTTPS record to steer modern browsers (those that support HTTPS) to something that can be load balanced. Of course, older browsers will just go to whatever non-load balanced site example.com is served off of but that will be an increasingly small fraction, so you'll still get a fair amount of value.

Final Thoughts #

The lesson here is the same as for most features on the Internet: if you want people to deploy something, then it has to be incrementally deployable and provide value with low levels of deployment. If your solution doesn't have this, then people will find some solution that does. And that, kids, is why we have www.example.com.


  1. Of course, having access to your mail server is often enough to get a certificate for your Web server, but we just won't talk about that. ↩︎

  2. Though it's also reasonably common to use anycast for this purpose, in which case there will just be one IP address and BGP will be used for this kind of traffic management. ↩︎

Keep Reading