What's with the www prefix in www.example.com?
Posted by ekr on 28 Mar 2022
You might have noticed that it's common for sites to have a domain
name like www.example.com
and a URL like
https://www.example.com
. You might wonder what the
www
is doing here. You're most likely loading this from a Web browser,
so surely the browser knows you're on the Web. Why does it
need the www
prefix? The answer, like many things on the
Internet, is that it was the quickest way to get to a
result without having to change anything and now we're at
a local minimum which is hard to change.
Protocol Separation #
In the early days of the Internet, it seemed like sites would be running a number of user-facing services (email, Web, gopher, NNTP, etc.) It quickly became apparent that even though it was technically possible to multiplex them on different TCP ports, you didn't actually want to run them all on the same machine, for several reasons.
First, you may not want them to be managed by the same person. The bigger your system gets, the more you want division of labor, and, for instance, you might not want your mail administrator to have access to your Web server.[1] Second, you might want to use multiple machines to manage load, initially by separating each service onto its own machine and then potentially later by having multiple Web servers. Load is generally more of an issue for Web than it is for other services, principally because it's possible to get flash crowds that suddenly dramatically increase the load on your Web server. For obvious reasons, you don't want a flash crowd that slows your Web server to a crawl to also bring down your mail server, which you may be using to coordinate fixing your Web server.
Unfortunately, in those early days, the DNS had no way to say that if you had
the name example.com
you should connect to machine A for Web and
machine B for NNTP. Recall from an earlier
post that a domain
name is just an index into a distributed database, with the primary
value in the database being the IP address associated with the name.
This means that Web and NNTP for example.com
have to point to
the same IP address and hence the same machine. As you have
probably guessed by now, the solution is to give each service
a different domain name, e.g., www.example.com
for Web,
nntp.example.com
for NNTP, etc. This allows you to configure
a separate machine for each service with its own IP address.
This also allows them to
be in totally different data centers or even operated by
different hosting providers.
Interestingly, it was possible to say that you should deliver mail
for (say) example.com
to mail.mailserver.example
via
something called an MX
record;
this allowed someone else to run a mailserver on your behalf. However,
there was no generic mechanism to do so for other protocols.
There are now several such mechanisms, starting with the
the SRV record
and now including the HTTPS record.
However, the SRV record never got wide deployment—to the best of my
knowledge, no browser supports it—and the HTTPS record is new.
The problem with deploying any such record is that there are a significant
number of browsers which don't support it, so if you want to
steer Web traffic and other traffic to different places, you need
to keep doing www
.
CNAME and the Apex Zone #
Of course, at this point, there are mostly only two domain
names that users regularly come into:
email (e.g., ekr@example.com
) and Web (https://example.com
).
As I mentioned above, it is possible to run email and
Web on different machines without the www
prefix. So, why
does the prefix persist?
In part this is just inertia, but it's also partly a result of another
shortcoming of the DNS which is that it's not possible to have a
CNAME at the apex of a zone.
Suppose that I want to have my web site hosted by cdn.example
. The
natural way to do this is with a CNAME record, which is basically
an indication that the real (canonical) name of a domain is what's
in the record. So, for instance, consider the following CNAME record:
www.example.com -> www.example.com.cdn.example
This would tell anyone that if they wanted to know about
www.example.com
they should go look up the records for
www.example.com.cdn.example
. This works well because
it means I don't need to know anything about how the CDN's
network is laid out or what IP addresses they have for their
machines. I just set up the CNAME and then the CDN can
have the name resolve to whatever IP address(es) they want.
This allows them, for instance, to provide different answers
based on load or where clients are geographically.[2]
You can also use a CNAME to point to a service
like Cedexis (now Citrix)
which will steer traffic to different CDNs depending on network
conditions.
Unfortunately, while you can use a CNAME for www.example.com
,
you can't use it for example.com
. The reason is that a CNAME
is an all or nothing proposition: it means "look over here for
every record" and because you also need to
have NS records (as well as probably MX records) for the example.com
,
if you CNAME example.com
and you just said "look over here for the
name server for example.com
, now you've created a circular
dependency because how do people look up the name server (the NS record)
that they need to look up the CNAME?
The result of all this is if you you want to host your Web site on a
CDN and you want it to have a www
(or some other) prefix, you
have two main choices:
- Host your own DNS and populate your records with the CDN's IP address (this is what I do).
- Have the CDN host your DNS, so that they can then resolve the actual IP address however they please.
Neither of these is ideal. If you host your own DNS, you have more control but it's brittle because the CDN has to maintain a stable IP for your domain. If they decide to move things then your site breaks. It also means they can't do DNS-based load distribution.
It's generally a better idea in this case to have the CDN host your
DNS, as then they can control how any given name resolve. Of course,
if they don't also host your email, you'll need to populate the domain
with MX records for your email server, but most anyone who hosts DNS
will allow this. Of course, this is only a partial solution because as
far as I can tell you still can't use a traffic management service to
steer between CDNs. As I understand that, if you want to do that, you
need to have some prefix (like www.
) in front of your domain.
One way to try to split the difference here is to serve a page
on example.com
but then have most of your content on
cdn.example.com
, which can be load balanced invisibly. You can
also redirect users from example.com
to www.example.com,
which isn't as invisible but lets you load balance even more
because (1) the redirect is a short message and (2) you can
tell the browser to remember the redirection, thus saving
the trip to example.com
in the future.
One more thing: because the the HTTPS record is needed for Encrypted
Client
Hello
we should expect to see browsers support it for that reason, and so
there should eventually be a fair amount of HTTPS record support,
though it won't be universal.
Sites will then be able to use a HTTPS record to steer modern
browsers (those that support HTTPS) to something that can be load balanced.
Of course, older browsers will just go to whatever non-load balanced
site example.com
is served off of
but that will be an increasingly small fraction, so you'll
still get a fair amount of value.
Final Thoughts #
The lesson here is the same as for most features on the Internet: if
you want people to deploy something, then it has to be incrementally
deployable and provide value with low levels of deployment. If your
solution doesn't have this, then people will find some solution that
does. And that, kids, is why we have www.example.com
.
Of course, having access to your mail server is often enough to get a certificate for your Web server, but we just won't talk about that. ↩︎
Though it's also reasonably common to use anycast for this purpose, in which case there will just be one IP address and BGP will be used for this kind of traffic management. ↩︎