End-to-End Encryption and Messaging Interoperability

Posted by ekr on 07 Apr 2022

The news the the EU will require that messaging companies provide interoperability has gotten a lot of attention, both positive (matrix.org) and negative (Alex Stamos, Alec Muffett, Steve Bellovin), as detailed in this Wired article (see also this ISOC white paper). At a high level, I'm more positive on the idea of interoperability for messaging systems than some others are, but it's certainly not a trivial problem and at least some of the EU timelines seem pretty unreasonable. Read on for more.

Critiques #

At a high level, there seem to be three broad critiques of messaging system interoperability:

It will weaken security, for instance by requiring decryption and re-encryption at system boundaries or by creating confusion about user identities.
It will hold back innovation by forcing messages to be sent using only features that are common to all systems.
It will make abuse (especially spam) worse.

It's useful to keep these in mind throughout the rest of the discussion.

Before covering messaging, however, it's helpful look at an existing system that has had interoperability for a long, where we can see the resulting dynamics: e-mail.

An Interoperable System: E-mail #

E-mail has the opposite problem from messaging: where messaging consists of a number of independent islands of encrypted messaging with no way to talk between them, email is a globally interoperable system that—despite a number of attempts—doesn't have anything like universal encryption.^[1]

E-mail operates on a hub-and-spoke model in which every user is associated with a given mail domain, represented by a domain name (e.g., example.com) as shown below:

Email architecture

Telephone Addressing #

Telephone numbers actually are hierarchically structured but don't map 1-1 with providers.

The basic structure of a phone number is given by the E.164 standard and consists of a country code followed by a subscriber number, with the structure of the subscriber number being defined by the country code. For instance, in the North American Numbering Plan, identified by country code 1, numbers look like: 415.555.1111.

Description	Digits	Example
Numbering plan area (aka area code)	3	`415`
Central office prefix	3	`555`
Line number, denoting subscriber	4	`1111`

I don't know too much about the non-North American setting, so the remainder of this aside is about North America. Until 1984, North American telephony was basically monopolized by the Bell System. In that system, the number hierarchy was geographic, with the area codes and central office prefixes corresponding to geographic regions and specific switches and the line number corresponding to lines on a given switch. However, with the advent of local number competition following the breakup of the Bell System and then mobile telephony, things started to get more complicated.

Initially, central offices were controlled by a single carrier and so the phone number could be used straightforwardly for routing. However, subsequently the US required carriers to provide Local Number Portability, which allowed you to take your number from carrier to carrier. Thus, even if you were originally assigned a number out of Verizon's block, you could "port" it to T-Mobile, which means that this kind of hierarchical routing no longer works. Instead, there's basically a giant—well, not so giant, given that there are only 10 billion possible numbers—database that indicates which carrier has responsibility for each number.

E-mail addresses are hierarchically assigned, which means that if your mail service is example.com, then your address will end in @example.com, as in [email protected]. It's helpful to work through an example here. For instance, here is what happens when Alice ([email protected]) wants to send a message to Bob ([email protected]):

First, she transmits the message to her mail server over a protocol called the Simple Mail Transfer Protocol (SMTP), along with the addressing information for [email protected].
The sending mail server looks up the receiving domain name—in this case gmail.com—in the DNS to get the server associated with it.^[2] It then connects to that server—again over SMTP—and transfers the message, along with the addressing information [email protected].
Assuming that [email protected] is actually a valid user on the receiving server, that server stores the message somewhere (on disk, in a database, whatever) and waits for Bob to come pick it up.
Finally, Bob connects to his mail server (historically over a protocol called Internet Message Access Protocol (IMAP)) and retrieves any new messages.

This structure has a number of important properties:

Addresses #

Because addresses are scoped by the mail domain they are associated with, it's possible to immediately know where a given message should be delivered just by looking at the right-hand side (RHS) of the address, namely the stuff after the @-sign. That tells you which domain an address is associated with. This is in contrast to addresses on most popular services (e.g., Twitter), which are unqualified: if all I have is the identifier ekr____ I don't know if that corresponds to Twitter, Github, or LinkedIn..^[3]

Conversely, the fact that names are hierarchical means that two people can have the same left-hand side (LHS) as long as the RHS is different (and vice versa). So, [email protected] and [email protected] are totally distinct addresses and quite likely belong to different people. This is of course true with Twitter handles and the like, but because they are unqualified, the bare address isn't enough to tell you who is who. This becomes a real issue when you want to import identities from another namespace, for example, when your address for messaging is actually your telephone number.

Finally, it means that the semantics of the LHS are opaque to the other end. For instance, if you had your own mail domain (for instance your-lastname.name) you might have every address that ends in @your-lastname.name delivered into the same mailbox. Another example is that Gmail allows you to create new addresses by adding a plus sign to the end of your actual address, so [email protected] and [email protected] go to the same place. This is a useful trick to let you sort your email by giving different addresses to each sender.

Hosted Domains #

Although mail is scoped by domain, as a practical matter many domains are actually hosted by the same service. For instance, Gmail allows you to host your "custom domain" on Gmail (that is how rtfm.com works), but your address can still have your domain in it rather than gmail.com. It's also possible to have your mail delivered to service A and have most of your accounts there but send mail from service B. This is useful if you want to send bulk email using a service like Mailgun.

Interoperability #

Because SMTP and IMAP are standardized, any mail endpoint can talk to any other mail endpoint. If you own example.com and want to send and receive mail there, all you have to do is stand up a server—or more likely, use an existing hosting server—set up the right DNS records, and you're good to go. Similarly, most mail services will provide IMAP service and so you can use any number of clients (the built in mail client on your Mac, Thunderbird, etc.) to read your mail.

Conversely, nothing says that a mail system has to have a separate client at all. For instance, instead of having people use IMAP to read their email you can just put up a Web front end that accesses it directly and, tada, you have Gmail. Or, as is common, you can both have a Web interface and an IMAP interface. As long as you properly speak SMTP, everything will work fine and the other end doesn't even need to know how you have everything set up; it's just a matter of having the right protocol interfaces. In particular, it doesn't matter to the receiver how the sender talks to their mail server and it doesn't matter to the sender how the receiver talks to their mail server. All that's required is that the servers speak SMTP to each other.

This is in contrast to most messaging systems, which are basically silos that don't interoperate with each other.

Extensibility #

The cost of interoperable protocols is a limited range of format extensibility. The format of the emails is standardized using a format called MIME, and if you send a compliant MIME message the receiver should be able to process it, at least to figure out what the type of the message is.

Identifying the type of the message is only the first step. Suppose that you want to introduce a new mail feature, say memoji in emails. Even if you write a new standard for it and Alice adds it to her email client, what happens if Bob hasn't upgraded? Ideally, the client would get some clear message that something was wrong, and yet would still see the part that was interpretable, but this doesn't always work. Depending on exactly how the new feature is designed, it either might not work properly—for instance, the memoji might be replaced with some unknown character like �— (for a long time, emails from Outlook would render the :) emoji to "J" on non-outlook systems) or the message might just not be readable at all (though hopefully you wouldn't design a feature like that). At the end of the day, this kind of mismatch can create a pretty degraded experience and change the meaning of the message.

The converse of this property however, is that email processing is highly extensible. Because mail formats are open and standardized, any client that speaks the protocol will work. I gave the example of Webmail before, but this also means that if you want to use a mail client which offers some new feature—automatic email summarization say—that's your business. By contrast, most messaging systems are closed and so you're limited to the features supported by the official client.

Security? #

Like many things on the Internet, the e-mail system was designed before modern encryption and so initially everything was in the clear. This allowed for a broad range of attacks:

Anyone on the connection between you and the mail server or between mail servers could read or modify your messages.
Senders weren't authenticated and so it was trivial to forge messages that appeared to come from someone else.
If your mail server was compromised, then it could read your messages in transit or change them.

Some of these issues have been gradually sort-of addressed with partial solutions such as TLS encrypting the traffic between you and the mail server, TLS encrypting the traffic between the mail servers, and server-based signing mechanisms like DKIM. However, they're incompletely applied (for instance, the client-server connection is generally strongly authenticated but the server-server connection often is not) and still don't provide any protection against a malicious or compromised mail server. For that you need end-to-end encryption (E2EE), in which the messages are encrypted (and authenticated) between the sending and receiving endpoints.

There have been quite a few attempts to provide end-to-end encryption for e-mail (PGP, S/MIME, etc.) but I think it's fair to describe them as having largely failed. This isn't to say that there isn't any encrypted mail but it's a fairly small fraction of overall traffic. The reasons for the failure of encrypted email are complicated, but there were a number of deployment problems that most likely contributed.

Key Management #

Like any cryptographic system, encrypted email depends on knowing the cryptographic keys of the people you are talking to. In e-mail, you use keys in two ways:

You sign your messages in order to authenticate them
People who want to send you secure messages need to encrypt them to your key.

It's technically possible to just start sending people messages with unauthenticated keys, for instance by signing all of your messages and expecting people to remember that this is your key (this is often called trust on first use (TOFU)). Once they have received a message from you, they can use your key to encrypt the return message. Obviously, TOFU is susceptible to attack if the that attacker is the first person to send you a message pretending to be someone else, which makes the system less than ideal, especially for interactions with people you don't talk to frequently. If my bank sends me a signed message, then I want to know it's my bank right away. It's also a problem if you want to send an encrypted message to someone you have never talked to before. What you really want is some system that lets you find out what people's keys are, which means solving two problems:

You need to somehow associate your key(s) with your email address.
You need some way to look up people's keys so that you can send them encrypted messages.

Deploying the infrastructure for both of these has proven to be quite challenging. The basic problem is that there was never a good way to automatically issue the credentials. This meant that people had to go to a lot of effort to get credentials, which of course meant that most people didn't get them. On the other side of the equation, there was never really a great way to discover people's credentials, which meant that you couldn't send encrypted email to new people. It's in principle possible to build mechanisms for this (ACME and WebFinger respectively are examples of the kind of thing I'm talking about), but we have the usual deployment network effect problems.

Confusing Semantics #

In addition to the keying problems, the fact that email encryption was added after the fact to an established system has resulted in some confusing semantics.

For example, the major extension point in e-mail is via the message body. As noted above, the bodies use an extensible message format called MIME. However the message subject line isn't extensible. This means that the subject line that appears in the email isn't either encrypted or authenticated. It's of course possible to have an inner subject line inside the encryption envelope, but it's an obvious challenge for users to understand that they can trust the body but not the subject.

Second, because some messages are protected and some are not, you need some way to indicate to the user which are which. This kind of indicator is a notorious source of confusion, especially in a situation where most messages are unprotected, because you don't want a big scary warning for nearly every message. But this also reduces the incentive for people to use secure e-mail, especially to send signed e-mail: if recipients don't notice or care whether messages are signed, then signing them doesn't add a lot of value, as an attacker can just impersonate you with the recipient being none the wiser.

Network Effects #

All of this should be a familiar story to EG readers: you have a situation where it's inconvenient for people to do something—in this case, deploy encryption—and there's not much benefit to doing it. In these cases, you get the expected result which is limited or minimal deployment. By contrast, most modern messaging systems were either built with E2EE from the start or underwent some mass upgrade that enabled it for everyone, rather than relying on people to do it themselves.

Messaging Systems #

Modern messaging systems have addressed these issues by making encryption both mandatory and automatic. This is comparatively easy because the messaging service is (usually) vertically integrated: all—or nearly all—users have clients which are provided by the service operator and can be updated as desired. The service operator also provides message routing and identity. This kind of uniform integrated system has a number of operational advantages:

The service can automatically issue credentials based on the user's account information, thus ensuring that every user has a credential. They can also run a directory which makes it easy for any client to learn the credentials for every other client.
When the service wants to add a new feature it can automatically upgrade everyone's client to support it. This means that they don't need to deal with massive heterogeneity of client functionality for very long, and can eventually just refuse to support older clients.^[4]
Spam and other kinds of abuse are easier to handle because all messages are authenticated by a user in the system. Of course, if you have a single central point where all messages are handled, and no end-to-end encryption, then content filtering is more difficult.

Of course, many of these advantages depend on having a closed system: if a significant fraction of people use third party clients to talk to such a system then you can no longer update the clients whenever you want to, which makes central extensibility much more difficult. In other words, you're trading off user control and extensibility for users for control and extensibility by the system operator. This is in stark contrast to the design of the Web, which is dominated by the principle of end-user control as documented in the HTML Priority of Constituencies and the Mozilla Web Vision.

Another consequence of a closed system is a lack of universal connectivity: with e-mail—or telephony—you can contact anyone no matter which service provider they are on. In fact, you don't even have to think about it: you just e-mail (or dial). Messaging, however, is different: if I want to send a message to someone on WhatsApp, I need to have a WhatsApp account myself. And because people choose different messaging systems, this means that it's now common to have accounts on a variety of messaging systems (I myself use three regular messaging systems, plus countless Slacks).

All of this creates a set of market dynamics dominated by network effects (Metcalfe's Law) and getting big: if you have a lot of users, then people have a strong incentive to join so they can talk to their friends. Conversely, if you are a new entrant into the market it is hard to break in because your early users don't have that many people to talk to. This is probably why we see a lot of regional variation in which apps are popular, because people want to use whatever app their friends use. Unsurprisingly, this produces some fairly lopsided market numbers, with Meta controlling two of the top three messaging platforms (WhatsApp and Facebook Messenger):

Messaging platforms

This brings us to the topic of interoperability: if it were possible for anyone to start a new messenger app that could still talk to WhatsApp and Messenger users, then this would remove a big barrier to entry into the market. I don't want to sound too optimistic here: even in a nominally open system like e-mail, we still see a huge amount of market concentration on the big mail systems like Gmail, Outlook, and Yahoo. This isn't too surprising: it's a lot of work to run a good mail system and so we'd expect well-funded players to dominate. However, it's also quite possible to use one of the smaller services like Fastmail, ProtonMail, or DreamHost or even run your own server, whereas there's really no way to run your own WhatsApp server.

Technical Interoperability for Messenging #

The details of what the DMA will actually require are extraordinarily sketchy; as I understand it they would need to be filled out by some regulatory agency. However, broadly speaking, there seem to be two options for providing interoperability, as laid out by ISOC:

Require services to offer stable APIs.
Require services to actually interoperate over a standardized protocol.

These require a bit of unpacking.

Stable APIs #

The idea behind a stable API is that the service would design and publish interfaces that others could use. There are actually two ways to offer stable APIs:

To clients, allowing someone else's messenger client to work with your service.
To services, allowing someone else's messenger service to gateway messages in and out of your service.

The first of this is actually a familiar concept in instant messaging: because there was never a single standardized protocol, it was fairly common to have messaging clients, such as Trillian, which would speak multiple protocols but provide a unified interface to the user that hid the details. This isn't really a conceptual change in the architecture of the system as it would still be a monolithic identifier space and the clients would still have to conform to whatever rules the service laid out; indeed, some services have open source clients, and so this is already possible for them, though of course third party clients might not get upgraded when the official clients do, potentially resulting in stability problems. The main result would be some decreased flexibility for the service because they would need to get users of the API to update when they wanted to change something that affected interoperability. However, as a practical matter, this probably wouldn't have that much of an impact on interoperability and market concentration because most people will just use the official client, and people who don't will be annoyed when the service changes something and breaks them.

The second version is less familiar, but the idea is presumably that WhatsApp would have some published API that would allow ekrMessage (TM pending!) to gateway messages into and out of WhatsApp. As with e-mail, each side would handle messages according to its own rules, with the gateway just transiting messages between the systems. This comes with two main problems:

How do you handle identities? For instance, if ekrMessage and WhatsApp both use phone numbers for identities, how do you know which messages stay on WhatsApp and which go to ekrMessage?
How do you manage different encryption protocols? Currently, each messenger has their own encryption protocol; while many of these are built along similar lines, they're not necessarily identical. Making this work either requires gatewaying at the provider—thus breaking end-to-end encryption, which is extremely undesirable from a security perspective—or having each client speak multiple encryption protocols, as in the multi-protocol client case.

Of course, this would all be a lot easier if there was some standardized protocol that everyone spoke, as with e-mail. Note: the difference between a stable API and a standardized protocol isn't really technical so much as social and depends on whether there is some standard or just a document published by the service.^[5]

Standardized Protocol #

Having a standardized protocol is not an all-or-nothing proposition: there are actually a number of levels at which one might have standardization, with the other levels potentially not being standardized:

Key establishment and message encryption
Use identity
Message transport
Message contents and features

I go into these in some more detail below.

Key Establishment and Message Encryption #

The basic structure of most messaging encryption systems is that you have an identity (e.g., your phone number) which is tied to a cryptographic key or keys. When Alice and Bob want to exchange messages, there is some protocol that lets them use their keys to establish a pairwise (or groupwise in the case of more than two people) cryptographic key which they then use to encrypt messages.^[6] Obviously, if Alice and Bob don't speak the same protocol, then they will not be able to establish pairwise keys and will not be able to encrypt messages end-to-end, so this is probably the most important place for everyone to use a common protocol.

Fortunately, while there are technical differences between the various protocols in use, they're similar enough that it would probably not be prohibitive for everyone to converge on a common protocol: a number of the existing messenging systems are based on the Signal protocol or one of its variants such such as Proteus or Megolm, and the IETF is currently in the final stages of standardizing a protocol called Messaging Layer Security (MLS) which contains a number of similar concepts but is intended to be more optimized for group communication. It's too soon to know how much adoption MLS will get, but the WG has had participation from a number of messenging services such as Facebook Messenger, Matrix, Wickr, and Wire (full disclosure: I have also been heavily involved in this effort). It would be a big lift for companies to change out their protocols, but, because right now they're noninteroperable silos, it's still technically feasible.

Identity #

As I said above, we need to have some notion of user identity. Identity is used for two purposes:

By the end-user clients (in an end-to-end system) to establish the keys to use to encrypt a message.
By the service to know how to route messages.

Both of these require identifying other people you want to exchange messages with.

iMessage #

iMessage is actually quite an interesting case because the Apple client is actually two clients in one, containing both an SMS client for talking to non-Apple users (the green bubble) and an iMessage client for talking to Apple users (the blue bubble). iMessages are sent over the Internet ("over the top") and are end-to-end encrypted. SMS messages are sent over the phone network and are not. However, both categories of users have the same type of addresses in the form of phone numbers iMessage (which also supports email addresses) and Apple automatically detects the capabilities of the message recipient and sends a message of the appropriate type.

iMessage might be one of the strongest cases for the benefits of interoperability because it already interoperates with Android devices, just in the clear over SMS. If iMessage was forced to interoperate and Android played along, then a large fraction of traffic would suddenly be encrypted.

At a high level, there are two main identity architectures we can have:

Hierarchical naming in which a given identity indicates which service it is attached to, as in e-mail.
A shared namespace in which a given identity could be attached to any service (like phone numbers).

With messaging, the situation is even more complicated because multiple messaging services use the same identifier (e.g., WhatsApp and iMessage both use phone numbers) so that means that even in an interoperable system, we'd need to find some way to manage that case, which seems like a real open question (though of course we already have that problem now when you tell someone "I'm 1.415.555.1111 on WhatsApp", so in the worst case scenario, we could just punt the problem to the user.) We also have the potential problem that alice on system A may be a different person from alice on system B; this shouldn't happen with phone numbers because they are uniquely assigned but it happens all the time with user-chosen handles.

The hierarchical design is obviously easier to manage, but it may be quite hard to retrofit to the existing non-hierarchical system.^[7] One possible approach is to have a hierarchical system under the hood but have UIs present unqualified namespaces, e.g., "Connect with 1.415.555.1111 on WhatsApp" in the UI turns into "Connect with [email protected] at the protocol layer." This is likely to work OK if there are a small number of messaging systems but less well if there are hundreds because the UI gets too cluttered. It's also possible to have a kind of hybrid UI like existing e-mail systems do for there accounts where you have a chooser for the common systems and then people can enter something freeform:

Email account chooser

This brings us to the question of how users learn other users keying material. In a fully distributed/federated world like e-mail, you'd need some sort of analog to the WebPKI in which there was a set of agreed up on roots of trust and those roots then somehow were able to attest to identities in a uniform manner, no matter which messaging service people used. This in contrast to the current situation where each service runs its own disconnected identity service. If there is a totally shared namespace, then this has a lot of the same problems as the WebPKI in which anyone can attest to any name, but if the names are arranged hierarchically—even if that's not visible to the user—then we could potentially dodge some of those problems, as only WhatsApp would be able to attest to names for @whatsapp.com, etc.^[8]

It's also possible that one could do something less universal: if there are only a modest number of messaging services, and you have to make special arrangements to federate between services, then each service could continue to maintain its own identity system and just publish documentation about how it works, forcing the other systems could implement that. The likely outcome here would be that the big gatekeeper systems would each have something and if you wanted to talk to them, you would need to both consume and publish that, which is a burden on the smaller systems, but perhaps a bearable one (the tricky part is when Alice has accounts on WhatsApp and iMessage and wants to talk to someone on ekrMessage: which credentials does she use for the ekrMessage user?).

Message Transport #

Once we have established keys and are sending messages, we still need some way to transport them. There have been attempts to design standardized protocols for this, in particular XMPP and SIMPLE (which is not), but neither has seen the kind of adoption that would make it the obvious choice here.^[9]

As with identity, while it would be convenient to offer something standardized, it's probably not a dealbreaker not to have it, as long as services are required to offer interoperable APIs for message sending and delivery. The good news here is that unlike the cryptographic pieces, those APIs can largely be handled by the messaging service, rather than the client, so my ekrMessage client just needs to know that a given message is destined for someone on WhatsApp and it can route it there.

Message Contents and Features #

All of the above is just concerned with getting messages from point A to point B, but what people actually care about is the messages themselves. In order for messaging to work properly, when the messages finally get to the recipient, they need to be readable, which won't work if (say) system A uses ASCII messages and system B encodes them as images. Moreover, if system B wants to add some new feature, it's a problem if system A doesn't have it (critique 2).

As noted above, this is a sort-of solved problem in e-mail in that you can send MIME-encoded messages that describe their contents. But of course, describing the contents doesn't help if someone sends me a message of type image/avif and I don't know how to parse that. The conventional solution here is to have some common format that it's assumed that everyone can read (in e-mail this is 7-bit ASCII text). The sender then sends two copies of the content bundled in the same message: (1) the "basic" version that everyone should be able to read and (2) the "enhanced" version that only newer clients can read.

This is a workable, if not ideal, solution, but actually it's probably possible to do quite a bit better. The reason is that unlike e-mail, where you send messages to people based solely on their address, in order to send someone an encrypted message you need their key. When people publish their keys then can also publish other capabilities such as the various media types they understand, which gives senders some information about what messages are safe to send (Rohan Mahy has described such a mechanism for MLS.) Unfortunately, it's still possible to get into trouble with larger groups with mixed capabilities, where you probably end up having to send a lowest common denominator version. This isn't ideal for ordinary features, but is potentially more problematic for security features, as discussed below.

As should be clear from the discussion above, any form of interoperability places some limits on the freedom of each service to change their offerings whenever they want. Some of these costs—like using a standardized encryption protocol—are relatively modest, but others may be larger. It's certainly a lot more work to detect the capabilities of every client and carefully craft messages which will work for all of them than it is to just generate messages for one client type which you know works.

Security Implications of Interoperability #

As discussed above, if connecting service A and service B requires some kind of bridge that decrypts and reencrypts messages, then this has a pretty negative impact on security (critique 1). However, it's also possible to have interoperable end-to-end encryption; I would also argue that with sufficient care it's even possible to design an identity infrastructure that doesn't badly weaken the system as a whole. However, that isn't to say that there are no security implications of requiring interoperability.

First, even if you have a common protocol, there may be differences in application semantics. For example, when WhatsApp detects that a recipient has changed their keys and so a message is undecryptable, it automatically re-sends the message. This is a usability feature but is a difference from Signal, which does not automatically re-send—even though they use the same protocol as WhatsApp—because Signal is concerned that the new key might be compromised. This is an application behavior and it's of course harder to frame the security guarantees of a system where there is more than kind of client; in this case, the security decision is made by the sender, but in other cases it might not be.

One case where that's so is that messaging systems support "disappearing messages" which get automatically deleted after a certain time. This is not a cryptographic feature but rather a client side feature and depends on the receiving client complying with the sender's request to delete the message. Obviously, if the remote client doesn't comply, then it's not going to work. I'm less sympathetic to this case because this kind of feature is mostly an example of hope-based security: even in a closed system you have no way of knowing what software is running on the receiver's computer; it could have been hacked or they could have reverse-engineered non-compliant system (the virtue of standards is that they allow for interoperability without reverse engineering). Even if that's not the case, nothing stops them from taking a photo of the screen, or, depending on the system, a screenshot. This seems like a case where the recipient can advertise its capabilities and you just have to trust them.

There might also be new security features that would not end up in whatever new standardized protocol was settled on, such as metadata protection or post-quantum security. This isn't ideal, of course, but standardized protocols do evolve, and it's possible for messaging services to use private protocol extensions for groups that just consist of their users on new clients, so this doesn't seem like a fatal objection.

Probably the most serious problem is spam and abuse (critique 3). As I mentioned earlier, this is a much easier problem if you have relationships with all the users and don't need to accept messages from arbitrary counterparties. End-to-end encryption also presents a problem here because it means you can't do content filtering centrally. I'm not sure how serious this would actually be in practice: a lot of what makes email spam work is that you have to accept email from non-contacts, which is somewhat less of an issue in messaging systems, but this still seems like a problem that needs more work.

Critique Recap #

It's probably useful to recap the critiques from the beginning of this post. I don't think they are entirely without merit, but I also believe that interoperability would have real benefits that need to be weighed against these concerns.

Interoperability will weaken security #

It's certainly true that there are ways to implement interoperability which would have a very negative impact on security. However, as I argue above, I think it's also possible to implement interoperability in ways which would minimize those impacts, in particularly by maintaining end-to-end encryption across system boundaries. Clearly, the resulting system would be more complex, which is bad for security, but having a common system would provide a single target for analysis and improvement, which is good.

It's also important to look at the non-technical picture here: right now users largely choose their messaging systems based on who they want to talk to and get whatever security properties those systems have. Interoperability would allow people to choose systems based on security properties—for instance that they have key transparency and reproducible builds—while still talking to people who have made other choices. Of course, those mixed conversations tend to have the security properties of the weaker system, but at least it would be easy to also talk to people who had made stronger choices. In addition, we see many cases today where people use back to unencrypted channels in order to interoperate (e.g., iMessage falling back to SMS), which would be improved by end-to-end interoperability.

Interoperability will hold back innovation #

Here too, the situation is complicated. On the one hand, it's clearly true that messaging services would be less free to innovate than if they were totally vertically integrated (although they would still retain substantial freedom). On the other hand, there would be more room for innovation on the clients themselves, something which is currently very difficult. It's worth noting that the Web is one giant mostly interoperable system which is still experiencing plenty of innovation, so I don't think it's a foregone conclusion that interoperable systems can't innovate; you just need mechanisms to manage compatibility and change.

Interoperability will make abuse worse #

It does seem likely that interoperability will make abuse worse: if you have to accept messages from basically anyone then reputation and similar systems become harder, and e-mail abuse (especially spam) is a serious problem. However, we already see abuse even in monolithic systems, so it's also clear that being closed isn't a panacea. Moreover, messaging is fundamentally different from e-mail in a number of important ways (we'll have authentication from the start, which was a huge problem in e-mail, there is much less expectation that you'll just accept messages from anyone, etc.) so it's not clear how much worse interoperability will make things.

Final Thoughts #

As the extremely long writeup above should indicate, this is far from an easy problem. We have a giant installed base of software that doesn't interoperate and changing that would be difficult even if the big players wanted to. Famously, Facebook has been trying to get Messenger and WhatsApp to interoperate in an end-to-end secure fashion for years, and it seems likely that they're going to be a lot less excited about interoperating with others. However, that's separate question from whether it's actually technically possible to do, which, as the analysis above suggests, I think it is. With that said, this is also a much harder problem than the EU guidelines seem to contemplate: for instance, they require that basic 1-1 messaging be available within three months, and group messaging within two years. Given that the MLS standardization process is just about complete after four years, two years seems pretty aggressive, and three months seems fairly implausible.

Note that email frequently has transport encryption where messages are encrypted between users and mail servers and between mail servers, but they are generally in the clear on the mail server. ↩︎
What it looks up is mail exchanger (MX) record. ↩︎
And Alice doesn't even need to know that much. For instance, if Gmail suddenly decided to support domains rooted in the blockchain, this would just work transparently for Alice, because only Gmail needs to know which server handles example.eth. ↩︎
Of course, users don't always upgrade instantaneously, so it's possible to have some heterogeneity, but it's typically fairly short term, especially because the service provider can force you to update to continue using the service. ↩︎
Note: The difference between "APIs" and "protocols" is largely a matter of terminology: protocols are just the rules for what go over the network, but things that run over HTTP are often called "APIs". ↩︎
In many protocols, that pairwise key is itself changed ("ratcheted") frequently. ↩︎
As an aside, am I just the only person who thinks that the proliferation of these non-hierarchical namespaces is a huge regression? I'd much rather be [email protected] everywhere than ekr on Github and ekr____ on Twitter. ↩︎
There are also questions about key transparency and the like, but they're largely downstream of these bigger architectural questions. ↩︎
Google chat used to offer an XMPP interface but no longer does. ↩︎

Educated Guesswork