Discovery Mechanisms for Messaging and Calling Interoperability

New phone, who dis?

Posted by ekr on 04 Aug 2022

As I discussed in an earlier post, it looks like the EU [corrected an embarassing typo that had this as UK -- EKR] Digital Markets Act (DMA) is going to require interoperability between messaging systems. That previous post focused on how to establishing end-to-end encryption between messaging systems. In this post I want to talk about the problem of discovering which messaging system someone is on.

Identifier Portability #

Many messaging systems bootstrap off of existing identifiers in the form of of phone numbers (jargon: "E.164 number"). Phone numbers are structured, which means that when you place a call over the Public Switched Telephone Network (PSTN) it incrementally routes the call via the country, area code, etc., but from the perspective of a messaging system, they are opaque and unstructured, which is to say that the identifier +1.415.555.0123 might be for a user who is on iMessage, WhatsApp, or even both. If all I have is someone's phone number, how do I know which service to reach them on?

Phone numbers as a shared namespace #

Phone numbers weren't originally designed to be a single namespace that was shared between carriers, but rather as a single namespace to be used by a single carrier, the Bell System (motto: "One Policy, One System, Universal Service"). Even then, numbers were structured, but the structure represented the topology of the system so that you could incrementally route calls. For instance, you could use the area code to direct traffic to the right region followed by the local office code to direct it to the right switch, and then down to the right subscriber line.

When the Bell System was broken up the breakup was done along geographic lines into what were called Regional Bell Operating Companies (RBOCs). Because the topology of the system was also roughly geographic—unlike, say, the Internet, where number prefixes do not really correspond to geographic regions—you could at least roughly align the RBOC boundaries with the number structure. However, subsequently jurisdictions started to require Local Number Portability, which allowed you to take your number from carrier to carrier. Thus, even if you were originally assigned a number out of Verizon's block, you could "port" it to T-Mobile, with the result that you have a shared namespace.

One possibility would be to simply sidestep this question by having identifiers be scoped, either by having people say "connect with me on WhatsApp at 1.415.555.0123" or by just adding an explicit scoping parameter, so your address is 1.415.555.0123@whatsapp.com (see here for more on this). This is how e-mail works and isn't the worst thing in the world, but does make it more complicated to contact someone else if all you have is their number, as well as making things confusing if they change their preferred app. By contrast, phone numbers are portable across carriers, which is to say that if you move from T-Mobile to Verizon you get to keep your phone number, and I don't need to know what carrier you have in order to call you: I just enter the phone number. This is implemented by having a giant—well, not really that giant, as the entire US number space is less than 10 billion numbers and so basically fits on a USB stick^[1]—database that knows which carrier is responsible for each number. When you want to call someone, your carrier checks this database (technical term: "dip") to see where to route the call.

So, what if you want to have this same property for instant messaging or video calling systems? This actually turns out to be surprisingly complicated.

Phone Number-Based Addressing for Single Applications #

Before trying to solve the problem of routing between applications who use phone number-based addresses, it's useful to look at the simpler problem of a single application that uses phone numbers as addresses (e.g., WhatsApp). Instead of using the number portability database, which doesn't really have the information you need here, these devices bootstrap authentication off of SMS.

How does the PSTN authenticate you? #

You might be wondering how the PSTN knows which number is associated with a given device. Back in the days of landline phones, the answer was simple: each subscriber had their own literal line. I.e., there was a separate pair of copper wires that went from the central office to the subscriber's house and the switch knew which pair of wires went with each number.

Obviously this doesn't work with mobile phones. Instead, each phone has its own cryptographic key which it uses to authenticate to the network. When your number is assigned to you, that key is then associated with the number in the carrier's database. In modern phones, that key is generally stored in a Subscriber Interface Module (SIM), which is a small chip embedded in a plastic card:

[From Wikipedia]

The SIM card is actually what gives your phone its identity, and if you swap SIM cards between devices, you will also swap their numbers.

The app prompts you for a password and your phone number.
The service then sends you an SMS message with a random code.
You enter that code into the app's user interface.

This demonstrates that you can receive messages at the indicated phone number.^[2]

This authentication mechanism relies on the assumption that the PSTN correctly routes messages to the right location and that nobody else can read them. When you think about it, this is actually a bit of an odd assumption to make at the time you are installing a messaging application that offers stronger security than SMS, but that's actually a surprisingly common scenario: certificate issuance on the Web relies on the weak security properties provided by unencrypted DNS to bootstrap up to TLS, after which the DNS no longer needs to be trusted.

The general concept here is that you only trust the weaker system once to form the initial association and from then on you have strong continuity of authentication (in some systems, this is known as Trust On First Use (TOFU)). In both cases, you can build supplementary mechanisms like Certificate Certificate Transparency or Key Transparency to detect mississuance.

One natural question to ask is why the app can't just ask the device, which, after all, knows its own phone number. The problem is that the device can't be trusted. Remember that what we are trying to do is to convince the service that a given device is associated with this number, and even though the service wrote the app in question, it's very difficult for them to determine that an attacker hasn't modified the app to lie about its number. The SMS verification mechanism doesn't have this problem; because it actually checks that you can receive messages, it works even if the device and the code running on it are totally untrusted.

It's easier to see the trust relationships if we look at what's really happening, as shown in the diagram below:

Phone number verification via SMS

In the first phase, the user is interacting with the application, which is what collects the password and the phone number and sends them to the server. The server then sends the code through the phone network to the device. The device shows it to the user, who then gives it to the app. The app then sends it back to the server, which is then able to confirm the code and verify the account. Importantly, even though the server is sending the code to the app (via the user) the SMS channel to the phone is out of band from the app's connection to the server. In fact, they may even be using different technology; for instance, if you are on WiFi, then the connection to the server will use that radio even though the SMS comes in over the mobile telephony network. Even if all the data is going over the mobile channel, the IP communications from the app aren't strongly bound to your phone number.

Note that even if you don't trust the answer, if you could ask the device for its number, you could still skip prompting the user. However, the number may not be available. Apple's security and privacy policies forbid this (presumably for privacy reasons) though it appears to be possible on Android. For similar security reasons, the app can't just reach into your SMSes—which are received by the operating system—and grab the confirmation code, as this would allow it to read any SMS.^[3] The exception here is iMessage, which uses similar techniques to verify the phone number, but because it ships as part of the operating system is able to do so silently, even though Apple doesn't permit other apps to do so.

Once the service has associated the user's account with their phone number, the rest of the system is fairly straightforward the app connects and authenticates as the user and the service just routes messages/calls to the user; no further interaction with the PSTN is required. It is worth noting, however, that this has some funny results if the phone number is ever reassigned because the service won't be notified. The result can be that Alice has an account on some service for a number that has been reassigned to Bob. It's hard to avoid this situation with this kind of loose service coupling, but of course it's not unique to the Internet: I still get paper mail addressed to the people who lived in my house over 20 years ago.

Phone Number-Based Addressing for Multiple Applications #

The basic situation isn't that different when different users use different apps, except that you not only need to determine which device is associated with a given user but also which app they are using. As a simplification, let's assume that everyone just uses a single app (analogous to the situation with mobile phones where each subscriber just has a single carrier); We'll look at the multi-app situation below.

Consider the following three users:

User	App	Number
Alice	A	1.650.555.0011^[4]
Bob	B	1.415.555.0022
Charlie	A	1.510.555.0033

What happens if Alice gets Bob's number and wants to contact him in App A? The obvious thing would be for Alice to just SMS Bob and ask "which app are you using?" She could then tell A to contact "1.415.55.0022 via app B" (assuming that A and B) can already talk to each other as discussed in my earlier post). This will work but it's clumsy and inconvenient; what you want is for Alice to put Bob's number into app A and for A to figure things out. Unfortunately, this doesn't appear to be something that A can do on its own; rather, we need some additional infrastructure.

I'm aware of two major designs here. In the first design, you have a directory service which knows which number is associated with which app. In the second design, each user—or rather their app—has to discover it out for itself.

Directory Services #

The obvious way to approach this is just to use the same approach as for number portability, i.e., to have some sort of global directory service that tells you which app to use for each number.

It's possible you could directly integrate it with the existing PSTN databases, but that's probably going to be a lot of work and it's probably easier to just use the same kind of SMS verification we discussed in the previous section. For instance, suppose you had a single global directory service. When you installed the app you would prove possession of your number to the directory service which would then create a record mapping your number to the app you were using. This directory can then be queried by other people, as shown in the diagram below.

A simple phone number service

[Update: fixed diagram -- 2022-08-04]

In this example, Alice installs app A, which automatically contacts the directory and proves possession of her number. The directory then creates a record mapping her number to app A.^[5] When Bob wants to contact Alice, he puts her number into app B, which contacts the directory and finds out that Alice uses A. B then uses whatever interoperability mechanism it has with A to establish communication.

This system is obviously massively oversimplified. If we wanted to build something real, we'd need to address some important design questions and fix some—as-yet-unsolved—privacy issues.

Authentication #

The first question we'd need to address is the authentication structure. In the design I sketched above, the directory service is solely responsible for knowing which app a given number is associated with, but not for authenticating the user. For instance, if Alice and Charlie both use app A then when Bob tries to call Alice, A can redirect the call to Charlie. Of course, A might run some kind of certificate/key transparency type of system to prevent this kind of attack, but that requires every app to engage with that.

Note that the reverse is also true: when Bob calls Alice, Alice is relying on B's representation that it's really Bob, and B can lie. Moreover, it's important for Alice to check the directory to make sure that Bob's number is actually associated with B. Otherwise, service C could just claim to be speaking for Bob even if he's not a user of app C at all.

An alternate approach would be to have a global authentication system in which the directory issues a credential to each user binding their number to whatever cryptographic credentials their app uses (effectively, this is a certificate authority for phone numbers). In this case, it wouldn't be possible for an app to lie about user, though of course we now have to trust the directory. The advantage of this design would be that you only have to trust one thing and maybe you could have better auditing and transparency for a global service.

It's also possible to run both kinds of systems simultaneously, where each app uses its own authentication system internally but also is able to make use of a global credential system. This allows for innovation inside an app but also provides interoperability.

Centralization #

Another problem with this design is that it seems to require a centralized directory service, or at best a small number of such services. The basic invariant here is that you need a procedure that takes in a number and outputs the app it's associated with. The easiest way to do that is to have a single service. Perhaps if there were only a small number of apps you could check them individually but if there are tens or hundreds it's a real scalability problem (and may also be a privacy problem, as discussed below).

ENUM #

For the real nerds here, there is actually an RFC documenting a less centralized design rooted in the DNS called ENUM. The idea was that you would store records in the DNS under your phone number (hilariously, reversed, because phone numbers read left to right and DNS addresses read right to left), so you might have 8.4.1.0.6.4.9.7.0.2.4.4.e164.arpa.. This never took off for a host of reasons, and I don't think it's really a viable option here because it requires DNS delegations to match the phone number structure, which seems like a lot of work for everyone involved.

There are really two objections here: one about deployability and one about network architecture. The deployability objection is that someone has to run the service and that has to be paid for, so who is going to do that. I tend to think that this isn't that big an issue: this really isn't that big a service by modern standards, and we have a reference point for what it costs to run something similar in the form of Let's Encrypt, which has a budget of around 6 million dollars, with the costs scaling sublinearly. The whole premise of the situation is that companies like Apple and Facebook will be required to interoperate, and against that background, this isn't really that much money.

I take the network architecture objection more seriously: yet another centralized service isn't great for the Internet. I think there are some ways to make it somewhat less centralized, for instance by having each app maintain its own mirror of the database, but at the end of the day there's a tradeoff here between the good of interoperability—assuming you think it is good—and the bad of centralization. I tend to think that the balance is in favor of interoperability but it's not a slam dunk, especially if you think that there are other architectures that would do a better job (see below).

Privacy #

Probably the biggest issue with this design is that it has some fairly unfortunate privacy properties. Specifically in the naive version of this design:

The directory service gets to see which app(s) a given phone number is associated with.
It's possible for ordinary users to scrape the directory service and learn which app(s) a given user is associated with.
The directory server gets to see every lookup and so be able to learn who is trying to connect with who. (This is even worse if the user has to try every possible app)

It's probably possible to address some of these issues, though it's not immediately obvious that they can be completely fixed. The rest of this section contains some handwaving in the direction of potential solutions. I just came up with these recently, so don't blame me if they are horrifically broken.

The last one is probably the easiest, as there are a number of reasonably efficient private information retrieval (PIR) schemes for allowing a client to retrieve a single value from a server without disclosing the value to the server. So, if we just require those values to be retrieved over PIR (or even over a proxy!), we can probably provide some kind of privacy for who is connecting to who.

Similarly, I think it's probably possible to prevent large-scale scraping of user data by clients. This is a pretty typical rate limiting problem and it's already a problem existing apps have to face, so we could probably apply similar techniques here. This doesn't do much to prevent learning about a single individual, though, for instance, suppose I want to know if someone is on WhatsApp. There seems to be an inherent tension here between allowing seamless discovery and connection and providing privacy in this case, so I'm not sure if it's really soluble at the end of the day.

The best idea I have for the directory service getting to see which apps a given number is associated with is to split up the data between two servers. The idea would be that you would have two directory servers operated by unaffiliated entities. The client would then prove its identity to both servers (as above) and this would give it a credential that it could use to authenticate to that server. It would then take encrypt its app identity and send the key to one server and the encrypted value to the other. Then when someone wanted to contact you, they would contact both servers and reconstruct the original value, as shown below^[6]

Split storage for records

[Update: fixed diagram --2022-08-04]

This stops the servers from being able to access the entire database, though you still need to worry about scraping attacks, either against both servers or by one against the other, so it's not perfect.

SPIN #

Recently, Jonathan Rosenberg, Cullen Jennings, Alissa Cooper, and Jon Peterson—a group of heavy hitters in real time communications if there ever was one—published an alternative design called SPIN for this problem. The idea is to replace the centralized server by having each client do its own phone number mapping via SMS. I.e., when Alice
wants to contact Bob, her device sends an SMS to Bob's device (again, with some unpredictable random value). Bob's device responds with the app(s) that Bob supports and perhaps with his identities on those apps. The reasoning here is the same as with the directory service: only someone who could receive SMS at Bob's number could complete the challenge, so you must be talking to Bob.

Of course, this leaves us with the problem of Bob knowing who is calling, because Alice just asserts her number. One way to address this would be for Bob to issue a challenge in the opposite direction, but this isn't actually what SPIN does. Instead it assumes that Alice has obtained a credential—presumably using a similar issuance process to the one I indicated above—that she uses to sign her message to Bob, but that's a design choice. If you wanted to entirely eliminate centralized infrastructure you could certainly do that, and that's an obvious selling point of SPIN. Even with this kind of hybrid design, the directory service doesn't need to be available for query and so you don't have the privacy problems I discussed above (it also isn't in the critical path for calls, but availability of this kind of server system seems like a mostly solved problem at this point).

Of course, the SPIN design has a number of drawbacks (in fact, I originally started thinking about this problem because I read the draft and I wanted to try to fix them).

Offline Access #

With SPIN, you can't really do discovery of anyone who isn't online at the same time as you (more precisely, it just stalls until they are online and you can get the return message). This isn't necessarily that big an issue for real-time calls because if someone isn't online then you're not going to be able to call them anyway (though there's voicemail) but it's a big issue for instant messaging, which is inherently asynchronous. Jonathan Rosenberg argues that mobile devices are basically always connected. I'm not sure that this is really true, but if you want to extend to systems which have e-mail style identifiers, then those may be on desktop not mobile devices, so this is a drawback. This isn't an issue for the directory service design: once a user has registered with the directory service then anyone can do a lookup whether you are offline or not.

One partial mitigation for this might be for the operator of each app to record (cache) phone number validations as they happen, so that they gradually learn some of the mappings and can resolve them immediately. For instance, once Alice (on service A) has discovered that Bob is on service B, if Charlie (also on service A) can learn this information from A without a new verification stage. This has the advantage that it's "soft state" in that things work without it, but the disadvantage that some things work and some don't.

It (mostly) requires changing the operating system #

Because the SPIN design involves every client doing its own phone number verification, people are going to get a lot of SMS messages requiring them to verify, which is annoying. SPIN expects to address this by having the device operating system absorb the messages and respond for you so the user doesn't see them. This isn't necessarily a bad idea, but it's kind of ugly and means that people with older operating systems will have a bad experience.

Again, this isn't an issue with the directory service version because apps can just register themselves. That version does work better if the operating system helps out with SMS verification, but even in the worst case the user is just bothered once for each app they use, not for each person who wants to call them.

Attack Resistance #

As noted above, SMS routing in the PSTN isn't really that secure, and so you have to worry about misissuance. One way to mitigate this is to have the results of verification published in a transparency log. This allows everyone to see which credentials have been assigned to each number and potentially detect misissuance. This works fine in a directory service type system but in a system where each user does their own verification, you might run into a scenario where an attacker hijacked just the connection between Alice and Bob but not between Charlie and Bob. This would need some fancier mechanisms to detect, though we could probably design something.

Privacy #

As noted above, the privacy situation is largely better without a centralized server, but there's still an issue around probing for individual user information. I.e., Alice wants to know which app(s) Bob has and so sends an SMS and looks at the results. One way to address this is for Bob to have some logic that runs on the device that determines whether to answer the query—perhaps depending on whether Alice's number is in the contact list—though it's not clear how easy that is to configure.^[7]

Multiple Apps Per User #

Multiple apps are a pretty straightforward extension to either of these systems. In both cases, you can basically think of the system as publishing a "record" attached to the phone number. I've implicitly assumed that the record would contain a single app, but there's no technical reason why they can't contain a list of apps (this is slightly more complicated in the directory service version for cryptographic reasons, but not really that hard).

The situation for the initiator is somewhat more complicated: I'm using app A and I want to call someone and learn that they have apps B and C. What now? Presumably each app is going to have a priority list of apps it would prefer to interoperate with (favoring itself!) and will just pick the top one. But this can lead to some obvious problems, such as: will you get the same app in each direction? What happens if someone installs a new app that is more preferred? These aren't strictly discovery problems but are definitely ergonomics issues that apps will need to work out somehow.

Final Thoughts #

Obviously this is a difficult problem without a single great solution. I do think it's possible to come up with something reasonably good here, especially if we're willing to make some technical compromises. That's a lot more likely if there really will be a requirement to interoperate; while there are real technical problems, many of the problems are around incentives (e.g., why should I run a server so some people can talk to my users?) and regulation provides those incentives.

This problem would be vastly easier if the addresses people were using had been structured from the very beginning: as an example, e-mail addresses already consist of a user portion and a domain portion, and so it's easy to know where to route any given message. But because instant messaging addresses are largely opaque, you're stuck with clumsier solutions. On the other hand, most e-mail addresses aren't portable—you can't take example@gmail.com over to Hotmail—so if you ever wanted that you'd be back in the soup. To the best of my knowledge there's no real way to have address portability without some kind of routing database, either an explicit one like the DNS or my directory service, or an implicit one like the PSTN fabric that powers SMS verification.

You can now buy 128 GB flash drives, so this gives us 12 bytes per record. ↩︎
Note that it does not demonstrate that this device is associated with that number. For instance, you could have two devices, one of which is associated with that number and one of which you are installing the device on. ↩︎
Yes, it's possible to design a system that doesn't require full SMS access, but that's not how these APIs work. ↩︎
See here for why I am using 555 numbers. ↩︎
In a real system, we'd probably want to prevent malicious apps on Alice's phone from registering for another app, in what's called an "identity misbinding" attack, but I'm ignoring that here. ↩︎
Update 2022-08-04: You could also use secret sharing, but encryption has the advantage that if the record you want to store is large then the total size is smaller. ↩︎
It might be possible to replicate this functionality in the directory service model. Naively, Bob could just upload the algorithm for which numbers to answer for, but this has its own privacy problems because it leaks Bob's contact list to the service. There may be some fancy cryptographic solution that addresses all these privacy problems at once, but I don't have it in my pocket. ↩︎

Educated Guesswork

Discovery Mechanisms for Messaging and Calling Interoperability

Identifier Portability #

Phone numbers as a shared namespace #

Phone Number-Based Addressing for Single Applications #

How does the PSTN authenticate you? #

Phone Number-Based Addressing for Multiple Applications #

Directory Services #

Authentication #

Centralization #

ENUM #

Privacy #

SPIN #

Offline Access #

It (mostly) requires changing the operating system #

Attack Resistance #

Privacy #

Multiple Apps Per User #

Final Thoughts #

Keep Reading